[mpich2-dev] mpiexec (+mpdboot, mpdcheck...) problem

Jovana Knezevic jovana.knezevic.83 at gmail.com
Mon Oct 19 09:14:25 CDT 2009


Hello everyone!

I am trying to run my parallel program on a 9 machines, each with 2 Opteron
processors. I am accessing all machines via ssh and I can 'ssh' from one
machine to another without the password.
mpdboot command (as described in the documentation) produced a similar
problem that I saw some other users in this list had:
mpdboot_lx64a170 (handle_mpd_output 374): failed to ping mpd on lxsrv171;
recvd output={}

I tried mpdcheck -l to see what would happen and it didn't produce any
output (is this good or bad?)

When I 'manually' set the hosts and ports on machines lxsrv171 to lxsrv178
with
mpd -n -h host -p port, where host and port I got via:
mpdtrace -l  on  the machine  that  I  am  calling mpiexec from (lxsrv170),
the execution was finally possible, however, did not give expected results -
it seems that most of the processes are not communicating with each other.
(I tried a simple "ring" program to make sure this is not due to my code,
but it behaves exactly the same).

BTW, my hostfile looks like
lxsrv170:2
lxsrv171:2
lxsrv172:2

I would be most grateful if someone could help. Thanks in advance.

Regards,
Jovana
...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich2-dev/attachments/20091019/3a19d692/attachment.htm>


More information about the mpich2-dev mailing list