[mpich2-dev] mpiexec (+mpdboot, mpdcheck...) problem
Rajeev Thakur
thakur at mcs.anl.gov
Mon Oct 19 20:37:10 CDT 2009
Did you follow all the debugging steps with mpdcheck as described in
Appendix A.2 of the installation guide?
Rajeev
_____
From: mpich2-dev-bounces at mcs.anl.gov
[mailto:mpich2-dev-bounces at mcs.anl.gov] On Behalf Of Jovana Knezevic
Sent: Monday, October 19, 2009 9:14 AM
To: mpich2-dev at mcs.anl.gov
Subject: [mpich2-dev] mpiexec (+mpdboot, mpdcheck...) problem
Hello everyone!
I am trying to run my parallel program on a 9 machines, each with 2
Opteron processors. I am accessing all machines via ssh and I can 'ssh'
from one machine to another without the password.
mpdboot command (as described in the documentation) produced a similar
problem that I saw some other users in this list had:
mpdboot_lx64a170 (handle_mpd_output 374): failed to ping mpd on
lxsrv171; recvd output={}
I tried mpdcheck -l to see what would happen and it didn't produce any
output (is this good or bad?)
When I 'manually' set the hosts and ports on machines lxsrv171 to
lxsrv178 with
mpd -n -h host -p port, where host and port I got via:
mpdtrace -l on the machine that I am calling mpiexec from
(lxsrv170), the execution was finally possible, however, did not give
expected results - it seems that most of the processes are not
communicating with each other.
(I tried a simple "ring" program to make sure this is not due to my
code, but it behaves exactly the same).
BTW, my hostfile looks like
lxsrv170:2
lxsrv171:2
lxsrv172:2
I would be most grateful if someone could help. Thanks in advance.
Regards,
Jovana
...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich2-dev/attachments/20091019/98c500d1/attachment.htm>
More information about the mpich2-dev
mailing list