[mpich2-dev] mpiexec (+mpdboot, mpdcheck...) problem

Rajeev Thakur thakur at mcs.anl.gov
Mon Oct 19 20:37:10 CDT 2009


Did you follow all the debugging steps with mpdcheck as described in
Appendix A.2 of the installation guide?
 
Rajeev


  _____  

From: mpich2-dev-bounces at mcs.anl.gov
[mailto:mpich2-dev-bounces at mcs.anl.gov] On Behalf Of Jovana Knezevic
Sent: Monday, October 19, 2009 9:14 AM
To: mpich2-dev at mcs.anl.gov
Subject: [mpich2-dev] mpiexec (+mpdboot, mpdcheck...) problem


Hello everyone!

I am trying to run my parallel program on a 9 machines, each with 2
Opteron processors. I am accessing all machines via ssh and I can 'ssh'
from one machine to another without the password.
mpdboot command (as described in the documentation) produced a similar
problem that I saw some other users in this list had: 
mpdboot_lx64a170 (handle_mpd_output 374): failed to ping mpd on
lxsrv171; recvd output={}

I tried mpdcheck -l to see what would happen and it didn't produce any
output (is this good or bad?)

When I 'manually' set the hosts and ports on machines lxsrv171 to
lxsrv178 with
mpd -n -h host -p port, where host and port I got via:
mpdtrace -l  on  the machine  that  I  am  calling mpiexec from
(lxsrv170), the execution was finally possible, however, did not give
expected results - it seems that most of the processes are not
communicating with each other.
(I tried a simple "ring" program to make sure this is not due to my
code, but it behaves exactly the same).

BTW, my hostfile looks like
lxsrv170:2
lxsrv171:2
lxsrv172:2

I would be most grateful if someone could help. Thanks in advance.

Regards,
Jovana
...

 



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich2-dev/attachments/20091019/98c500d1/attachment.htm>


More information about the mpich2-dev mailing list