[mpich-discuss] Can't boot mpd anymore after cluster reboot
Dave Goodell
goodell at mcs.anl.gov
Wed Jan 27 13:48:35 CST 2010
On Jan 27, 2010, at 1:38 PM, Thomas Ruedas wrote:
> Rajeev Thakur wrote:
>> Probably something changed with the networking settings on the
>> machines.
>> Is there a firewall? You can use the the mpdcheck utility to debug
>> the
>> problem as described in the Appendix of the installation guide.
> I have done a couple of tests like those in the appendix, and it
> seems that the nodes can communicate. I can even start a ring and
> execute e.g. hostname on two nodes via mpiexec if I start the mpds
> on the nodes separately by hand, the one on the second node as a
> client to the one on the first. Nonetheless, mpdboot specifically
> does not work.
> Any ideas?
The mpdcheck utility is usually the best method for diagnosing
networking problems that will interfere with mpd and mpdboot.
Sometimes "mpdboot -v <ORIGINAL_ARGS_HERE>" also helps.
Alternatively, you can try using the hydra process manager instead
(mpiexec.hydra):
http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager
-Dave
More information about the mpich-discuss
mailing list