[mpich-discuss] Can't boot mpd anymore after cluster reboot

Dave Goodell goodell at mcs.anl.gov
Wed Jan 27 13:48:35 CST 2010


On Jan 27, 2010, at 1:38 PM, Thomas Ruedas wrote:

> Rajeev Thakur wrote:
>> Probably something changed with the networking settings on the  
>> machines.
>> Is there a firewall? You can use the the mpdcheck utility to debug  
>> the
>> problem as described in the Appendix of the installation guide.
> I have done a couple of tests like those in the appendix, and it  
> seems that the nodes can communicate. I can even start a ring and  
> execute e.g. hostname on two nodes via mpiexec if I start the mpds  
> on the nodes separately by hand, the one on the second node as a  
> client to the one on the first. Nonetheless, mpdboot specifically  
> does not work.
> Any ideas?

The mpdcheck utility is usually the best method for diagnosing  
networking problems that will interfere with mpd and mpdboot.   
Sometimes "mpdboot -v <ORIGINAL_ARGS_HERE>" also helps.

Alternatively, you can try using the hydra process manager instead  
(mpiexec.hydra):

http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager

-Dave



More information about the mpich-discuss mailing list