[mpich-discuss] Can't boot mpd anymore after cluster reboot
    Dave Goodell 
    goodell at mcs.anl.gov
       
    Wed Jan 27 13:48:35 CST 2010
    
    
  
On Jan 27, 2010, at 1:38 PM, Thomas Ruedas wrote:
> Rajeev Thakur wrote:
>> Probably something changed with the networking settings on the  
>> machines.
>> Is there a firewall? You can use the the mpdcheck utility to debug  
>> the
>> problem as described in the Appendix of the installation guide.
> I have done a couple of tests like those in the appendix, and it  
> seems that the nodes can communicate. I can even start a ring and  
> execute e.g. hostname on two nodes via mpiexec if I start the mpds  
> on the nodes separately by hand, the one on the second node as a  
> client to the one on the first. Nonetheless, mpdboot specifically  
> does not work.
> Any ideas?
The mpdcheck utility is usually the best method for diagnosing  
networking problems that will interfere with mpd and mpdboot.   
Sometimes "mpdboot -v <ORIGINAL_ARGS_HERE>" also helps.
Alternatively, you can try using the hydra process manager instead  
(mpiexec.hydra):
http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager
-Dave
    
    
More information about the mpich-discuss
mailing list