[mpich-discuss] Can't boot mpd anymore after cluster reboot

Dave Goodell goodell at mcs.anl.gov
Wed Jan 27 14:16:56 CST 2010


Do all nodes have a ".mpd.conf" with the same content in your home  
directory (usually via an NFS mount)?  Did one of your nodes come back  
with a bad mount for the home directories?  Are you out of disk space  
in a /tmp dir on one of the machines?  Given that you seem to be able  
to start mpd's by hand manually and form a ring, I'm guessing that  
none of these are your problems.  The only other thing to check is  
that you can still ssh without a password between the nodes, but IIRC  
mpdcheck is supposed to check this for you already.

Other than that, I'm not sure what to tell you.  These error messages  
are unfortunately non-specific; several different problems can cause  
that output.  Both mpd and mpdboot are very temperamental programs,  
which is one of the many reasons that we are replacing them with Hydra.

-Dave

On Jan 27, 2010, at 2:04 PM, Thomas Ruedas wrote:

> Dave Goodell wrote:
>> The mpdcheck utility is usually the best method for diagnosing  
>> networking problems that will interfere with mpd and mpdboot.   
>> Sometimes "mpdboot -v <ORIGINAL_ARGS_HERE>" also helps.
> Ok, so I tried this:
> mpdboot -v -n 2 -f mpd.hosts --ncpus=2
> running mpdallexit on xenia.gl.ciw.edu
> LAUNCHED mpd on xenia.gl.ciw.edu  via
> mpdboot_xenia.gl.ciw.edu (handle_mpd_output 406): failed to  
> handshake with mpd on xenia.gl.ciw.edu; recvd output={}
>
> I don't know whether there is a linebreak after "via" in the 2nd  
> line (i.e. if it actually reads "via mpdboot") or if something is  
> actually missing, but that's all I get.
>> Alternatively, you can try using the hydra process manager instead  
>> (mpiexec.hydra):
> This version *should* work - I have used it before the shutdown this  
> way, and no updates have been made since.
> Thomas
> -- 
> -----------------------------------
> Thomas Ruedas
> Department of Terrestrial Magnetism
> Carnegie Institution of Washington
> http://www.dtm.ciw.edu/users/ruedas/
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list