[mpich-discuss] Can't boot mpd anymore after cluster reboot
Dave Goodell
goodell at mcs.anl.gov
Wed Jan 27 14:16:56 CST 2010
Do all nodes have a ".mpd.conf" with the same content in your home
directory (usually via an NFS mount)? Did one of your nodes come back
with a bad mount for the home directories? Are you out of disk space
in a /tmp dir on one of the machines? Given that you seem to be able
to start mpd's by hand manually and form a ring, I'm guessing that
none of these are your problems. The only other thing to check is
that you can still ssh without a password between the nodes, but IIRC
mpdcheck is supposed to check this for you already.
Other than that, I'm not sure what to tell you. These error messages
are unfortunately non-specific; several different problems can cause
that output. Both mpd and mpdboot are very temperamental programs,
which is one of the many reasons that we are replacing them with Hydra.
-Dave
On Jan 27, 2010, at 2:04 PM, Thomas Ruedas wrote:
> Dave Goodell wrote:
>> The mpdcheck utility is usually the best method for diagnosing
>> networking problems that will interfere with mpd and mpdboot.
>> Sometimes "mpdboot -v <ORIGINAL_ARGS_HERE>" also helps.
> Ok, so I tried this:
> mpdboot -v -n 2 -f mpd.hosts --ncpus=2
> running mpdallexit on xenia.gl.ciw.edu
> LAUNCHED mpd on xenia.gl.ciw.edu via
> mpdboot_xenia.gl.ciw.edu (handle_mpd_output 406): failed to
> handshake with mpd on xenia.gl.ciw.edu; recvd output={}
>
> I don't know whether there is a linebreak after "via" in the 2nd
> line (i.e. if it actually reads "via mpdboot") or if something is
> actually missing, but that's all I get.
>> Alternatively, you can try using the hydra process manager instead
>> (mpiexec.hydra):
> This version *should* work - I have used it before the shutdown this
> way, and no updates have been made since.
> Thomas
> --
> -----------------------------------
> Thomas Ruedas
> Department of Terrestrial Magnetism
> Carnegie Institution of Washington
> http://www.dtm.ciw.edu/users/ruedas/
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list