[mpich-discuss] Can't boot mpd anymore after cluster reboot
Rajeev Thakur
thakur at mcs.anl.gov
Wed Jan 27 12:25:33 CST 2010
Probably something changed with the networking settings on the machines.
Is there a firewall? You can use the the mpdcheck utility to debug the
problem as described in the Appendix of the installation guide.
Rajeev
> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Thomas Ruedas
> Sent: Wednesday, January 27, 2010 12:12 PM
> To: MPICH Discussion List
> Subject: [mpich-discuss] Can't boot mpd anymore after cluster reboot
>
> Hi,
> I wonder why I can't boot mpd's on the cluster anymore. If I
> try to do
> it the way (I think) I used to do it, by executing on the head node:
> mpdboot --totalnum=43 --file=machines2
>
> I get this error:
> mpdboot_head.gl.ciw.edu (handle_mpd_output 415): failed to connect to
> mpd on head.gl.ciw.edu
>
> but I can execute a single mpd just as
> mpd &
> on the head node. When I do that and try afterwards:
> mpdboot --totalnum=43 --file=machines2
>
> I get this error:
> mpdboot_head.gl.ciw.edu (handle_mpd_output 406): failed to handshake
> with mpd on head.gl.ciw.edu; recvd output={}
>
> and all mpds die. What is wrong here?
> Thomas
> --
> -----------------------------------
> Thomas Ruedas
> Department of Terrestrial Magnetism
> Carnegie Institution of Washington
> http://www.dtm.ciw.edu/users/ruedas/
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
More information about the mpich-discuss
mailing list