[mpich-discuss] Can't boot mpd anymore after cluster reboot

Rajeev Thakur thakur at mcs.anl.gov
Wed Jan 27 12:25:33 CST 2010


Probably something changed with the networking settings on the machines.
Is there a firewall? You can use the the mpdcheck utility to debug the
problem as described in the Appendix of the installation guide.

Rajeev 

> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov 
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Thomas Ruedas
> Sent: Wednesday, January 27, 2010 12:12 PM
> To: MPICH Discussion List
> Subject: [mpich-discuss] Can't boot mpd anymore after cluster reboot
> 
> Hi,
> I wonder why I can't boot mpd's on the cluster anymore. If I 
> try to do 
> it the way (I think) I used to do it, by executing on the head node:
> mpdboot --totalnum=43 --file=machines2
> 
> I get this error:
> mpdboot_head.gl.ciw.edu (handle_mpd_output 415): failed to connect to 
> mpd on head.gl.ciw.edu
> 
> but I can execute a single mpd just as
> mpd &
> on the head node. When I do that and try afterwards:
> mpdboot --totalnum=43 --file=machines2
> 
> I get this error:
> mpdboot_head.gl.ciw.edu (handle_mpd_output 406): failed to handshake 
> with mpd on head.gl.ciw.edu; recvd output={}
> 
> and all mpds die. What is wrong here?
> Thomas
> -- 
> -----------------------------------
> Thomas Ruedas
> Department of Terrestrial Magnetism
> Carnegie Institution of Washington
> http://www.dtm.ciw.edu/users/ruedas/
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 



More information about the mpich-discuss mailing list