[mpich-discuss] Trouble with new installation -- failed to connect to mpd

Rajeev Thakur thakur at mcs.anl.gov
Mon Dec 1 13:01:33 CST 2008


This means a simple client on one machine was not able to connect to a
simple server on another machine in the cluster (independent of MPICH or
MPD). Can you check the networking settings on the machines. Is there a
firewall preventing access?

Rajeev
 

> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov 
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of 
> Benjamin Svetitsky
> Sent: Monday, December 01, 2008 9:42 AM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] Trouble with new installation -- 
> failed to connect to mpd
> 
> Thanks, Dave.  mpdcheck indeed points to a problem.  But the 
> message is 
> not very illuminating, apart from pointing out which links are giving 
> trouble.  What really has me worried is that mpdcheck gives me the 
> *same* error message on my old cluster -- where MPICH has 
> been working 
> fine for a year!  The message:
> 
> [root at nodeF ~]# mpdcheck -f mpd.hosts -ssh
> client on nodeE failed to access the server
> here is the output:
> Traceback (most recent call last):
>    File "/usr/local/bin/mpdcheck.py", line 103, in ?
>      sock.connect((argv[argidx+1],int(argv[argidx+2])))  # 
> note double 
> parens
>    File "<string>", line 1, in connect
> socket.error: (113, 'No route to host')
> 
> 
> 
> Dave Goodell wrote:
> > Hi Ben,
> > 
> > Please try the MPD troubleshooting steps listed in appendix 
> A of the 
> > install guide: 
> > 
> http://www.mcs.anl.gov/research/projects/mpich2/documentation/
> files/mpich2-1.0.8-installguide.pdf 
> > 
> > 
> > In particular, the mpdcheck utility should give you a 
> better clue about 
> > where the problem is.
> > 
> > -Dave
> > 
> > On Dec 1, 2008, at 4:11 AM, Benjamin Svetitsky wrote:
> > 
> >> Dear MPI community,
> >>
> >> I already have MIPCH 1.0.8 running well on a cluster of four Linux 
> >> quad cores.  But now I can't get it running on a new 
> cluster.  I think 
> >> I installed everything exactly like the first system.  But 
> when I try 
> >> to mpdboot as root I get a minimal error message:
> >>
> >> [root at nodeE ~]# mpdboot -n 4 -f /root/mpd.hosts
> >> mpdboot_nodeE (handle_mpd_output 401): failed to connect 
> to mpd on nodeF
> >>
> >> The /root/mpd.hosts contains:
> >> nodeE
> >> nodeF
> >> nodeG
> >> nodeH
> >>
> >> Oddly enough, after the failure of mpdboot as above I find:
> >> [root at nodeE ~]# mpdtrace
> >> nodeE
> >> nodeF
> >>
> >> If I do mpdallexit and log into nodeF, the result is:
> >> [root at nodeF ~]# mpdboot -n 4 -f /root/mpd.hosts
> >> mpdboot_nodeF (handle_mpd_output 392): failed to handshake 
> with mpd on 
> >> nodeE; recvd output={}
> >>
> >> Do I have a network problem or is it an MPICH problem?
> >>
> >> Thanks,
> >>     Ben
> >>
> >> -- 
> >> Prof. Benjamin Svetitsky         Phone:            +972-3-640 8870
> >> School of Physics and Astronomy  Fax:              +972-3-640 7932
> >> Tel Aviv University              E-mail:      bqs at julian.tau.ac.il
> >> 69978 Tel Aviv, Israel           WWW: http://julian.tau.ac.il/~bqs
> 
> -- 
> Prof. Benjamin Svetitsky         Phone:            +972-3-640 8870
> School of Physics and Astronomy  Fax:              +972-3-640 7932
> Tel Aviv University              E-mail:      bqs at julian.tau.ac.il
> 69978 Tel Aviv, Israel           WWW: http://julian.tau.ac.il/~bqs
> 




More information about the mpich-discuss mailing list