[mpich-discuss] Trouble with new installation -- failed to connect to mpd

Benjamin Svetitsky bqs at julian.tau.ac.il
Tue Dec 2 02:12:29 CST 2008


Thanks Rajeev, we are rechecking network settings.  But I don't trust 
mpdcheck.  On my old cluster (running MPICH for a year npw) when I run
mpicheck -f ~/hosts.mpd -ssh
it says -

client on nodeA failed to access the server
here is the output:
Traceback (most recent call last):
   File "/usr/local/bin/mpdcheck.py", line 103, in ?
     sock.connect((argv[argidx+1],int(argv[argidx+2])))  # note double 
parens
   File "<string>", line 1, in connect
socket.gaierror: (-3, 'Temporary failure in name resolution')

** but when I run mpdcheck -s (and -c) between the two nodes there is no 
problem in either direction.  (Does it matter that mpd was running jobs 
at the time of this test?)

		Ben

Rajeev Thakur wrote:
> This means a simple client on one machine was not able to connect to a
> simple server on another machine in the cluster (independent of MPICH or
> MPD). Can you check the networking settings on the machines. Is there a
> firewall preventing access?
> 
> Rajeev
>  
> 
>> -----Original Message-----
>> From: mpich-discuss-bounces at mcs.anl.gov 
>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of 
>> Benjamin Svetitsky
>> Sent: Monday, December 01, 2008 9:42 AM
>> To: mpich-discuss at mcs.anl.gov
>> Subject: Re: [mpich-discuss] Trouble with new installation -- 
>> failed to connect to mpd
>>
>> Thanks, Dave.  mpdcheck indeed points to a problem.  But the 
>> message is 
>> not very illuminating, apart from pointing out which links are giving 
>> trouble.  What really has me worried is that mpdcheck gives me the 
>> *same* error message on my old cluster -- where MPICH has 
>> been working 
>> fine for a year!  The message:
>>
>> [root at nodeF ~]# mpdcheck -f mpd.hosts -ssh
>> client on nodeE failed to access the server
>> here is the output:
>> Traceback (most recent call last):
>>    File "/usr/local/bin/mpdcheck.py", line 103, in ?
>>      sock.connect((argv[argidx+1],int(argv[argidx+2])))  # 
>> note double 
>> parens
>>    File "<string>", line 1, in connect
>> socket.error: (113, 'No route to host')
>>
>>
>>
>> Dave Goodell wrote:
>>> Hi Ben,
>>>
>>> Please try the MPD troubleshooting steps listed in appendix 
>> A of the 
>>> install guide: 
>>>
>> http://www.mcs.anl.gov/research/projects/mpich2/documentation/
>> files/mpich2-1.0.8-installguide.pdf 
>>>
>>> In particular, the mpdcheck utility should give you a 
>> better clue about 
>>> where the problem is.
>>>
>>> -Dave
>>>
>>> On Dec 1, 2008, at 4:11 AM, Benjamin Svetitsky wrote:
>>>
>>>> Dear MPI community,
>>>>
>>>> I already have MIPCH 1.0.8 running well on a cluster of four Linux 
>>>> quad cores.  But now I can't get it running on a new 
>> cluster.  I think 
>>>> I installed everything exactly like the first system.  But 
>> when I try 
>>>> to mpdboot as root I get a minimal error message:
>>>>
>>>> [root at nodeE ~]# mpdboot -n 4 -f /root/mpd.hosts
>>>> mpdboot_nodeE (handle_mpd_output 401): failed to connect 
>> to mpd on nodeF
>>>> The /root/mpd.hosts contains:
>>>> nodeE
>>>> nodeF
>>>> nodeG
>>>> nodeH
>>>>
>>>> Oddly enough, after the failure of mpdboot as above I find:
>>>> [root at nodeE ~]# mpdtrace
>>>> nodeE
>>>> nodeF
>>>>
>>>> If I do mpdallexit and log into nodeF, the result is:
>>>> [root at nodeF ~]# mpdboot -n 4 -f /root/mpd.hosts
>>>> mpdboot_nodeF (handle_mpd_output 392): failed to handshake 
>> with mpd on 
>>>> nodeE; recvd output={}
>>>>
>>>> Do I have a network problem or is it an MPICH problem?
>>>>
>>>> Thanks,
>>>>     Ben
>>>>
>>>> -- 
>>>> Prof. Benjamin Svetitsky         Phone:            +972-3-640 8870
>>>> School of Physics and Astronomy  Fax:              +972-3-640 7932
>>>> Tel Aviv University              E-mail:      bqs at julian.tau.ac.il
>>>> 69978 Tel Aviv, Israel           WWW: http://julian.tau.ac.il/~bqs
>> -- 
>> Prof. Benjamin Svetitsky         Phone:            +972-3-640 8870
>> School of Physics and Astronomy  Fax:              +972-3-640 7932
>> Tel Aviv University              E-mail:      bqs at julian.tau.ac.il
>> 69978 Tel Aviv, Israel           WWW: http://julian.tau.ac.il/~bqs
>>

-- 
Prof. Benjamin Svetitsky         Phone:            +972-3-640 8870
School of Physics and Astronomy  Fax:              +972-3-640 7932
Tel Aviv University              E-mail:      bqs at julian.tau.ac.il
69978 Tel Aviv, Israel           WWW: http://julian.tau.ac.il/~bqs



More information about the mpich-discuss mailing list