[mpich-discuss] Trouble with new installation -- failed to connect to mpd

Benjamin Svetitsky bqs at julian.tau.ac.il
Mon Dec 1 09:42:11 CST 2008


Thanks, Dave.  mpdcheck indeed points to a problem.  But the message is 
not very illuminating, apart from pointing out which links are giving 
trouble.  What really has me worried is that mpdcheck gives me the 
*same* error message on my old cluster -- where MPICH has been working 
fine for a year!  The message:

[root at nodeF ~]# mpdcheck -f mpd.hosts -ssh
client on nodeE failed to access the server
here is the output:
Traceback (most recent call last):
   File "/usr/local/bin/mpdcheck.py", line 103, in ?
     sock.connect((argv[argidx+1],int(argv[argidx+2])))  # note double 
parens
   File "<string>", line 1, in connect
socket.error: (113, 'No route to host')



Dave Goodell wrote:
> Hi Ben,
> 
> Please try the MPD troubleshooting steps listed in appendix A of the 
> install guide: 
> http://www.mcs.anl.gov/research/projects/mpich2/documentation/files/mpich2-1.0.8-installguide.pdf 
> 
> 
> In particular, the mpdcheck utility should give you a better clue about 
> where the problem is.
> 
> -Dave
> 
> On Dec 1, 2008, at 4:11 AM, Benjamin Svetitsky wrote:
> 
>> Dear MPI community,
>>
>> I already have MIPCH 1.0.8 running well on a cluster of four Linux 
>> quad cores.  But now I can't get it running on a new cluster.  I think 
>> I installed everything exactly like the first system.  But when I try 
>> to mpdboot as root I get a minimal error message:
>>
>> [root at nodeE ~]# mpdboot -n 4 -f /root/mpd.hosts
>> mpdboot_nodeE (handle_mpd_output 401): failed to connect to mpd on nodeF
>>
>> The /root/mpd.hosts contains:
>> nodeE
>> nodeF
>> nodeG
>> nodeH
>>
>> Oddly enough, after the failure of mpdboot as above I find:
>> [root at nodeE ~]# mpdtrace
>> nodeE
>> nodeF
>>
>> If I do mpdallexit and log into nodeF, the result is:
>> [root at nodeF ~]# mpdboot -n 4 -f /root/mpd.hosts
>> mpdboot_nodeF (handle_mpd_output 392): failed to handshake with mpd on 
>> nodeE; recvd output={}
>>
>> Do I have a network problem or is it an MPICH problem?
>>
>> Thanks,
>>     Ben
>>
>> -- 
>> Prof. Benjamin Svetitsky         Phone:            +972-3-640 8870
>> School of Physics and Astronomy  Fax:              +972-3-640 7932
>> Tel Aviv University              E-mail:      bqs at julian.tau.ac.il
>> 69978 Tel Aviv, Israel           WWW: http://julian.tau.ac.il/~bqs

-- 
Prof. Benjamin Svetitsky         Phone:            +972-3-640 8870
School of Physics and Astronomy  Fax:              +972-3-640 7932
Tel Aviv University              E-mail:      bqs at julian.tau.ac.il
69978 Tel Aviv, Israel           WWW: http://julian.tau.ac.il/~bqs



More information about the mpich-discuss mailing list