[mpich-discuss] problems with mpdboot

bjday bjday at cse.usf.edu
Tue Apr 7 10:25:24 CDT 2009


I tried mpdcheck as instructed in the Troubleshooting in the 
instillation guide again and the client (pc12) successfully recved ack 
form server.  The server (pc19) has conn from the client and 
successfully recvd msg from client.

I also tryed the ssh command and received "c4labpc19.csee.usf.edu" as a 
response.

Once mpd is started on the master i can connect the slaves once i get 
the port number from the server. I can also run $mpdboot -n 1 and the 
master node will be the only output when mpdtrace is ran.  The error is 
when n>1, when trying to remotely start eh slave nodes.

thank you,
Brian

Pavan Balaji wrote:
>
> Can you try mpdcheck to make sure there are no network infrastructure 
> issues (e.g., firewalls or errors in /etc/hosts)?
>
> Another quick check is to make sure each host can ssh to another host 
> with the name given in the host file. For example, try:
>
>  $ ssh c4labpc12.csee.usf.edu -t "ssh c4labpc19.csee.usf.edu hostname"
>
>  -- Pavan
>
> bjday wrote:
>> Pavan,
>>
>> Yes the names returned by "hostname" and the names in mpd.hosts are 
>> the fully qualified names.
>>
>> Thank you,
>> Brian
>>
>>
>> Pavan Balaji wrote:
>>>
>>> Check if your host file contains the same name as what is returned 
>>> by the "hostname" command (e.g., "foo" is different from 
>>> "foo.domain.edu"). Otherwise, mpd can't find the local hostname in 
>>> your host file.
>>>
>>>  -- Pavan
>>>
>>> bjday wrote:
>>>> Hello MPICH2 Gurus
>>>>
>>>> I am installing MPICH2 on some lab computers at the request of a 
>>>> professor.  I have ran into a during testing.  When i run mpdboot I 
>>>> receive this error
>>>>
>>>> mpdboot -n 2 -f mpd.hosts -v -d
>>>> debug: starting
>>>> running mpdallexit on c4labpc19.csee.usf.edu
>>>> LAUNCHED mpd on c4labpc19.csee.usf.edu  via
>>>> debug: launch cmd= /usr/local/mpich2/bin/mpd.py   --ncpus=1 -e -d
>>>> debug: mpd on c4labpc19.csee.usf.edu  on port 37116
>>>> RUNNING: mpd on c4labpc19.csee.usf.edu
>>>> debug: info for running mpd: {'ncpus': 1, 'list_port': 37116, 
>>>> 'entry_port': '', 'host': 'c4labpc19.csee.usf.edu', 'entry_host': 
>>>> '', 'ifhn': ''}
>>>> LAUNCHED mpd on c4labpc12.csee.usf.edu  via  c4labpc19.csee.usf.edu
>>>> debug: launch cmd= ssh -x -n -q c4labpc12.csee.usf.edu 
>>>> '/usr/local/mpich2/bin/mpd.py  -h c4labpc19.csee.usf.edu -p 37116  
>>>> --ncpus=1 -e -d'
>>>> debug: mpd on c4labpc12.csee.usf.edu  on port no_port
>>>> mpdboot_c4labpc19.csee.usf.edu (handle_mpd_output 406): from mpd on 
>>>> c4labpc12.csee.usf.edu, invalid port info:
>>>> no_port
>>>>
>>>> I have seen this in the forums but there was not a resolution 
>>>> posted.  I have gone through the trouble shooting in the install 
>>>> guide and i can complete until step 7 where mpdboot is used..  I 
>>>> can start mpd on the master, get the port, then connect the slave 
>>>> computers by specifying the master name and port number.  Any ideas 
>>>> why pc12 is reporting no port?
>>>>
>>>> Thank you,
>>>> Brian
>>>
>>
>



More information about the mpich-discuss mailing list