[mpich-discuss] problems with mpdboot
bjday
bjday at cse.usf.edu
Tue Apr 7 10:25:24 CDT 2009
I tried mpdcheck as instructed in the Troubleshooting in the
instillation guide again and the client (pc12) successfully recved ack
form server. The server (pc19) has conn from the client and
successfully recvd msg from client.
I also tryed the ssh command and received "c4labpc19.csee.usf.edu" as a
response.
Once mpd is started on the master i can connect the slaves once i get
the port number from the server. I can also run $mpdboot -n 1 and the
master node will be the only output when mpdtrace is ran. The error is
when n>1, when trying to remotely start eh slave nodes.
thank you,
Brian
Pavan Balaji wrote:
>
> Can you try mpdcheck to make sure there are no network infrastructure
> issues (e.g., firewalls or errors in /etc/hosts)?
>
> Another quick check is to make sure each host can ssh to another host
> with the name given in the host file. For example, try:
>
> $ ssh c4labpc12.csee.usf.edu -t "ssh c4labpc19.csee.usf.edu hostname"
>
> -- Pavan
>
> bjday wrote:
>> Pavan,
>>
>> Yes the names returned by "hostname" and the names in mpd.hosts are
>> the fully qualified names.
>>
>> Thank you,
>> Brian
>>
>>
>> Pavan Balaji wrote:
>>>
>>> Check if your host file contains the same name as what is returned
>>> by the "hostname" command (e.g., "foo" is different from
>>> "foo.domain.edu"). Otherwise, mpd can't find the local hostname in
>>> your host file.
>>>
>>> -- Pavan
>>>
>>> bjday wrote:
>>>> Hello MPICH2 Gurus
>>>>
>>>> I am installing MPICH2 on some lab computers at the request of a
>>>> professor. I have ran into a during testing. When i run mpdboot I
>>>> receive this error
>>>>
>>>> mpdboot -n 2 -f mpd.hosts -v -d
>>>> debug: starting
>>>> running mpdallexit on c4labpc19.csee.usf.edu
>>>> LAUNCHED mpd on c4labpc19.csee.usf.edu via
>>>> debug: launch cmd= /usr/local/mpich2/bin/mpd.py --ncpus=1 -e -d
>>>> debug: mpd on c4labpc19.csee.usf.edu on port 37116
>>>> RUNNING: mpd on c4labpc19.csee.usf.edu
>>>> debug: info for running mpd: {'ncpus': 1, 'list_port': 37116,
>>>> 'entry_port': '', 'host': 'c4labpc19.csee.usf.edu', 'entry_host':
>>>> '', 'ifhn': ''}
>>>> LAUNCHED mpd on c4labpc12.csee.usf.edu via c4labpc19.csee.usf.edu
>>>> debug: launch cmd= ssh -x -n -q c4labpc12.csee.usf.edu
>>>> '/usr/local/mpich2/bin/mpd.py -h c4labpc19.csee.usf.edu -p 37116
>>>> --ncpus=1 -e -d'
>>>> debug: mpd on c4labpc12.csee.usf.edu on port no_port
>>>> mpdboot_c4labpc19.csee.usf.edu (handle_mpd_output 406): from mpd on
>>>> c4labpc12.csee.usf.edu, invalid port info:
>>>> no_port
>>>>
>>>> I have seen this in the forums but there was not a resolution
>>>> posted. I have gone through the trouble shooting in the install
>>>> guide and i can complete until step 7 where mpdboot is used.. I
>>>> can start mpd on the master, get the port, then connect the slave
>>>> computers by specifying the master name and port number. Any ideas
>>>> why pc12 is reporting no port?
>>>>
>>>> Thank you,
>>>> Brian
>>>
>>
>
More information about the mpich-discuss
mailing list