[mpich-discuss] problems with mpdboot
Pavan Balaji
balaji at mcs.anl.gov
Tue Apr 7 10:11:36 CDT 2009
Can you try mpdcheck to make sure there are no network infrastructure
issues (e.g., firewalls or errors in /etc/hosts)?
Another quick check is to make sure each host can ssh to another host
with the name given in the host file. For example, try:
$ ssh c4labpc12.csee.usf.edu -t "ssh c4labpc19.csee.usf.edu hostname"
-- Pavan
bjday wrote:
> Pavan,
>
> Yes the names returned by "hostname" and the names in mpd.hosts are the
> fully qualified names.
>
> Thank you,
> Brian
>
>
> Pavan Balaji wrote:
>>
>> Check if your host file contains the same name as what is returned by
>> the "hostname" command (e.g., "foo" is different from
>> "foo.domain.edu"). Otherwise, mpd can't find the local hostname in
>> your host file.
>>
>> -- Pavan
>>
>> bjday wrote:
>>> Hello MPICH2 Gurus
>>>
>>> I am installing MPICH2 on some lab computers at the request of a
>>> professor. I have ran into a during testing. When i run mpdboot I
>>> receive this error
>>>
>>> mpdboot -n 2 -f mpd.hosts -v -d
>>> debug: starting
>>> running mpdallexit on c4labpc19.csee.usf.edu
>>> LAUNCHED mpd on c4labpc19.csee.usf.edu via
>>> debug: launch cmd= /usr/local/mpich2/bin/mpd.py --ncpus=1 -e -d
>>> debug: mpd on c4labpc19.csee.usf.edu on port 37116
>>> RUNNING: mpd on c4labpc19.csee.usf.edu
>>> debug: info for running mpd: {'ncpus': 1, 'list_port': 37116,
>>> 'entry_port': '', 'host': 'c4labpc19.csee.usf.edu', 'entry_host': '',
>>> 'ifhn': ''}
>>> LAUNCHED mpd on c4labpc12.csee.usf.edu via c4labpc19.csee.usf.edu
>>> debug: launch cmd= ssh -x -n -q c4labpc12.csee.usf.edu
>>> '/usr/local/mpich2/bin/mpd.py -h c4labpc19.csee.usf.edu -p 37116
>>> --ncpus=1 -e -d'
>>> debug: mpd on c4labpc12.csee.usf.edu on port no_port
>>> mpdboot_c4labpc19.csee.usf.edu (handle_mpd_output 406): from mpd on
>>> c4labpc12.csee.usf.edu, invalid port info:
>>> no_port
>>>
>>> I have seen this in the forums but there was not a resolution
>>> posted. I have gone through the trouble shooting in the install
>>> guide and i can complete until step 7 where mpdboot is used.. I can
>>> start mpd on the master, get the port, then connect the slave
>>> computers by specifying the master name and port number. Any ideas
>>> why pc12 is reporting no port?
>>>
>>> Thank you,
>>> Brian
>>
>
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
More information about the mpich-discuss
mailing list