[mpich-discuss] mpdboot fails

Robert Kubrick robertkubrick at gmail.com
Wed Aug 27 10:09:49 CDT 2008


I tried mpdcheck -s/-c and it works between the 2 machines both ways:

iMac:$ mpdcheck -s
server listening at INADDR_ANY on: iMac.local 65357
server has conn on <socket._socketobject object at 0x8c030> from  
('192.168.2.2', 43331)
server successfully recvd msg from client: hello_from_client_to_server

wlaptop:$ mpdcheck -c iMac 65357
client successfully recvd ack from server: ack_from_server_to_client

I run the mpd ring manually:

On iMac:

iMac:$ mpd&
[1] 13755
iMac:$ mpdtrace -l
iMac.local_65501 (192.168.2.1)

On wlaptop:

wlaptop:$ mpd -h iMac -p 65501&

Then I run a test on the local machine:

iMac:$ mpiexec -n 2 /bin/hostname
iMac.local
wlaptop

So far so good. Now I try to run 4 hostname processes, this is when I  
start having problems:

iMac:$ mpiexec -n 4 hostname
iMac.local_mpdman_2: conn error in connect_lhs: Operation timed out

The command hangs for a while then it prints the timeout message.  
Lets take a look at the mpd processes:

$ ps -A|fgrep mpd
13755  p1  S      0:00.28 python2.3 /opt/local/mpich2/bin/mpd
13805  p1  S      0:00.01 python2.3 /opt/local/mpich2/bin/mpd
13806  p1  S      0:00.01 python2.3 /opt/local/mpich2/bin/mpd
iMac:$ mpdtrace -l
iMac.local_65501 (192.168.2.1)
wlaptop_53313 (127.0.1.1)

I have to manually kill the extra mpd on the local machine.

On Aug 26, 2008, at 7:11 PM, Rajeev Thakur wrote:

> -mpd just specifies the path for mpd on remote hosts (the same path  
> for all
> remote hosts). If it still doesn't work, there may be some problem  
> with the
> networking configuration on the two machines. To debug the problem,  
> you can
> use the mpdcheck utility as described in the installation guide  
> (all steps).
>
> Rajeev
>
>
>> -----Original Message-----
>> From: owner-mpich-discuss at mcs.anl.gov
>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Robert Kubrick
>> Sent: Tuesday, August 26, 2008 6:05 PM
>> To: mpich-discuss at mcs.anl.gov
>> Subject: Re: [mpich-discuss] mpdboot fails
>>
>> I'm not clear on how --mpd works: I want to specify two
>> different paths, one for the local machine and one for the
>> remote host (or possibly a number of different remote hosts).
>> Here it looks like I can only set one path for all the hosts:
>>
>> $ mpdboot --verbose --totalnum=2
>> --mpd=/usr/local/mpich2/bin/mpd --debug
>> debug: starting
>> running mpdallexit on iMac.local
>> LAUNCHED mpd on iMac.local  via
>> debug: launch cmd= /usr/local/mpich2/bin/mpd   --ncpus=1 -e -d
>> debug: mpd on iMac.local  on port no_port mpdboot_iMac.local
>> (handle_mpd_output 406): from mpd on iMac.local, invalid port info:
>> no_port
>>
>>
>> $ mpdboot --verbose --totalnum=2 --debug
>> debug: starting
>> running mpdallexit on iMac.local
>> LAUNCHED mpd on iMac.local  via
>> debug: launch cmd= /opt/local/mpich2/bin/mpd.py   --ncpus=1 -e -d
>> debug: mpd on iMac.local  on port 62629
>> RUNNING: mpd on iMac.local
>> debug: info for running mpd: {'ncpus': 1, 'list_port': 62629,
>> 'entry_port': '', 'host': 'iMac.local', 'entry_host': '',
>> 'ifhn': ''} LAUNCHED mpd on wlaptop  via  iMac.local
>> debug: launch cmd= ssh -x -n -q wlaptop
>> '/opt/local/mpich2/bin/ mpd.py  -h iMac.local -p 62629
>> --ncpus=1 -e -d'
>> debug: mpd on wlaptop  on port no_port
>> mpdboot_iMac.local (handle_mpd_output 406): from mpd on
>> wlaptop, invalid port info:
>> no_port
>>
>>
>>
>>
>> On Aug 25, 2008, at 2:20 PM, Rajeev Thakur wrote:
>>
>>>> The path to mpd on 'wlaptop' is actually a different one than on
>>>> iMac.local. How can I specify a different remote path for mpd?
>>>
>>> With the --mpd option to mpdboot.
>>>
>>> Rajeev
>>>
>>>> -----Original Message-----
>>>> From: owner-mpich-discuss at mcs.anl.gov
>>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of
>> Robert Kubrick
>>>> Sent: Sunday, August 24, 2008 5:19 PM
>>>> To: mpich-discuss at mcs.anl.gov
>>>> Subject: [mpich-discuss] mpdboot fails
>>>>
>>>> mpdboot fails when I try to run with this command:
>>>>
>>>> # mpdboot --totalnum=2 --verbose -d               debug: starting
>>>> running mpdallexit on iMac.local
>>>> LAUNCHED mpd on iMac.local  via
>>>> debug: launch cmd= /opt/local/mpich2/bin/mpd.py   --ncpus=1 -e -d
>>>> debug: mpd on iMac.local  on port 61785
>>>> RUNNING: mpd on iMac.local
>>>> debug: info for running mpd: {'ncpus': 1, 'list_port': 61785,
>>>> 'entry_port': '', 'host': 'iMac.local', 'entry_host': '',
>> 'ifhn': ''}
>>>> LAUNCHED mpd on wlaptop  via  iMac.local
>>>> debug: launch cmd= ssh -x -n -q wlaptop '/opt/local/mpich2/bin/
>>>> mpd.py  -h iMac.local -p 61785  --ncpus=1 -e -d'
>>>> debug: mpd on wlaptop  on port no_port mpdboot_iMac.local
>>>> (handle_mpd_output 406): from mpd on wlaptop, invalid port info:
>>>> no_port
>>>>
>>>>
>>>> The path to mpd on 'wlaptop' is actually a different one than on
>>>> iMac.local. How can I specify a different remote path for mpd?
>>>>
>>>>
>>>
>>
>>
>




More information about the mpich-discuss mailing list