[mpich-discuss] mpdboot error: failed to handshake with mpd on 192.168.1.248; recvd output={}

Zengming Zhang nicegiving at gmail.com
Fri Jan 8 03:26:35 CST 2010


Hi all:

Happy New Year!

I am a truly new guy to mpich2 software, and I have a question which
almost put me crazy when installing mpich2 1.2.1, just see the following
contents:

QUESTION:
-----------------------------------------------------------------------
I have installed mpich2 software from source code with the version of
1.2.1 on my Ubuntu 8.10 desktop system. I got no error in building and
installing process, but when I use mpdboot command to start two nodes
which configured in mpd.hosts file, I got the error like this: 

zzm at zzm-desktop:~$ mpdboot -n 2 -f mpd.hosts 
mpdboot_zzm-desktop (handle_mpd_output 407): failed to handshake with
mpd on 192.168.1.248; recvd output={}

There is a more detailed command output here: 
zzm at zzm-desktop:~$ mpdboot -n 2 -f mpd.hosts --chkup -v -d
debug: starting
checking 192.168.1.248
there are 2 hosts up (counting local)
running mpdallexit on zzm-desktop
LAUNCHED mpd on zzm-desktop  via  
debug: launch cmd= /home/zzm/bin/mpich2/bin/mpd.py   --ncpus=1 -e -d
debug: mpd on zzm-desktop  on port 43220
RUNNING: mpd on zzm-desktop
debug: info for running mpd: {'ncpus': 1, 'list_port': 43220,
'entry_port': '', 'host': 'zzm-desktop', 'entry_host': '', 'ifhn': ''}
LAUNCHED mpd on 192.168.1.248  via  zzm-desktop
debug: launch cmd= ssh -x -n -q 192.168.1.248
'/home/zzm/bin/mpich2/bin/mpd.py  -h zzm-desktop -p 43220  --ncpus=1 -e
-d' 
debug: mpd on 192.168.1.248  on port 49130
mpdboot_zzm-desktop (handle_mpd_output 407): failed to handshake with
mpd on 192.168.1.248; recvd output={}

Note that:there are 2 hosts up (counting local)
I think this means that both the two computers could be access, am I
right?
-----------------------------------------------------------------------

COMPUTER CONFIGURATIONS:
-----------------------------------------------------------------------
Some configurations of my computer are listed bellow:
zzm at zzm-desktop:~$ whoami
zzm
zzm at zzm-desktop:~$ more /home/zzm/.mpd.conf 
secretword=nicegiving
zzm at zzm-desktop:~$ chmod 600 .mpd.conf
zzm at zzm-desktop:~$ more /home/zzm/mpd.hosts 
192.168.1.248
192.168.1.190
zzm at zzm-desktop:~$ python -V
Python 2.5.2
zzm at zzm-desktop:~$ 

The IP address of the main computer is 192.168.1.190, domain name is:
zzm-desktop, and all commands are run on it. 
192.168.1.248 is another computer used to test parallel environment,
which domain name is : zcni-desktop. 

The /etc/hosts files on each computers are:
zzm at zzm-desktop:~$ more /etc/hosts
192.168.1.190   zzm-desktop
... ...
1zzm at zcni-desktop:~$ more /etc/hosts
192.168.1.190   zcni-desktop
... ...

Another thing, I do can access zcni-desktop via SSH protocol without
entering a password. The only thing is that I must wait a long time, for
example 10 seconds, to get login to zcni-desktop.
So, in order to avoid SSH timed out, I add "UseDNS no" at the
of /etc/ssh/sshd_config on each of the two computers.
So, I can access to zcni-desktop via SSH immediately.

The mpich2 installation folder are already on each of computer, and the
$MPICH2/bin has been added into $PATH as well on each computer:
zzm at zzm-desktop:~/bin/mpich2$ ls
bin  etc  include  lib  sbin  share
zzm at zzm-desktop:~/bin/mpich2$ pwd
/home/zzm/bin/mpich2
zzm at zzm-desktop:~/bin/mpich2$ which mpd
/home/zzm/bin/mpich2/bin/mpd

zzm at zcni-desktop:/home/zzm/bin/mpich2$ ls
bin  etc  include  lib sbin  share
zzm at zcni-desktop:/home/zzm/bin/mpich2$ pwd
/home/zzm/bin/mpich2
zzm at zcni-desktop:/home/zzm/bin$ which mpd
/home/zzm/bin/mpich2/bin/mpd
-----------------------------------------------------------------------

WHAT DID I DO TO FIX IT:
-----------------------------------------------------------------------
And I do can run these commands on each individual computer:
zzm at zzm-desktop:~$ mpd &
[1] 9219
zzm at zzm-desktop:~$ mpdtrace
zzm-desktop
zzm at zzm-desktop:~$ mpdallexit
[1]+  Done                    mpd
zzm at zzm-desktop:~$ 
zzm at zcni-desktop:~$ mpd &
[1] 9219
zzm at zcni-desktop:~$ mpdtrace
zcni-desktop
zzm at zcni-desktop:~$ mpdallexit
[1]+  Done                    mpd
zzm at zcni-desktop:~$


I also used mpdcheck command on each computer, and got an error report
when using -ssh option: 

zzm at zcni-desktop:~$mpdcheck
zzm at zcni-desktop:~$

zzm at zzm-desktop:~$ mpdcheck
zzm at zzm-desktop:~$ mpdcheck -f mpd.hosts 
zzm at zzm-desktop:~$ 

zzm at zzm-desktop:~$ mpdcheck -f mpd.hosts -ssh
** timed out waiting for client on 192.168.1.248 to produce output
client on 192.168.1.248 failed to access the server
here is the output:
zzm at zzm-desktop:~$


But I do can access 192.168.1.248 via SSH and without entering a
password as I mentioned above(in "COMPUTER CONFIGURATIONS:" part ).

I also thought about that whether the port is open, is the firewall that
cause the problem? But I run the following command on both computer and
still got the error:
zzm at zzm-desktop:~/bin/mpich2/bin$ sudo iptables -F
[sudo] password for zzm: 
zzm at zzm-desktop:~/bin/mpich2/bin$ 

zzm at zcni-desktop:/home/zzm/bin/mpich2/bin$ sudo iptables -F
[sudo] password for zzm:
zzm at zcni-desktop:/home/zzm/bin/mpich2/bin$

So, as I have mentioned, anyone could give me a clue? Please help me get
out of the problem, any words of you would be appreciated very much!

Thanks in advance!

Best regards,
Zengming
ZheJiang Univ. HangZhou, China




More information about the mpich-discuss mailing list