[mpich-discuss] mpdboot error: failed to handshake with mpd on 192.168.1.248; recvd output={}

Rajeev Thakur thakur at mcs.anl.gov
Tue Jan 12 10:08:19 CST 2010


Can you try using the Hydra process manager instead. It should have been built by default.
http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager

Rajeev 

> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov 
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Zengming Zhang
> Sent: Friday, January 08, 2010 3:27 AM
> To: mpich2 mailling list
> Subject: [mpich-discuss] mpdboot error: failed to handshake 
> with mpd on 192.168.1.248; recvd output={}
> 
> Hi all:
> 
> Happy New Year!
> 
> I am a truly new guy to mpich2 software, and I have a 
> question which almost put me crazy when installing mpich2 
> 1.2.1, just see the following
> contents:
> 
> QUESTION:
> --------------------------------------------------------------
> ---------
> I have installed mpich2 software from source code with the version of
> 1.2.1 on my Ubuntu 8.10 desktop system. I got no error in 
> building and installing process, but when I use mpdboot 
> command to start two nodes which configured in mpd.hosts 
> file, I got the error like this: 
> 
> zzm at zzm-desktop:~$ mpdboot -n 2 -f mpd.hosts 
> mpdboot_zzm-desktop (handle_mpd_output 407): failed to 
> handshake with mpd on 192.168.1.248; recvd output={}
> 
> There is a more detailed command output here: 
> zzm at zzm-desktop:~$ mpdboot -n 2 -f mpd.hosts --chkup -v -d
> debug: starting
> checking 192.168.1.248
> there are 2 hosts up (counting local)
> running mpdallexit on zzm-desktop
> LAUNCHED mpd on zzm-desktop  via  
> debug: launch cmd= /home/zzm/bin/mpich2/bin/mpd.py   --ncpus=1 -e -d
> debug: mpd on zzm-desktop  on port 43220
> RUNNING: mpd on zzm-desktop
> debug: info for running mpd: {'ncpus': 1, 'list_port': 43220,
> 'entry_port': '', 'host': 'zzm-desktop', 'entry_host': '', 
> 'ifhn': ''} LAUNCHED mpd on 192.168.1.248  via  zzm-desktop
> debug: launch cmd= ssh -x -n -q 192.168.1.248 
> '/home/zzm/bin/mpich2/bin/mpd.py  -h zzm-desktop -p 43220  
> --ncpus=1 -e -d' 
> debug: mpd on 192.168.1.248  on port 49130 
> mpdboot_zzm-desktop (handle_mpd_output 407): failed to 
> handshake with mpd on 192.168.1.248; recvd output={}
> 
> Note that:there are 2 hosts up (counting local) I think this 
> means that both the two computers could be access, am I right?
> --------------------------------------------------------------
> ---------
> 
> COMPUTER CONFIGURATIONS:
> --------------------------------------------------------------
> ---------
> Some configurations of my computer are listed bellow:
> zzm at zzm-desktop:~$ whoami
> zzm
> zzm at zzm-desktop:~$ more /home/zzm/.mpd.conf 
> secretword=nicegiving zzm at zzm-desktop:~$ chmod 600 .mpd.conf 
> zzm at zzm-desktop:~$ more /home/zzm/mpd.hosts
> 192.168.1.248
> 192.168.1.190
> zzm at zzm-desktop:~$ python -V
> Python 2.5.2
> zzm at zzm-desktop:~$ 
> 
> The IP address of the main computer is 192.168.1.190, domain name is:
> zzm-desktop, and all commands are run on it. 
> 192.168.1.248 is another computer used to test parallel 
> environment, which domain name is : zcni-desktop. 
> 
> The /etc/hosts files on each computers are:
> zzm at zzm-desktop:~$ more /etc/hosts
> 192.168.1.190   zzm-desktop
> ... ...
> 1zzm at zcni-desktop:~$ more /etc/hosts
> 192.168.1.190   zcni-desktop
> ... ...
> 
> Another thing, I do can access zcni-desktop via SSH protocol 
> without entering a password. The only thing is that I must 
> wait a long time, for example 10 seconds, to get login to 
> zcni-desktop.
> So, in order to avoid SSH timed out, I add "UseDNS no" at the 
> of /etc/ssh/sshd_config on each of the two computers.
> So, I can access to zcni-desktop via SSH immediately.
> 
> The mpich2 installation folder are already on each of 
> computer, and the $MPICH2/bin has been added into $PATH as 
> well on each computer:
> zzm at zzm-desktop:~/bin/mpich2$ ls
> bin  etc  include  lib  sbin  share
> zzm at zzm-desktop:~/bin/mpich2$ pwd
> /home/zzm/bin/mpich2
> zzm at zzm-desktop:~/bin/mpich2$ which mpd
> /home/zzm/bin/mpich2/bin/mpd
> 
> zzm at zcni-desktop:/home/zzm/bin/mpich2$ ls bin  etc  include  
> lib sbin  share zzm at zcni-desktop:/home/zzm/bin/mpich2$ pwd
> /home/zzm/bin/mpich2
> zzm at zcni-desktop:/home/zzm/bin$ which mpd /home/zzm/bin/mpich2/bin/mpd
> --------------------------------------------------------------
> ---------
> 
> WHAT DID I DO TO FIX IT:
> --------------------------------------------------------------
> ---------
> And I do can run these commands on each individual computer:
> zzm at zzm-desktop:~$ mpd &
> [1] 9219
> zzm at zzm-desktop:~$ mpdtrace
> zzm-desktop
> zzm at zzm-desktop:~$ mpdallexit
> [1]+  Done                    mpd
> zzm at zzm-desktop:~$
> zzm at zcni-desktop:~$ mpd &
> [1] 9219
> zzm at zcni-desktop:~$ mpdtrace
> zcni-desktop
> zzm at zcni-desktop:~$ mpdallexit
> [1]+  Done                    mpd
> zzm at zcni-desktop:~$
> 
> 
> I also used mpdcheck command on each computer, and got an error report
> when using -ssh option: 
> 
> zzm at zcni-desktop:~$mpdcheck
> zzm at zcni-desktop:~$
> 
> zzm at zzm-desktop:~$ mpdcheck
> zzm at zzm-desktop:~$ mpdcheck -f mpd.hosts 
> zzm at zzm-desktop:~$ 
> 
> zzm at zzm-desktop:~$ mpdcheck -f mpd.hosts -ssh
> ** timed out waiting for client on 192.168.1.248 to produce output
> client on 192.168.1.248 failed to access the server
> here is the output:
> zzm at zzm-desktop:~$
> 
> 
> But I do can access 192.168.1.248 via SSH and without entering a
> password as I mentioned above(in "COMPUTER CONFIGURATIONS:" part ).
> 
> I also thought about that whether the port is open, is the 
> firewall that
> cause the problem? But I run the following command on both 
> computer and
> still got the error:
> zzm at zzm-desktop:~/bin/mpich2/bin$ sudo iptables -F
> [sudo] password for zzm: 
> zzm at zzm-desktop:~/bin/mpich2/bin$ 
> 
> zzm at zcni-desktop:/home/zzm/bin/mpich2/bin$ sudo iptables -F
> [sudo] password for zzm:
> zzm at zcni-desktop:/home/zzm/bin/mpich2/bin$
> 
> So, as I have mentioned, anyone could give me a clue? Please 
> help me get
> out of the problem, any words of you would be appreciated very much!
> 
> Thanks in advance!
> 
> Best regards,
> Zengming
> ZheJiang Univ. HangZhou, China
> 
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 



More information about the mpich-discuss mailing list