[mpich-discuss] mpdboot error: failed to handshake with mpd on 192.168.1.248; recvd output={}
Rajeev Thakur
thakur at mcs.anl.gov
Tue Jan 12 10:08:19 CST 2010
Can you try using the Hydra process manager instead. It should have been built by default.
http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager
Rajeev
> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Zengming Zhang
> Sent: Friday, January 08, 2010 3:27 AM
> To: mpich2 mailling list
> Subject: [mpich-discuss] mpdboot error: failed to handshake
> with mpd on 192.168.1.248; recvd output={}
>
> Hi all:
>
> Happy New Year!
>
> I am a truly new guy to mpich2 software, and I have a
> question which almost put me crazy when installing mpich2
> 1.2.1, just see the following
> contents:
>
> QUESTION:
> --------------------------------------------------------------
> ---------
> I have installed mpich2 software from source code with the version of
> 1.2.1 on my Ubuntu 8.10 desktop system. I got no error in
> building and installing process, but when I use mpdboot
> command to start two nodes which configured in mpd.hosts
> file, I got the error like this:
>
> zzm at zzm-desktop:~$ mpdboot -n 2 -f mpd.hosts
> mpdboot_zzm-desktop (handle_mpd_output 407): failed to
> handshake with mpd on 192.168.1.248; recvd output={}
>
> There is a more detailed command output here:
> zzm at zzm-desktop:~$ mpdboot -n 2 -f mpd.hosts --chkup -v -d
> debug: starting
> checking 192.168.1.248
> there are 2 hosts up (counting local)
> running mpdallexit on zzm-desktop
> LAUNCHED mpd on zzm-desktop via
> debug: launch cmd= /home/zzm/bin/mpich2/bin/mpd.py --ncpus=1 -e -d
> debug: mpd on zzm-desktop on port 43220
> RUNNING: mpd on zzm-desktop
> debug: info for running mpd: {'ncpus': 1, 'list_port': 43220,
> 'entry_port': '', 'host': 'zzm-desktop', 'entry_host': '',
> 'ifhn': ''} LAUNCHED mpd on 192.168.1.248 via zzm-desktop
> debug: launch cmd= ssh -x -n -q 192.168.1.248
> '/home/zzm/bin/mpich2/bin/mpd.py -h zzm-desktop -p 43220
> --ncpus=1 -e -d'
> debug: mpd on 192.168.1.248 on port 49130
> mpdboot_zzm-desktop (handle_mpd_output 407): failed to
> handshake with mpd on 192.168.1.248; recvd output={}
>
> Note that:there are 2 hosts up (counting local) I think this
> means that both the two computers could be access, am I right?
> --------------------------------------------------------------
> ---------
>
> COMPUTER CONFIGURATIONS:
> --------------------------------------------------------------
> ---------
> Some configurations of my computer are listed bellow:
> zzm at zzm-desktop:~$ whoami
> zzm
> zzm at zzm-desktop:~$ more /home/zzm/.mpd.conf
> secretword=nicegiving zzm at zzm-desktop:~$ chmod 600 .mpd.conf
> zzm at zzm-desktop:~$ more /home/zzm/mpd.hosts
> 192.168.1.248
> 192.168.1.190
> zzm at zzm-desktop:~$ python -V
> Python 2.5.2
> zzm at zzm-desktop:~$
>
> The IP address of the main computer is 192.168.1.190, domain name is:
> zzm-desktop, and all commands are run on it.
> 192.168.1.248 is another computer used to test parallel
> environment, which domain name is : zcni-desktop.
>
> The /etc/hosts files on each computers are:
> zzm at zzm-desktop:~$ more /etc/hosts
> 192.168.1.190 zzm-desktop
> ... ...
> 1zzm at zcni-desktop:~$ more /etc/hosts
> 192.168.1.190 zcni-desktop
> ... ...
>
> Another thing, I do can access zcni-desktop via SSH protocol
> without entering a password. The only thing is that I must
> wait a long time, for example 10 seconds, to get login to
> zcni-desktop.
> So, in order to avoid SSH timed out, I add "UseDNS no" at the
> of /etc/ssh/sshd_config on each of the two computers.
> So, I can access to zcni-desktop via SSH immediately.
>
> The mpich2 installation folder are already on each of
> computer, and the $MPICH2/bin has been added into $PATH as
> well on each computer:
> zzm at zzm-desktop:~/bin/mpich2$ ls
> bin etc include lib sbin share
> zzm at zzm-desktop:~/bin/mpich2$ pwd
> /home/zzm/bin/mpich2
> zzm at zzm-desktop:~/bin/mpich2$ which mpd
> /home/zzm/bin/mpich2/bin/mpd
>
> zzm at zcni-desktop:/home/zzm/bin/mpich2$ ls bin etc include
> lib sbin share zzm at zcni-desktop:/home/zzm/bin/mpich2$ pwd
> /home/zzm/bin/mpich2
> zzm at zcni-desktop:/home/zzm/bin$ which mpd /home/zzm/bin/mpich2/bin/mpd
> --------------------------------------------------------------
> ---------
>
> WHAT DID I DO TO FIX IT:
> --------------------------------------------------------------
> ---------
> And I do can run these commands on each individual computer:
> zzm at zzm-desktop:~$ mpd &
> [1] 9219
> zzm at zzm-desktop:~$ mpdtrace
> zzm-desktop
> zzm at zzm-desktop:~$ mpdallexit
> [1]+ Done mpd
> zzm at zzm-desktop:~$
> zzm at zcni-desktop:~$ mpd &
> [1] 9219
> zzm at zcni-desktop:~$ mpdtrace
> zcni-desktop
> zzm at zcni-desktop:~$ mpdallexit
> [1]+ Done mpd
> zzm at zcni-desktop:~$
>
>
> I also used mpdcheck command on each computer, and got an error report
> when using -ssh option:
>
> zzm at zcni-desktop:~$mpdcheck
> zzm at zcni-desktop:~$
>
> zzm at zzm-desktop:~$ mpdcheck
> zzm at zzm-desktop:~$ mpdcheck -f mpd.hosts
> zzm at zzm-desktop:~$
>
> zzm at zzm-desktop:~$ mpdcheck -f mpd.hosts -ssh
> ** timed out waiting for client on 192.168.1.248 to produce output
> client on 192.168.1.248 failed to access the server
> here is the output:
> zzm at zzm-desktop:~$
>
>
> But I do can access 192.168.1.248 via SSH and without entering a
> password as I mentioned above(in "COMPUTER CONFIGURATIONS:" part ).
>
> I also thought about that whether the port is open, is the
> firewall that
> cause the problem? But I run the following command on both
> computer and
> still got the error:
> zzm at zzm-desktop:~/bin/mpich2/bin$ sudo iptables -F
> [sudo] password for zzm:
> zzm at zzm-desktop:~/bin/mpich2/bin$
>
> zzm at zcni-desktop:/home/zzm/bin/mpich2/bin$ sudo iptables -F
> [sudo] password for zzm:
> zzm at zcni-desktop:/home/zzm/bin/mpich2/bin$
>
> So, as I have mentioned, anyone could give me a clue? Please
> help me get
> out of the problem, any words of you would be appreciated very much!
>
> Thanks in advance!
>
> Best regards,
> Zengming
> ZheJiang Univ. HangZhou, China
>
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
More information about the mpich-discuss
mailing list