[mpich-discuss] mpdboot handshake problem

Camilo Rostoker rostoker at gmail.com
Tue May 6 13:27:56 CDT 2008


I have gotten that error in the past, adding "--remcons" as a parameter to
mpdboot seemed to help.

C.


On Tue, May 6, 2008 at 10:04 AM, Qi Ying <qi.ying at gmail.com> wrote:

> Hi All,
>
> Recently I had trouble starting mpd using mpdboot on my cluster. This
> seems caused by a single node in the system (n002). The following is
> the debugging output. However, I can manually start mpd on n001 and
> n002 (and have them join the ring), and there is no problem. Any
> insights or suggestions?
>
> Thanks,
>
> Qi Ying
>
> [qying at n001~] $ mpdboot -n 2 -f ~/mpd.hosts --rsh=/usr/bin/rsh -v -d
>
> running mpdallexit on n001.newwolf.edu
> LAUNCHED mpd on n001.newwolf.edu  via
> debug: launch cmd= /opt/mpich2/bin/mpd.py   --ncpus=1 -e -d
> debug: mpd on n001.newwolf.edu  on port 55059
> RUNNING: mpd on n001.newwolf.edu
> debug: info for running mpd: {'ncpus': 1, 'list_port': 55059,
> 'entry_port': '', 'host': 'n001.newwolf.edu', 'entry_host': '',
> 'ifhn': ''}
> LAUNCHED mpd on n002  via  n001.newwolf.edu
> debug: launch cmd= /usr/bin/rsh -n n002 '/opt/mpich2/bin/mpd.py  -h
> n001.newwolf.edu -p 55059  --ncpus=1 -e -d'
> debug: mpd on n002  on port 54319
> mpdboot_n001.newwolf.edu (handle_mpd_output 385): failed to handshake
> with mpd on n002; recvd output={}
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20080506/5027ed07/attachment.htm>


More information about the mpich-discuss mailing list