[mpich-discuss] mpdboot handshake problem
Qi Ying
qi.ying at gmail.com
Tue May 6 12:04:26 CDT 2008
Hi All,
Recently I had trouble starting mpd using mpdboot on my cluster. This
seems caused by a single node in the system (n002). The following is
the debugging output. However, I can manually start mpd on n001 and
n002 (and have them join the ring), and there is no problem. Any
insights or suggestions?
Thanks,
Qi Ying
[qying at n001~] $ mpdboot -n 2 -f ~/mpd.hosts --rsh=/usr/bin/rsh -v -d
running mpdallexit on n001.newwolf.edu
LAUNCHED mpd on n001.newwolf.edu via
debug: launch cmd= /opt/mpich2/bin/mpd.py --ncpus=1 -e -d
debug: mpd on n001.newwolf.edu on port 55059
RUNNING: mpd on n001.newwolf.edu
debug: info for running mpd: {'ncpus': 1, 'list_port': 55059,
'entry_port': '', 'host': 'n001.newwolf.edu', 'entry_host': '',
'ifhn': ''}
LAUNCHED mpd on n002 via n001.newwolf.edu
debug: launch cmd= /usr/bin/rsh -n n002 '/opt/mpich2/bin/mpd.py -h
n001.newwolf.edu -p 55059 --ncpus=1 -e -d'
debug: mpd on n002 on port 54319
mpdboot_n001.newwolf.edu (handle_mpd_output 385): failed to handshake
with mpd on n002; recvd output={}
More information about the mpich-discuss
mailing list