I have gotten that error in the past, adding "--remcons" as a parameter to mpdboot seemed to help.<br><br>C.<br><br><br><div class="gmail_quote">On Tue, May 6, 2008 at 10:04 AM, Qi Ying <<a href="mailto:qi.ying@gmail.com">qi.ying@gmail.com</a>> wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Hi All,<br>
<br>
Recently I had trouble starting mpd using mpdboot on my cluster. This<br>
seems caused by a single node in the system (n002). The following is<br>
the debugging output. However, I can manually start mpd on n001 and<br>
n002 (and have them join the ring), and there is no problem. Any<br>
insights or suggestions?<br>
<br>
Thanks,<br>
<br>
Qi Ying<br>
<br>
[qying@n001~] $ mpdboot -n 2 -f ~/mpd.hosts --rsh=/usr/bin/rsh -v -d<br>
<br>
running mpdallexit on <a href="http://n001.newwolf.edu" target="_blank">n001.newwolf.edu</a><br>
LAUNCHED mpd on <a href="http://n001.newwolf.edu" target="_blank">n001.newwolf.edu</a> via<br>
debug: launch cmd= /opt/mpich2/bin/mpd.py --ncpus=1 -e -d<br>
debug: mpd on <a href="http://n001.newwolf.edu" target="_blank">n001.newwolf.edu</a> on port 55059<br>
RUNNING: mpd on <a href="http://n001.newwolf.edu" target="_blank">n001.newwolf.edu</a><br>
debug: info for running mpd: {'ncpus': 1, 'list_port': 55059,<br>
'entry_port': '', 'host': '<a href="http://n001.newwolf.edu" target="_blank">n001.newwolf.edu</a>', 'entry_host': '',<br>
'ifhn': ''}<br>
LAUNCHED mpd on n002 via <a href="http://n001.newwolf.edu" target="_blank">n001.newwolf.edu</a><br>
debug: launch cmd= /usr/bin/rsh -n n002 '/opt/mpich2/bin/mpd.py -h<br>
<a href="http://n001.newwolf.edu" target="_blank">n001.newwolf.edu</a> -p 55059 --ncpus=1 -e -d'<br>
debug: mpd on n002 on port 54319<br>
<a href="http://mpdboot_n001.newwolf.edu" target="_blank">mpdboot_n001.newwolf.edu</a> (handle_mpd_output 385): failed to handshake<br>
with mpd on n002; recvd output={}<br>
<br>
</blockquote></div><br>