[MPICH] MPI2 bails on mpdboot but works if I add the ring manually?
Shaun Q
shaun at qualheim.org
Fri Mar 31 13:08:25 CST 2006
Hi there guys:
I'm trying to get a ring up on a new 64-bit diskless cluster I have here
and I'm having some problems connecting:
So I run the mpdboot...
%mpdboot -n 4 --rsh=rsh &
try to start on 4 identical machines via rsh...
and it spits the following back to me:
mpdboot_ct105 (handle_mpd_output 368): failed to connect to mpd on ct107
that last machine name -- ct107 -- rotates between each of the four
machines on trying to boot the ring..
This is the output from my /var/log/messages:
Mar 31 13:01:34 ct105 mpd: mpd starting; no mpdid yet
Mar 31 13:01:34 ct105 mpd: mpd has mpdid=ct105_37225 (port=37225)
Mar 31 13:01:34 ct105 python2.4: mpdboot_ct105 (handle_mpd_output 368):
failed to connect to mpd on ct107
Mar 31 13:01:35 ct105 mpd: mpd ending mpdid=ct105_37225 (inside cleanup)
I am however, able to start up a ring by issuing the mpd commands manually
(mpd; mpdtrace -l and then mpd -h blahblah -p blahblah & on the other
nodes).
So what are we thinking here? Might this be an RSH issue or a Python
issue?
Any ideas?
Thanks!
Shaun Qualheim
Convergent Thinking
More information about the mpich-discuss
mailing list