[mpich-discuss] mpdboot hanging
Jason Palmer
jason at sccn.ucsd.edu
Fri Feb 12 19:44:49 CST 2010
My problem may involve --maxbranch. I don't recall needing to set this
before to start 6 mpd procs, one local and 5 on 5 remote hosts, but now to
start more than 4 remote mpd's, which it says is the maxbranch default, I
need to set -maxbranch=5 for example.
Maybe I'm misremembering how mpdboot worked. It is supposed to return after
starting the mpd's right?
Is setting maxbranch always required to start more than 4 remote mpd's?
Here is what I'm getting, where "mpdfile" contains the hostnames . the
traceback occurs after hitting ctrl-c.
[jason at juggling ~]$ mpdboot -f mpdfile -n 7 --verbose --maxbranch=6
running mpdallexit on juggling.ucsd.edu
LAUNCHED mpd on juggling.ucsd.edu via
RUNNING: mpd on juggling.ucsd.edu
LAUNCHED mpd on compute-0-16 via juggling.ucsd.edu
LAUNCHED mpd on compute-0-17 via juggling.ucsd.edu
LAUNCHED mpd on compute-0-18 via juggling.ucsd.edu
LAUNCHED mpd on compute-0-19 via juggling.ucsd.edu
LAUNCHED mpd on compute-0-20 via juggling.ucsd.edu
LAUNCHED mpd on compute-0-15 via juggling.ucsd.edu
Traceback (most recent call last):
File "/home/jason/mpich2-1.2.1-install/bin/mpdboot", line 476, in ?
mpdboot()
File "/home/jason/mpich2-1.2.1-install/bin/mpdboot", line 347, in mpdboot
handle_mpd_output(fd,fd2idx,hostsAndInfo)
File "/home/jason/mpich2-1.2.1-install/bin/mpdboot", line 385, in
handle_mpd_output
for line in fd.readlines(): # handle output from shells that echo
stuff
KeyboardInterrupt
[jason at juggling ~]$
Thanks,
Jason
From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Jason Palmer
Sent: Friday, February 12, 2010 3:13 PM
To: mpich-discuss at mcs.anl.gov
Subject: [mpich-discuss] mpdboot hanging
Hi,
This is probably something simple, but when I run mpdboot with a file
containing node names, mpd is started on the all the nodes but the last one
in the list (in the mpd.hosts file) and mpdboot hangs without returning. If
I hit ctrl-C it breaks saying it was in a function "handle shells that
echo", with the mpd's that were started still up.
I ran mpdboot successfully before as I recall, with no hanging, and all the
mpd's requested being started on all the nodes in the file, so it seems like
something simple has changed to cause this issue.
Any help greatly appreciated.
Thanks,
Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20100212/bf00363a/attachment.htm>
More information about the mpich-discuss
mailing list