[mpich-discuss] mpdboot hanging
Rajeev Thakur
thakur at mcs.anl.gov
Sun Feb 14 19:23:53 CST 2010
It shouldn't need --maxbranch. Try shuffling the hosts in the hostfile
and see if the problem persists with the same host. In that case, there
may be something wrong with the networking configuration for that host.
Or try using the Hydra process manager, which doesn't require setting up
MPDs. You can use mpiexec.hydra.
Rajeev
_____
From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Jason Palmer
Sent: Friday, February 12, 2010 7:45 PM
To: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] mpdboot hanging
My problem may involve --maxbranch. I don't recall needing to set this
before to start 6 mpd procs, one local and 5 on 5 remote hosts, but now
to start more than 4 remote mpd's, which it says is the maxbranch
default, I need to set -maxbranch=5 for example.
Maybe I'm misremembering how mpdboot worked. It is supposed to return
after starting the mpd's right?
Is setting maxbranch always required to start more than 4 remote mpd's?
Here is what I'm getting, where "mpdfile" contains the hostnames . the
traceback occurs after hitting ctrl-c.
[jason at juggling ~]$ mpdboot -f mpdfile -n 7 --verbose --maxbranch=6
running mpdallexit on juggling.ucsd.edu
LAUNCHED mpd on juggling.ucsd.edu via
RUNNING: mpd on juggling.ucsd.edu
LAUNCHED mpd on compute-0-16 via juggling.ucsd.edu
LAUNCHED mpd on compute-0-17 via juggling.ucsd.edu
LAUNCHED mpd on compute-0-18 via juggling.ucsd.edu
LAUNCHED mpd on compute-0-19 via juggling.ucsd.edu
LAUNCHED mpd on compute-0-20 via juggling.ucsd.edu
LAUNCHED mpd on compute-0-15 via juggling.ucsd.edu
Traceback (most recent call last):
File "/home/jason/mpich2-1.2.1-install/bin/mpdboot", line 476, in ?
mpdboot()
File "/home/jason/mpich2-1.2.1-install/bin/mpdboot", line 347, in
mpdboot
handle_mpd_output(fd,fd2idx,hostsAndInfo)
File "/home/jason/mpich2-1.2.1-install/bin/mpdboot", line 385, in
handle_mpd_output
for line in fd.readlines(): # handle output from shells that echo
stuff
KeyboardInterrupt
[jason at juggling ~]$
Thanks,
Jason
From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Jason Palmer
Sent: Friday, February 12, 2010 3:13 PM
To: mpich-discuss at mcs.anl.gov
Subject: [mpich-discuss] mpdboot hanging
Hi,
This is probably something simple, but when I run mpdboot with a file
containing node names, mpd is started on the all the nodes but the last
one in the list (in the mpd.hosts file) and mpdboot hangs without
returning. If I hit ctrl-C it breaks saying it was in a function "handle
shells that echo", with the mpd's that were started still up.
I ran mpdboot successfully before as I recall, with no hanging, and all
the mpd's requested being started on all the nodes in the file, so it
seems like something simple has changed to cause this issue.
Any help greatly appreciated.
Thanks,
Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20100214/adddacb4/attachment.htm>
More information about the mpich-discuss
mailing list