[mpich-discuss] mpdboot hanging

Jason Palmer jason at sccn.ucsd.edu
Fri Feb 12 19:44:49 CST 2010


My problem may involve --maxbranch. I don't recall needing to set this
before to start 6 mpd procs, one local and 5 on 5 remote hosts, but now to
start more than 4 remote mpd's, which it says is the maxbranch default, I
need to set -maxbranch=5 for example.

 

Maybe I'm misremembering how mpdboot worked. It is supposed to return after
starting the mpd's right?

 

Is setting maxbranch always required to start more than 4 remote mpd's?

 

Here is what I'm getting, where "mpdfile" contains the hostnames . the
traceback occurs after hitting ctrl-c.

 

[jason at juggling ~]$ mpdboot -f mpdfile -n 7 --verbose --maxbranch=6

running mpdallexit on juggling.ucsd.edu

LAUNCHED mpd on juggling.ucsd.edu  via

RUNNING: mpd on juggling.ucsd.edu

LAUNCHED mpd on compute-0-16  via  juggling.ucsd.edu

LAUNCHED mpd on compute-0-17  via  juggling.ucsd.edu

LAUNCHED mpd on compute-0-18  via  juggling.ucsd.edu

LAUNCHED mpd on compute-0-19  via  juggling.ucsd.edu

LAUNCHED mpd on compute-0-20  via  juggling.ucsd.edu

LAUNCHED mpd on compute-0-15  via  juggling.ucsd.edu

Traceback (most recent call last):

  File "/home/jason/mpich2-1.2.1-install/bin/mpdboot", line 476, in ?

    mpdboot()

  File "/home/jason/mpich2-1.2.1-install/bin/mpdboot", line 347, in mpdboot

    handle_mpd_output(fd,fd2idx,hostsAndInfo)

  File "/home/jason/mpich2-1.2.1-install/bin/mpdboot", line 385, in
handle_mpd_output

    for line in fd.readlines():    # handle output from shells that echo
stuff

KeyboardInterrupt

[jason at juggling ~]$

 

Thanks,

Jason

 

 

From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Jason Palmer
Sent: Friday, February 12, 2010 3:13 PM
To: mpich-discuss at mcs.anl.gov
Subject: [mpich-discuss] mpdboot hanging

 

Hi,

This is probably something simple, but when I run mpdboot with a file
containing node names, mpd is started on the all the nodes but the last one
in the list (in the mpd.hosts file) and mpdboot hangs without returning. If
I hit ctrl-C it breaks saying it was in a function "handle shells that
echo", with the mpd's that were started still up.

 

I ran mpdboot successfully before as I recall, with no hanging, and all the
mpd's requested being started on all the nodes in the file, so it seems like
something simple has changed to cause this issue.

 

Any help greatly appreciated.

 

Thanks,

Jason

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20100212/bf00363a/attachment.htm>


More information about the mpich-discuss mailing list