[mpich-discuss] mpdboot hanging

Rajeev Thakur thakur at mcs.anl.gov
Sun Feb 14 19:23:53 CST 2010


It shouldn't need --maxbranch. Try shuffling the hosts in the hostfile
and see if the problem persists with the same host. In that case, there
may be something wrong with the networking configuration for that host.
 
Or try using the Hydra process manager, which doesn't require setting up
MPDs. You can use mpiexec.hydra.
 
Rajeev


  _____  

From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Jason Palmer
Sent: Friday, February 12, 2010 7:45 PM
To: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] mpdboot hanging



My problem may involve --maxbranch. I don't recall needing to set this
before to start 6 mpd procs, one local and 5 on 5 remote hosts, but now
to start more than 4 remote mpd's, which it says is the maxbranch
default, I need to set -maxbranch=5 for example.

 

Maybe I'm misremembering how mpdboot worked. It is supposed to return
after starting the mpd's right?

 

Is setting maxbranch always required to start more than 4 remote mpd's?

 

Here is what I'm getting, where "mpdfile" contains the hostnames . the
traceback occurs after hitting ctrl-c.

 

[jason at juggling ~]$ mpdboot -f mpdfile -n 7 --verbose --maxbranch=6

running mpdallexit on juggling.ucsd.edu

LAUNCHED mpd on juggling.ucsd.edu  via

RUNNING: mpd on juggling.ucsd.edu

LAUNCHED mpd on compute-0-16  via  juggling.ucsd.edu

LAUNCHED mpd on compute-0-17  via  juggling.ucsd.edu

LAUNCHED mpd on compute-0-18  via  juggling.ucsd.edu

LAUNCHED mpd on compute-0-19  via  juggling.ucsd.edu

LAUNCHED mpd on compute-0-20  via  juggling.ucsd.edu

LAUNCHED mpd on compute-0-15  via  juggling.ucsd.edu

Traceback (most recent call last):

  File "/home/jason/mpich2-1.2.1-install/bin/mpdboot", line 476, in ?

    mpdboot()

  File "/home/jason/mpich2-1.2.1-install/bin/mpdboot", line 347, in
mpdboot

    handle_mpd_output(fd,fd2idx,hostsAndInfo)

  File "/home/jason/mpich2-1.2.1-install/bin/mpdboot", line 385, in
handle_mpd_output

    for line in fd.readlines():    # handle output from shells that echo
stuff

KeyboardInterrupt

[jason at juggling ~]$

 

Thanks,

Jason

 

 

From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Jason Palmer
Sent: Friday, February 12, 2010 3:13 PM
To: mpich-discuss at mcs.anl.gov
Subject: [mpich-discuss] mpdboot hanging

 

Hi,

This is probably something simple, but when I run mpdboot with a file
containing node names, mpd is started on the all the nodes but the last
one in the list (in the mpd.hosts file) and mpdboot hangs without
returning. If I hit ctrl-C it breaks saying it was in a function "handle
shells that echo", with the mpd's that were started still up.

 

I ran mpdboot successfully before as I recall, with no hanging, and all
the mpd's requested being started on all the nodes in the file, so it
seems like something simple has changed to cause this issue.

 

Any help greatly appreciated.

 

Thanks,

Jason

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20100214/adddacb4/attachment.htm>


More information about the mpich-discuss mailing list