[mpich-discuss] mpdboot hanging

Jason Palmer jason at sccn.ucsd.edu
Mon Feb 15 01:58:02 CST 2010


I tried shuffling the hosts-it hangs on the last host in the list (see
output below). I realized that it was working before because I was actually
calling the mpich2 installed in /opt/mpich2 which uses an older non-openMP
compatible gcc. That mpdboot works fine, but the one I installed to use
gcc-4.4.3 hangs on the last node as seen below. The mpiCC, etc. that I built
work ok, so I guess I could use the older mpdboot to launch mpd's, and use
he mpiCC etc. that I built to compile. It would be nice to know what the
difference in the mpdboots is though. The mpich2version compilation options
are the same (I recompiled the one I built several times).

 

[jason at juggling ~]$ cat mpdfile2

compute-0-20

compute-0-16

compute-0-17

compute-0-18

compute-0-19

[jason at juggling ~]$ mpdboot -f mpdfile2 -n 6 --verbose

running mpdallexit on juggling.ucsd.edu

LAUNCHED mpd on juggling.ucsd.edu  via

RUNNING: mpd on juggling.ucsd.edu

LAUNCHED mpd on compute-0-20  via  juggling.ucsd.edu

LAUNCHED mpd on compute-0-16  via  juggling.ucsd.edu

LAUNCHED mpd on compute-0-17  via  juggling.ucsd.edu

LAUNCHED mpd on compute-0-18  via  juggling.ucsd.edu

Traceback (most recent call last):

  File "/home/jason/mpich2-1.2.1-install/bin/mpdboot", line 476, in ?

    mpdboot()

  File "/home/jason/mpich2-1.2.1-install/bin/mpdboot", line 347, in mpdboot

    handle_mpd_output(fd,fd2idx,hostsAndInfo)

  File "/home/jason/mpich2-1.2.1-install/bin/mpdboot", line 385, in
handle_mpd_output

    for line in fd.readlines():    # handle output from shells that echo
stuff

KeyboardInterrupt

[jason at juggling ~]$ mpdtrace

juggling

compute-0-18

compute-0-17

compute-0-16

compute-0-20

[jason at juggling ~]$

 

Thanks,

Jason

 

 

From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Rajeev Thakur
Sent: Sunday, February 14, 2010 5:24 PM
To: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] mpdboot hanging

 

It shouldn't need --maxbranch. Try shuffling the hosts in the hostfile and
see if the problem persists with the same host. In that case, there may be
something wrong with the networking configuration for that host.

 

Or try using the Hydra process manager, which doesn't require setting up
MPDs. You can use mpiexec.hydra.

 

Rajeev

 

  _____  

From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Jason Palmer
Sent: Friday, February 12, 2010 7:45 PM
To: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] mpdboot hanging

My problem may involve --maxbranch. I don't recall needing to set this
before to start 6 mpd procs, one local and 5 on 5 remote hosts, but now to
start more than 4 remote mpd's, which it says is the maxbranch default, I
need to set -maxbranch=5 for example.

 

Maybe I'm misremembering how mpdboot worked. It is supposed to return after
starting the mpd's right?

 

Is setting maxbranch always required to start more than 4 remote mpd's?

 

Here is what I'm getting, where "mpdfile" contains the hostnames . the
traceback occurs after hitting ctrl-c.

 

[jason at juggling ~]$ mpdboot -f mpdfile -n 7 --verbose --maxbranch=6

running mpdallexit on juggling.ucsd.edu

LAUNCHED mpd on juggling.ucsd.edu  via

RUNNING: mpd on juggling.ucsd.edu

LAUNCHED mpd on compute-0-16  via  juggling.ucsd.edu

LAUNCHED mpd on compute-0-17  via  juggling.ucsd.edu

LAUNCHED mpd on compute-0-18  via  juggling.ucsd.edu

LAUNCHED mpd on compute-0-19  via  juggling.ucsd.edu

LAUNCHED mpd on compute-0-20  via  juggling.ucsd.edu

LAUNCHED mpd on compute-0-15  via  juggling.ucsd.edu

Traceback (most recent call last):

  File "/home/jason/mpich2-1.2.1-install/bin/mpdboot", line 476, in ?

    mpdboot()

  File "/home/jason/mpich2-1.2.1-install/bin/mpdboot", line 347, in mpdboot

    handle_mpd_output(fd,fd2idx,hostsAndInfo)

  File "/home/jason/mpich2-1.2.1-install/bin/mpdboot", line 385, in
handle_mpd_output

    for line in fd.readlines():    # handle output from shells that echo
stuff

KeyboardInterrupt

[jason at juggling ~]$

 

Thanks,

Jason

 

 

From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Jason Palmer
Sent: Friday, February 12, 2010 3:13 PM
To: mpich-discuss at mcs.anl.gov
Subject: [mpich-discuss] mpdboot hanging

 

Hi,

This is probably something simple, but when I run mpdboot with a file
containing node names, mpd is started on the all the nodes but the last one
in the list (in the mpd.hosts file) and mpdboot hangs without returning. If
I hit ctrl-C it breaks saying it was in a function "handle shells that
echo", with the mpd's that were started still up.

 

I ran mpdboot successfully before as I recall, with no hanging, and all the
mpd's requested being started on all the nodes in the file, so it seems like
something simple has changed to cause this issue.

 

Any help greatly appreciated.

 

Thanks,

Jason

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20100214/6c39a35c/attachment.htm>


More information about the mpich-discuss mailing list