[mpich-discuss] mpdboot and hostsfile

Kenin Coloma keninc at gmail.com
Tue Dec 1 17:33:58 CST 2009


In the mpich2-1.2.1, mpdboot stopped working (upgraded from mpich2-1.1.1)
for a fairly simple host file

(on compute06)
mpdboot --totalnum=6 --ncpus=0

host file:
compute07
compute08
compute09
compute10
compute11

mpdboot will hang after trying to launch mpd on compute10

[kcoloma at compute06 ~]$
/rd_personalization08/kcoloma/mpich_install/bin/mpdboot --totalnum=6
--ncpus=0 --file=/home/kcoloma/mpiHosts.txt
--mpd=/rd_personalization08/kcoloma/mpich_install/bin/mpd --verbose
running mpdallexit on compute06
LAUNCHED mpd on compute06  via
RUNNING: mpd on compute06
LAUNCHED mpd on compute07  via  compute06
LAUNCHED mpd on compute08  via  compute06
LAUNCHED mpd on compute09  via  compute06
LAUNCHED mpd on compute10  via  compute06
Traceback (most recent call last):
  File "/rd_personalization08/kcoloma/mpich_install/bin/mpdboot", line 476,
in ?
    mpdboot()
  File "/rd_personalization08/kcoloma/mpich_install/bin/mpdboot", line 347,
in mpdboot
    handle_mpd_output(fd,fd2idx,hostsAndInfo)
  File "/rd_personalization08/kcoloma/mpich_install/bin/mpdboot", line 385,
in handle_mpd_output
    for line in fd.readlines():    # handle output from shells that echo
stuff
KeyboardInterrupt

It will hang as long as --totalnum > 1.

mpdboot.py scripts are the same between the two versions of mpich, but the
mpd.py scripts changed to address ticket #905.  I've found that rolling back
to the mpich2-1.1.1p1 mpd.py, fixes the mpdboot issue I'm having.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20091201/7e0250b1/attachment.htm>


More information about the mpich-discuss mailing list