[mpich-discuss] mpdboot and hostsfile
Dave Goodell
goodell at mcs.anl.gov
Fri Dec 4 14:42:40 CST 2009
(reposting this on this thread in case you didn't see the other thread)
---------8<--------
This has been fixed in the trunk. Anyone who needs a fix in the short
term should be able to download the following copy of mpd.py and drop
it into src/pm/mpd/ in their MPICH2 source tree (and then re-install
MPICH2):
https://trac.mcs.anl.gov/projects/mpich2/export/5923/mpich2/trunk/src/pm/mpd/mpd.py
---------8<--------
By the way, I tested the --ncpus=0 behavior and found that it doesn't
do what you would expect. It basically just assumes that you really
passed --ncpus=1 on the mpdboot host. If you want a "head node" with
mpd you are probably better off just dropping the "--ncpus=0" argument
to mpdboot and then making sure that you pass "-1" to your mpiexec
commands to avoid launching processes on the head node first. Or use
hydra, which has no trouble launching processes on exactly the hosts
that you specify.
-Dave
On Dec 2, 2009, at 3:46 PM, Kenin Coloma wrote:
> Thanks, Dave!
>
> The idea was that we wanted to run the mpd's under root so that
> anyone could use them and have a "job submission" node. We haven't
> gotten to the point where we needed/wanted to setup real resource
> management/schedulers &c - but hopefully we will!
>
> -kenin
>
> On Wed, Dec 2, 2009 at 12:39 PM, Dave Goodell <goodell at mcs.anl.gov>
> wrote:
> Hmm... I'm slightly surprised that "--ncpus=0" ever worked.
> Glancing at the code right now, there's nothing that I see that
> specifically would cause a problem, but it's likely that's a broken
> corner case. Skimming the code I bet that it will even accept a
> negative ncpus argument, which clearly doesn't make any sense.
>
> Also, it seems strange that this would fail with the fairly minor
> modifications that are present in the 1.2.1 mpd.
>
> It sounds like you have a reasonable workaround for this right now,
> so I've filed this as a ticket to fix later: https://trac.mcs.anl.gov/projects/mpich2/ticket/963
>
> Another alternative if you don't need dynamic process support is to
> use the hydra process manager: http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager
>
> -Dave
>
>
> On Dec 1, 2009, at 5:33 PM, Kenin Coloma wrote:
>
> In the mpich2-1.2.1, mpdboot stopped working (upgraded from
> mpich2-1.1.1) for a fairly simple host file
>
> (on compute06)
> mpdboot --totalnum=6 --ncpus=0
>
> host file:
> compute07
> compute08
> compute09
> compute10
> compute11
>
> mpdboot will hang after trying to launch mpd on compute10
>
> [kcoloma at compute06 ~]$ /rd_personalization08/kcoloma/mpich_install/
> bin/mpdboot --totalnum=6 --ncpus=0 --file=/home/kcoloma/mpiHosts.txt
> --mpd=/rd_personalization08/kcoloma/mpich_install/bin/mpd --verbose
> running mpdallexit on compute06
> LAUNCHED mpd on compute06 via
> RUNNING: mpd on compute06
> LAUNCHED mpd on compute07 via compute06
> LAUNCHED mpd on compute08 via compute06
> LAUNCHED mpd on compute09 via compute06
> LAUNCHED mpd on compute10 via compute06
> Traceback (most recent call last):
> File "/rd_personalization08/kcoloma/mpich_install/bin/mpdboot",
> line 476, in ?
> mpdboot()
> File "/rd_personalization08/kcoloma/mpich_install/bin/mpdboot",
> line 347, in mpdboot
> handle_mpd_output(fd,fd2idx,hostsAndInfo)
> File "/rd_personalization08/kcoloma/mpich_install/bin/mpdboot",
> line 385, in handle_mpd_output
> for line in fd.readlines(): # handle output from shells that
> echo stuff
> KeyboardInterrupt
>
> It will hang as long as --totalnum > 1.
>
> mpdboot.py scripts are the same between the two versions of mpich,
> but the mpd.py scripts changed to address ticket #905. I've found
> that rolling back to the mpich2-1.1.1p1 mpd.py, fixes the mpdboot
> issue I'm having.
>
> _______________________________________________
> mpich-discuss mailing list
>
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list
>
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list