Thanks, Dave!<br><br>The idea was that we wanted to run the mpd's under root so that anyone could use them and have a "job submission" node. We haven't gotten to the point where we needed/wanted to setup real resource management/schedulers &c - but hopefully we will!<br>
<br>-kenin<br><br><div class="gmail_quote">On Wed, Dec 2, 2009 at 12:39 PM, Dave Goodell <span dir="ltr"><<a href="mailto:goodell@mcs.anl.gov">goodell@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Hmm... I'm slightly surprised that "--ncpus=0" ever worked. Glancing at the code right now, there's nothing that I see that specifically would cause a problem, but it's likely that's a broken corner case. Skimming the code I bet that it will even accept a negative ncpus argument, which clearly doesn't make any sense.<br>
<br>
Also, it seems strange that this would fail with the fairly minor modifications that are present in the 1.2.1 mpd.<br>
<br>
It sounds like you have a reasonable workaround for this right now, so I've filed this as a ticket to fix later: <a href="https://trac.mcs.anl.gov/projects/mpich2/ticket/963" target="_blank">https://trac.mcs.anl.gov/projects/mpich2/ticket/963</a><br>
<br>
Another alternative if you don't need dynamic process support is to use the hydra process manager: <a href="http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager" target="_blank">http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager</a><br>
<br>
-Dave<div><div></div><div class="h5"><br>
<br>
On Dec 1, 2009, at 5:33 PM, Kenin Coloma wrote:<br>
<br>
</div></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div><div></div><div class="h5">
In the mpich2-1.2.1, mpdboot stopped working (upgraded from mpich2-1.1.1) for a fairly simple host file<br>
<br>
(on compute06)<br>
mpdboot --totalnum=6 --ncpus=0<br>
<br>
host file:<br>
compute07<br>
compute08<br>
compute09<br>
compute10<br>
compute11<br>
<br>
mpdboot will hang after trying to launch mpd on compute10<br>
<br>
[kcoloma@compute06 ~]$ /rd_personalization08/kcoloma/mpich_install/bin/mpdboot --totalnum=6 --ncpus=0 --file=/home/kcoloma/mpiHosts.txt --mpd=/rd_personalization08/kcoloma/mpich_install/bin/mpd --verbose<br>
running mpdallexit on compute06<br>
LAUNCHED mpd on compute06 via<br>
RUNNING: mpd on compute06<br>
LAUNCHED mpd on compute07 via compute06<br>
LAUNCHED mpd on compute08 via compute06<br>
LAUNCHED mpd on compute09 via compute06<br>
LAUNCHED mpd on compute10 via compute06<br>
Traceback (most recent call last):<br>
File "/rd_personalization08/kcoloma/mpich_install/bin/mpdboot", line 476, in ?<br>
mpdboot()<br>
File "/rd_personalization08/kcoloma/mpich_install/bin/mpdboot", line 347, in mpdboot<br>
handle_mpd_output(fd,fd2idx,hostsAndInfo)<br>
File "/rd_personalization08/kcoloma/mpich_install/bin/mpdboot", line 385, in handle_mpd_output<br>
for line in fd.readlines(): # handle output from shells that echo stuff<br>
KeyboardInterrupt<br>
<br>
It will hang as long as --totalnum > 1.<br>
<br>
mpdboot.py scripts are the same between the two versions of mpich, but the mpd.py scripts changed to address ticket #905. I've found that rolling back to the mpich2-1.1.1p1 mpd.py, fixes the mpdboot issue I'm having.<br>
<br></div></div>
_______________________________________________<br>
mpich-discuss mailing list<div class="im"><br>
<a href="mailto:mpich-discuss@mcs.anl.gov" target="_blank">mpich-discuss@mcs.anl.gov</a><br>
</div><a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
</blockquote>
<br>
_______________________________________________<br>
mpich-discuss mailing list<div class="im"><br>
<a href="mailto:mpich-discuss@mcs.anl.gov" target="_blank">mpich-discuss@mcs.anl.gov</a><br>
</div><a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
</blockquote></div><br>