[mpich2-dev] Hydra Process Manager
Pavan Balaji
balaji at mcs.anl.gov
Thu Apr 26 17:40:03 CDT 2012
Hi Rayson,
Sorry about the delay in responding. Comments inline.
On 04/23/2012 03:21 PM, Rayson Ho wrote:
> 1) I could not find the design doc which is supposed to be available at:
>
> https://trac.mcs.anl.gov/projects/mpich2/browser/mpich2/branches/dev/hydra/doc/hydra
>
> Can I get access to the design doc??
You can find the overall design here.
http://wiki.mcs.anl.gov/mpich2/index.php/Hydra_Process_Management_Framework
We are working on a paper that has more details. Hopefully, that'll be
ready in a few weeks.
> 2) In tools/bootstrap/external/pbs_wait.c (yes, we mainly support Open
> Grid Scheduler/Grid Engine, but we also have sites running other batch
> systems and thus we support infrequently support Torque& SLURM
> users):
>
> while (events_count) {
> err = tm_poll(TM_NULL_EVENT,&e, 0,&poll_err);
>
> ...
> }
>
> Seems like a busy waiting loop, is there a reason to not pass wait=1
> to tm_poll()??
We were missing the ability to timeout in tm_poll(). So we had to
implement it ourselves with a busy loop. Note that this busy loop is
called at the very end after all the MPI processes have terminated.
This loop only waits for the proxies to terminate which should be very
quick.
-- Pavan
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
More information about the mpich2-dev
mailing list