[mpich2-dev] Hydra Process Manager

Pavan Balaji balaji at mcs.anl.gov
Thu Apr 26 17:40:03 CDT 2012


Hi Rayson,

Sorry about the delay in responding.  Comments inline.

On 04/23/2012 03:21 PM, Rayson Ho wrote:
> 1) I could not find the design doc which is supposed to be available at:
>
> https://trac.mcs.anl.gov/projects/mpich2/browser/mpich2/branches/dev/hydra/doc/hydra
>
> Can I get access to the design doc??

You can find the overall design here.

http://wiki.mcs.anl.gov/mpich2/index.php/Hydra_Process_Management_Framework

We are working on a paper that has more details.  Hopefully, that'll be 
ready in a few weeks.

> 2) In tools/bootstrap/external/pbs_wait.c (yes, we mainly support Open
> Grid Scheduler/Grid Engine, but we also have sites running other batch
> systems and thus we support infrequently support Torque&  SLURM
> users):
>
>      while (events_count) {
>          err = tm_poll(TM_NULL_EVENT,&e, 0,&poll_err);
>
>          ...
>      }
>
> Seems like a busy waiting loop, is there a reason to not pass wait=1
> to tm_poll()??

We were missing the ability to timeout in tm_poll().  So we had to 
implement it ourselves with a busy loop.  Note that this busy loop is 
called at the very end after all the MPI processes have terminated. 
This loop only waits for the proxies to terminate which should be very 
quick.

  -- Pavan

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich2-dev mailing list