[mpich-discuss] MPI_Comm_spawn(), dynamic distribution and hydra

Pavan Balaji balaji at mcs.anl.gov
Wed Mar 2 14:28:59 CST 2011


Hi Roberto,

Hydra doesn't currently have the capability to request for more 
resources from a resource manager. It can only query for the allocated 
resources.

But if you ask Hydra to spawn a process on a particular node (using the 
info argument), it will do that.

  -- Pavan

On 03/02/2011 05:55 AM, Roberto Fichera wrote:
> Hi All,
>
> I made a parallelization library on top of the MPICH2 library that basically
> performs a dynamic job's distribution among all the assigned nodes. So
> the library makes an heavy use of MPI_Comm_spawn() in a MPI_THREAD_MULTIPLE
> configure MPICH2 library. This works pretty fine and we are able to easily differentiate
> the spawned jobs algorithms once decomposing the data. Furthermore the library
> permit the user to specify more than one node to assign to any given job. This end up
> on "electing" the given slave node to become a master of a node subset (submastering)
> to perform more fine grained parallel distribution as per user's algorithm. So
> typical distribution scenery is like a tree where the nodes are submasters. Since we
> discover that for certain algorithm we might need to dynamically decide the number
> of nodes to dedicate for any given submastering parallel computation. So I started to
> investigate if would be possible to make an interface with the cluster resource manager
> and the library we use.
>
> So my idea is that in certain condition before to call the MPI_Comm_spawn() for spawning
> a job into a well defined hostname I would like to reserve, via the resource manager, the
> number of node requested by the given submastering job. So my first thought was to make
> an interface against PBS (it's our cluster manager/scheduler) using its libpbs and basically
> implementing something like a qsub/mpiexec function call, but I come at the conclusion
> that for implementing the dynamic bootstrap process, for the newly allocated
> node, Hydra would be much more flexible in such sense, since it seems providing a more
> abstracted interface for doing such integration.
>
> So my question is: does anyone know a better way to do so? Might it create
> problems or not I don't see in some way?
>
> Any suggestion will be really appreciated.
>
> Thanks in advance.
> Roberto Fichera.
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list