[mpich-discuss] communicator creation/deletion semantics and performance

Tue Oct 30 12:48:05 CDT 2012

If you want to dynamically generate communicators on a group without
using it being collective on the parent communication (which might be
MPI_COMM_WORLD in the worst case), see
http://www.mcs.anl.gov/publications/paper_detail.php?id=1695.  I
suspect this will be useful for doing the dynamic unions you describe
below.  This operation is part of MPI-3 (see MPI_Comm_create_group)
and was implemented in MPICH2 a while ago.  I don't know the optimized
(i.e. internal) implementation works on BGQ but I suspect it can be
done relatively easily in the event that the on-top-of-MPI
implementation isn't fast enough (it is quite fast - faster than
MPI_Comm_create - for small groups, at least on BGP (see paper for
details)).

I believe that MPI_Comm_split does communication, potentially
expensive communication (allgather? or maybe that was the optimized
version...).  At some point, I recall Bill Gropp and coworkers doing
some work to optimize it because of a problem we observed at scale on
BGP.  I don't know the status of that and whether or not it is part of
MPICH2 yet.  As for MPI_Comm_free, I suspect it is quite cheap and
does at most a barrier, but I am speculating.

I can hack an optimized version of MPI_Comm_create_group into BGQ-MPI
using PAMI if IBM doesn't do it first.
PAMI_Geometry_create_endpointlist is collective only over the output
geometry, which matches the semantics of MPI_Comm_create_group.

Is this at all helpful?

Jeff

On Tue, Oct 30, 2012 at 11:33 AM, Edgar Solomonik
<solomon at eecs.berkeley.edu> wrote:
> Hello,
>
> For my application, I need to maintain or dynamically create a large number
> of communicators.  My current solution has been to initialize a large number
> of communicators at start-up and make dynamic decisions on which to use
> later.  I have ran into MPI errors due to creating too many communicators on
> some occasions, but have so far been able to resolve this by limiting the
> set.
>
> However, I am now interested in employing an even larger set of
> communicators, that is harder to generate completely.  So, I would like to
> move to an approach which dynamically creates and frees communicators on
> demand.  I am concerned about two issues:
>
> 1. Is there an overhead to MPI_Comm_split and MPI_Comm_free, for instance do
> they need to perform inter-process communication?
> 2. Does the limit on the number of communicators bound the number of
> communicators ever created or the number of live (non-freed) communicators?
>
> My specific use-case is merging sets of communicators in dynamic ways.  e.g.
> on BG/Q I form up 6 communicators for each dimension (+1 for intra-node) and
> then make dynamic mapping decisions which select unions of the communicators
> to map to.  So, I either need to construct a fairly complicated tree
> data-structure to keep up with all possible unions of communicators or I can
> simply create the unions on demand and free them once I am done using them
> after a given iteration.  So, far I had used only contiguous unions of
> communicators, which is a smaller set and easier to keep track of in a flat
> data-structure, but I want even more generality now.
>
> Thanks,
> Edgar
>
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>

-- 
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond