[mpich-discuss] communicator creation/deletion semantics and performance

Jim Dinan dinan at mcs.anl.gov
Tue Oct 30 13:14:59 CDT 2012


The paper that Jeff cited presents an implementation of 
MPI_Comm_create_group that is built on top of MPI-2 and uses recursive 
intercommunicator merging:

https://www.mcs.anl.gov/publications/paper_detail.php?id=1695

At this year's EuroMPI, we presented a paper on the native MPI-3 
implementation of MPI_Comm_create_group and compared its cost with 
MPI_Comm_create and the technique presented in the above paper at 
EuroMPI '11.  The general takeaway is that MPI_Comm_create_group should 
always be the cheaper:

https://www.mcs.anl.gov/publications/paper_detail.php?id=2061

  ~Jim.

On 10/30/12 12:48 PM, Jeff Hammond wrote:
> If you want to dynamically generate communicators on a group without
> using it being collective on the parent communication (which might be
> MPI_COMM_WORLD in the worst case), see
> http://www.mcs.anl.gov/publications/paper_detail.php?id=1695.  I
> suspect this will be useful for doing the dynamic unions you describe
> below.  This operation is part of MPI-3 (see MPI_Comm_create_group)
> and was implemented in MPICH2 a while ago.  I don't know the optimized
> (i.e. internal) implementation works on BGQ but I suspect it can be
> done relatively easily in the event that the on-top-of-MPI
> implementation isn't fast enough (it is quite fast - faster than
> MPI_Comm_create - for small groups, at least on BGP (see paper for
> details)).
>
> I believe that MPI_Comm_split does communication, potentially
> expensive communication (allgather? or maybe that was the optimized
> version...).  At some point, I recall Bill Gropp and coworkers doing
> some work to optimize it because of a problem we observed at scale on
> BGP.  I don't know the status of that and whether or not it is part of
> MPICH2 yet.  As for MPI_Comm_free, I suspect it is quite cheap and
> does at most a barrier, but I am speculating.
>
> I can hack an optimized version of MPI_Comm_create_group into BGQ-MPI
> using PAMI if IBM doesn't do it first.
> PAMI_Geometry_create_endpointlist is collective only over the output
> geometry, which matches the semantics of MPI_Comm_create_group.
>
> Is this at all helpful?
>
> Jeff
>
> On Tue, Oct 30, 2012 at 11:33 AM, Edgar Solomonik
> <solomon at eecs.berkeley.edu> wrote:
>> Hello,
>>
>> For my application, I need to maintain or dynamically create a large number
>> of communicators.  My current solution has been to initialize a large number
>> of communicators at start-up and make dynamic decisions on which to use
>> later.  I have ran into MPI errors due to creating too many communicators on
>> some occasions, but have so far been able to resolve this by limiting the
>> set.
>>
>> However, I am now interested in employing an even larger set of
>> communicators, that is harder to generate completely.  So, I would like to
>> move to an approach which dynamically creates and frees communicators on
>> demand.  I am concerned about two issues:
>>
>> 1. Is there an overhead to MPI_Comm_split and MPI_Comm_free, for instance do
>> they need to perform inter-process communication?
>> 2. Does the limit on the number of communicators bound the number of
>> communicators ever created or the number of live (non-freed) communicators?
>>
>> My specific use-case is merging sets of communicators in dynamic ways.  e.g.
>> on BG/Q I form up 6 communicators for each dimension (+1 for intra-node) and
>> then make dynamic mapping decisions which select unions of the communicators
>> to map to.  So, I either need to construct a fairly complicated tree
>> data-structure to keep up with all possible unions of communicators or I can
>> simply create the unions on demand and free them once I am done using them
>> after a given iteration.  So, far I had used only contiguous unions of
>> communicators, which is a smaller set and easier to keep track of in a flat
>> data-structure, but I want even more generality now.
>>
>> Thanks,
>> Edgar
>>
>> _______________________________________________
>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>> To manage subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>
>
>


More information about the mpich-discuss mailing list