[mpich-discuss] communicator creation/deletion semantics and performance

Tue Oct 30 17:42:45 CDT 2012

Thank you for the responses.

It sounds like dynamic communicator creation/deletion is good enough for my
purposes.  I looked into the papers and I don't think the advantages of
create_group are important in my scenario (global synchronization is not an
issue).  I will probably do this with comm_split, and look into the MPI-3
functionality if split becomes a bottleneck.

Thanks again,
Edgar

On Tue, Oct 30, 2012 at 11:14 AM, Jim Dinan <dinan at mcs.anl.gov> wrote:

> The paper that Jeff cited presents an implementation of
> MPI_Comm_create_group that is built on top of MPI-2 and uses recursive
> intercommunicator merging:
>
> https://www.mcs.anl.gov/**publications/paper_detail.php?**id=1695<https://www.mcs.anl.gov/publications/paper_detail.php?id=1695>
>
> At this year's EuroMPI, we presented a paper on the native MPI-3
> implementation of MPI_Comm_create_group and compared its cost with
> MPI_Comm_create and the technique presented in the above paper at EuroMPI
> '11.  The general takeaway is that MPI_Comm_create_group should always be
> the cheaper:
>
> https://www.mcs.anl.gov/**publications/paper_detail.php?**id=2061<https://www.mcs.anl.gov/publications/paper_detail.php?id=2061>
>
>  ~Jim.
>
>
> On 10/30/12 12:48 PM, Jeff Hammond wrote:
>
>> If you want to dynamically generate communicators on a group without
>> using it being collective on the parent communication (which might be
>> MPI_COMM_WORLD in the worst case), see
>> http://www.mcs.anl.gov/**publications/paper_detail.php?**id=1695<http://www.mcs.anl.gov/publications/paper_detail.php?id=1695>.
>>  I
>> suspect this will be useful for doing the dynamic unions you describe
>> below.  This operation is part of MPI-3 (see MPI_Comm_create_group)
>> and was implemented in MPICH2 a while ago.  I don't know the optimized
>> (i.e. internal) implementation works on BGQ but I suspect it can be
>> done relatively easily in the event that the on-top-of-MPI
>> implementation isn't fast enough (it is quite fast - faster than
>> MPI_Comm_create - for small groups, at least on BGP (see paper for
>> details)).
>>
>> I believe that MPI_Comm_split does communication, potentially
>> expensive communication (allgather? or maybe that was the optimized
>> version...).  At some point, I recall Bill Gropp and coworkers doing
>> some work to optimize it because of a problem we observed at scale on
>> BGP.  I don't know the status of that and whether or not it is part of
>> MPICH2 yet.  As for MPI_Comm_free, I suspect it is quite cheap and
>> does at most a barrier, but I am speculating.
>>
>> I can hack an optimized version of MPI_Comm_create_group into BGQ-MPI
>> using PAMI if IBM doesn't do it first.
>> PAMI_Geometry_create_**endpointlist is collective only over the output
>> geometry, which matches the semantics of MPI_Comm_create_group.
>>
>> Is this at all helpful?
>>
>> Jeff
>>
>> On Tue, Oct 30, 2012 at 11:33 AM, Edgar Solomonik
>> <solomon at eecs.berkeley.edu> wrote:
>>
>>> Hello,
>>>
>>> For my application, I need to maintain or dynamically create a large
>>> number
>>> of communicators.  My current solution has been to initialize a large
>>> number
>>> of communicators at start-up and make dynamic decisions on which to use
>>> later.  I have ran into MPI errors due to creating too many
>>> communicators on
>>> some occasions, but have so far been able to resolve this by limiting the
>>> set.
>>>
>>> However, I am now interested in employing an even larger set of
>>> communicators, that is harder to generate completely.  So, I would like
>>> to
>>> move to an approach which dynamically creates and frees communicators on
>>> demand.  I am concerned about two issues:
>>>
>>> 1. Is there an overhead to MPI_Comm_split and MPI_Comm_free, for
>>> instance do
>>> they need to perform inter-process communication?
>>> 2. Does the limit on the number of communicators bound the number of
>>> communicators ever created or the number of live (non-freed)
>>> communicators?
>>>
>>> My specific use-case is merging sets of communicators in dynamic ways.
>>>  e.g.
>>> on BG/Q I form up 6 communicators for each dimension (+1 for intra-node)
>>> and
>>> then make dynamic mapping decisions which select unions of the
>>> communicators
>>> to map to.  So, I either need to construct a fairly complicated tree
>>> data-structure to keep up with all possible unions of communicators or I
>>> can
>>> simply create the unions on demand and free them once I am done using
>>> them
>>> after a given iteration.  So, far I had used only contiguous unions of
>>> communicators, which is a smaller set and easier to keep track of in a
>>> flat
>>> data-structure, but I want even more generality now.
>>>
>>> Thanks,
>>> Edgar
>>>
>>> ______________________________**_________________
>>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>>> To manage subscription options or unsubscribe:
>>> https://lists.mcs.anl.gov/**mailman/listinfo/mpich-discuss<https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss>
>>>
>>>
>>
>>
>>  ______________________________**_________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/**mailman/listinfo/mpich-discuss<https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20121030/aaec9dc1/attachment.html>