[mpich-discuss] intercommunicator support in MPICH

Wei-keng Liao wkliao at ece.northwestern.edu
Tue Jul 20 13:13:36 CDT 2010


Hi, Jim,

We have developed a work, called I/O delegate, using the same approach of allocating
an additional set of MPI processes to handle I/O requests forwarded from the MPI clients.
If you are interested, please refer "Scaling parallel I/O performance through I/O delegate and
caching system" published in SC08.

We have done our experiments on a Cray XT (Franklin at NERSC). As far as we know, Cray's MPI has
not yet supported MPI-2's dynamic process management (eg. MPI_Spawn, etc.) But, maybe Cray
MPI developers on this list can shed the light of their future plan for supporting this.

So, we used MPI_Comm_split on Franklin and the inter-communicator support is just fine there.
As for the communication performance between application processes and the delegate processes,
we did not explicitly benchmark the inter-communnicator performance, but the cost did not
seem to be particularly high.


Wei-keng

On Jul 20, 2010, at 12:41 PM, Rob Latham wrote:

> 
> (please keep Jim cc'ed on followups, thanks)
> 
> On Tue, Jul 20, 2010 at 11:32:16AM -0500, Dave Goodell wrote:
>> Intercommunicators are definitely supported in MPICH2.  You probably
>> have MPICH installed instead, which does not support
>> intercommunicators (nor is it supported in general). 
> 
> Jim does explicitly mention the Cray.  Any chance that Jaguar is
> running some old version of MPICH2 with a shoddy intercommunicator
> support?  
> 
> Jim is also coming from AIX: do you know of anything about the IBM
> intercommunicator support that might make the transition to MPICH2
> odd?  (due to, say, defects in either the IBM or MPICH2
> implementation:  as we know, the standard is one thing but
> implementations have varying degrees of "quality")
> 
>> Point-to-point performance in intercommunicators should generally be
>> identical to performance in intracommunicators.  Collective
>> communication routines for intercommunicators have not been
>> extensively tuned, so they may not quite perform as well as they
>> could, depending on the particular collective and way it is invoked.
> 
> Well there you have it, Jim: it's supposed to "just work".  Perhaps
> you can tell us a bit more about how you are creating the
> intercommunicators and how you are using them?
> 
> ==rob
> 
>> 
>> On Jul 20, 2010, at 8:05 AM CDT, Rob Latham wrote:
>> 
>>> Hi Jim.  I'm interested in hearing more about how this async i/o
>>> strategy plays out on other platforms.  
>>> 
>>> I'm moving this to the mpich-discuss list, because as far as I know
>>> intercommunicators are supported on MPICH2, but the folks on the
>>> mpich-discuss list will be able to speak with more authority on that
>>> matter.
>>> 
>>> What is it about intercommunicators that does not work for you?  Are
>>> you splitting up COMM_WORLD to form comp_comm and io_comm ?   
>>> 
>>> There might be performance implications with intercommunicators.  Can
>>> the link between the two sets be the bottleneck here?  I presume  you
>>> are transferring a lot of data to io_comm.  
>>> 
>>> MPICH guys, Jim's original email is below. 
>>> ==rob
>>> 
>>> On Mon, Jul 19, 2010 at 04:44:50PM -0600, Jim Edwards wrote:
>>>> Hi All,
>>>> 
>>>> I have created a new repository branch and checked in the beginnings of a
>>>> version of pio which allows the io tasks to be a disjoint set of tasks from
>>>> those used for computation.
>>>> 
>>>> The io_comm and the comp_comm are disjoint and pio_init
>>>> is called with an intercommunicator which spans the two task sets.   The
>>>> compute task set returns while the io task set waits in a call back loop for
>>>> further instructions.
>>>> 
>>>> I have added three new tests in the pio test suite and all of them pass on
>>>> bluefire.   Then I discovered that the mpich  does not support mpi
>>>> intercommunicators.    These are part of the mpi-2 standard and I thought
>>>> that all of the mpi implementations were there by now?  Apparently not.   Is
>>>> there another mpi implementation that we can try on jaguar or edinburgh?
>>>> 
>>>> Currently all of the pio commands are still syncronous calls - that is the
>>>> compute tasks cannot continue until the write has completed, my eventual
>>>> plan is to relax this requirement to see if there is a performance advantage
>>>> - but if AIX-POE is the only environment to support this model I may have to
>>>> rethink the approach.
>>>> 
>>>> If you get a chance please have a look at the implementation in
>>>> https://parallelio.googlecode.com/svn/branches/async_pio1_1_1/
>>>> 
>>>> If enough of you are interested we can schedule a con-call to go over how it
>>>> works and some of the things that still need to be done.
>>>> 
>>>> Jim
>>>> 
>>> 
>>> -- 
>>> Rob Latham
>>> Mathematics and Computer Science Division
>>> Argonne National Lab, IL USA
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> 
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> -- 
> Rob Latham
> Mathematics and Computer Science Division
> Argonne National Lab, IL USA
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 



More information about the mpich-discuss mailing list