[mpich-discuss] intercommunicator support in MPICH
Wei-keng Liao
wkliao at ece.northwestern.edu
Tue Jul 20 13:13:36 CDT 2010
Hi, Jim,
We have developed a work, called I/O delegate, using the same approach of allocating
an additional set of MPI processes to handle I/O requests forwarded from the MPI clients.
If you are interested, please refer "Scaling parallel I/O performance through I/O delegate and
caching system" published in SC08.
We have done our experiments on a Cray XT (Franklin at NERSC). As far as we know, Cray's MPI has
not yet supported MPI-2's dynamic process management (eg. MPI_Spawn, etc.) But, maybe Cray
MPI developers on this list can shed the light of their future plan for supporting this.
So, we used MPI_Comm_split on Franklin and the inter-communicator support is just fine there.
As for the communication performance between application processes and the delegate processes,
we did not explicitly benchmark the inter-communnicator performance, but the cost did not
seem to be particularly high.
Wei-keng
On Jul 20, 2010, at 12:41 PM, Rob Latham wrote:
>
> (please keep Jim cc'ed on followups, thanks)
>
> On Tue, Jul 20, 2010 at 11:32:16AM -0500, Dave Goodell wrote:
>> Intercommunicators are definitely supported in MPICH2. You probably
>> have MPICH installed instead, which does not support
>> intercommunicators (nor is it supported in general).
>
> Jim does explicitly mention the Cray. Any chance that Jaguar is
> running some old version of MPICH2 with a shoddy intercommunicator
> support?
>
> Jim is also coming from AIX: do you know of anything about the IBM
> intercommunicator support that might make the transition to MPICH2
> odd? (due to, say, defects in either the IBM or MPICH2
> implementation: as we know, the standard is one thing but
> implementations have varying degrees of "quality")
>
>> Point-to-point performance in intercommunicators should generally be
>> identical to performance in intracommunicators. Collective
>> communication routines for intercommunicators have not been
>> extensively tuned, so they may not quite perform as well as they
>> could, depending on the particular collective and way it is invoked.
>
> Well there you have it, Jim: it's supposed to "just work". Perhaps
> you can tell us a bit more about how you are creating the
> intercommunicators and how you are using them?
>
> ==rob
>
>>
>> On Jul 20, 2010, at 8:05 AM CDT, Rob Latham wrote:
>>
>>> Hi Jim. I'm interested in hearing more about how this async i/o
>>> strategy plays out on other platforms.
>>>
>>> I'm moving this to the mpich-discuss list, because as far as I know
>>> intercommunicators are supported on MPICH2, but the folks on the
>>> mpich-discuss list will be able to speak with more authority on that
>>> matter.
>>>
>>> What is it about intercommunicators that does not work for you? Are
>>> you splitting up COMM_WORLD to form comp_comm and io_comm ?
>>>
>>> There might be performance implications with intercommunicators. Can
>>> the link between the two sets be the bottleneck here? I presume you
>>> are transferring a lot of data to io_comm.
>>>
>>> MPICH guys, Jim's original email is below.
>>> ==rob
>>>
>>> On Mon, Jul 19, 2010 at 04:44:50PM -0600, Jim Edwards wrote:
>>>> Hi All,
>>>>
>>>> I have created a new repository branch and checked in the beginnings of a
>>>> version of pio which allows the io tasks to be a disjoint set of tasks from
>>>> those used for computation.
>>>>
>>>> The io_comm and the comp_comm are disjoint and pio_init
>>>> is called with an intercommunicator which spans the two task sets. The
>>>> compute task set returns while the io task set waits in a call back loop for
>>>> further instructions.
>>>>
>>>> I have added three new tests in the pio test suite and all of them pass on
>>>> bluefire. Then I discovered that the mpich does not support mpi
>>>> intercommunicators. These are part of the mpi-2 standard and I thought
>>>> that all of the mpi implementations were there by now? Apparently not. Is
>>>> there another mpi implementation that we can try on jaguar or edinburgh?
>>>>
>>>> Currently all of the pio commands are still syncronous calls - that is the
>>>> compute tasks cannot continue until the write has completed, my eventual
>>>> plan is to relax this requirement to see if there is a performance advantage
>>>> - but if AIX-POE is the only environment to support this model I may have to
>>>> rethink the approach.
>>>>
>>>> If you get a chance please have a look at the implementation in
>>>> https://parallelio.googlecode.com/svn/branches/async_pio1_1_1/
>>>>
>>>> If enough of you are interested we can schedule a con-call to go over how it
>>>> works and some of the things that still need to be done.
>>>>
>>>> Jim
>>>>
>>>
>>> --
>>> Rob Latham
>>> Mathematics and Computer Science Division
>>> Argonne National Lab, IL USA
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> --
> Rob Latham
> Mathematics and Computer Science Division
> Argonne National Lab, IL USA
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
More information about the mpich-discuss
mailing list