[mpich-discuss] intercommunicator support in MPICH

Jim Edwards jedwards at ucar.edu
Tue Jul 20 13:03:13 CDT 2010


Hi Rob,

I am getting different errors depending on the platform I am running on -
the only one that is clearly an mpi problem is edinburgh  with mpich-1.2.7p1
- so I will request an update to mpich2.

I am working on debugging on jaguar - if you have an account there maybe you
could have a look?  /tmp/work/jedwards/testpio/all.asb04    There is a
problem where something is not being communicated or is communicated
incorrectly, i've yet to find the source.

Bluegene also has problems I've yet to look into - I suspect I just need to
dig deeper.



On Tue, Jul 20, 2010 at 11:41 AM, Rob Latham <robl at mcs.anl.gov> wrote:

>
> (please keep Jim cc'ed on followups, thanks)
>
> On Tue, Jul 20, 2010 at 11:32:16AM -0500, Dave Goodell wrote:
> > Intercommunicators are definitely supported in MPICH2.  You probably
> > have MPICH installed instead, which does not support
> > intercommunicators (nor is it supported in general).
>
> Jim does explicitly mention the Cray.  Any chance that Jaguar is
> running some old version of MPICH2 with a shoddy intercommunicator
> support?
>
> Jim is also coming from AIX: do you know of anything about the IBM
> intercommunicator support that might make the transition to MPICH2
> odd?  (due to, say, defects in either the IBM or MPICH2
> implementation:  as we know, the standard is one thing but
> implementations have varying degrees of "quality")
>
> > Point-to-point performance in intercommunicators should generally be
> > identical to performance in intracommunicators.  Collective
> > communication routines for intercommunicators have not been
> > extensively tuned, so they may not quite perform as well as they
> > could, depending on the particular collective and way it is invoked.
>
> Well there you have it, Jim: it's supposed to "just work".  Perhaps
> you can tell us a bit more about how you are creating the
> intercommunicators and how you are using them?
>
> ==rob
>
> >
> > On Jul 20, 2010, at 8:05 AM CDT, Rob Latham wrote:
> >
> > > Hi Jim.  I'm interested in hearing more about how this async i/o
> > > strategy plays out on other platforms.
> > >
> > > I'm moving this to the mpich-discuss list, because as far as I know
> > > intercommunicators are supported on MPICH2, but the folks on the
> > > mpich-discuss list will be able to speak with more authority on that
> > > matter.
> > >
> > > What is it about intercommunicators that does not work for you?  Are
> > > you splitting up COMM_WORLD to form comp_comm and io_comm ?
> > >
> > > There might be performance implications with intercommunicators.  Can
> > > the link between the two sets be the bottleneck here?  I presume  you
> > > are transferring a lot of data to io_comm.
> > >
> > > MPICH guys, Jim's original email is below.
> > > ==rob
> > >
> > > On Mon, Jul 19, 2010 at 04:44:50PM -0600, Jim Edwards wrote:
> > >> Hi All,
> > >>
> > >> I have created a new repository branch and checked in the beginnings
> of a
> > >> version of pio which allows the io tasks to be a disjoint set of tasks
> from
> > >> those used for computation.
> > >>
> > >> The io_comm and the comp_comm are disjoint and pio_init
> > >> is called with an intercommunicator which spans the two task sets.
> The
> > >> compute task set returns while the io task set waits in a call back
> loop for
> > >> further instructions.
> > >>
> > >> I have added three new tests in the pio test suite and all of them
> pass on
> > >> bluefire.   Then I discovered that the mpich  does not support mpi
> > >> intercommunicators.    These are part of the mpi-2 standard and I
> thought
> > >> that all of the mpi implementations were there by now?  Apparently
> not.   Is
> > >> there another mpi implementation that we can try on jaguar or
> edinburgh?
> > >>
> > >> Currently all of the pio commands are still syncronous calls - that is
> the
> > >> compute tasks cannot continue until the write has completed, my
> eventual
> > >> plan is to relax this requirement to see if there is a performance
> advantage
> > >> - but if AIX-POE is the only environment to support this model I may
> have to
> > >> rethink the approach.
> > >>
> > >> If you get a chance please have a look at the implementation in
> > >> https://parallelio.googlecode.com/svn/branches/async_pio1_1_1/
> > >>
> > >> If enough of you are interested we can schedule a con-call to go over
> how it
> > >> works and some of the things that still need to be done.
> > >>
> > >> Jim
> > >>
> > >
> > > --
> > > Rob Latham
> > > Mathematics and Computer Science Division
> > > Argonne National Lab, IL USA
> > > _______________________________________________
> > > mpich-discuss mailing list
> > > mpich-discuss at mcs.anl.gov
> > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >
> > _______________________________________________
> > mpich-discuss mailing list
> > mpich-discuss at mcs.anl.gov
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> --
> Rob Latham
> Mathematics and Computer Science Division
> Argonne National Lab, IL USA
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20100720/fd7cb669/attachment.htm>


More information about the mpich-discuss mailing list