Hi Rob,<br><br>I am getting different errors depending on the platform I am running on - the only one that is clearly an mpi problem is edinburgh with mpich-1.2.7p1 - so I will request an update to mpich2. <br><br>I am working on debugging on jaguar - if you have an account there maybe you could have a look? /tmp/work/jedwards/testpio/all.asb04 There is a problem where something is not being communicated or is communicated incorrectly, i've yet to find the source.<br>
<br>Bluegene also has problems I've yet to look into - I suspect I just need to dig deeper. <br><br><br><br><div class="gmail_quote">On Tue, Jul 20, 2010 at 11:41 AM, Rob Latham <span dir="ltr"><<a href="mailto:robl@mcs.anl.gov">robl@mcs.anl.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>
(please keep Jim cc'ed on followups, thanks)<br>
<br>
On Tue, Jul 20, 2010 at 11:32:16AM -0500, Dave Goodell wrote:<br>
> Intercommunicators are definitely supported in MPICH2. You probably<br>
> have MPICH installed instead, which does not support<br>
> intercommunicators (nor is it supported in general).<br>
<br>
Jim does explicitly mention the Cray. Any chance that Jaguar is<br>
running some old version of MPICH2 with a shoddy intercommunicator<br>
support?<br>
<br>
Jim is also coming from AIX: do you know of anything about the IBM<br>
intercommunicator support that might make the transition to MPICH2<br>
odd? (due to, say, defects in either the IBM or MPICH2<br>
implementation: as we know, the standard is one thing but<br>
implementations have varying degrees of "quality")<br>
<br>
> Point-to-point performance in intercommunicators should generally be<br>
> identical to performance in intracommunicators. Collective<br>
> communication routines for intercommunicators have not been<br>
> extensively tuned, so they may not quite perform as well as they<br>
> could, depending on the particular collective and way it is invoked.<br>
<br>
Well there you have it, Jim: it's supposed to "just work". Perhaps<br>
you can tell us a bit more about how you are creating the<br>
intercommunicators and how you are using them?<br>
<br>
==rob<br>
<br>
><br>
> On Jul 20, 2010, at 8:05 AM CDT, Rob Latham wrote:<br>
><br>
> > Hi Jim. I'm interested in hearing more about how this async i/o<br>
> > strategy plays out on other platforms.<br>
> ><br>
> > I'm moving this to the mpich-discuss list, because as far as I know<br>
> > intercommunicators are supported on MPICH2, but the folks on the<br>
> > mpich-discuss list will be able to speak with more authority on that<br>
> > matter.<br>
> ><br>
> > What is it about intercommunicators that does not work for you? Are<br>
> > you splitting up COMM_WORLD to form comp_comm and io_comm ?<br>
> ><br>
> > There might be performance implications with intercommunicators. Can<br>
> > the link between the two sets be the bottleneck here? I presume you<br>
> > are transferring a lot of data to io_comm.<br>
> ><br>
> > MPICH guys, Jim's original email is below.<br>
> > ==rob<br>
> ><br>
> > On Mon, Jul 19, 2010 at 04:44:50PM -0600, Jim Edwards wrote:<br>
> >> Hi All,<br>
> >><br>
> >> I have created a new repository branch and checked in the beginnings of a<br>
> >> version of pio which allows the io tasks to be a disjoint set of tasks from<br>
> >> those used for computation.<br>
> >><br>
> >> The io_comm and the comp_comm are disjoint and pio_init<br>
> >> is called with an intercommunicator which spans the two task sets. The<br>
> >> compute task set returns while the io task set waits in a call back loop for<br>
> >> further instructions.<br>
> >><br>
> >> I have added three new tests in the pio test suite and all of them pass on<br>
> >> bluefire. Then I discovered that the mpich does not support mpi<br>
> >> intercommunicators. These are part of the mpi-2 standard and I thought<br>
> >> that all of the mpi implementations were there by now? Apparently not. Is<br>
> >> there another mpi implementation that we can try on jaguar or edinburgh?<br>
> >><br>
> >> Currently all of the pio commands are still syncronous calls - that is the<br>
> >> compute tasks cannot continue until the write has completed, my eventual<br>
> >> plan is to relax this requirement to see if there is a performance advantage<br>
> >> - but if AIX-POE is the only environment to support this model I may have to<br>
> >> rethink the approach.<br>
> >><br>
> >> If you get a chance please have a look at the implementation in<br>
> >> <a href="https://parallelio.googlecode.com/svn/branches/async_pio1_1_1/" target="_blank">https://parallelio.googlecode.com/svn/branches/async_pio1_1_1/</a><br>
> >><br>
> >> If enough of you are interested we can schedule a con-call to go over how it<br>
> >> works and some of the things that still need to be done.<br>
> >><br>
> >> Jim<br>
> >><br>
> ><br>
> > --<br>
> > Rob Latham<br>
> > Mathematics and Computer Science Division<br>
> > Argonne National Lab, IL USA<br>
> > _______________________________________________<br>
> > mpich-discuss mailing list<br>
> > <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
> > <a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
><br>
> _______________________________________________<br>
> mpich-discuss mailing list<br>
> <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
> <a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
<font color="#888888"><br>
--<br>
Rob Latham<br>
Mathematics and Computer Science Division<br>
Argonne National Lab, IL USA<br>
</font></blockquote></div><br>