[mpich-discuss] Relative ordering of MPI_Iprobe()s and MPI_Barrier()s

Darius Buntinas buntinas at mcs.anl.gov
Wed Aug 18 14:58:04 CDT 2010


It looks like this behavior is due to the way our progress engine is implemented.  We implement intranode communication using lock-free queues: each process has a single receive queue onto which other processes enqueue new messages.  When Iprobe is called, the next message in the receive queue (if any) is dequeued and processed.  Essentially we're handling only one message at a time when Iprobe is called.  In your test program, the message you're looking for may be on the receive queue behind messages from other processes, so several calls to Iprobe are needed until you get to the one you're looking for.

The (bad) fix I send earlier was supposed to get around another issue with fastboxes (an internal fast-path for messages) that may cause the same symptoms even if there are no messages from other processes interfering.  I have a correct fix for this now, but the larger issue described above requires a significant redesign of the progress engine.

We were aiming for a light-weight Iprobe implementation, since we figured it would be used in a polling loop by the application.  So, while the implementation technically follows the standard, I understand that this may have a performance impact in some applications.  Is this a problem for you guys?

-d


On Aug 18, 2010, at 3:38 AM, Edric Ellis wrote:

> Hi Darius,
> 
> Thanks for that - I haven't tried 1.3b1 yet - but that patch applied on its own to 1.2.1p1 appears to cause my example program to lock up because some of the Iprobes appear never to succeed.
> 
> Based on what Dave and Anh have said, it sounds like the right approach is for us not to make assumptions about the relative timing of barriers and probes. (Which leaves us with a puzzle of how to test our wrapper around MPI_Barrier...)
> 
> Cheers,
> 
> Edric.
> 
>> -----Original Message-----
>> From: mpich-discuss-bounces at mcs.anl.gov [mailto:mpich-discuss-
>> bounces at mcs.anl.gov] On Behalf Of Darius Buntinas
>> Sent: Tuesday, August 17, 2010 7:42 PM
>> To: mpich-discuss at mcs.anl.gov
>> Subject: Re: [mpich-discuss] Relative ordering of MPI_Iprobe()s and
>> MPI_Barrier()s
>> 
>> 
>> Can you try this patch (it's based on the trunk, but it should be simple
>> enough that it should still apply)?
>> 
>> http://mpich2.pastebin.com/z7ASbdrk
>> 
>> -d
>> 
>> On Aug 17, 2010, at 11:15 AM, Dave Goodell wrote:
>> 
>>> On Aug 17, 2010, at 11:14 AM CDT, Darius Buntinas wrote:
>>> 
>>>> I believe we might have fixed this.  Can you send us the test program, or
>> try 1.3b1?
>>> 
>>> I believe it was attached to the original email...
>>> 
>>> -Dave
>>> 
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> 
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list