[mpich-discuss] Relative ordering of MPI_Iprobe()s and MPI_Barrier()s

Tue Aug 17 11:06:39 CDT 2010

Hi Edric,

My understanding of the MPI Standard is that case (1) is true.  That is, point-to-point and collective communication occur in separate contexts and don't interfere with each other, except insofar as many pt2pt and all collective calls may block a process.  See MPI-2.2 p.133:18 and p.188:43.  So while the MPI_Barrier in your example does ensure that the MPI_Send calls have been posted, it doesn't say anything about when the probing processes will begin to "see" those sends.

The MPI standard does require that a busy-waiting MPI_Iprobe loop will eventually see the incoming message (MPI-2.2 p.66:24).  But this seems to be the behavior that you are seeing, so I think we are conforming to the standard here.

I suspect that the implementation reason for the behavior you are seeing in nemesis is that we don't poll the network (TCP in this case) as often as shared memory when we are making progress.  But I haven't played with your example code at all yet.

BTW, your test program could theoretically deadlock.  If your MPI_Send calls blocked until the receiver arrived at some sort of reception call (Recv, Probe, etc), then all of your processes would be stuck in MPI_Send on line 15.  MPI_Isend is a safer way to code this.  "Eager" sending is not required by the MPI standard, even though it is extremely common practice.

-Dave

On Aug 17, 2010, at 10:59 AM CDT, Edric Ellis wrote:

> Hi mpich2-discuss,
>  
> We’re in the process of moving to MPICH2-1.2.1p1 using the SMPD/nemesis variant (on Linux only - on Windows, we’re waiting for a fix to ticket #895), and we’ve found a discrepancy in the behaviour compared to the sock variant. I’m not sure if this is a real bug, or if I’ve missed something in the MPI standard. Our test for our wrapper around MPI_Barrier() essentially proceeds as follows (see attached for a C test case which usually shows this problem when running with 10 processes). Each process does this:
>  
> 1.       Call MPI_Send() to each other process in turn with a tiny payload (assuming that this will be sent in the “eager” mode).
> 2.       MPI_Barrier()
> 3.       Check that MPI_Iprobe() indicates a message ready to receive from each other process
>  
> With the sock variant, this works as I expect - each process gets a return from MPI_Iprobe indicating that there is indeed a message waiting from each other process. With nemesis, this isn’t always the case - sometimes multiple calls to MPI_Iprobe are required. (Could this be related to ticket #1062?).
>  
> I couldn’t see in the MPI standard where the “expected” behaviour of the above might be specified, but it’s possible that I’ve missed something.
>  
> I can see several options for where a problem might exist:
>  
> 1.       MPI doesn’t actually specify that these MPI_Iprobe()s should definitely return “true”
> 2.       The nemesis channel isn’t preserving the ordering between MPI_Barrier() and pt2pt communications in the way I expect
>  
> As it happens, our usage of MPI_Iprobe() is basically restricted to our test code, so we could modify our tests not to rely on the old behaviour, but we’d like to understand better where the problem is.
>  
> Cheers,
>  
> Edric.
>  
> <testprobe.cpp>_______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss