[mpich-discuss] Assertion failure from too many MPI_Gets between fences

Sat Jan 8 10:09:17 CST 2011

This is one of the things that the current RMA re-write will fix.   
This is definitely a bug in MPICH2.
In the short term, I advise using a datatype with the MPI_Get to  
reduce the number of individual Get operations.  I'd rather do the  
full fix that add a workaround for this case.

Bill

On Jan 7, 2011, at 5:55 PM, Dave Goodell wrote:

> On Jan 7, 2011, at 5:26 PM CST, Jeremiah Willcock wrote:
>
>> On Fri, 7 Jan 2011, Dave Goodell wrote:
>>
>>> MPI-2.2, page 339, line 13-14: "These operations are nonblocking:  
>>> the call initiates the transfer, but the transfer may continue  
>>> after the call returns."
>>>
>>> This language is weaker than I would like, because the presence of  
>>> the clarifying statements after the colon don't say that the call  
>>> cannot block, implicitly watering down the natural MPI meaning of  
>>> "nonblocking".  But I think that the intent is clear, that the  
>>> call should not block the user waiting on the action of another  
>>> process. After further thought I can't come up with any realistic  
>>> example where a blocking-for-flow-control MPI_Get causes a  
>>> deadlock, but I think the behavior is still intended to be  
>>> disallowed by the standard.
>>
>> I think that the progress clarification at the top of page 371 of  
>> MPI 2.2 (end of section 11.7.2) would cover the case in which some  
>> one-sided operations blocked for flow control.  Or could there be  
>> deadlocks even with MPI progress semantics?
>
> As I said, I couldn't come up with a _realistic_ program where this  
> would result in a deadlock.  But an unrealistic program is exactly  
> the sort of thing that is discussed in the second paragraph of that  
> Rationale passage.  Something ridiculous like:
>
> -----8<-----
> if (rank == 0) {
>   for (1..1000000) {
>       MPI_Get(..., /*rank=*/1, ...);
>   }
>   send_on_socket_to_rank_1(...);
>   MPI_Win_fence(...);
> }
> else {
>   /* do some compute or even nothing here */
>   blocking_socket_recv_from_rank_0(...);
>   MPI_Win_fence(...);
> }
> -----8<-----
>
> Under my "blocking for flow control is not allowed" interpretation,  
> the user could assume this program won't deadlock.  Under the  
> opposing interpretation it easily could if the implementation does  
> not provide asynchronous progress (which is a valid and common  
> implementation choice).
>
> Instead of a socket send/recv pair, you could stick any sort of non- 
> MPI synchronizing operation in there.  Shared memory barriers or  
> mutexes, UNIX FIFOs, some sort of non-MPI file I/O, etc.  I don't  
> consider any of these cases to be practical or realistic MPI  
> programs, but they do illustrate the point.
>
> Now, that all said, we could probably offer "block user RMA calls  
> for flow control" as some sort of non-standard-compliant option that  
> you could turn on via an environment variable in MPICH2.
>
> -Dave
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

William Gropp
Deputy Director for Research
Institute for Advanced Computing Applications and Technologies
Paul and Cynthia Saylor Professor of Computer Science
University of Illinois Urbana-Champaign