[mpich-discuss] Assertion failure from too many MPI_Gets between fences
William Gropp
wgropp at illinois.edu
Sat Jan 8 10:09:17 CST 2011
This is one of the things that the current RMA re-write will fix.
This is definitely a bug in MPICH2.
In the short term, I advise using a datatype with the MPI_Get to
reduce the number of individual Get operations. I'd rather do the
full fix that add a workaround for this case.
Bill
On Jan 7, 2011, at 5:55 PM, Dave Goodell wrote:
> On Jan 7, 2011, at 5:26 PM CST, Jeremiah Willcock wrote:
>
>> On Fri, 7 Jan 2011, Dave Goodell wrote:
>>
>>> MPI-2.2, page 339, line 13-14: "These operations are nonblocking:
>>> the call initiates the transfer, but the transfer may continue
>>> after the call returns."
>>>
>>> This language is weaker than I would like, because the presence of
>>> the clarifying statements after the colon don't say that the call
>>> cannot block, implicitly watering down the natural MPI meaning of
>>> "nonblocking". But I think that the intent is clear, that the
>>> call should not block the user waiting on the action of another
>>> process. After further thought I can't come up with any realistic
>>> example where a blocking-for-flow-control MPI_Get causes a
>>> deadlock, but I think the behavior is still intended to be
>>> disallowed by the standard.
>>
>> I think that the progress clarification at the top of page 371 of
>> MPI 2.2 (end of section 11.7.2) would cover the case in which some
>> one-sided operations blocked for flow control. Or could there be
>> deadlocks even with MPI progress semantics?
>
> As I said, I couldn't come up with a _realistic_ program where this
> would result in a deadlock. But an unrealistic program is exactly
> the sort of thing that is discussed in the second paragraph of that
> Rationale passage. Something ridiculous like:
>
> -----8<-----
> if (rank == 0) {
> for (1..1000000) {
> MPI_Get(..., /*rank=*/1, ...);
> }
> send_on_socket_to_rank_1(...);
> MPI_Win_fence(...);
> }
> else {
> /* do some compute or even nothing here */
> blocking_socket_recv_from_rank_0(...);
> MPI_Win_fence(...);
> }
> -----8<-----
>
> Under my "blocking for flow control is not allowed" interpretation,
> the user could assume this program won't deadlock. Under the
> opposing interpretation it easily could if the implementation does
> not provide asynchronous progress (which is a valid and common
> implementation choice).
>
> Instead of a socket send/recv pair, you could stick any sort of non-
> MPI synchronizing operation in there. Shared memory barriers or
> mutexes, UNIX FIFOs, some sort of non-MPI file I/O, etc. I don't
> consider any of these cases to be practical or realistic MPI
> programs, but they do illustrate the point.
>
> Now, that all said, we could probably offer "block user RMA calls
> for flow control" as some sort of non-standard-compliant option that
> you could turn on via an environment variable in MPICH2.
>
> -Dave
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
William Gropp
Deputy Director for Research
Institute for Advanced Computing Applications and Technologies
Paul and Cynthia Saylor Professor of Computer Science
University of Illinois Urbana-Champaign
More information about the mpich-discuss
mailing list