[mpich-discuss] Assertion failure from too many MPI_Gets between fences

Fri Jan 7 16:03:24 CST 2011

On Fri, 7 Jan 2011, Dave Goodell wrote:

> On Jan 7, 2011, at 3:44 PM CST, Jeremiah Willcock wrote:
>
>> On Fri, 7 Jan 2011, Dave Goodell wrote:
>>
>>> On Jan 7, 2011, at 12:56 PM CST, Jeremiah Willcock wrote:
>>>> on some or all ranks.  I am using the SVN head version currently, but the same error (and same line number) occurred with 1.3.1.  I am running two processes on one machine using "mpiexec -n 2 app"; the platform is x86-64 Linux (RHEL 5.5, gcc 4.1.2).  The number of MPI_Get operations required seems to be about 260k; fewer appears to work fine, but the exact number required for the error varies.  The kind of code I am using is:
>>>
>>> 260k is a large number of requests, if one req is being allocated for 
>>> each Get.  Requests are unfortunately large, somewhere on the order of 
>>> 1 kiB, so 260k reqs is in the neighborhood of 260 MiB of memory, 
>>> possibly double that if I'm lowballing the request size.  The handle 
>>> allocator has a theoretical capacity of at least 2^26 bits (~67 
>>> million), so I don't think that we hit an intrinsic addressing limit.
>>>
>>> Is your application memory-constrained?
>>
>> Not really, at least at the sizes I'm testing at so far.  Is there a 
>> good way to test how much memory it is actually using, or how much 
>> memory MPICH is using?
>
> Not off the top of my head.  It seems like you are using a small-ish 
> test program to check this.  Can you send that to me so that I can 
> reproduce this for myself and play with the bug?

I don't really have a small program for this, just part of a larger one 
that I was putting print statements in and hacking on.  The relevant part 
of the code just creates a window for an array (with different data on 
different ranks) then does MPI_Gets on random parts of it to implement a 
large gather operation.  A code such as row-distributed sparse 
matrix-vector multiplication would have this kind of access pattern as 
well.

>> Inserting fences periodically in my code fixes the problem, but the 
>> fence frequency needed is proportional to the number of ranks in the 
>> job.  I think the MPI implementation should automatically do whatever 
>> flow control it needs to avoid running out of memory, no matter how 
>> many requests the application feeds in.
>
> I agree, the MPI implementation should take care of that, within reason. 
> But forcing an MPI_Get to be nonblocking will require a careful reading 
> of the MPI standard to ensure that's valid behavior.  I think I can 
> construct scenarios where an MPI_Get that blocks would cause a 
> deadlock...

I believe (like everything else in MPI) they can be nonblocking or 
blocking, at the implementation's choice, but I don't know for sure. 
Remember that you can't assume that a get has completed until the next 
fence, so you can't depend on seeing the answer (which I think was what 
you were worried about causing deadlocks).  I treat the MPI_Gets as 
nonblocking in my application; that's why I start a large number of them 
then use a fence to complete them all before using the results.

> Without knowing exactly why it's failing, other than some sort of 
> request allocation problem, it's hard to say what the right fix is.

Is there a patch I could apply to MPICH to help diagnose the problem?  A 
branch I could switch to for testing?

-- Jeremiah Willcock