[mpich-discuss] Assertion failure from too many MPI_Gets between fences

Jeremiah Willcock jewillco at osl.iu.edu
Fri Jan 7 15:44:53 CST 2011


On Fri, 7 Jan 2011, Dave Goodell wrote:

> On Jan 7, 2011, at 12:56 PM CST, Jeremiah Willcock wrote:
>
>> When I run a large number of MPI_Get operations (8 bytes each) between two MPI_Fences, I sometimes receive the error:
>>
>> Assertion failed in file ch3_istartmsg.c at line 90: sreq != NULL
>> internal ABORT - process 0
>
> This assertion failure indicates that MPICH2 ran out of memory while 
> allocating a new request: 
> https://trac.mcs.anl.gov/projects/mpich2/browser/mpich2/trunk/src/mpid/ch3/channels/nemesis/src/ch3_istartmsg.c#L88
>
> Unfortunately, I can't say for sure whether the heap ran out of memory 
> in general, or some other more specific resource in the handle allocator 
> was exhausted.

I have a lot of memory on my system (including swap) and the application 
does not use much data the way I'm testing so far.  One other symptom that 
I have is that the number of MPI_Gets I can run between fences goes down 
if I have more ranks (and my application is counting the total number sent 
from each rank, regardless of destination).

>> on some or all ranks.  I am using the SVN head version currently, but 
>> the same error (and same line number) occurred with 1.3.1.  I am 
>> running two processes on one machine using "mpiexec -n 2 app"; the 
>> platform is x86-64 Linux (RHEL 5.5, gcc 4.1.2).  The number of MPI_Get 
>> operations required seems to be about 260k; fewer appears to work fine, 
>> but the exact number required for the error varies.  The kind of code I 
>> am using is:
>
> 260k is a large number of requests, if one req is being allocated for 
> each Get.  Requests are unfortunately large, somewhere on the order of 1 
> kiB, so 260k reqs is in the neighborhood of 260 MiB of memory, possibly 
> double that if I'm lowballing the request size.  The handle allocator 
> has a theoretical capacity of at least 2^26 bits (~67 million), so I 
> don't think that we hit an intrinsic addressing limit.
>
> Is your application memory-constrained?

Not really, at least at the sizes I'm testing at so far.  Is there a good 
way to test how much memory it is actually using, or how much memory MPICH 
is using?

Inserting fences periodically in my code fixes the problem, but the fence 
frequency needed is proportional to the number of ranks in the job.  I 
think the MPI implementation should automatically do whatever flow control 
it needs to avoid running out of memory, no matter how many requests the 
application feeds in.

-- Jeremiah Willcock


More information about the mpich-discuss mailing list