[mpich-discuss] Assertion failure from too many MPI_Gets between fences

Dave Goodell goodell at mcs.anl.gov
Fri Jan 7 15:39:45 CST 2011


On Jan 7, 2011, at 12:56 PM CST, Jeremiah Willcock wrote:

> When I run a large number of MPI_Get operations (8 bytes each) between two MPI_Fences, I sometimes receive the error:
> 
> Assertion failed in file ch3_istartmsg.c at line 90: sreq != NULL
> internal ABORT - process 0

This assertion failure indicates that MPICH2 ran out of memory while allocating a new request: https://trac.mcs.anl.gov/projects/mpich2/browser/mpich2/trunk/src/mpid/ch3/channels/nemesis/src/ch3_istartmsg.c#L88

Unfortunately, I can't say for sure whether the heap ran out of memory in general, or some other more specific resource in the handle allocator was exhausted.

> on some or all ranks.  I am using the SVN head version currently, but the same error (and same line number) occurred with 1.3.1.  I am running two processes on one machine using "mpiexec -n 2 app"; the platform is x86-64 Linux (RHEL 5.5, gcc 4.1.2).  The number of MPI_Get operations required seems to be about 260k; fewer appears to work fine, but the exact number required for the error varies.  The kind of code I am using is:

260k is a large number of requests, if one req is being allocated for each Get.  Requests are unfortunately large, somewhere on the order of 1 kiB, so 260k reqs is in the neighborhood of 260 MiB of memory, possibly double that if I'm lowballing the request size.  The handle allocator has a theoretical capacity of at least 2^26 bits (~67 million), so I don't think that we hit an intrinsic addressing limit.

Is your application memory-constrained?

-Dave



More information about the mpich-discuss mailing list