[mpich-discuss] Assertion failure from too many MPI_Gets between fences
Dave Goodell
goodell at mcs.anl.gov
Fri Jan 7 15:39:45 CST 2011
On Jan 7, 2011, at 12:56 PM CST, Jeremiah Willcock wrote:
> When I run a large number of MPI_Get operations (8 bytes each) between two MPI_Fences, I sometimes receive the error:
> Assertion failed in file ch3_istartmsg.c at line 90: sreq != NULL
> internal ABORT - process 0
This assertion failure indicates that MPICH2 ran out of memory while allocating a new request: https://trac.mcs.anl.gov/projects/mpich2/browser/mpich2/trunk/src/mpid/ch3/channels/nemesis/src/ch3_istartmsg.c#L88
Unfortunately, I can't say for sure whether the heap ran out of memory in general, or some other more specific resource in the handle allocator was exhausted.
> on some or all ranks. I am using the SVN head version currently, but the same error (and same line number) occurred with 1.3.1. I am running two processes on one machine using "mpiexec -n 2 app"; the platform is x86-64 Linux (RHEL 5.5, gcc 4.1.2). The number of MPI_Get operations required seems to be about 260k; fewer appears to work fine, but the exact number required for the error varies. The kind of code I am using is:
260k is a large number of requests, if one req is being allocated for each Get. Requests are unfortunately large, somewhere on the order of 1 kiB, so 260k reqs is in the neighborhood of 260 MiB of memory, possibly double that if I'm lowballing the request size. The handle allocator has a theoretical capacity of at least 2^26 bits (~67 million), so I don't think that we hit an intrinsic addressing limit.
Is your application memory-constrained?
More information about the mpich-discuss
mailing list