Out of memory problem

Julien Bodart julien.bodart at gmail.com
Sat Sep 5 06:44:55 CDT 2009


Hi everybody,

I am using parallel-netcdf on a BGene/P computer and I get some troubles
when running  large cases.

The error message looks like:

"abort(1) on node 2820 (rank 2820 in comm 1140850688): application called
MPI_Abort(MPI_COMM_WORLD, 1) - process 2820
Out of memory in file
/bglhome/usr6/bgbuild/V1R3M0_460_2008-081112P/ppc/bgp/comm/lib/dev/mpich2/src/mpi/romio/adio/ad_bgl/ad_bgl_wrcoll.c,
line 1238"

It looks like a memory leak as the program is able to perform many
intermediate saving during the run and suddenly stop with the previous error
message.
The core file says it happens during saving independent data. It is not the
first time I got such a problem when reading/saving independent data. I
usually work around it using collective communications before saving/reading
data, but this is not possible this time.

So my question is:
Do you think it comes from my code which might have some memory leaks and
causes troubles when pnetcdf is allocating temporary arrays, or it might
comes from a wrong using of pnetcdf with independent data.

Thanks in advance.

Julien
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/parallel-netcdf/attachments/20090905/2f320d33/attachment.htm>


More information about the parallel-netcdf mailing list