[mpich-discuss] MPI-IO ERROR

Rob Latham robl at mcs.anl.gov
Fri Oct 1 13:03:35 CDT 2010


On Mon, Sep 27, 2010 at 08:49:55AM -0700, Weiqiang Wang wrote:
> I'm trying to use MPI-IO on BlueGene/P cluster by incorporating it into my Fortran77 code.

Good to hear from you Weiqiang.  I see you also emailed me, but I was
on vacation last week.  I'm still working through the accumulated
emails. 

> The program works fine, and have reduced the time of writing files
> by several times faster when compared to writing out files from each
> core. 

Glad to hear that.

> However, I found out that, after I scale my program to more CPUs
> (from 32,768 to 65,5536), some problem starts appearing. The system
> has complained in my two tests that no sufficient memory can be
> allocated in the I/O nodes.  In these two tests, I tried to write
> out totally 12,582,912 atom info (including x,y,z coordinates and
> velocities and datatype all in double precision). These data are
> distributed uniformly among all the processors.

In the past, this type of error typically comes when all processes
perform a collective read of a small config file.  I haven't seen this
error in the write path yet.

> Here below are the details of the messages in the two tests:
> 
> 1) ======================
> <Sep 24 22:40:53.496483> FE_MPI (Info) : Starting job 1636055
> <Sep 24 22:40:53.576159> FE_MPI (Info) : Waiting for job to terminate
> <Sep 24 22:40:55.770176> BE_MPI (Info) : IO - Threads initialized
> <Sep 24 22:40:55.784851> BE_MPI (Info) : I/O input runner thread terminated
> "remd22.f", line 903: 1525-037 The I/O statement cannot be processed because the I/O subsystem is unable to allocate sufficient memory for the oper
> ation.  The program will stop.
> <Sep 24 22:42:06.025409> BE_MPI (Info) : I/O output runner thread terminated
> <Sep 24 22:42:06.069553> BE_MPI (Info) : Job 1636055 switched to state TERMINATED ('T')

This one I don't recognize, but as Dave suggests, you're likely
running out of memory on the compute node.

> 2) =======================
> Out of memory in file /bghome/bgbuild/V1R4M2_200_2010-100508P/ppc/bgp/comm/lib/dev/mpich2/src/mpi/romio/adio/ad_bgl/ad_bgl_wrcoll.c, line 498
> Out of memory in file /bghome/bgbuild/V1R4M2_200_2010-100508P/ppc/bgp/comm/lib/dev/mpich2/src/mpi/romio/adio/ad_bgl/ad_bgl_wrcoll.c, line 498

This particular line (ad_bgl_wrcoll.c, line 498) is where the MPI-IO
library allocates a temporary buffer for the two-phase optimization.
By default that temporary buffer is 16 MiB.  You could use a smaller
size but it could be the case that you have so much memory pressure
that an extra 12 MiB won't matter.  Any smaller than 4 MiB and you'll
likely stop seeing those nice performance gains.

in my crude and limited understanding of fortran, here's how you might
try setting a smaller hint to see if you get further:

 INTEGER INFO, IERROR 
 CHARACTER*(*) KEY, VALUE

 KEY="cb_buffer_size"
 VALUE="4194304"


 MPI_INFO_CREATE(INFO, IERROR)
 MPI_INFO_SET(INFO, KEY, VALUE, IERROR)
 MPI_FILE_OPEN(MPI_COMM_WORLD, "myfile.dat",
 MPI_MODE_CREATE+MPI_MODE_RDWR, INFO, fh, IERROR)
 MPI_INFO_FREE(INFO, IERROR) 


==rob

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA


More information about the mpich-discuss mailing list