collective memory-mapped array

Rob Latham robl at mcs.anl.gov
Mon Jan 25 10:28:41 CST 2010


On Mon, Jan 25, 2010 at 04:32:06PM +0100, Jose Gracia wrote:
> Dear all,
> 
> I am having problems writting a 4-dimensional (actually, the first
> dim has length one only) memory-mapped array to disk collectively.
> The array is not strided but mapped non-trivially in memory. The
> call I want and doesn't work (ie code hangs forever) is:
> 
> 
> status=ncmpi_put_varm_double_all(ncid,varidp[6],start4,count4,stride4,imap4,tem_bck+offset);
>
> 
> The strange thing is, that writing in independent mode works and
> produces the expected result:
> 
> status=ncmpi_begin_indep_data(ncid);
> status=ncmpi_put_varm_double(ncid,varidp[6],start4,count4,stride4,imap4,tem_bck+offset);
> status=ncmpi_end_indep_data(ncid);
> 
> 
> I am trying this with PNetCDF 1.0.3 and older versions (1.1.0 will
> not compile for me) compiled with Intel compiler V11.0 and Open MPI
> V1.3. on a lustre filesystem (though tests on NFS mounted FS didn't
> work either).

OK, I think we need to know if this is a hang or just very very slow
response from the underlying file system.  

On your system, do you have a way to capture a backtrace of some of
the MPI processors?  I would like to see what the hung processes are
trying to do.

> I would love to provide a minimal example showing the error, but I
> can't as all my test codes work fine :(
> 
> The strange thing is, that running the same code on a big (as in
> lots of memory) single-node machine with the same number of MPI
> tasks works fine. But this isn't an option for production runs.

another hint that this might be a file system issue..but a back trace
would still help pinpoint the problem.

==rob

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA


More information about the parallel-netcdf mailing list