pnetcdf 1.2.0 create file issue

Rob Latham robl at mcs.anl.gov
Fri May 11 10:43:20 CDT 2012


On Thu, May 10, 2012 at 03:28:57PM -0600, Jim Edwards wrote:
> This occurs on the ncsa machine bluewaters.   I am using pnetcdf1.2.0 and
> pgi 11.10.0

need one more bit of information: the version of MPT you are using.

> The issue is that calling nfmpi_createfile would sometimes result in an
> error:
> 
> MPI_File_open : Other I/O error , error stack:
> (unknown)(): Other I/O error
> 126: MPI_File_open : Other I/O error , error stack:
> (unknown)(): Other I/O error
>   Error on create :           502          -32
> 
> The error appears to be intermittent and I could not get it to occur at all
> on a small number of tasks (160) but it occurs with high frequency when
> using a larger number of tasks (>=1600).    I traced the problem to the use
> of nf_clobber in the mode argument, removing the nf_clobber seems to have
> solved the problem and I think that create implies clobber anyway doesn't
> it?    

> Can someone who knows what is going on under the covers enlighten me
> with some understanding of this issue?   I suspect that one task is trying
> to clobber the file that another has just created or something of that
> nature.

Unfortunately, "under the covers" here means "inside the MPI-IO
library", which we don't have access to.   

in the create case we call MPI_File_open with "MPI_MODE_RDWR |
MPI_MODE_CREATE", and  if noclobber set, we add MPI_MODE_EXCL.

OK, so that's pnetcdf.  What's going on in MPI-IO?  Well, cray's based
their MPI-IO off of our ROMIO, but I'm not sure which version.    

Let me cook up a quick MPI-IO-only test case you can run to trigger
this problem and then you can beat cray over the head with it. 

==rob

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA


More information about the parallel-netcdf mailing list