pnetcdf 1.2.0 create file issue

Wei-keng Liao wkliao at ece.northwestern.edu
Fri May 11 12:32:16 CDT 2012


In ncmpi_create/nfmpi_create, NC_CLOBBER/nf_clobber is the default mode. 
So, using that mode or not in create should not have any difference.

If you also tried nf_noclobber, then MPI_MODE_EXCL issue can come up,
as Rob mentioned.

There have been many fixes since 1.2.0 and some are for create and open.
Would you like to give the latest SVN a try? I have used the latest on Hopper
(also a Cray) without a problem on 40K processes, but my MPT is newer
xt-mpich2/5.4.4


Wei-keng


On May 11, 2012, at 11:03 AM, Jim Edwards wrote:

> 
> 
> On Fri, May 11, 2012 at 9:43 AM, Rob Latham <robl at mcs.anl.gov> wrote:
> On Thu, May 10, 2012 at 03:28:57PM -0600, Jim Edwards wrote:
> > This occurs on the ncsa machine bluewaters.   I am using pnetcdf1.2.0 and
> > pgi 11.10.0
> 
> need one more bit of information: the version of MPT you are using.
> 
> Sorry, what's mpt?  MPI?
> Currently Loaded Modulefiles:
>   1) modules/3.2.6.6                       9) user-paths                           17) xpmem/0.1-2.0400.31280.3.1.gem
>   2) xtpe-network-gemini                  10) pgi/11.10.0                          18) xe-sysroot/4.0.46
>   3) xt-mpich2/5.4.2                      11) xt-libsci/11.0.04                    19) xt-asyncpe/5.07
>   4) xtpe-interlagos                      12) udreg/2.3.1-1.0400.4264.3.1.gem      20) atp/1.4.1
>   5) eswrap/1.0.12                        13) ugni/2.3-1.0400.4374.4.88.gem        21) PrgEnv-pgi/4.0.46
>   6) torque/2.5.10                        14) pmi/3.0.0-1.0000.8661.28.2807.gem    22) hdf5-parallel/1.8.7
>   7) moab/6.1.5                           15) dmapp/3.2.1-1.0400.4255.2.159.gem    23) netcdf-hdf5parallel/4.1.3
>   8) scripts                              16) gni-headers/2.1-1.0400.4351.3.1.gem  24) parallel-netcdf/1.2.0
> 
> 
> 
> 
>  
> > The issue is that calling nfmpi_createfile would sometimes result in an
> > error:
> >
> > MPI_File_open : Other I/O error , error stack:
> > (unknown)(): Other I/O error
> > 126: MPI_File_open : Other I/O error , error stack:
> > (unknown)(): Other I/O error
> >   Error on create :           502          -32
> >
> > The error appears to be intermittent and I could not get it to occur at all
> > on a small number of tasks (160) but it occurs with high frequency when
> > using a larger number of tasks (>=1600).    I traced the problem to the use
> > of nf_clobber in the mode argument, removing the nf_clobber seems to have
> > solved the problem and I think that create implies clobber anyway doesn't
> > it?
> 
> > Can someone who knows what is going on under the covers enlighten me
> > with some understanding of this issue?   I suspect that one task is trying
> > to clobber the file that another has just created or something of that
> > nature.
> 
> Unfortunately, "under the covers" here means "inside the MPI-IO
> library", which we don't have access to.
> 
> in the create case we call MPI_File_open with "MPI_MODE_RDWR |
> MPI_MODE_CREATE", and  if noclobber set, we add MPI_MODE_EXCL.
> 
> OK, so that's pnetcdf.  What's going on in MPI-IO?  Well, cray's based
> their MPI-IO off of our ROMIO, but I'm not sure which version.
> 
> Let me cook up a quick MPI-IO-only test case you can run to trigger
> this problem and then you can beat cray over the head with it.
> 
> 
> Sounds good, thanks.
>  
> ==rob
> 
> --
> Rob Latham
> Mathematics and Computer Science Division
> Argonne National Lab, IL USA
> 
> 
> 
> -- 
> Jim Edwards
> 
> CESM Software Engineering Group
> National Center for Atmospheric Research
> Boulder, CO 
> 303-497-1842
> 



More information about the parallel-netcdf mailing list