[netCDF #KLB-596506]: apparent bug in netcdf-4.2

Unidata netCDF Support support-netcdf at unidata.ucar.edu
Tue Mar 5 10:33:39 CST 2013


Hi Wei-keng,

> This variable alignment is a PnetCDF behavior.
> The default alignment value for each non-record variable is 512 bytes in PnetCDF.
> 
> According to CDF-1 and CDF-2 file format specifications, each variable has
> a field named "begin" which is the variable's file starting location.
> var := name  nelems  [dimid ...]  vatt_array  nc_type  vsize  begin
> 
> We believe PnetCDF's variable alignment does not violate the CDF spec. and hence
> implemented this default alignment in hope to improve performance. This alignment
> can be turned off by setting the two hints below.
> MPI_Info_set(info, "nc_header_align_size", "1");
> MPI_Info_set(info, "nc_var_align_size",    "1");
> 
> I wonder if you can send us the file and program to reproduce the corruption problem.

Yes, I got it from Jim Edwards, and it's a 1.4GB file, along with a
Makefile and the original Fortran problem that demonstrated the bug:

  ftp://ftp.cgd.ucar.edu/pub/eaton/nfbug.tar

I converted to a C program demonstrating the bug, that's available
here:

  https://bugtracking.unidata.ucar.edu/browse/NCF-234

Now that we know what caused the observed symptoms, it would probably
be easy to create a smaller example by running ncgen linked with the
pnetcdf library on a CDL file that just has the first few of 288
variables.  It might even suffice to include just the first 1 or 2 of
the scalar integer variables that are stored with 512-byte alignment:

  int timemgr_rst_nstep_rad_prev ;
  int timemgr_rst_type ;
  
and the first vector variable 

  double grid1d_lon(gridcell) ;

but I'm attaching a truncated version of the CDL file for the file that
has the whole header, but datavalues for only the first 11 variables.

The file can be read by our netCDF library, but when it's opened for
writing and a change is made to the schema, calling nc_redef(ncid) and
eventually nc_enddef(ncid), the variable offsets in the header are
rewritten assuming 4-byte alignment, so subsequent reads get bad data
values.  For the file in question, rewriting bad offsets only happens
when the file is a "CDF2" 64-bit offset format file.  The bug demo
program behaves correctly if run on a classic format version of the
original file.

I think you're right that the file format specification permits the
way pnetcdf is making use of the variable offsets in the header, but
the library has made the additional assumption of 4-byte alignment
within the fixed data section at least since version 3.6.0 in 2004
(I've tested that and intervening versions have the same bug).

So we should take responsibility for fixing this ...

--Russ

> Wei-keng
> 
> On Mar 4, 2013, at 6:49 PM, Jim Edwards wrote:
> 
> > Hi Russ,
> >
> > That turns out to have been the problem.   The original file was created with pnetcdf.
> >
> > Jim
> >
> >
> >
> > On Mon, Mar 4, 2013 at 3:12 PM, Jim Edwards <jedwards at ucar.edu> wrote:
> > Russ,
> >
> > We think that the original file may have been written with pnetcdf.   We are going to try to recreate the file with netcdf and again with pnetcdf and see if that explains the issue.
> >
> > Jim
> >
> >
> > On Mon, Mar 4, 2013 at 2:31 PM, Samuel Levis <slevis at ucar.edu> wrote:
> > Not exactly. I tried 2-degree to 2-degree, 2-degree to 0.5, 2-degree to 0.25, and others. All cases worked except the ones with the 0.5-degree file as output.
> >
> > I also tried 0.5-degree to 0.5-degree (mapping the file into itself) and that failed. When I say failed, I mean that the output file ends up with junk in it.
> >
> > Sam
> >
> >
> > On 03/04/2013 02:26 PM, Jim Edwards wrote:
> >> Hi Russ,
> >>
> >> Another piece of information.   This program interpolates data from a file of one resolution (2 degree in this case) to another.  When the output file is low resolution, 1/2 degree or lower, the output file looks fine, no corruption that we can detect.   It's only when the output file is higher resolution (1/4 degree) that this problem comes about.
> >>
> >> Jim
> >>
> >> On Mon, Mar 4, 2013 at 2:04 PM, Jim Edwards <jedwards at ucar.edu> wrote:
> >> Hi Russ,
> >>
> >> It looks like that file was originally created on bluefire on 11/21/11, I don't have any information about which netcdf library was used, but I think that some adjustment may have been made inside netcdf for performance on gpfs filesystems.
> >>
> >> But doesn't your own
> >> int nc__enddef(int ncid, size_t h_minfree, size_t v_align,
> >>                     size_t v_minfree, size_t r_align);
> >>
> >>
> >> allow for changing this alignment?   I don't know that that was done for this file, but it would seem to suggest that there is no assumption being violated about these alignments.  Or that one part of netcdf is assuming something which another part is not.
> >>
> >>
> >>
> >> On Mon, Mar 4, 2013 at 12:53 PM, Unidata netCDF Support <support-netcdf at unidata.ucar.edu> wrote:
> >> Hi Jim,
> >>
> >> I'm curious how the original file you provided was created and perhaps
> >> modified.  It has a peculiar alignment characteristic that I haven't
> >> seen before, and if there are more netCDF files being created the same
> >> way, we may nned to adapt.
> >>
> >> Could you tell me the history of the file, what file system it was
> >> written on, and whether the netCDF library with which it was written
> >> was modified in any way?
> >>
> >> The file has this characteristic, which would indicate a non-Posix
> >> file system: it is using 512-byte alignment of data values rather than
> >> the 4-byte alignment assumed by netCDF. So, for example, the data
> >> block for fixed-size variables begins with 9 scalar integers that
> >> should take 4 bytes each. The offsets computed for these values from
> >> the beginning of the fixed-size data block are 0, 4, 8, 12, 16, 20,
> >> 24, 28, 32, so there is no padding or wasted space. The offsets from
> >> the beginning of the fixed-size data block that are actually stored in the
> >> header for these variables are 0, 512, 1024, ... , 4096. If the file
> >> system used to write the data originally could not write data on
> >> 4-byte boundaries, I think that violates the assumption of netCDF and
> >> POSIX I/O. Nevertheless, if the nc_endef() call pays attention to the
> >> file offsets for each variable that are stored in the header (as the
> >> netCDF library does when reading the file), rather than computing them
> >> from assuming 4-byte alignment, perhaps this file can be modified
> >> correctly.
> >>
> >> The function where we might be able to adapt to this is
> >> nc3internal.c:NC_begins(), which is called from
> >> nc3internal.c:NC_enddef().  In any case it's a netCDF bug to write
> >> something that can't be later read correctly, so if our unmodified
> >> library wrote that file and we can't adapt to it, then it was a bug
> >> to not emit an error message for trying to create a file on the original
> >> non-POSIX file system.  Also, the data seems to all be there in the
> >> "corrupted" file, which can be fixed by just restoring the variable
> >> offsets in the file header to the peculiar values in the original ...
> >>
> >> --Russ
> >>
> >> Russ Rew                                         UCAR Unidata Program
> >> russ at unidata.ucar.edu                      http://www.unidata.ucar.edu
> >>
> >>
> >>
> >> Ticket Details
> >> ===================
> >> Ticket ID: KLB-596506
> >> Department: Support netCDF
> >> Priority: Normal
> >> Status: Closed
> >>
> >>
> >>
> >>
> >> --
> >> Jim Edwards
> >>
> >> CESM Software Engineering Group
> >> National Center for Atmospheric Research
> >> Boulder, CO
> >> 303-497-1842
> >>
> >>
> >>
> >> --
> >> Jim Edwards
> >>
> >> CESM Software Engineering Group
> >> National Center for Atmospheric Research
> >> Boulder, CO
> >> 303-497-1842
> >
> > --
> > Samuel Levis -
> > slevis at ucar.edu
> >
> > National Center for Atmospheric Research
> > PO Box 3000, Boulder CO 80307-3000      <- use for mail
> > 3090 Center Green Dr., Boulder CO 80301 <- vs. shipping
> >
> > tel
> > 303 497-1627
> > ; fax -1348; skype: samuellevis2
> >
> > http://www.cgd.ucar.edu/tss
> >
> >
> > Terrestrial Sciences Section in the
> > Climate & Global Dynamics Division
> >
> >
> >
> >
> > --
> > Jim Edwards
> >
> > CESM Software Engineering Group
> > National Center for Atmospheric Research
> > Boulder, CO
> > 303-497-1842
> >
> >
> >
> > --
> > Jim Edwards
> >
> > CESM Software Engineering Group
> > National Center for Atmospheric Research
> > Boulder, CO
> > 303-497-1842
> 
> 
Russ Rew                                         UCAR Unidata Program
russ at unidata.ucar.edu                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: KLB-596506
Department: Support netCDF
Priority: Normal
Status: Closed
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_orig.cdl-trunc
Type: application/octet-stream
Size: 539312 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/parallel-netcdf/attachments/20130305/a120d0e8/attachment-0001.obj>


More information about the parallel-netcdf mailing list