[netCDF #KLB-596506]: apparent bug in netcdf-4.2

Rob Latham robl at mcs.anl.gov
Tue Mar 5 10:08:54 CST 2013


On Mon, Mar 04, 2013 at 09:02:23PM -0700, Unidata netCDF Support wrote:
> Jim,
> 
> > That turns out to have been the problem.   The original file was created
> > with pnetcdf.
> 
> So now I'm a little confused about the relationship between pnetcdf and netCDF.

Me too!

> Except for pnetcdf's CDF-5 format (64-bit everything), I thought their software could
> read and write both netCDF-3 formats, "classic" and 64-bit offset.  

That is indeed our intention.  If we cannot, that's a *serious* bug on
our (team pnetcdf's) part

> But your comment seems to imply that files created with pnetcdf
> might not be modifiable by our netCDF software.  I didn't see any
> warnings about this limitation in the parallel-netCDF documentation,
> and it seems like a problem that our software can open and try to
> write to a pnetcdf file in a way that seems to silently corrupt the
> variable offsets in the file header.
> 
> Is this all due to a difference in interpretation of what the v_align parameter means in
> the nc__enddef(ncid, h_minfree, v_align, v_minfree, r_align) function?

It must be.  We relied on wording such as 
"The file format requires mod 4 alignment, so the align parameters are
silently rounded up to multiples of 4" 

What is the proper interpretation of 'r_align' ?

==rob

> 
> --Russ
> 
> > On Mon, Mar 4, 2013 at 3:12 PM, Jim Edwards <jedwards at ucar.edu> wrote:
> > 
> > > Russ,
> > >
> > > We think that the original file may have been written with pnetcdf.   We
> > > are going to try to recreate the file with netcdf and again with pnetcdf
> > > and see if that explains the issue.
> > >
> > > Jim
> > >
> > >
> > > On Mon, Mar 4, 2013 at 2:31 PM, Samuel Levis <slevis at ucar.edu> wrote:
> > >
> > >>  Not exactly. I tried 2-degree to 2-degree, 2-degree to 0.5, 2-degree to
> > >> 0.25, and others. All cases worked except the ones with the 0.5-degree file
> > >> as output.
> > >>
> > >> I also tried 0.5-degree to 0.5-degree (mapping the file into itself) and
> > >> that failed. When I say failed, I mean that the output file ends up with
> > >> junk in it.
> > >>
> > >> Sam
> > >>
> > >>
> > >> On 03/04/2013 02:26 PM, Jim Edwards wrote:
> > >>
> > >> Hi Russ,
> > >>
> > >> Another piece of information.   This program interpolates data from a
> > >> file of one resolution (2 degree in this case) to another.  When the output
> > >> file is low resolution, 1/2 degree or lower, the output file looks fine, no
> > >> corruption that we can detect.   It's only when the output file is higher
> > >> resolution (1/4 degree) that this problem comes about.
> > >>
> > >> Jim
> > >>
> > >> On Mon, Mar 4, 2013 at 2:04 PM, Jim Edwards <jedwards at ucar.edu> wrote:
> > >>
> > >>> Hi Russ,
> > >>>
> > >>> It looks like that file was originally created on bluefire on 11/21/11,
> > >>> I don't have any information about which netcdf library was used, but I
> > >>> think that some adjustment may have been made inside netcdf for performance
> > >>> on gpfs filesystems.
> > >>>
> > >>> But doesn't your own
> > >>>
> > >>> int nc__enddef(int ncid, size_t h_minfree, size_t v_align,
> > >>>                     size_t v_minfree, size_t r_align);
> > >>>
> > >>>
> > >>> allow for changing this alignment?   I don't know that that was done for
> > >>> this file, but it would seem to suggest that there is no assumption being
> > >>> violated about these alignments.  Or that one part of netcdf is assuming
> > >>> something which another part is not.
> > >>>
> > >>>
> > >>>
> > >>> On Mon, Mar 4, 2013 at 12:53 PM, Unidata netCDF Support <
> > >>> support-netcdf at unidata.ucar.edu> wrote:
> > >>>
> > >>>> Hi Jim,
> > >>>>
> > >>>> I'm curious how the original file you provided was created and perhaps
> > >>>> modified.  It has a peculiar alignment characteristic that I haven't
> > >>>> seen before, and if there are more netCDF files being created the same
> > >>>> way, we may nned to adapt.
> > >>>>
> > >>>> Could you tell me the history of the file, what file system it was
> > >>>> written on, and whether the netCDF library with which it was written
> > >>>> was modified in any way?
> > >>>>
> > >>>> The file has this characteristic, which would indicate a non-Posix
> > >>>> file system: it is using 512-byte alignment of data values rather than
> > >>>> the 4-byte alignment assumed by netCDF. So, for example, the data
> > >>>> block for fixed-size variables begins with 9 scalar integers that
> > >>>> should take 4 bytes each. The offsets computed for these values from
> > >>>> the beginning of the fixed-size data block are 0, 4, 8, 12, 16, 20,
> > >>>> 24, 28, 32, so there is no padding or wasted space. The offsets from
> > >>>> the beginning of the fixed-size data block that are actually stored in
> > >>>> the
> > >>>> header for these variables are 0, 512, 1024, ... , 4096. If the file
> > >>>> system used to write the data originally could not write data on
> > >>>> 4-byte boundaries, I think that violates the assumption of netCDF and
> > >>>> POSIX I/O. Nevertheless, if the nc_endef() call pays attention to the
> > >>>> file offsets for each variable that are stored in the header (as the
> > >>>> netCDF library does when reading the file), rather than computing them
> > >>>> from assuming 4-byte alignment, perhaps this file can be modified
> > >>>> correctly.
> > >>>>
> > >>>> The function where we might be able to adapt to this is
> > >>>> nc3internal.c:NC_begins(), which is called from
> > >>>> nc3internal.c:NC_enddef().  In any case it's a netCDF bug to write
> > >>>> something that can't be later read correctly, so if our unmodified
> > >>>> library wrote that file and we can't adapt to it, then it was a bug
> > >>>> to not emit an error message for trying to create a file on the original
> > >>>> non-POSIX file system.  Also, the data seems to all be there in the
> > >>>> "corrupted" file, which can be fixed by just restoring the variable
> > >>>> offsets in the file header to the peculiar values in the original ...
> > >>>>
> > >>>> --Russ
> > >>>>
> > >>>> Russ Rew                                         UCAR Unidata Program
> > >>>> russ at unidata.ucar.edu                      http://www.unidata.ucar.edu
> > >>>>
> > >>>>
> > >>>>
> > >>>> Ticket Details
> > >>>> ===================
> > >>>> Ticket ID: KLB-596506
> > >>>> Department: Support netCDF
> > >>>> Priority: Normal
> > >>>> Status: Closed
> > >>>>
> > >>>>
> > >>>
> > >>>
> > >>>  --
> > >>> Jim Edwards
> > >>>
> > >>> CESM Software Engineering Group
> > >>> National Center for Atmospheric Research
> > >>> Boulder, CO
> > >>> 303-497-1842
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> Jim Edwards
> > >>
> > >> CESM Software Engineering Group
> > >> National Center for Atmospheric Research
> > >> Boulder, CO
> > >> 303-497-1842
> > >>
> > >>
> > >> --
> > >> Samuel Levis - slevis at ucar.edu
> > >> National Center for Atmospheric Research
> > >> PO Box 3000, Boulder CO 80307-3000      <- use for mail
> > >> 3090 Center Green Dr., Boulder CO 80301 <- vs. shipping
> > >>
> > >> tel 303 497-1627; fax -1348; skype: samuellevis2http://www.cgd.ucar.edu/tss
> > >>
> > >> Terrestrial Sciences Section in the
> > >> Climate & Global Dynamics Division
> > >>
> > >>
> > >
> > >
> > > --
> > > Jim Edwards
> > >
> > > CESM Software Engineering Group
> > > National Center for Atmospheric Research
> > > Boulder, CO
> > > 303-497-1842
> > >
> > 
> > 
> > 
> > --
> > Jim Edwards
> > 
> > CESM Software Engineering Group
> > National Center for Atmospheric Research
> > Boulder, CO
> > 303-497-1842
> > 
> > 
> Russ Rew                                         UCAR Unidata Program
> russ at unidata.ucar.edu                      http://www.unidata.ucar.edu
> 
> 
> 
> Ticket Details
> ===================
> Ticket ID: KLB-596506
> Department: Support netCDF
> Priority: Normal
> Status: Closed
> 

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA


More information about the parallel-netcdf mailing list