pnetcdf bug?

Wei-keng Liao wkliao at eecs.northwestern.edu
Tue Oct 27 17:29:07 CDT 2015


Hi, Bill

I confirm this is a bug in netCDF. Please go ahead submit a bug to the netCDF group.

Below is the patch to fix this bug.

% diff wkliao/libsrc/nc3internal.c ../netcdf-4.3.3.1/libsrc/nc3internal.c
213c213
< 		        if ((*vpp)->begin < ncp->old->vars.value[j]->begin) {
---
> 		        if ((*vpp)->begin < ncp->old->vars.value[j]->begin)
218,219d217
<                             index = (*vpp)->begin;
<                         }


I also wrote a short program (attached) that adds 2 new variables and tested
it on your file created by PnetCDF method. I have to add a printf statement in
netCDF library to print the variable offsets. See comments inside the test
program. You can also send the codes to netCDF support.

If you decide to apply the patch to your netCDF library, please let me know
if it works for you.

Wei-keng

-------------- next part --------------
A non-text attachment was scrubbed...
Name: add_var.c
Type: application/octet-stream
Size: 1576 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/parallel-netcdf/attachments/20151027/fe0b3d8e/attachment-0001.obj>
-------------- next part --------------

On Oct 27, 2015, at 3:19 PM, Bill Sacks wrote:

> Hi Wei-keng,
> 
> Thanks very much for looking into this. I'm happy to submit a bug to the netCDF group if you think that's the best next step.
> 
> Superficially, this sure sounds similar to https://bugtracking.unidata.ucar.edu/browse/NCF-234 ? but maybe there are details that make it differ.
> 
> Thanks,
> Bill
> 
>> On Oct 27, 2015, at 1:11 PM, Wei-keng Liao <wkliao at eecs.northwestern.edu> wrote:
>> 
>> Hi, Bill
>> 
>> I checked the file starting offsets for the two newly added variables.
>> It appears that ncks (netCDF underneath) does not respect the offset
>> alignment used in the files created by PnetCDF.
>> 
>> Your file created by netCDF has no alignment in between two adjacent variables.
>> The other file created by PnetCDF has an alignment of 512 bytes.
>> So, when ncks adds 2 new variables, I found the file offsets of the
>> two new variables overlap with the last variable of the existing file.
>> This indicates a bug in netCDF library, as ncks does not use PnetCDF library.
>> 
>> I will dig into netCDF library to see what happens internally.
>> 
>> Wei-keng
>> 
>> On Oct 27, 2015, at 1:41 PM, Bill Sacks wrote:
>> 
>>> Looking back at my notes, it seems that this problem sometimes appears in differences in actual values ? i.e., it doesn't appear to just be a difference in where there are fill values.
>>> 
>>> Thank you,
>>> Bill
>>> 
>>>> On Oct 27, 2015, at 12:30 PM, Wei-keng Liao <wkliao at eecs.northwestern.edu> wrote:
>>>> 
>>>> Hi, Bill
>>>> 
>>>> I can reproduce what you are seeing.
>>>> 
>>>> If the differences happen only to those missing array elements (fill values),
>>>> then this is because PnetCDF supports the fill mode only in 1.6.1.
>>>> Please note the way fill mode is used differs from netCDF. See the release note
>>>> and example codes in
>>>> http://trac.mcs.anl.gov/projects/parallel-netcdf/wiki/ReleaseNotes-1.6.1
>>>> 
>>>> Please let me know if this is the case.
>>>> 
>>>> Wei-keng
>>>> 
>>>> On Oct 27, 2015, at 12:41 PM, Bill Sacks wrote:
>>>> 
>>>>> I have put the attachment on a public ftp server:
>>>>> 
>>>>> ftp ftp.cgd.ucar.edu
>>>>> 
>>>>> user name: anonymous
>>>>> password: (your email address)
>>>>> 
>>>>> cd pub/sacks
>>>>> get pnetcdf_bug.tar.gz
>>>>> 
>>>>> Thanks,
>>>>> Bill
>>>>> 
>>>>>> On Oct 27, 2015, at 11:11 AM, Wei-keng Liao <wkliao at eecs.northwestern.edu> wrote:
>>>>>> 
>>>>>> Hi, Bill
>>>>>> 
>>>>>> Bug NCF-234 should not be the cause, as you are using netCDF 4.3.3.1.
>>>>>> The fix has been applied to 4.3.0. I will take a look and get back to you.
>>>>>> 
>>>>>> Somehow your attachment did not come through my mail system.
>>>>>> I check PnetCDF mail archive and it does not appear there either.
>>>>>> http://lists.mcs.anl.gov/pipermail/parallel-netcdf/2015-October/001746.html
>>>>>> 
>>>>>> Maybe the file is too big? If that is the case, please send it to me directly.
>>>>>> Thanks
>>>>>> 
>>>>>> Wei-keng
>>>>>> 
>>>>>> On Oct 27, 2015, at 10:36 AM, Bill Sacks wrote:
>>>>>> 
>>>>>>> I wonder if this could be related to this (fixed) bug:
>>>>>>> 
>>>>>>> https://bugtracking.unidata.ucar.edu/browse/NCF-234
>>>>>>> 
>>>>>>> As with that one, it's possible that the problem is actually in netCDF and not in pnetcdf. Does anyone have an idea for how to determine if this is a pnetcdf problem or a netcdf problem? Or should I go ahead and post this to the netcdf bug list as well?
>>>>>>> 
>>>>>>> Charlie: I'm feeling more and more that NCO is probably off the hook here: sorry for dragging you into this initially :-)
>>>>>>> 
>>>>>>> Bill
>>>>>>> 
>>>>>>> 
>>>>>>>> On Oct 27, 2015, at 9:21 AM, Bill Sacks <wsacks at gmail.com> wrote:
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I have run into what appears to be a bug in pnetcdf: I have a file written by pnetcdf (via CESM). When I try to append a variable onto it using ncks -A, the new variable gets written properly, but a different variable on the file gets garbage values put into it. If the original file is written with standard netcdf rather than pnetcdf, the problem does not occur.
>>>>>>>> 
>>>>>>>> I am attaching a tar file that contains files needed to see the problem. It contains two restart files written by CESM (file names beginning check_ncks...): one written with pnetcdf and one with standard netcdf (the latter has "netcdf" in its name). It also contains a third file from which I was trying to copy variables onto this file.
>>>>>>>> 
>>>>>>>> To reproduce:
>>>>>>>> 
>>>>>>>> cp check_ncks_problem_noInterp_1027.clm2.r.0001-01-01-01800.nc test.nc
>>>>>>>> ncks -A -v COL_Z_p,LEVGRND_CLASS_p finidat_interp_dest.nc test.nc 
>>>>>>>> ncdump -v plant_nalloc check_ncks_problem_noInterp_1027.clm2.r.0001-01-01-01800.nc > dump1
>>>>>>>> ncdump -v plant_nalloc test.nc > dump2
>>>>>>>> diff dump1 dump2 | less
>>>>>>>> 
>>>>>>>> Notice that many points that were FillValue have been replaced by garbage. 
>>>>>>>> 
>>>>>>>> If you do the same thing, but using check_ncks_problem_noInterp_netcdf_1027.clm2.r.0001-01-01-01800.nc, then the dumps are identical.
>>>>>>>> 
>>>>>>>> I originally filed a bug report with NCO <https://sourceforge.net/p/nco/bugs/84/>, but Charlie Zender and Jim Edwards both feel that this is most likely a problem in the writing of the original file, which points to a possible pnetcdf problem.
>>>>>>>> 
>>>>>>>> CESM was built with
>>>>>>>> 
>>>>>>>>      module load netcdf-mpi/4.3.3.1
>>>>>>>>      module load pnetcdf/1.6.0
>>>>>>>> 
>>>>>>>> (on NCAR's yellowstone machine).
>>>>>>>> 
>>>>>>>> Thank you,
>>>>>>>> Bill
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Bill Sacks
>>>>>>>> CESM Software Engineering Group
>>>>>>>> National Center for Atmospheric Research
>>>>>>>> (303) 497-1762
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 



More information about the parallel-netcdf mailing list