pnetcdf bug?
Bill Sacks
wsacks at gmail.com
Wed Oct 28 07:29:45 CDT 2015
Hi Wei-keng,
Do you have any sense of when this bug would apply? I am telling people to use caution when doing any manipulations of files written by pnetcdf, using tools built on top of the vanilla netcdf library (i.e., not pnetcdf-based tools). Would you agree?
Thanks,
Bill
> On Oct 27, 2015, at 4:29 PM, Wei-keng Liao <wkliao at eecs.northwestern.edu> wrote:
>
> Hi, Bill
>
> I confirm this is a bug in netCDF. Please go ahead submit a bug to the netCDF group.
>
> Below is the patch to fix this bug.
>
> % diff wkliao/libsrc/nc3internal.c ../netcdf-4.3.3.1/libsrc/nc3internal.c
> 213c213
> < if ((*vpp)->begin < ncp->old->vars.value[j]->begin) {
> ---
>> if ((*vpp)->begin < ncp->old->vars.value[j]->begin)
> 218,219d217
> < index = (*vpp)->begin;
> < }
>
>
> I also wrote a short program (attached) that adds 2 new variables and tested
> it on your file created by PnetCDF method. I have to add a printf statement in
> netCDF library to print the variable offsets. See comments inside the test
> program. You can also send the codes to netCDF support.
>
> If you decide to apply the patch to your netCDF library, please let me know
> if it works for you.
>
> Wei-keng
>
> <add_var.c>
> On Oct 27, 2015, at 3:19 PM, Bill Sacks wrote:
>
>> Hi Wei-keng,
>>
>> Thanks very much for looking into this. I'm happy to submit a bug to the netCDF group if you think that's the best next step.
>>
>> Superficially, this sure sounds similar to https://bugtracking.unidata.ucar.edu/browse/NCF-234 – but maybe there are details that make it differ.
>>
>> Thanks,
>> Bill
>>
>>> On Oct 27, 2015, at 1:11 PM, Wei-keng Liao <wkliao at eecs.northwestern.edu> wrote:
>>>
>>> Hi, Bill
>>>
>>> I checked the file starting offsets for the two newly added variables.
>>> It appears that ncks (netCDF underneath) does not respect the offset
>>> alignment used in the files created by PnetCDF.
>>>
>>> Your file created by netCDF has no alignment in between two adjacent variables.
>>> The other file created by PnetCDF has an alignment of 512 bytes.
>>> So, when ncks adds 2 new variables, I found the file offsets of the
>>> two new variables overlap with the last variable of the existing file.
>>> This indicates a bug in netCDF library, as ncks does not use PnetCDF library.
>>>
>>> I will dig into netCDF library to see what happens internally.
>>>
>>> Wei-keng
>>>
>>> On Oct 27, 2015, at 1:41 PM, Bill Sacks wrote:
>>>
>>>> Looking back at my notes, it seems that this problem sometimes appears in differences in actual values – i.e., it doesn't appear to just be a difference in where there are fill values.
>>>>
>>>> Thank you,
>>>> Bill
>>>>
>>>>> On Oct 27, 2015, at 12:30 PM, Wei-keng Liao <wkliao at eecs.northwestern.edu> wrote:
>>>>>
>>>>> Hi, Bill
>>>>>
>>>>> I can reproduce what you are seeing.
>>>>>
>>>>> If the differences happen only to those missing array elements (fill values),
>>>>> then this is because PnetCDF supports the fill mode only in 1.6.1.
>>>>> Please note the way fill mode is used differs from netCDF. See the release note
>>>>> and example codes in
>>>>> http://trac.mcs.anl.gov/projects/parallel-netcdf/wiki/ReleaseNotes-1.6.1
>>>>>
>>>>> Please let me know if this is the case.
>>>>>
>>>>> Wei-keng
>>>>>
>>>>> On Oct 27, 2015, at 12:41 PM, Bill Sacks wrote:
>>>>>
>>>>>> I have put the attachment on a public ftp server:
>>>>>>
>>>>>> ftp ftp.cgd.ucar.edu
>>>>>>
>>>>>> user name: anonymous
>>>>>> password: (your email address)
>>>>>>
>>>>>> cd pub/sacks
>>>>>> get pnetcdf_bug.tar.gz
>>>>>>
>>>>>> Thanks,
>>>>>> Bill
>>>>>>
>>>>>>> On Oct 27, 2015, at 11:11 AM, Wei-keng Liao <wkliao at eecs.northwestern.edu> wrote:
>>>>>>>
>>>>>>> Hi, Bill
>>>>>>>
>>>>>>> Bug NCF-234 should not be the cause, as you are using netCDF 4.3.3.1.
>>>>>>> The fix has been applied to 4.3.0. I will take a look and get back to you.
>>>>>>>
>>>>>>> Somehow your attachment did not come through my mail system.
>>>>>>> I check PnetCDF mail archive and it does not appear there either.
>>>>>>> http://lists.mcs.anl.gov/pipermail/parallel-netcdf/2015-October/001746.html
>>>>>>>
>>>>>>> Maybe the file is too big? If that is the case, please send it to me directly.
>>>>>>> Thanks
>>>>>>>
>>>>>>> Wei-keng
>>>>>>>
>>>>>>> On Oct 27, 2015, at 10:36 AM, Bill Sacks wrote:
>>>>>>>
>>>>>>>> I wonder if this could be related to this (fixed) bug:
>>>>>>>>
>>>>>>>> https://bugtracking.unidata.ucar.edu/browse/NCF-234
>>>>>>>>
>>>>>>>> As with that one, it's possible that the problem is actually in netCDF and not in pnetcdf. Does anyone have an idea for how to determine if this is a pnetcdf problem or a netcdf problem? Or should I go ahead and post this to the netcdf bug list as well?
>>>>>>>>
>>>>>>>> Charlie: I'm feeling more and more that NCO is probably off the hook here: sorry for dragging you into this initially :-)
>>>>>>>>
>>>>>>>> Bill
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Oct 27, 2015, at 9:21 AM, Bill Sacks <wsacks at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I have run into what appears to be a bug in pnetcdf: I have a file written by pnetcdf (via CESM). When I try to append a variable onto it using ncks -A, the new variable gets written properly, but a different variable on the file gets garbage values put into it. If the original file is written with standard netcdf rather than pnetcdf, the problem does not occur.
>>>>>>>>>
>>>>>>>>> I am attaching a tar file that contains files needed to see the problem. It contains two restart files written by CESM (file names beginning check_ncks...): one written with pnetcdf and one with standard netcdf (the latter has "netcdf" in its name). It also contains a third file from which I was trying to copy variables onto this file.
>>>>>>>>>
>>>>>>>>> To reproduce:
>>>>>>>>>
>>>>>>>>> cp check_ncks_problem_noInterp_1027.clm2.r.0001-01-01-01800.nc test.nc
>>>>>>>>> ncks -A -v COL_Z_p,LEVGRND_CLASS_p finidat_interp_dest.nc test.nc
>>>>>>>>> ncdump -v plant_nalloc check_ncks_problem_noInterp_1027.clm2.r.0001-01-01-01800.nc > dump1
>>>>>>>>> ncdump -v plant_nalloc test.nc > dump2
>>>>>>>>> diff dump1 dump2 | less
>>>>>>>>>
>>>>>>>>> Notice that many points that were FillValue have been replaced by garbage.
>>>>>>>>>
>>>>>>>>> If you do the same thing, but using check_ncks_problem_noInterp_netcdf_1027.clm2.r.0001-01-01-01800.nc, then the dumps are identical.
>>>>>>>>>
>>>>>>>>> I originally filed a bug report with NCO <https://sourceforge.net/p/nco/bugs/84/>, but Charlie Zender and Jim Edwards both feel that this is most likely a problem in the writing of the original file, which points to a possible pnetcdf problem.
>>>>>>>>>
>>>>>>>>> CESM was built with
>>>>>>>>>
>>>>>>>>> module load netcdf-mpi/4.3.3.1
>>>>>>>>> module load pnetcdf/1.6.0
>>>>>>>>>
>>>>>>>>> (on NCAR's yellowstone machine).
>>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>> Bill
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Bill Sacks
>>>>>>>>> CESM Software Engineering Group
>>>>>>>>> National Center for Atmospheric Research
>>>>>>>>> (303) 497-1762
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/parallel-netcdf/attachments/20151028/74a10a18/attachment.html>
More information about the parallel-netcdf
mailing list