writing large variables

John Clyne clyne at ucar.edu
Wed Jan 16 12:25:07 CST 2013


On Jan 16, 2013, at 11:07 AM, Wei-keng Liao wrote:

> In your case, NC_64BIT_DATA is indeed required.
> 
> In netcdf, if you define a variable with > 2^31 elements and it is the last
> variable defined in the file, then you probably can still use CDF-2.
> Below is the netcdf code I tested (using netcdf library version 4.2.1.1).
> var2 is the variable with 8B elements.
> 

Thanks for clarifying that. Wei-keng. We are indeed able to write and read "large" variables with netCDF in CDF-2 format with the restrictions noted. The problem is that we need to write the data in parallel, and the last time we did the experiment, which is admittedly a while ago, pnetcdf performed significantly better than Unidata's netCDF-4.

jc

> #include <stdio.h>
> #include <netcdf.h>
> 
> #define NZ 2
> #define NY 1048576
> #define NX 8192
> 
> #define ERR(e) {if (e!= NC_NOERR) {printf("Error: %s\n", nc_strerror(e)); exit(-1);}}
> 
> int main(int argc, char* argv[])
> {
>    int ncid, varid1, varid2, old_modep, cmode, err;
>    int dimids[3];
>    size_t start[3], count[2];
>    double buf;
> 
>    cmode = NC_CLOBBER | NC_64BIT_OFFSET;
> 
>    if (err = nc_create("testfile.nc", cmode, &ncid)) ERR(err);
>    if (err = nc_def_dim(ncid, "z", NZ, &dimids[0])) ERR(err);
>    if (err = nc_def_dim(ncid, "y", NY, &dimids[1])) ERR(err);
>    if (err = nc_def_dim(ncid, "x", NX, &dimids[2])) ERR(err);
> 
>    if (err = nc_def_var(ncid, "var1", NC_DOUBLE, 2, dimids, &varid1))
>        ERR(err);
>    if (err = nc_def_var(ncid, "var2", NC_DOUBLE, 2, dimids+1, &varid2))
>        ERR(err);
> 
>    if (err = nc_set_fill(ncid, NC_NOFILL, &old_modep)) ERR(err);
>    if (err = nc_enddef(ncid)) ERR(err);
> 
>    /* write the last element */
>    start[0] = NZ-1;
>    start[1] = NY-1;
>    start[2] = NX-1;
>    count[0] = count[1] = 1;
>    if (err = nc_put_vara_double(ncid, varid1, start, count, &buf))
>        ERR(err);
>    if (err = nc_put_vara_double(ncid, varid2, start+1, count, &buf))
>        ERR(err);
> 
>    if (err = nc_close(ncid)) ERR(err);
> 
>    return 0;
> }
> 
> % ls -l testfile.nc
> -rw------- 1 wkliao users 68736254100 Jan 16 11:58 testfile.nc
> 
> % ncdump -h testfile.nc
> netcdf testfile {
> dimensions:
> 	z = 2 ;
> 	y = 1048576 ;
> 	x = 8192 ;
> variables:
> 	double var1(z, y) ;
> 	double var2(y, x) ;
> }
> 
> % ncdump -k testfile.nc
> 64-bit offset
> 
> 
> Wei-keng
> 
> On Jan 16, 2013, at 11:49 AM, John Clyne wrote:
> 
>> Hi Wei-Keng,
>> 
>> I should have been more clear. The array has more than 2^31 elements. Our test case presently has on the order of 2^33 elements, and soon we'll need to support 2^36 elements or more.
>> 
>> It sounds like the NC_64BIT_DATA flag is required in our case?
>> 
>> thanks - jc
>> 
>> On Jan 16, 2013, at 9:27 AM, Wei-keng Liao wrote:
>> 
>>> Hi, John,
>>> 
>>> The mode NC_64BIT_DATA (CDF-5 format) allows you to define an array variable
>>> that has more than 2^31 elements. Note this is about the number of "elements"
>>> not the size of an array.
>>> 
>>> If your array has less elements but the size is more than 4GB, then
>>> NC_64BIT_OFFSET can still be used. For example, double foo[Z][Y][X] has
>>> Z*Y*X elements. If Z*Y*X < 2^31 and Z*Y*X*sizeof(double) > 2^31, then you
>>> can still use NC_64BIT_OFFSET.
>>> 
>>> Is this your case?
>>> 
>>> Wei-keng
>>> 
>>> On Jan 16, 2013, at 9:58 AM, John Clyne wrote:
>>> 
>>>> Thanks for the quick response, Rob. I've poked the Unidata folks as well to see if they have any updates on their CDF-5 support plans. One followup question: Is it possible to output large variables from pnetcdf without using CDF-5? netCDF seems to support this in a CDF-2 format, albeit with restrictions. For our application we can live with those restrictions.'
>>>> 
>>>> Thanks again for your help.
>>>> 
>>>> Best,
>>>> 
>>>> jc
>>>> 
>>>> On Jan 16, 2013, at 7:54 AM, Rob Latham wrote:
>>>> 
>>>>> On Tue, Jan 15, 2013 at 05:09:05PM -0700, John Clyne wrote:
>>>>>> Is it possible to write a large variable (>4GB) to a file with pnetcdf and read back the variable from the resulting file with netCDF? Outputting a large variable with pnetcdf appears to require passing the NC_64BIT_DATA flag (not NC_64BIT_OFFSET) to nc_create_par() - without this flag an error is generated. The file is written successfully, but when NC_64BIT_DATA is used the file is unrecognized by netcdf. For example:
>>>>>> 
>>>>>> yslogin2[43] ncdump -h vx.0000.nc0
>>>>>> ncdump: vx.0000.nc0: NetCDF: Unknown file format
>>>>>> 
>>>>>> From what I can gather from the web the NC_64BIT_DATA results in the generation of a CDF-5 formatted file. Is there support for CDF-5 in netCDF, or any other options for mixing pnetcdf and netCDF?
>>>>> 
>>>>> Hi John:  the short answer is there is no "unidata netCDF" support for
>>>>> pnetcdf's CDF-5 (giant variables) file format.  
>>>>> 
>>>>> I've been working with Unidata  on and off over the last few years to
>>>>> find a way that we could use NetCDF-4's "netcdf on pnetcdf" feature to
>>>>> support CDF-5, but that support right now only exists as a series of
>>>>> patches yet to be incorporated into Unidata's tree.
>>>>> 
>>>>> ==rob
>>>>> 
>>>>> -- 
>>>>> Rob Latham
>>>>> Mathematics and Computer Science Division
>>>>> Argonne National Lab, IL USA
>>>> 
>>>> John Clyne
>>>> National Center for Atmospheric Research
>>>> 303.497.1236 (w), 303.809.1922 (c)
>>>> clyne at ucar.edu
>>>> 
>>>> 
>>>> 
>>> 
>> 
>> John Clyne
>> National Center for Atmospheric Research
>> 303.497.1236 (w), 303.809.1922 (c)
>> clyne at ucar.edu
>> 
>> 
>> 
> 

John Clyne
National Center for Atmospheric Research
303.497.1236 (w), 303.809.1922 (c)
clyne at ucar.edu





More information about the parallel-netcdf mailing list