[EXTERNAL] Re: Problem in variable bounds checking

Sjaardema, Gregory D gdsjaar at sandia.gov
Thu Mar 31 13:53:31 CDT 2016


I have a similar patch submitted to NetCDF team.  The same test occurs three times in the nc4hdf.c file; two times without the count[]==0 check, one time with it.  If they accept the pull request, I will then resubmit this to the pnetcdf group.

Thanks,
..Greg

-- 
"A supercomputer is a device for turning compute-bound problems into I/O-bound problems”







On 3/31/16, 12:35 PM, "Wei-keng Liao" <wkliao at eecs.northwestern.edu> wrote:

>Hi, Greg
>
>This out-of-bound check is unfortunately enforced by netCDF even if count is zero.
>There is a comment mentioned at the top of that function. PnetCDF tries to conform
>with netCDF on returning error codes, and hence enforces the same check.
>
>A while ago when I revised this part of codes, I tried to skip this check for
>zero-length request, but found that will cause many fails for the internal test
>programs. Unless netCDF changes its error checking on this, PnetCDF will keep
>this the same. Here is a small program I tested against netCDF that can generate
>the same error.
>
>
>#include <stdio.h>
>#include <stdlib.h>
>#include <netcdf.h>
>
>#define ERR {if(err!=NC_NOERR){printf("Error at line=%d: %s\n", __LINE__, nc_strerror(err));}}
>
>int main(int argc, char** argv) {
>    int err, ncid, varid, dimid, buf[10];
>    size_t start, count;
>
>    err = nc_create("testfile.nc", NC_WRITE, &ncid); ERR
>    err = nc_def_dim(ncid, "dim", 10, &dimid); ERR
>    err = nc_def_var(ncid, "var", NC_INT, 1, &dimid, &varid); ERR
>    err = nc_enddef(ncid); ERR
>
>    start = 10;
>    count = 0;
>    err = nc_put_vara_int(ncid, varid, &start, &count, buf); ERR
>    err = nc_close(ncid); ERR
>    return 0;
>}
>
>
>
>Wei-keng
>
>On Mar 31, 2016, at 12:43 PM, Sjaardema, Gregory D wrote:
>
>> The following code is in filetype.c, function `NC_start_count_stride_ck`
>> 
>>>     for (; i<varp->ndims; i++) {
>>>         if (start[i] < 0 || start[i] >= varp->shape[i])
>>>             DEBUG_RETURN_ERROR(NC_EINVALCOORDS)
>>> 
>>>         if (varp->shape[i] < 0) DEBUG_RETURN_ERROR(NC_EEDGE)
>>> 
>>>         if (count != NULL) {
>>>             if (count[i] < 0) /* no negative count[] */
>>>                 DEBUG_RETURN_ERROR(NC_ENEGATIVECNT)
>>> 
>>>             if (stride == NULL) { /* for vara APIs */
>>>                 if (count[i] > varp->shape[i] ||
>>>                     start[i] + count[i] > varp->shape[i])
>>>                     DEBUG_RETURN_ERROR(NC_EEDGE)
>>>             }
>>>             else { /* for vars APIs */
>>>                 if (count[i] > 0 &&
>>>                     start[i] + (count[i]-1) * stride[i] >= varp->shape[i])
>>>                     DEBUG_RETURN_ERROR(NC_EEDGE)
>>>                 if (stride[i] == 0) DEBUG_RETURN_ERROR(NC_ESTRIDE)
>>>             }
>>>         }
>>>         /* else is for var1 APIs */
>>> 
>> There is an issue when the process with the highest rank has zero items to output.  As an example, if I have 4 mpi processes which are each writing the following amount of data:
>>  * rank 0: 0 items
>>  * rank 1: 2548 items
>>  * rank 2: 4352 items
>>  * rank 3: 0 items.
>> 
>> I will define the variable to have a length of 6900 items (0 + 2548 + 4352 + 0).  When I am outputting data to the variable, each rank will call nc_put_vara_longlong with the following start and count values:
>>  * rank 0: start = 0, count = 0
>>  * rank 1: start = 0, count = 2548
>>  * rank 2: start = 2548, count = 4352
>>  * rank 3: start = 6900, count = 0.
>> 
>> In each case, the `start` for rank N is equal to `start` for rank N-1 + `count` for rank N-1.  This all works ok until the highest rank is writing 0 items.  In that case, the `start` value for that rank is equal to the total size of the variable and the check in the code fragment shown above fails since `start[i] == varp->shape[i]`.
>> 
>> This could be fixed in the application code by checking whether the `count` is zero and if so, then set `start` to 0 also, but I think that is a kluge that should not be required. 
>> 
>> My suggestion is to make the test be:
>> ```
>>   if (start[i] < 0 || (start[i] >= varp->shape[i] && count[i] > 0))
>> ```
>> This is in version 1.7.0.  It also appears in 1.6.1 in the function Nccoordck.
>> 
>> ..Greg
>> 
>> -- 
>> "A supercomputer is a device for turning compute-bound problems into I/O-bound problems”
>


More information about the parallel-netcdf mailing list