Problem in variable bounds checking

Sjaardema, Gregory D gdsjaar at sandia.gov
Thu Mar 31 12:43:47 CDT 2016


The following code is in filetype.c, function `NC_start_count_stride_ck`

    for (; i<varp->ndims; i++) {
        if (start[i] < 0 || start[i] >= varp->shape[i])
            DEBUG_RETURN_ERROR(NC_EINVALCOORDS)

        if (varp->shape[i] < 0) DEBUG_RETURN_ERROR(NC_EEDGE)

        if (count != NULL) {
            if (count[i] < 0) /* no negative count[] */
                DEBUG_RETURN_ERROR(NC_ENEGATIVECNT)

            if (stride == NULL) { /* for vara APIs */
                if (count[i] > varp->shape[i] ||
                    start[i] + count[i] > varp->shape[i])
                    DEBUG_RETURN_ERROR(NC_EEDGE)
            }
            else { /* for vars APIs */
                if (count[i] > 0 &&
                    start[i] + (count[i]-1) * stride[i] >= varp->shape[i])
                    DEBUG_RETURN_ERROR(NC_EEDGE)
                if (stride[i] == 0) DEBUG_RETURN_ERROR(NC_ESTRIDE)
            }
        }
        /* else is for var1 APIs */

There is an issue when the process with the highest rank has zero items to output.  As an example, if I have 4 mpi processes which are each writing the following amount of data:
 * rank 0: 0 items
 * rank 1: 2548 items
 * rank 2: 4352 items
 * rank 3: 0 items.

I will define the variable to have a length of 6900 items (0 + 2548 + 4352 + 0).  When I am outputting data to the variable, each rank will call nc_put_vara_longlong with the following start and count values:
 * rank 0: start = 0, count = 0
 * rank 1: start = 0, count = 2548
 * rank 2: start = 2548, count = 4352
 * rank 3: start = 6900, count = 0.

In each case, the `start` for rank N is equal to `start` for rank N-1 + `count` for rank N-1.  This all works ok until the highest rank is writing 0 items.  In that case, the `start` value for that rank is equal to the total size of the variable and the check in the code fragment shown above fails since `start[i] == varp->shape[i]`.

This could be fixed in the application code by checking whether the `count` is zero and if so, then set `start` to 0 also, but I think that is a kluge that should not be required.

My suggestion is to make the test be:
```
  if (start[i] < 0 || (start[i] >= varp->shape[i] && count[i] > 0))
```
This is in version 1.7.0.  It also appears in 1.6.1 in the function Nccoordck.

..Greg

--
"A supercomputer is a device for turning compute-bound problems into I/O-bound problems”
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/parallel-netcdf/attachments/20160331/b87107f6/attachment.html>


More information about the parallel-netcdf mailing list