Issue in tests/C/pres_temp_wr.c

Wei-keng Liao wkliao at eecs.northwestern.edu
Mon Apr 4 18:09:18 CDT 2016


Hi, Greg

Thanks for reporting the problem. The fix is now in the source repo.
http://trac.mcs.anl.gov/projects/parallel-netcdf/changeset/2391

Wei-keng

On Apr 4, 2016, at 5:04 PM, Sjaardema, Gregory D wrote:

> While looking through the parallel-netcdf-1.7.0 tests, I stumbled across the press_temp_wr.c file.  I think the logic in this file is failed.
> 
> 	• Each processor has the exact same view of the 3D pressure and temperature data.  This is OK for a test.
> 	• The code below is used for each processor to write out a portion of the pressure and temperature data to the file:
>>>    /* These settings tell netcdf to write one timestep of data. (The
>>>      setting of start[0] inside the loop below tells netCDF which
>>>                     &data[0][0][0]);
>>>      timestep to write.) */
>>>    count[0] = 1;
>>>    count[1] = NLVL/nprocs;
>>>    count[2] = NLAT;
>>>    count[3] = NLON;
>>>    start[1] = 0;
>>>    start[2] = 0;
>>>    start[3] = 0;
>>> 
>>>    /* Write the pretend data. This will write our surface pressure and
>>>       surface temperature data. The arrays only hold one timestep worth
>>>       of data. We will just rewrite the same data for each timestep. In
>>>       a real application, the data would change between timesteps. */
>>> 
>>>    for (rec = 0; rec < NREC; rec++)
>>>    {
>>>       start[0] = rec;
>>>       err = ncmpi_put_vara_float_all(ncid, pres_varid, start, count, &pres_out[0]][0][0]);
>>>       CHECK_ERR
>>>       err = ncmpi_put_vara_float_all(ncid, temp_varid, start, count, &temp_out[0][0][0]);
>>>       CHECK_ERR
>>>    }
> 	• I think that the start[1] value and the first dimension of the pres_out and temp_out array in the calls to ncmpi_put_vara_float are incorrect.  With these values, only 1/nprocs portion of the array is written and the remainder is zero-filled.  
> 		• This is verified by running press_temp_wr on 1, 2, and 4 processors and doing an ncdump of the file.
> 		• All runs should give the same file, but they don’t.
> 	• I think that the correct code should be something similar to:
>>>    /* These settings tell netcdf to write one timestep of data. (The
>>>      setting of start[0] inside the loop below tells netCDF which
>>>                     &data[0][0][0]);
>>>      timestep to write.) */
>>>    count[0] = 1;
>>>    count[1] = NLVL/nprocs;
>>>    count[2] = NLAT;
>>>    count[3] = NLON;
>>>    start[1] = rank*(NLVL/nprocs);
>>>    start[2] = 0;
>>>    start[3] = 0;
>>> 
>>>    /* Write the pretend data. This will write our surface pressure and
>>>       surface temperature data. The arrays only hold one timestep worth
>>>       of data. We will just rewrite the same data for each timestep. In
>>>       a real application, the data would change between timesteps. */
>>> 
>>>    for (rec = 0; rec < NREC; rec++)
>>>    {
>>>       start[0] = rec;
>>>       err = ncmpi_put_vara_float_all(ncid, pres_varid, start, count, &pres_out[start[1]][0][0]);
>>>       CHECK_ERR
>>>       err = ncmpi_put_vara_float_all(ncid, temp_varid, start, count, &temp_out[start[1]][0][0]);
>>>       CHECK_ERR
>>>    }
> 
> 	• With these changes, the exact same file is written for 1, 2, and 4 processors.
> 	• Similar changes are needed for the press_temp_rd.c 
> 	• Note that a “real” application would probably not replicate the pres_out and temp_out arrays on all processes and instead each process would have its own portion of the array; however, for this test in which the arrays are replicated on all processors, I think the changes shown above are needed to give correct results.
> ..Greg
> 
> -- 
> "A supercomputer is a device for turning compute-bound problems into I/O-bound problems”



More information about the parallel-netcdf mailing list