Issue in tests/C/pres_temp_wr.c
Wei-keng Liao
wkliao at eecs.northwestern.edu
Mon Apr 4 18:09:18 CDT 2016
Hi, Greg
Thanks for reporting the problem. The fix is now in the source repo.
http://trac.mcs.anl.gov/projects/parallel-netcdf/changeset/2391
Wei-keng
On Apr 4, 2016, at 5:04 PM, Sjaardema, Gregory D wrote:
> While looking through the parallel-netcdf-1.7.0 tests, I stumbled across the press_temp_wr.c file. I think the logic in this file is failed.
>
> • Each processor has the exact same view of the 3D pressure and temperature data. This is OK for a test.
> • The code below is used for each processor to write out a portion of the pressure and temperature data to the file:
>>> /* These settings tell netcdf to write one timestep of data. (The
>>> setting of start[0] inside the loop below tells netCDF which
>>> &data[0][0][0]);
>>> timestep to write.) */
>>> count[0] = 1;
>>> count[1] = NLVL/nprocs;
>>> count[2] = NLAT;
>>> count[3] = NLON;
>>> start[1] = 0;
>>> start[2] = 0;
>>> start[3] = 0;
>>>
>>> /* Write the pretend data. This will write our surface pressure and
>>> surface temperature data. The arrays only hold one timestep worth
>>> of data. We will just rewrite the same data for each timestep. In
>>> a real application, the data would change between timesteps. */
>>>
>>> for (rec = 0; rec < NREC; rec++)
>>> {
>>> start[0] = rec;
>>> err = ncmpi_put_vara_float_all(ncid, pres_varid, start, count, &pres_out[0]][0][0]);
>>> CHECK_ERR
>>> err = ncmpi_put_vara_float_all(ncid, temp_varid, start, count, &temp_out[0][0][0]);
>>> CHECK_ERR
>>> }
> • I think that the start[1] value and the first dimension of the pres_out and temp_out array in the calls to ncmpi_put_vara_float are incorrect. With these values, only 1/nprocs portion of the array is written and the remainder is zero-filled.
> • This is verified by running press_temp_wr on 1, 2, and 4 processors and doing an ncdump of the file.
> • All runs should give the same file, but they don’t.
> • I think that the correct code should be something similar to:
>>> /* These settings tell netcdf to write one timestep of data. (The
>>> setting of start[0] inside the loop below tells netCDF which
>>> &data[0][0][0]);
>>> timestep to write.) */
>>> count[0] = 1;
>>> count[1] = NLVL/nprocs;
>>> count[2] = NLAT;
>>> count[3] = NLON;
>>> start[1] = rank*(NLVL/nprocs);
>>> start[2] = 0;
>>> start[3] = 0;
>>>
>>> /* Write the pretend data. This will write our surface pressure and
>>> surface temperature data. The arrays only hold one timestep worth
>>> of data. We will just rewrite the same data for each timestep. In
>>> a real application, the data would change between timesteps. */
>>>
>>> for (rec = 0; rec < NREC; rec++)
>>> {
>>> start[0] = rec;
>>> err = ncmpi_put_vara_float_all(ncid, pres_varid, start, count, &pres_out[start[1]][0][0]);
>>> CHECK_ERR
>>> err = ncmpi_put_vara_float_all(ncid, temp_varid, start, count, &temp_out[start[1]][0][0]);
>>> CHECK_ERR
>>> }
>
> • With these changes, the exact same file is written for 1, 2, and 4 processors.
> • Similar changes are needed for the press_temp_rd.c
> • Note that a “real” application would probably not replicate the pres_out and temp_out arrays on all processes and instead each process would have its own portion of the array; however, for this test in which the arrays are replicated on all processors, I think the changes shown above are needed to give correct results.
> ..Greg
>
> --
> "A supercomputer is a device for turning compute-bound problems into I/O-bound problems”
More information about the parallel-netcdf
mailing list