Unchecked memory allocation and potential performance problem
Rob Ross
rross at mcs.anl.gov
Wed Dec 6 10:50:51 CST 2006
Thanks for the reminder Russ.
I think it would be in our best interest to align based on file system
boundaries, picking something like 256K in the cases where we don't know
anything useful and aren't told anything by the user.
Let's have a look at nc__enddef and see how it works. We'll probably
want a more obvious name, or to allow the user to just pass the same
additional information in MPI_Info at open time...
Thanks!
Rob
William Gropp wrote:
> By aligned, I meant on file block boundaries. Just as data not on "word
> size" boundaries can be slow in the processor, data not on file block
> boundaries, particularly when multiple threads/processes are accessing
> the same file, can be slower than aligned data (see O_DIRECT
> restrictions on some filesystems). Of course, those boundaries are
> multiples of 16 to 256k :)
>
> Bill
>
> On Dec 6, 2006, at 10:18 AM, Russ Rew wrote:
>
>> On Wed, Dec 06, 2006 at 09:53:18 -0600, Rob Latham wrote:
>>> Off the top of my head there are two not-too-hard ways we can do this:
>>>
>>> There's nothing in the CDF-1 or CDF-2 file format spec that prevents
>>> us from using an arbitrarily large header to describe the data. If we
>>> know the right parameters for alignment and blocksize, we can pad the
>>> header out to a useful point (which might somewhat reduce the chance a
>>> re-definition would trigger a costly data shuffle).
>>>
>>> Same thing for variables. We don't *have* to place variables butting
>>> up against each other. They could also be padded out to beneficial
>>> points in the file. This change would be more invasive than padding
>>> the header.
>>
>> There is a documented serial netCDF-3 interface for reserving extra
>> space in the header and for controlling alignment of the data sections
>> for fixed-size and record variables, using the function nc__enddef
>> (note the two underscores in the name):
>>
>>
>> http://www.unidata.ucar.edu/software/netcdf/docs/netcdf-c.html#nc_005f_005fenddef
>>
>>
>> Also by default, data for variables starts on four-byte boundaries, so
>> badly aligned accesses should not occur except possibly when getting
>> subsets of byte or short variables.
>>
>> --Russ
>>
>>
>>
>
More information about the parallel-netcdf
mailing list