Unchecked memory allocation and potential performance problem

William Gropp gropp at mcs.anl.gov
Wed Dec 6 10:40:20 CST 2006


By aligned, I meant on file block boundaries.  Just as data not on  
"word size" boundaries can be slow in the processor, data not on file  
block boundaries, particularly when multiple threads/processes are  
accessing the same file, can be slower than aligned data (see  
O_DIRECT restrictions on some filesystems).  Of course, those  
boundaries are multiples of 16 to 256k :)

Bill

On Dec 6, 2006, at 10:18 AM, Russ Rew wrote:

> On Wed, Dec 06, 2006 at 09:53:18 -0600, Rob Latham wrote:
>> Off the top of my head there are two not-too-hard ways we can do  
>> this:
>>
>> There's nothing in the CDF-1 or CDF-2 file format spec that prevents
>> us from using an arbitrarily large header to describe the data.   
>> If we
>> know the right parameters for alignment and blocksize, we can pad the
>> header out to a useful point (which might somewhat reduce the  
>> chance a
>> re-definition would trigger a costly data shuffle).
>>
>> Same thing for variables.  We don't *have* to place variables butting
>> up against each other.  They could also be padded out to beneficial
>> points in the file.  This change would be more invasive than padding
>> the header.
>
> There is a documented serial netCDF-3 interface for reserving extra
> space in the header and for controlling alignment of the data sections
> for fixed-size and record variables, using the function nc__enddef
> (note the two underscores in the name):
>
>   http://www.unidata.ucar.edu/software/netcdf/docs/netcdf- 
> c.html#nc_005f_005fenddef
>
> Also by default, data for variables starts on four-byte boundaries, so
> badly aligned accesses should not occur except possibly when getting
> subsets of byte or short variables.
>
> --Russ
>
>
>




More information about the parallel-netcdf mailing list