metadata consistency

Wei-keng Liao wkliao at ece.northwestern.edu
Thu Jul 11 14:18:45 CDT 2013


In netCDF, the put_att_<type> is allowed in data mode only when
it is used to change an existing attribute. I consider
this case rare, and thus am asking the community if this is
a common practice.

The solution of "rank 0 wins" does not solve the read case.
For example, one process change a time stamp (as a global attribute)
in data mode. This change will not be made aware to other processes.
So, when other processes call get_att(), they will get the old value.
Will this be OK?

One milder solution is to make the APIs collective, if they are
called in data mode.

Wei-keng

On Jul 11, 2013, at 1:21 PM, Rob Latham wrote:

> On Thu, Jul 11, 2013 at 12:09:54PM -0500, Wei-keng Liao wrote:
>> One minor correction. The last API in the list I intent to say
>> is the family of ncmpi_put_att_<type> APIs. The <type> can be one
>> of text, uchar, schar, short, int, ...
>> 
> 
> I'm a little wary of changing our semantics in such a mature piece of
> software.  I think you are right that most people are already doing
> this, but it makes me a bit nervous.  
> 
> the put_att_<type> change has me the most nervous.  
> 
> It's not as nice as your proposal, but could we just say "rank 0 wins
> if there is ever inconsistent metadata" ?
> 
> ==rob
> 
>> Wei-keng
>> 
>> On Jul 11, 2013, at 12:03 PM, Wei-keng Liao wrote:
>> 
>>> Dear PnetCDF users,
>>> 
>>> I am working on strengthening the PnetCDF's metadata consistency and
>>> would like to change/limit the usage of APIs that modified the
>>> metadata (file header) of a netCDF file. These APIs are:
>>>   ncmpi_rename_dim(),
>>>   ncmpi_rename_var(),
>>>   ncmpi_copy_att(),
>>>   ncmpi_rename_att(), and
>>>   ncmpii_put_att().              <------- correction !
>>> 
>>> (The consistency here is referring to the consistency of file header
>>> data stored in memory across all MPI processes.)
>>> 
>>> In netCDF, the above APIs are allowed in data mode if the space
>>> required to store the new metadata (attributes, names, etc.) is
>>> less than the old one. Otherwise, they must be called in the define
>>> mode.
>>> 
>>> In PnetCDF, I would like to change that to allow these APIs only
>>> in define mode. If your applications require the above APIs to
>>> be called in data mode, please do let me know.
>>> 
>>> Here is my reason for the above change. In data mode, if metadata
>>> is changed on one process's memory (or even the change is written
>>> to the file by that process because NC_SHARE is set), there is no
>>> way to propagate the change from this process to other processes,
>>> until ncmpi_close() or ncmpi_sync() is called. If allowing these
>>> APIs in define mode only, we can rely on ncmpi_endef() to
>>> ensure/check the consistency.
>>> 
>>> Please let me know if your applications will have a problem with
>>> such change.
>>> 
>>> (My plan is to make NC_SHARE the default mode for PnetCDF as
>>> PnetCDF IS developed to handle parallel access to shared files.
>>> The above suggested API changes is the first step of my plan.)
>>> 
>>> 
>>> Wei-keng
>>> 
>> 
> 
> -- 
> Rob Latham
> Mathematics and Computer Science Division
> Argonne National Lab, IL USA



More information about the parallel-netcdf mailing list