problems writing vars with pnetcdf

Rob Ross rross at mcs.anl.gov
Sat Dec 4 12:01:59 CST 2004


Hi Katie,

It sounds like you and Jianwei have come up with a couple of different 
solutions to this problem.

It is best if you can maintain use of the collective mode, because as 
Jianwei mentioned, there are optimizations that can be applied in that 
case (in the libraries, transparent to the application) that cannot be 
applied in the independent mode, because the libraries know less about 
what you are doing as a whole in the independent mode.

We will fix our code so that your zero-particle writes work as desired.  
In the mean time, Jianwei came up with a little hack that will get around 
the problem for the moment.

Will that work for you?

Thanks!

Rob

On Fri, 3 Dec 2004, Katie Antypas wrote:

> Thanks for the email.  I'll try to make that fix.  
> 
> We had one other idea for a fix that currently doesn't work, but let me
> run it by you.  Pnetcdf allows you to work in collective and independent
> data modes.  Right now we are doing everything in collective mode (ie
> all put calls end in _all).  We thought that possibly we could get
> around this bug by writing the particles out in the independent mode.  
> That way a processor with zero particles wouldn't make the put_vars call
> at all and then the syncronization wouldn't get messed up.(?)
> 
> This seems to be more of the way that hdf5 works for us.  We don't write a 
> zero length array, instead the processor with zero particles doesn't make 
> the h5_write call.
> 
> I've been reading the bit of documenation on this which talks very 
> briefly about setting MPI_File_set_view as a file handle for collective 
> operations and MPI_COMM_SELF as the handler for independent mode.
> 
> There is this mysterious line in documentation though, 'It is difficult
> if not impossible in the general case to ensure consistency of access
> when a collection of processes are using multiple MPI_File handles to
> access the same file with mixed independent and collective
> operations....'
> 
> which sounds like this might be a more complicated fix.
> 
> any thoughts? do you think using independent mode could fix this?




More information about the parallel-netcdf mailing list