pnetcdf and large transfers

Rob Latham robl at mcs.anl.gov
Tue Jul 2 16:44:25 CDT 2013


On Tue, Jul 02, 2013 at 04:29:38PM -0500, Wei-keng Liao wrote:
> > Perhaps I misunderstand, but I think that in the case that the I/O is to a single variable and the variable size is such that the access cannot be too large, we can safely avoid the allreduce. Right?
> 
> If the variables are fixed-sized (non-record) and the size is defined < 2GiB, then you are right
> we can avoid the allreduce (for blocking APIs only). Otherwise, I think allreduce is still
> necessary for RobL's approach.

Right. That won't catch all cases, but it will catch the important
ones: if the amount of data stored is small, then there's not a lot of
I/O over which to amortize data.

> > Is there something additional that we could learn as an artifact of the collective (currently proposed as an allreduce) that might help us in optimizing I/O generally? 
> 
> I am not sure for optimization, just thinking about making it work.
> I wonder when the request size is that big, should we worry about the cost
> of that one additional allreduce?

> I just remember if the default collective I/O buffer size is used (cb_buffer_size=16MiB),
> then the maximal amount of individual read/write (made by aggregators) is 16 MiB. Thus,
> I don't think we will have a problem for collective I/O (where two-phase I/O actually
> involves). It is independent I/O. Is my understanding correct?

We still need to "feed" the MPI-IO routine, though.  MPI_File_read_all
takes an integer 'count' parameter: presently we count off N
MPI_BYTES.   

Yes, once we get the request down into MPI-IO we're in good shape. 

> > I would like to have a solution in ROMIO also, but prefer a
> > solution that is available soonest to our users, and a PnetCDF fix
> > is superior with respect to that metric (as RobL says)...
> 
> In this case, we can use RobL's approach on blocking (independent?)
> APIs and for big variables.  For other cases, return errors?

I'm going to do two things (and not do one thing):

- check for the _x variants of the type routines.  If those exist, I
  will presume the implementation has given some thought to datatypes
  describing large amounts of memory.   

- Implement the RobR "only for large variables" optimization

- leave non-blocking alone for now, on the assumption that
  non-blocking's primary use case is to combine many small I/O
  requests into larger ones.

==rob

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA


More information about the parallel-netcdf mailing list