parallel-netcdf buffered I/O interface

Wei-keng Liao wkliao at ece.northwestern.edu
Wed Aug 15 10:18:36 CDT 2012


> The  NC_EINSUFFBUF error code is returned from the bput call?

I found a bug that 1.3.0 fails to return this error code. r1086 fixes this bug.


>   If you get that error will you need to make that same bput call again after flushing?  But the other tasks involved in the same bput call who didn't have full buffers would do what?   

My idea is to skip the bput request when NC_EINSUFFBUF is returned.
Flushing at the wait call will only flush those successful bput calls, so yes
you need to make the same failed bput call again after flushing.

Please note that bput APIs are independent. There is no "other tasks in the same bput call" issue.


> I could use a query function and to avoid the independent write calls would do an mpi_allreduce on the max memory used before calling the mpi_waitall.  If the max is approaching the buffer size I would flush all io tasks. This is basically what I have implemented in pio with iput - I have a user determined limit on the size of the buffer and grow the buffer with each iput call, when the buffer meets (or exceeds) the limit on any task I call waitall on all tasks.   

This is a nice idea.


Please let me know if the new query API below will be sufficient for you.

  int ncmpi_inq_buffer_usage(int ncid, MPI_Offset *usage);

  * "usage" will be returned with the current buffer usage in bytes.
  * Error codes may be invalid ncid or no attached buffer found.



Wei-keng



> 
> 
> On Tue, Aug 14, 2012 at 10:07 PM, Wei-keng Liao <wkliao at ece.northwestern.edu> wrote:
> Hi, Jim,
> 
> The usage of bput APIs is very similar to iput, except the followings.
> 1. users must tell pnetcdf the size of buffer to be used by pnetcdf internally (attach and detach calls).
> 2. once a bput API returns, user's buffer can be reused or freed (because the write
>   data has been copied to the internal buffer.)
> 
> The internal buffer is per file (as the attach API requires an ncid argument.) It can be used to aggregate
> requests to multiple variables defined in the file.
> 
> I did not implement a query API to check the current usage of the buffer. If this query is useful, we
> can implement it. Let me know. But please note this query will be an independent call, so you
> will have to call independent wait (nfmpi_wait). Independent wait uses MPI independent I/O, causing
> poor performance, not recommended. Otherwise, you need an MPI reduce to ensure all processes know
> when to call the collective wait_all.
> 
> You are right about flushing. The buffer will not be flushed automatically and all file I/O happens in wait_all.
> If the attached buffer ran out of space, NC_EINSUFFBUF error code (non-fatal) will return. It can be
> used to determine to call wait API, as described above. However, an automatic flushing would require an MPI
> independent I/O, again meaning a poor performance. So, I recommend to make sure the buffer size is
> sufficient large. In addition, if you let pnetcdf do type conversion between two types of different size
> (e.g. short to int), you must calculate the size of attach buffer using the larger type.
> 
> If automatic flushing is highly desired, we can add it later.
> 
> Once the call to wait/wait_all returns, the internal buffer is marked empty.
> 
> Let me know if the above answers your questions.
> 
> Wei-keng
> 
> On Aug 14, 2012, at 2:04 PM, Jim Edwards wrote:
> 
> > No, the flush must happen in the nfmpi_wait_all.
> > But does that call mark the buffer as empty?  I'll wait and bug
> > Wei-keng.
> >
> > On Tue, Aug 14, 2012 at 12:56 PM, Rob Latham <robl at mcs.anl.gov> wrote:
> > On Tue, Aug 14, 2012 at 12:52:46PM -0600, Jim Edwards wrote:
> >> Hi Rob,
> >>
> >> I assume that the same buffer can be used for multiple variables (as long
> >> as they are associated with the same file).    Is there a query function so
> >> that you know when you've used the entire buffer and it's time to flush?
> >
> > It does not appear to be so.  The only non-data-movement routines in
> > the API are these:
> >
> > int ncmpi_buffer_attach(int ncid, MPI_Offset bufsize);
> > int ncmpi_buffer_detach(int ncid);
> >
> > The end-user doesn't flush, I don't think.  I had the impression that once the
> > buffer filled up, the library did the flush, then started filling up the buffer
> > again.  This one I'll need Wei-keng to confirm.
> >
> > ==rob
> >
> >> Jim
> >>
> >> On Tue, Aug 14, 2012 at 11:41 AM, Rob Latham <robl at mcs.anl.gov> wrote:
> >>
> >>> On Tue, Aug 14, 2012 at 10:50:15AM -0600, Jim Edwards wrote:
> >>>> No, I'm using iput and blocking get.   I'm doing my own buffereing layer
> >>> in
> >>>> pio.   I might consider using the bput functions - can you point me to
> >>> some
> >>>> documentation/examples?
> >>>
> >>> Sure.  It's too bad Wei-keng is on vacation this month, as he's the
> >>> one who designed and implemented this new feature for pnetcdf 1.3.0.
> >>> Wei-keng: i'm not expecting you to reply while on vacation.  I'm just
> >>> CCing you so you know I'm talking about your work :>
> >>>
> >>> I think this might be the entire contents of our documentation:
> >>>
> >>> "A new set of buffered put APIs (eg. ncmpi_bput_vara_float) is added.
> >>> They make a copy of the user's buffer internally, so the user's buffer
> >>> can be reused when the call returns. Their usage are similar to the
> >>> iput APIs. "
> >>>
> >>> Hey, check that out: Wei-keng wrote up a fortran example:
> >>>
> >>>
> >>> http://trac.mcs.anl.gov/projects/parallel-netcdf/browser/trunk/examples/tutorial/pnetcdf-write-bufferedf.F
> >>>
> >>> There's also the C version:
> >>>
> >>>
> >>> http://trac.mcs.anl.gov/projects/parallel-netcdf/browser/trunk/examples/tutorial/pnetcdf-write-buffered.c
> >>>
> >>>
> >>> ==rob
> >>>
> >>>> On Tue, Aug 14, 2012 at 10:16 AM, Rob Latham <robl at mcs.anl.gov> wrote:
> >>>>
> >>>>> Hi Jim
> >>>>>
> >>>>> You've been using the new 'bput/bget' routines, right?  Can you tell
> >>>>> me a bit about what you are using them for, and what -- if any --
> >>>>> benefit they've provided?
> >>>>>
> >>>>> (Rationale: our program management likes to see papers and
> >>>>> presentations, but the most valued contribution is 'science impact').
> >>>>>
> >>>>> Thanks
> >>>>> ==rob
> >>>>>
> >>>>> --
> >>>>> Rob Latham
> >>>>> Mathematics and Computer Science Division
> >>>>> Argonne National Lab, IL USA
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>> --
> >>> Rob Latham
> >>> Mathematics and Computer Science Division
> >>> Argonne National Lab, IL USA
> >>>
> >>
> >>
> >>
> >
> > --
> > Rob Latham
> > Mathematics and Computer Science Division
> > Argonne National Lab, IL USA
> >
> >
> >
> > --
> > Jim Edwards
> >
> > CESM Software Engineering Group
> > National Center for Atmospheric Research
> > Boulder, CO
> > 303-497-1842
> >
> 
> 
> 
> 
> 
> -- 
> Jim Edwards
> 
> 
> 



More information about the parallel-netcdf mailing list