parallel-netcdf buffered I/O interface

Sun Aug 19 09:50:40 CDT 2012

I have already explained that the problem arose from running config.status
instead of repeating the entire configure sequence.   My configure.in is
the one in the repo and if I do the full autoreconf etc it works fine.

On Sun, Aug 19, 2012 at 8:42 AM, Wei-keng Liao
<wkliao at ece.northwestern.edu>wrote:

> Hi, Jim
>
> Could you run this command "svn diff configure.in"? It should report no
> difference
> between your local copy and the svn version. You can then run command
> "autoreconf" to
> generate a new file named "configure". Please note that the SVN repo does
> not contain
> the file "configure" or "config.status". In the next step, you should run
> command
> "configure", rather than "config.status". config.status is a by-product of
> running
> command "configure".
>
> My way of building pnetcdf from the SVN repo is given below,as an example.
> 1. svn co https://svn.mcs.anl.gov/repos/parallel-netcdf/trunk
> 2. cd trunk
> 3. autoreconf
> 4. ./configure --prefix=$HOME/PnetCDF --with-mpi=$HOME/MPICH/2-1.4.1p1
> RM=/bin/rm
> 5. make
> 6. make install
>
> Could you please try the above and let me know if you still have the
> problem.
>
> Wei-keng
>
> On Aug 19, 2012, at 8:06 AM, Jim Edwards wrote:
>
> > Hi Wei-keng,
> >
> > Here's the problem.   When I updated from subversion and didn't see any
> autoconf dependent changes I ran config.status rather than
> autoreconf/configure.   The config.status script doesn't have the sed
> commands.
> >
> > On Sat, Aug 18, 2012 at 6:18 PM, Wei-keng Liao <
> wkliao at ece.northwestern.edu> wrote:
> >
> > It is unlikely the autoconf.
> >
> > In pnetcdf's configure.in, we first generate pnetcdf_inc
> > by removing some define macros from pnetcdf_inc.in using
> > language C style comment, i.e. /* and */
> >
> > Then in configure.in line 592, we use command "sed" to
> > replace "/*" with F90 style comment "!" and remove "*/".
> >
> > 592   sed -e "s%/\*%!%g" -e "s%\*/%%g" <src/libf/pnetcdf_inc>pnetcdf_inc
> && mv pnetcdf_inc src/libf/pnetcdf_inc
> >
> > Could you check if your configure.in file is the same as the SVN's?
> >
> >
> > Wei-keng
> >
> > On Aug 18, 2012, at 8:37 AM, Jim Edwards wrote:
> >
> > > They aren't in pnetcdf_inc.in - it seems that configure is generating
> the comments.    I have autoconf 2.67 - could that be the problem?
> > >
> > > On Fri, Aug 17, 2012 at 8:52 PM, Wei-keng Liao <
> wkliao at ece.northwestern.edu> wrote:
> > >
> > > Strange. I did not see any c-style comments in the source file
> pnetcdf_inc.in.
> > > pnetcdf_inc is generated from pnetcdf_inc.in at the configure time.
> > >
> > > Could you try a clean build starting from running command "autoreconf"?
> > > If the problem persist, please let us know.
> > >
> > > Wei-keng
> > >
> > > On Aug 17, 2012, at 12:01 PM, Jim Edwards wrote:
> > >
> > > > Hi Wei-keng,
> > > >
> > > > In order to build r1088 using xlf I had to edit the file
> src/libf/pnetcdf_inc and add a ! in front of each of
> > > > the c-style comments...
> > > >
> > > > Jim
> > > >
> > > > On Thu, Aug 16, 2012 at 5:20 AM, Wei-keng Liao <
> wkliao at ece.northwestern.edu> wrote:
> > > > ncmpi_inq_buffer_usage and its fortran API are now added in r1087
> > > >
> > > > Wei-keng
> > > >
> > > > On Aug 15, 2012, at 11:27 AM, Rob Latham wrote:
> > > >
> > > > > On Wed, Aug 15, 2012 at 10:10:02AM -0600, Jim Edwards wrote:
> > > > >> Okay, so when do you need to call nfmpi_begin_indep_mode/
> > > > >> nfmpi_end_indep_mode?    It doesn't seem to
> > > > >> be entirely consistent anymore - is it?
> > > > >
> > > > > nfmpi_begin_indep_mode and nfmpi_end_indep_mode should continue to
> > > > > wrap the blocking and independent nfmpi_put_ and nfmpi_get routines
> > > > > (those that do not end in _all).
> > > > >
> > > > > begin/end should also bracket the independent nfmpi_wait, I think.
> > > > >
> > > > > If you are interested, I think the reason for all this flipping
> around
> > > > > is essentially so we can keep consistent among processors the
> number
> > > > > of records in a record variable.
> > > > >
> > > > > ==rob
> > > > >
> > > > >>
> > > > >> On Wed, Aug 15, 2012 at 10:01 AM, Rob Latham <robl at mcs.anl.gov>
> wrote:
> > > > >>
> > > > >>> On Wed, Aug 15, 2012 at 09:32:56AM -0600, Jim Edwards wrote:
> > > > >>>> Hi Wei-keng,
> > > > >>>>
> > > > >>>> Yes that looks like what I would need.   I have to think about
> the
> > > > >>>> independent aspect - currently i am using collective operations
> in almost
> > > > >>>> all cases.  The performance trade offs of independent vs
> collective
> > > > >>>> operations are not really clear to me.  Why no collective bputs?
> > > > >>>
> > > > >>> Aw, Wei-keng already replied.   Well, here's my answer, which
> says the
> > > > >>> same thing as Wei-keng but emphasises the "put it on a list" and
> > > > >>> "execute this list" aspects of these APIs.
> > > > >>>
> > > > >>> The 'buffered put' routines are a variant of the non-blocking
> > > > >>> routines.  These routines defer all I/O to the wait or wait_all
> > > > >>> routine, where all pending I/O requests for a given process are
> > > > >>> stitched together into one bigger request.
> > > > >>>
> > > > >>> So, issuing an I/O operation under these interfaces is
> essentially
> > > > >>> "put it on a list".  Then, "execute this list" can be done either
> > > > >>> independently (ncmpi_wait) or collectively (ncmpi_wait_all).
> > > > >>>
> > > > >>> A very early instance of these routines did the "put it on a
> list"
> > > > >>> collectively.  This approach did not work out so well for
> applications
> > > > >>> (like for example Chombo) where processes make a bunch of small
> > > > >>> uncoordinated I/O requests, but still have a clear part of their
> code
> > > > >>> where "collectively wait for everyone to finish" made sense.
> > > > >>>
> > > > >>> I hope you have enjoyed today's episode of Parallel-NetCDF
> history
> > > > >>> theater.
> > > > >>>
> > > > >>> ==rob
> > > > >>>
> > > > >>>> On Wed, Aug 15, 2012 at 9:18 AM, Wei-keng Liao
> > > > >>>> <wkliao at ece.northwestern.edu>wrote:
> > > > >>>>
> > > > >>>>>> The  NC_EINSUFFBUF error code is returned from the bput call?
> > > > >>>>>
> > > > >>>>> I found a bug that 1.3.0 fails to return this error code.
> r1086 fixes
> > > > >>> this
> > > > >>>>> bug.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>>  If you get that error will you need to make that same bput
> call
> > > > >>> again
> > > > >>>>> after flushing?  But the other tasks involved in the same bput
> call who
> > > > >>>>> didn't have full buffers would do what?
> > > > >>>>>
> > > > >>>>> My idea is to skip the bput request when NC_EINSUFFBUF is
> returned.
> > > > >>>>> Flushing at the wait call will only flush those successful
> bput calls,
> > > > >>> so
> > > > >>>>> yes
> > > > >>>>> you need to make the same failed bput call again after
> flushing.
> > > > >>>>>
> > > > >>>>> Please note that bput APIs are independent. There is no "other
> tasks in
> > > > >>>>> the same bput call" issue.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>> I could use a query function and to avoid the independent
> write calls
> > > > >>>>> would do an mpi_allreduce on the max memory used before
> calling the
> > > > >>>>> mpi_waitall.  If the max is approaching the buffer size I
> would flush
> > > > >>> all
> > > > >>>>> io tasks. This is basically what I have implemented in pio
> with iput -
> > > > >>> I
> > > > >>>>> have a user determined limit on the size of the buffer and
> grow the
> > > > >>> buffer
> > > > >>>>> with each iput call, when the buffer meets (or exceeds) the
> limit on
> > > > >>> any
> > > > >>>>> task I call waitall on all tasks.
> > > > >>>>>
> > > > >>>>> This is a nice idea.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> Please let me know if the new query API below will be
> sufficient for
> > > > >>> you.
> > > > >>>>>
> > > > >>>>>  int ncmpi_inq_buffer_usage(int ncid, MPI_Offset *usage);
> > > > >>>>>
> > > > >>>>>  * "usage" will be returned with the current buffer usage in
> bytes.
> > > > >>>>>  * Error codes may be invalid ncid or no attached buffer found.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> Wei-keng
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> On Tue, Aug 14, 2012 at 10:07 PM, Wei-keng Liao <
> > > > >>>>> wkliao at ece.northwestern.edu> wrote:
> > > > >>>>>> Hi, Jim,
> > > > >>>>>>
> > > > >>>>>> The usage of bput APIs is very similar to iput, except the
> > > > >>> followings.
> > > > >>>>>> 1. users must tell pnetcdf the size of buffer to be used by
> pnetcdf
> > > > >>>>> internally (attach and detach calls).
> > > > >>>>>> 2. once a bput API returns, user's buffer can be reused or
> freed
> > > > >>>>> (because the write
> > > > >>>>>>  data has been copied to the internal buffer.)
> > > > >>>>>>
> > > > >>>>>> The internal buffer is per file (as the attach API requires
> an ncid
> > > > >>>>> argument.) It can be used to aggregate
> > > > >>>>>> requests to multiple variables defined in the file.
> > > > >>>>>>
> > > > >>>>>> I did not implement a query API to check the current usage of
> the
> > > > >>>>> buffer. If this query is useful, we
> > > > >>>>>> can implement it. Let me know. But please note this query
> will be an
> > > > >>>>> independent call, so you
> > > > >>>>>> will have to call independent wait (nfmpi_wait). Independent
> wait
> > > > >>> uses
> > > > >>>>> MPI independent I/O, causing
> > > > >>>>>> poor performance, not recommended. Otherwise, you need an MPI
> reduce
> > > > >>> to
> > > > >>>>> ensure all processes know
> > > > >>>>>> when to call the collective wait_all.
> > > > >>>>>>
> > > > >>>>>> You are right about flushing. The buffer will not be flushed
> > > > >>>>> automatically and all file I/O happens in wait_all.
> > > > >>>>>> If the attached buffer ran out of space, NC_EINSUFFBUF error
> code
> > > > >>>>> (non-fatal) will return. It can be
> > > > >>>>>> used to determine to call wait API, as described above.
> However, an
> > > > >>>>> automatic flushing would require an MPI
> > > > >>>>>> independent I/O, again meaning a poor performance. So, I
> recommend to
> > > > >>>>> make sure the buffer size is
> > > > >>>>>> sufficient large. In addition, if you let pnetcdf do type
> conversion
> > > > >>>>> between two types of different size
> > > > >>>>>> (e.g. short to int), you must calculate the size of attach
> buffer
> > > > >>> using
> > > > >>>>> the larger type.
> > > > >>>>>>
> > > > >>>>>> If automatic flushing is highly desired, we can add it later.
> > > > >>>>>>
> > > > >>>>>> Once the call to wait/wait_all returns, the internal buffer
> is marked
> > > > >>>>> empty.
> > > > >>>>>>
> > > > >>>>>> Let me know if the above answers your questions.
> > > > >>>>>>
> > > > >>>>>> Wei-keng
> > > > >>>>>>
> > > > >>>>>> On Aug 14, 2012, at 2:04 PM, Jim Edwards wrote:
> > > > >>>>>>
> > > > >>>>>>> No, the flush must happen in the nfmpi_wait_all.
> > > > >>>>>>> But does that call mark the buffer as empty?  I'll wait and
> bug
> > > > >>>>>>> Wei-keng.
> > > > >>>>>>>
> > > > >>>>>>> On Tue, Aug 14, 2012 at 12:56 PM, Rob Latham <
> robl at mcs.anl.gov>
> > > > >>> wrote:
> > > > >>>>>>> On Tue, Aug 14, 2012 at 12:52:46PM -0600, Jim Edwards wrote:
> > > > >>>>>>>> Hi Rob,
> > > > >>>>>>>>
> > > > >>>>>>>> I assume that the same buffer can be used for multiple
> variables
> > > > >>> (as
> > > > >>>>> long
> > > > >>>>>>>> as they are associated with the same file).    Is there a
> query
> > > > >>>>> function so
> > > > >>>>>>>> that you know when you've used the entire buffer and it's
> time to
> > > > >>>>> flush?
> > > > >>>>>>>
> > > > >>>>>>> It does not appear to be so.  The only non-data-movement
> routines
> > > > >>> in
> > > > >>>>>>> the API are these:
> > > > >>>>>>>
> > > > >>>>>>> int ncmpi_buffer_attach(int ncid, MPI_Offset bufsize);
> > > > >>>>>>> int ncmpi_buffer_detach(int ncid);
> > > > >>>>>>>
> > > > >>>>>>> The end-user doesn't flush, I don't think.  I had the
> impression
> > > > >>> that
> > > > >>>>> once the
> > > > >>>>>>> buffer filled up, the library did the flush, then started
> filling
> > > > >>> up
> > > > >>>>> the buffer
> > > > >>>>>>> again.  This one I'll need Wei-keng to confirm.
> > > > >>>>>>>
> > > > >>>>>>> ==rob
> > > > >>>>>>>
> > > > >>>>>>>> Jim
> > > > >>>>>>>>
> > > > >>>>>>>> On Tue, Aug 14, 2012 at 11:41 AM, Rob Latham <
> robl at mcs.anl.gov>
> > > > >>>>> wrote:
> > > > >>>>>>>>
> > > > >>>>>>>>> On Tue, Aug 14, 2012 at 10:50:15AM -0600, Jim Edwards
> wrote:
> > > > >>>>>>>>>> No, I'm using iput and blocking get.   I'm doing my own
> > > > >>> buffereing
> > > > >>>>> layer
> > > > >>>>>>>>> in
> > > > >>>>>>>>>> pio.   I might consider using the bput functions - can you
> > > > >>> point me
> > > > >>>>> to
> > > > >>>>>>>>> some
> > > > >>>>>>>>>> documentation/examples?
> > > > >>>>>>>>>
> > > > >>>>>>>>> Sure.  It's too bad Wei-keng is on vacation this month, as
> he's
> > > > >>> the
> > > > >>>>>>>>> one who designed and implemented this new feature for
> pnetcdf
> > > > >>> 1.3.0.
> > > > >>>>>>>>> Wei-keng: i'm not expecting you to reply while on
> vacation.  I'm
> > > > >>> just
> > > > >>>>>>>>> CCing you so you know I'm talking about your work :>
> > > > >>>>>>>>>
> > > > >>>>>>>>> I think this might be the entire contents of our
> documentation:
> > > > >>>>>>>>>
> > > > >>>>>>>>> "A new set of buffered put APIs (eg.
> ncmpi_bput_vara_float) is
> > > > >>> added.
> > > > >>>>>>>>> They make a copy of the user's buffer internally, so the
> user's
> > > > >>>>> buffer
> > > > >>>>>>>>> can be reused when the call returns. Their usage are
> similar to
> > > > >>> the
> > > > >>>>>>>>> iput APIs. "
> > > > >>>>>>>>>
> > > > >>>>>>>>> Hey, check that out: Wei-keng wrote up a fortran example:
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>
> > > > >>>
> http://trac.mcs.anl.gov/projects/parallel-netcdf/browser/trunk/examples/tutorial/pnetcdf-write-bufferedf.F
> > > > >>>>>>>>>
> > > > >>>>>>>>> There's also the C version:
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>
> > > > >>>
> http://trac.mcs.anl.gov/projects/parallel-netcdf/browser/trunk/examples/tutorial/pnetcdf-write-buffered.c
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>> ==rob
> > > > >>>>>>>>>
> > > > >>>>>>>>>> On Tue, Aug 14, 2012 at 10:16 AM, Rob Latham <
> robl at mcs.anl.gov>
> > > > >>>>> wrote:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>> Hi Jim
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> You've been using the new 'bput/bget' routines, right?
>  Can you
> > > > >>>>> tell
> > > > >>>>>>>>>>> me a bit about what you are using them for, and what --
> if any
> > > > >>> --
> > > > >>>>>>>>>>> benefit they've provided?
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> (Rationale: our program management likes to see papers
> and
> > > > >>>>>>>>>>> presentations, but the most valued contribution is
> 'science
> > > > >>>>> impact').
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Thanks
> > > > >>>>>>>>>>> ==rob
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> --
> > > > >>>>>>>>>>> Rob Latham
> > > > >>>>>>>>>>> Mathematics and Computer Science Division
> > > > >>>>>>>>>>> Argonne National Lab, IL USA
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>> --
> > > > >>>>>>>>> Rob Latham
> > > > >>>>>>>>> Mathematics and Computer Science Division
> > > > >>>>>>>>> Argonne National Lab, IL USA
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> --
> > > > >>>>>>> Rob Latham
> > > > >>>>>>> Mathematics and Computer Science Division
> > > > >>>>>>> Argonne National Lab, IL USA
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> --
> > > > >>>>>>> Jim Edwards
> > > > >>>>>>>
> > > > >>>>>>> CESM Software Engineering Group
> > > > >>>>>>> National Center for Atmospheric Research
> > > > >>>>>>> Boulder, CO
> > > > >>>>>>> 303-497-1842
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> --
> > > > >>>>>> Jim Edwards
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>>>
> > > > >>>
> > > > >>> --
> > > > >>> Rob Latham
> > > > >>> Mathematics and Computer Science Division
> > > > >>> Argonne National Lab, IL USA
> > > > >>>
> > > > >>
> > > > >>
> > > > >>
> > > > >
> > > > > --
> > > > > Rob Latham
> > > > > Mathematics and Computer Science Division
> > > > > Argonne National Lab, IL USA
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Jim Edwards
> > > >
> > > >
> > > >
> > >
> > >
> > >
> > >
> > > --
> > > Jim Edwards
> > >
> > >
> > >
> >
> >
> >
> >
> > --
> > Jim Edwards
> >
> >
> >
>
>

-- 

Jim Edwards
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/parallel-netcdf/attachments/20120819/598cca52/attachment-0001.html>