parallel-netcdf buffered I/O interface

Wei-keng Liao wkliao at ece.northwestern.edu
Sun Aug 19 09:42:19 CDT 2012


Hi, Jim

Could you run this command "svn diff configure.in"? It should report no difference
between your local copy and the svn version. You can then run command "autoreconf" to
generate a new file named "configure". Please note that the SVN repo does not contain
the file "configure" or "config.status". In the next step, you should run command
"configure", rather than "config.status". config.status is a by-product of running
command "configure".

My way of building pnetcdf from the SVN repo is given below,as an example.
1. svn co https://svn.mcs.anl.gov/repos/parallel-netcdf/trunk
2. cd trunk
3. autoreconf
4. ./configure --prefix=$HOME/PnetCDF --with-mpi=$HOME/MPICH/2-1.4.1p1 RM=/bin/rm
5. make
6. make install

Could you please try the above and let me know if you still have the problem.

Wei-keng

On Aug 19, 2012, at 8:06 AM, Jim Edwards wrote:

> Hi Wei-keng,
> 
> Here's the problem.   When I updated from subversion and didn't see any autoconf dependent changes I ran config.status rather than autoreconf/configure.   The config.status script doesn't have the sed commands.
> 
> On Sat, Aug 18, 2012 at 6:18 PM, Wei-keng Liao <wkliao at ece.northwestern.edu> wrote:
> 
> It is unlikely the autoconf.
> 
> In pnetcdf's configure.in, we first generate pnetcdf_inc
> by removing some define macros from pnetcdf_inc.in using
> language C style comment, i.e. /* and */
> 
> Then in configure.in line 592, we use command "sed" to
> replace "/*" with F90 style comment "!" and remove "*/".
> 
> 592   sed -e "s%/\*%!%g" -e "s%\*/%%g" <src/libf/pnetcdf_inc>pnetcdf_inc && mv pnetcdf_inc src/libf/pnetcdf_inc
> 
> Could you check if your configure.in file is the same as the SVN's?
> 
> 
> Wei-keng
> 
> On Aug 18, 2012, at 8:37 AM, Jim Edwards wrote:
> 
> > They aren't in pnetcdf_inc.in - it seems that configure is generating the comments.    I have autoconf 2.67 - could that be the problem?
> >
> > On Fri, Aug 17, 2012 at 8:52 PM, Wei-keng Liao <wkliao at ece.northwestern.edu> wrote:
> >
> > Strange. I did not see any c-style comments in the source file pnetcdf_inc.in.
> > pnetcdf_inc is generated from pnetcdf_inc.in at the configure time.
> >
> > Could you try a clean build starting from running command "autoreconf"?
> > If the problem persist, please let us know.
> >
> > Wei-keng
> >
> > On Aug 17, 2012, at 12:01 PM, Jim Edwards wrote:
> >
> > > Hi Wei-keng,
> > >
> > > In order to build r1088 using xlf I had to edit the file src/libf/pnetcdf_inc and add a ! in front of each of
> > > the c-style comments...
> > >
> > > Jim
> > >
> > > On Thu, Aug 16, 2012 at 5:20 AM, Wei-keng Liao <wkliao at ece.northwestern.edu> wrote:
> > > ncmpi_inq_buffer_usage and its fortran API are now added in r1087
> > >
> > > Wei-keng
> > >
> > > On Aug 15, 2012, at 11:27 AM, Rob Latham wrote:
> > >
> > > > On Wed, Aug 15, 2012 at 10:10:02AM -0600, Jim Edwards wrote:
> > > >> Okay, so when do you need to call nfmpi_begin_indep_mode/
> > > >> nfmpi_end_indep_mode?    It doesn't seem to
> > > >> be entirely consistent anymore - is it?
> > > >
> > > > nfmpi_begin_indep_mode and nfmpi_end_indep_mode should continue to
> > > > wrap the blocking and independent nfmpi_put_ and nfmpi_get routines
> > > > (those that do not end in _all).
> > > >
> > > > begin/end should also bracket the independent nfmpi_wait, I think.
> > > >
> > > > If you are interested, I think the reason for all this flipping around
> > > > is essentially so we can keep consistent among processors the number
> > > > of records in a record variable.
> > > >
> > > > ==rob
> > > >
> > > >>
> > > >> On Wed, Aug 15, 2012 at 10:01 AM, Rob Latham <robl at mcs.anl.gov> wrote:
> > > >>
> > > >>> On Wed, Aug 15, 2012 at 09:32:56AM -0600, Jim Edwards wrote:
> > > >>>> Hi Wei-keng,
> > > >>>>
> > > >>>> Yes that looks like what I would need.   I have to think about the
> > > >>>> independent aspect - currently i am using collective operations in almost
> > > >>>> all cases.  The performance trade offs of independent vs collective
> > > >>>> operations are not really clear to me.  Why no collective bputs?
> > > >>>
> > > >>> Aw, Wei-keng already replied.   Well, here's my answer, which says the
> > > >>> same thing as Wei-keng but emphasises the "put it on a list" and
> > > >>> "execute this list" aspects of these APIs.
> > > >>>
> > > >>> The 'buffered put' routines are a variant of the non-blocking
> > > >>> routines.  These routines defer all I/O to the wait or wait_all
> > > >>> routine, where all pending I/O requests for a given process are
> > > >>> stitched together into one bigger request.
> > > >>>
> > > >>> So, issuing an I/O operation under these interfaces is essentially
> > > >>> "put it on a list".  Then, "execute this list" can be done either
> > > >>> independently (ncmpi_wait) or collectively (ncmpi_wait_all).
> > > >>>
> > > >>> A very early instance of these routines did the "put it on a list"
> > > >>> collectively.  This approach did not work out so well for applications
> > > >>> (like for example Chombo) where processes make a bunch of small
> > > >>> uncoordinated I/O requests, but still have a clear part of their code
> > > >>> where "collectively wait for everyone to finish" made sense.
> > > >>>
> > > >>> I hope you have enjoyed today's episode of Parallel-NetCDF history
> > > >>> theater.
> > > >>>
> > > >>> ==rob
> > > >>>
> > > >>>> On Wed, Aug 15, 2012 at 9:18 AM, Wei-keng Liao
> > > >>>> <wkliao at ece.northwestern.edu>wrote:
> > > >>>>
> > > >>>>>> The  NC_EINSUFFBUF error code is returned from the bput call?
> > > >>>>>
> > > >>>>> I found a bug that 1.3.0 fails to return this error code. r1086 fixes
> > > >>> this
> > > >>>>> bug.
> > > >>>>>
> > > >>>>>
> > > >>>>>>  If you get that error will you need to make that same bput call
> > > >>> again
> > > >>>>> after flushing?  But the other tasks involved in the same bput call who
> > > >>>>> didn't have full buffers would do what?
> > > >>>>>
> > > >>>>> My idea is to skip the bput request when NC_EINSUFFBUF is returned.
> > > >>>>> Flushing at the wait call will only flush those successful bput calls,
> > > >>> so
> > > >>>>> yes
> > > >>>>> you need to make the same failed bput call again after flushing.
> > > >>>>>
> > > >>>>> Please note that bput APIs are independent. There is no "other tasks in
> > > >>>>> the same bput call" issue.
> > > >>>>>
> > > >>>>>
> > > >>>>>> I could use a query function and to avoid the independent write calls
> > > >>>>> would do an mpi_allreduce on the max memory used before calling the
> > > >>>>> mpi_waitall.  If the max is approaching the buffer size I would flush
> > > >>> all
> > > >>>>> io tasks. This is basically what I have implemented in pio with iput -
> > > >>> I
> > > >>>>> have a user determined limit on the size of the buffer and grow the
> > > >>> buffer
> > > >>>>> with each iput call, when the buffer meets (or exceeds) the limit on
> > > >>> any
> > > >>>>> task I call waitall on all tasks.
> > > >>>>>
> > > >>>>> This is a nice idea.
> > > >>>>>
> > > >>>>>
> > > >>>>> Please let me know if the new query API below will be sufficient for
> > > >>> you.
> > > >>>>>
> > > >>>>>  int ncmpi_inq_buffer_usage(int ncid, MPI_Offset *usage);
> > > >>>>>
> > > >>>>>  * "usage" will be returned with the current buffer usage in bytes.
> > > >>>>>  * Error codes may be invalid ncid or no attached buffer found.
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> Wei-keng
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On Tue, Aug 14, 2012 at 10:07 PM, Wei-keng Liao <
> > > >>>>> wkliao at ece.northwestern.edu> wrote:
> > > >>>>>> Hi, Jim,
> > > >>>>>>
> > > >>>>>> The usage of bput APIs is very similar to iput, except the
> > > >>> followings.
> > > >>>>>> 1. users must tell pnetcdf the size of buffer to be used by pnetcdf
> > > >>>>> internally (attach and detach calls).
> > > >>>>>> 2. once a bput API returns, user's buffer can be reused or freed
> > > >>>>> (because the write
> > > >>>>>>  data has been copied to the internal buffer.)
> > > >>>>>>
> > > >>>>>> The internal buffer is per file (as the attach API requires an ncid
> > > >>>>> argument.) It can be used to aggregate
> > > >>>>>> requests to multiple variables defined in the file.
> > > >>>>>>
> > > >>>>>> I did not implement a query API to check the current usage of the
> > > >>>>> buffer. If this query is useful, we
> > > >>>>>> can implement it. Let me know. But please note this query will be an
> > > >>>>> independent call, so you
> > > >>>>>> will have to call independent wait (nfmpi_wait). Independent wait
> > > >>> uses
> > > >>>>> MPI independent I/O, causing
> > > >>>>>> poor performance, not recommended. Otherwise, you need an MPI reduce
> > > >>> to
> > > >>>>> ensure all processes know
> > > >>>>>> when to call the collective wait_all.
> > > >>>>>>
> > > >>>>>> You are right about flushing. The buffer will not be flushed
> > > >>>>> automatically and all file I/O happens in wait_all.
> > > >>>>>> If the attached buffer ran out of space, NC_EINSUFFBUF error code
> > > >>>>> (non-fatal) will return. It can be
> > > >>>>>> used to determine to call wait API, as described above. However, an
> > > >>>>> automatic flushing would require an MPI
> > > >>>>>> independent I/O, again meaning a poor performance. So, I recommend to
> > > >>>>> make sure the buffer size is
> > > >>>>>> sufficient large. In addition, if you let pnetcdf do type conversion
> > > >>>>> between two types of different size
> > > >>>>>> (e.g. short to int), you must calculate the size of attach buffer
> > > >>> using
> > > >>>>> the larger type.
> > > >>>>>>
> > > >>>>>> If automatic flushing is highly desired, we can add it later.
> > > >>>>>>
> > > >>>>>> Once the call to wait/wait_all returns, the internal buffer is marked
> > > >>>>> empty.
> > > >>>>>>
> > > >>>>>> Let me know if the above answers your questions.
> > > >>>>>>
> > > >>>>>> Wei-keng
> > > >>>>>>
> > > >>>>>> On Aug 14, 2012, at 2:04 PM, Jim Edwards wrote:
> > > >>>>>>
> > > >>>>>>> No, the flush must happen in the nfmpi_wait_all.
> > > >>>>>>> But does that call mark the buffer as empty?  I'll wait and bug
> > > >>>>>>> Wei-keng.
> > > >>>>>>>
> > > >>>>>>> On Tue, Aug 14, 2012 at 12:56 PM, Rob Latham <robl at mcs.anl.gov>
> > > >>> wrote:
> > > >>>>>>> On Tue, Aug 14, 2012 at 12:52:46PM -0600, Jim Edwards wrote:
> > > >>>>>>>> Hi Rob,
> > > >>>>>>>>
> > > >>>>>>>> I assume that the same buffer can be used for multiple variables
> > > >>> (as
> > > >>>>> long
> > > >>>>>>>> as they are associated with the same file).    Is there a query
> > > >>>>> function so
> > > >>>>>>>> that you know when you've used the entire buffer and it's time to
> > > >>>>> flush?
> > > >>>>>>>
> > > >>>>>>> It does not appear to be so.  The only non-data-movement routines
> > > >>> in
> > > >>>>>>> the API are these:
> > > >>>>>>>
> > > >>>>>>> int ncmpi_buffer_attach(int ncid, MPI_Offset bufsize);
> > > >>>>>>> int ncmpi_buffer_detach(int ncid);
> > > >>>>>>>
> > > >>>>>>> The end-user doesn't flush, I don't think.  I had the impression
> > > >>> that
> > > >>>>> once the
> > > >>>>>>> buffer filled up, the library did the flush, then started filling
> > > >>> up
> > > >>>>> the buffer
> > > >>>>>>> again.  This one I'll need Wei-keng to confirm.
> > > >>>>>>>
> > > >>>>>>> ==rob
> > > >>>>>>>
> > > >>>>>>>> Jim
> > > >>>>>>>>
> > > >>>>>>>> On Tue, Aug 14, 2012 at 11:41 AM, Rob Latham <robl at mcs.anl.gov>
> > > >>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>>> On Tue, Aug 14, 2012 at 10:50:15AM -0600, Jim Edwards wrote:
> > > >>>>>>>>>> No, I'm using iput and blocking get.   I'm doing my own
> > > >>> buffereing
> > > >>>>> layer
> > > >>>>>>>>> in
> > > >>>>>>>>>> pio.   I might consider using the bput functions - can you
> > > >>> point me
> > > >>>>> to
> > > >>>>>>>>> some
> > > >>>>>>>>>> documentation/examples?
> > > >>>>>>>>>
> > > >>>>>>>>> Sure.  It's too bad Wei-keng is on vacation this month, as he's
> > > >>> the
> > > >>>>>>>>> one who designed and implemented this new feature for pnetcdf
> > > >>> 1.3.0.
> > > >>>>>>>>> Wei-keng: i'm not expecting you to reply while on vacation.  I'm
> > > >>> just
> > > >>>>>>>>> CCing you so you know I'm talking about your work :>
> > > >>>>>>>>>
> > > >>>>>>>>> I think this might be the entire contents of our documentation:
> > > >>>>>>>>>
> > > >>>>>>>>> "A new set of buffered put APIs (eg. ncmpi_bput_vara_float) is
> > > >>> added.
> > > >>>>>>>>> They make a copy of the user's buffer internally, so the user's
> > > >>>>> buffer
> > > >>>>>>>>> can be reused when the call returns. Their usage are similar to
> > > >>> the
> > > >>>>>>>>> iput APIs. "
> > > >>>>>>>>>
> > > >>>>>>>>> Hey, check that out: Wei-keng wrote up a fortran example:
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>
> > > >>> http://trac.mcs.anl.gov/projects/parallel-netcdf/browser/trunk/examples/tutorial/pnetcdf-write-bufferedf.F
> > > >>>>>>>>>
> > > >>>>>>>>> There's also the C version:
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>
> > > >>> http://trac.mcs.anl.gov/projects/parallel-netcdf/browser/trunk/examples/tutorial/pnetcdf-write-buffered.c
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> ==rob
> > > >>>>>>>>>
> > > >>>>>>>>>> On Tue, Aug 14, 2012 at 10:16 AM, Rob Latham <robl at mcs.anl.gov>
> > > >>>>> wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>>> Hi Jim
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> You've been using the new 'bput/bget' routines, right?  Can you
> > > >>>>> tell
> > > >>>>>>>>>>> me a bit about what you are using them for, and what -- if any
> > > >>> --
> > > >>>>>>>>>>> benefit they've provided?
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> (Rationale: our program management likes to see papers and
> > > >>>>>>>>>>> presentations, but the most valued contribution is 'science
> > > >>>>> impact').
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Thanks
> > > >>>>>>>>>>> ==rob
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> --
> > > >>>>>>>>>>> Rob Latham
> > > >>>>>>>>>>> Mathematics and Computer Science Division
> > > >>>>>>>>>>> Argonne National Lab, IL USA
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> --
> > > >>>>>>>>> Rob Latham
> > > >>>>>>>>> Mathematics and Computer Science Division
> > > >>>>>>>>> Argonne National Lab, IL USA
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>> --
> > > >>>>>>> Rob Latham
> > > >>>>>>> Mathematics and Computer Science Division
> > > >>>>>>> Argonne National Lab, IL USA
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> --
> > > >>>>>>> Jim Edwards
> > > >>>>>>>
> > > >>>>>>> CESM Software Engineering Group
> > > >>>>>>> National Center for Atmospheric Research
> > > >>>>>>> Boulder, CO
> > > >>>>>>> 303-497-1842
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> --
> > > >>>>>> Jim Edwards
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>>
> > > >>> --
> > > >>> Rob Latham
> > > >>> Mathematics and Computer Science Division
> > > >>> Argonne National Lab, IL USA
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >
> > > > --
> > > > Rob Latham
> > > > Mathematics and Computer Science Division
> > > > Argonne National Lab, IL USA
> > >
> > >
> > >
> > >
> > > --
> > > Jim Edwards
> > >
> > >
> > >
> >
> >
> >
> >
> > --
> > Jim Edwards
> >
> >
> >
> 
> 
> 
> 
> -- 
> Jim Edwards
> 
> 
> 



More information about the parallel-netcdf mailing list