Hi Wei-keng,<br><br>In order to build r1088 using xlf I had to edit the file src/libf/pnetcdf_inc and add a ! in front of each of <br>the c-style comments...<br><br>Jim<br><br><div class="gmail_quote">On Thu, Aug 16, 2012 at 5:20 AM, Wei-keng Liao <span dir="ltr"><<a href="mailto:wkliao@ece.northwestern.edu" target="_blank">wkliao@ece.northwestern.edu</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">ncmpi_inq_buffer_usage and its fortran API are now added in r1087<br>

<span class="HOEnZb"><font color="#888888"><br>

Wei-keng<br>

</font></span><div class="HOEnZb"><div class="h5"><br>

On Aug 15, 2012, at 11:27 AM, Rob Latham wrote:<br>

<br>

> On Wed, Aug 15, 2012 at 10:10:02AM -0600, Jim Edwards wrote:<br>

>> Okay, so when do you need to call nfmpi_begin_indep_mode/<br>

>> nfmpi_end_indep_mode?    It doesn't seem to<br>

>> be entirely consistent anymore - is it?<br>

><br>

> nfmpi_begin_indep_mode and nfmpi_end_indep_mode should continue to<br>

> wrap the blocking and independent nfmpi_put_ and nfmpi_get routines<br>

> (those that do not end in _all).<br>

><br>

> begin/end should also bracket the independent nfmpi_wait, I think.<br>

><br>

> If you are interested, I think the reason for all this flipping around<br>

> is essentially so we can keep consistent among processors the number<br>

> of records in a record variable.<br>

><br>

> ==rob<br>

><br>

>><br>

>> On Wed, Aug 15, 2012 at 10:01 AM, Rob Latham <<a href="mailto:robl@mcs.anl.gov">robl@mcs.anl.gov</a>> wrote:<br>

>><br>

>>> On Wed, Aug 15, 2012 at 09:32:56AM -0600, Jim Edwards wrote:<br>

>>>> Hi Wei-keng,<br>

>>>><br>

>>>> Yes that looks like what I would need.   I have to think about the<br>

>>>> independent aspect - currently i am using collective operations in almost<br>

>>>> all cases.  The performance trade offs of independent vs collective<br>

>>>> operations are not really clear to me.  Why no collective bputs?<br>

>>><br>

>>> Aw, Wei-keng already replied.   Well, here's my answer, which says the<br>

>>> same thing as Wei-keng but emphasises the "put it on a list" and<br>

>>> "execute this list" aspects of these APIs.<br>

>>><br>

>>> The 'buffered put' routines are a variant of the non-blocking<br>

>>> routines.  These routines defer all I/O to the wait or wait_all<br>

>>> routine, where all pending I/O requests for a given process are<br>

>>> stitched together into one bigger request.<br>

>>><br>

>>> So, issuing an I/O operation under these interfaces is essentially<br>

>>> "put it on a list".  Then, "execute this list" can be done either<br>

>>> independently (ncmpi_wait) or collectively (ncmpi_wait_all).<br>

>>><br>

>>> A very early instance of these routines did the "put it on a list"<br>

>>> collectively.  This approach did not work out so well for applications<br>

>>> (like for example Chombo) where processes make a bunch of small<br>

>>> uncoordinated I/O requests, but still have a clear part of their code<br>

>>> where "collectively wait for everyone to finish" made sense.<br>

>>><br>

>>> I hope you have enjoyed today's episode of Parallel-NetCDF history<br>

>>> theater.<br>

>>><br>

>>> ==rob<br>

>>><br>

>>>> On Wed, Aug 15, 2012 at 9:18 AM, Wei-keng Liao<br>

>>>> <<a href="mailto:wkliao@ece.northwestern.edu">wkliao@ece.northwestern.edu</a>>wrote:<br>

>>>><br>

>>>>>> The  NC_EINSUFFBUF error code is returned from the bput call?<br>

>>>>><br>

>>>>> I found a bug that 1.3.0 fails to return this error code. r1086 fixes<br>

>>> this<br>

>>>>> bug.<br>

>>>>><br>

>>>>><br>

>>>>>>  If you get that error will you need to make that same bput call<br>

>>> again<br>

>>>>> after flushing?  But the other tasks involved in the same bput call who<br>

>>>>> didn't have full buffers would do what?<br>

>>>>><br>

>>>>> My idea is to skip the bput request when NC_EINSUFFBUF is returned.<br>

>>>>> Flushing at the wait call will only flush those successful bput calls,<br>

>>> so<br>

>>>>> yes<br>

>>>>> you need to make the same failed bput call again after flushing.<br>

>>>>><br>

>>>>> Please note that bput APIs are independent. There is no "other tasks in<br>

>>>>> the same bput call" issue.<br>

>>>>><br>

>>>>><br>

>>>>>> I could use a query function and to avoid the independent write calls<br>

>>>>> would do an mpi_allreduce on the max memory used before calling the<br>

>>>>> mpi_waitall.  If the max is approaching the buffer size I would flush<br>

>>> all<br>

>>>>> io tasks. This is basically what I have implemented in pio with iput -<br>

>>> I<br>

>>>>> have a user determined limit on the size of the buffer and grow the<br>

>>> buffer<br>

>>>>> with each iput call, when the buffer meets (or exceeds) the limit on<br>

>>> any<br>

>>>>> task I call waitall on all tasks.<br>

>>>>><br>

>>>>> This is a nice idea.<br>

>>>>><br>

>>>>><br>

>>>>> Please let me know if the new query API below will be sufficient for<br>

>>> you.<br>

>>>>><br>

>>>>>  int ncmpi_inq_buffer_usage(int ncid, MPI_Offset *usage);<br>

>>>>><br>

>>>>>  * "usage" will be returned with the current buffer usage in bytes.<br>

>>>>>  * Error codes may be invalid ncid or no attached buffer found.<br>

>>>>><br>

>>>>><br>

>>>>><br>

>>>>> Wei-keng<br>

>>>>><br>

>>>>><br>

>>>>><br>

>>>>>><br>

>>>>>><br>

>>>>>> On Tue, Aug 14, 2012 at 10:07 PM, Wei-keng Liao <<br>

>>>>> <a href="mailto:wkliao@ece.northwestern.edu">wkliao@ece.northwestern.edu</a>> wrote:<br>

>>>>>> Hi, Jim,<br>

>>>>>><br>

>>>>>> The usage of bput APIs is very similar to iput, except the<br>

>>> followings.<br>

>>>>>> 1. users must tell pnetcdf the size of buffer to be used by pnetcdf<br>

>>>>> internally (attach and detach calls).<br>

>>>>>> 2. once a bput API returns, user's buffer can be reused or freed<br>

>>>>> (because the write<br>

>>>>>>  data has been copied to the internal buffer.)<br>

>>>>>><br>

>>>>>> The internal buffer is per file (as the attach API requires an ncid<br>

>>>>> argument.) It can be used to aggregate<br>

>>>>>> requests to multiple variables defined in the file.<br>

>>>>>><br>

>>>>>> I did not implement a query API to check the current usage of the<br>

>>>>> buffer. If this query is useful, we<br>

>>>>>> can implement it. Let me know. But please note this query will be an<br>

>>>>> independent call, so you<br>

>>>>>> will have to call independent wait (nfmpi_wait). Independent wait<br>

>>> uses<br>

>>>>> MPI independent I/O, causing<br>

>>>>>> poor performance, not recommended. Otherwise, you need an MPI reduce<br>

>>> to<br>

>>>>> ensure all processes know<br>

>>>>>> when to call the collective wait_all.<br>

>>>>>><br>

>>>>>> You are right about flushing. The buffer will not be flushed<br>

>>>>> automatically and all file I/O happens in wait_all.<br>

>>>>>> If the attached buffer ran out of space, NC_EINSUFFBUF error code<br>

>>>>> (non-fatal) will return. It can be<br>

>>>>>> used to determine to call wait API, as described above. However, an<br>

>>>>> automatic flushing would require an MPI<br>

>>>>>> independent I/O, again meaning a poor performance. So, I recommend to<br>

>>>>> make sure the buffer size is<br>

>>>>>> sufficient large. In addition, if you let pnetcdf do type conversion<br>

>>>>> between two types of different size<br>

>>>>>> (e.g. short to int), you must calculate the size of attach buffer<br>

>>> using<br>

>>>>> the larger type.<br>

>>>>>><br>

>>>>>> If automatic flushing is highly desired, we can add it later.<br>

>>>>>><br>

>>>>>> Once the call to wait/wait_all returns, the internal buffer is marked<br>

>>>>> empty.<br>

>>>>>><br>

>>>>>> Let me know if the above answers your questions.<br>

>>>>>><br>

>>>>>> Wei-keng<br>

>>>>>><br>

>>>>>> On Aug 14, 2012, at 2:04 PM, Jim Edwards wrote:<br>

>>>>>><br>

>>>>>>> No, the flush must happen in the nfmpi_wait_all.<br>

>>>>>>> But does that call mark the buffer as empty?  I'll wait and bug<br>

>>>>>>> Wei-keng.<br>

>>>>>>><br>

>>>>>>> On Tue, Aug 14, 2012 at 12:56 PM, Rob Latham <<a href="mailto:robl@mcs.anl.gov">robl@mcs.anl.gov</a>><br>

>>> wrote:<br>

>>>>>>> On Tue, Aug 14, 2012 at 12:52:46PM -0600, Jim Edwards wrote:<br>

>>>>>>>> Hi Rob,<br>

>>>>>>>><br>

>>>>>>>> I assume that the same buffer can be used for multiple variables<br>

>>> (as<br>

>>>>> long<br>

>>>>>>>> as they are associated with the same file).    Is there a query<br>

>>>>> function so<br>

>>>>>>>> that you know when you've used the entire buffer and it's time to<br>

>>>>> flush?<br>

>>>>>>><br>

>>>>>>> It does not appear to be so.  The only non-data-movement routines<br>

>>> in<br>

>>>>>>> the API are these:<br>

>>>>>>><br>

>>>>>>> int ncmpi_buffer_attach(int ncid, MPI_Offset bufsize);<br>

>>>>>>> int ncmpi_buffer_detach(int ncid);<br>

>>>>>>><br>

>>>>>>> The end-user doesn't flush, I don't think.  I had the impression<br>

>>> that<br>

>>>>> once the<br>

>>>>>>> buffer filled up, the library did the flush, then started filling<br>

>>> up<br>

>>>>> the buffer<br>

>>>>>>> again.  This one I'll need Wei-keng to confirm.<br>

>>>>>>><br>

>>>>>>> ==rob<br>

>>>>>>><br>

>>>>>>>> Jim<br>

>>>>>>>><br>

>>>>>>>> On Tue, Aug 14, 2012 at 11:41 AM, Rob Latham <<a href="mailto:robl@mcs.anl.gov">robl@mcs.anl.gov</a>><br>

>>>>> wrote:<br>

>>>>>>>><br>

>>>>>>>>> On Tue, Aug 14, 2012 at 10:50:15AM -0600, Jim Edwards wrote:<br>

>>>>>>>>>> No, I'm using iput and blocking get.   I'm doing my own<br>

>>> buffereing<br>

>>>>> layer<br>

>>>>>>>>> in<br>

>>>>>>>>>> pio.   I might consider using the bput functions - can you<br>

>>> point me<br>

>>>>> to<br>

>>>>>>>>> some<br>

>>>>>>>>>> documentation/examples?<br>

>>>>>>>>><br>

>>>>>>>>> Sure.  It's too bad Wei-keng is on vacation this month, as he's<br>

>>> the<br>

>>>>>>>>> one who designed and implemented this new feature for pnetcdf<br>

>>> 1.3.0.<br>

>>>>>>>>> Wei-keng: i'm not expecting you to reply while on vacation.  I'm<br>

>>> just<br>

>>>>>>>>> CCing you so you know I'm talking about your work :><br>

>>>>>>>>><br>

>>>>>>>>> I think this might be the entire contents of our documentation:<br>

>>>>>>>>><br>

>>>>>>>>> "A new set of buffered put APIs (eg. ncmpi_bput_vara_float) is<br>

>>> added.<br>

>>>>>>>>> They make a copy of the user's buffer internally, so the user's<br>

>>>>> buffer<br>

>>>>>>>>> can be reused when the call returns. Their usage are similar to<br>

>>> the<br>

>>>>>>>>> iput APIs. "<br>

>>>>>>>>><br>

>>>>>>>>> Hey, check that out: Wei-keng wrote up a fortran example:<br>

>>>>>>>>><br>

>>>>>>>>><br>

>>>>>>>>><br>

>>>>><br>

>>> <a href="http://trac.mcs.anl.gov/projects/parallel-netcdf/browser/trunk/examples/tutorial/pnetcdf-write-bufferedf.F" target="_blank">http://trac.mcs.anl.gov/projects/parallel-netcdf/browser/trunk/examples/tutorial/pnetcdf-write-bufferedf.F</a><br>


>>>>>>>>><br>

>>>>>>>>> There's also the C version:<br>

>>>>>>>>><br>

>>>>>>>>><br>

>>>>>>>>><br>

>>>>><br>

>>> <a href="http://trac.mcs.anl.gov/projects/parallel-netcdf/browser/trunk/examples/tutorial/pnetcdf-write-buffered.c" target="_blank">http://trac.mcs.anl.gov/projects/parallel-netcdf/browser/trunk/examples/tutorial/pnetcdf-write-buffered.c</a><br>


>>>>>>>>><br>

>>>>>>>>><br>

>>>>>>>>> ==rob<br>

>>>>>>>>><br>

>>>>>>>>>> On Tue, Aug 14, 2012 at 10:16 AM, Rob Latham <<a href="mailto:robl@mcs.anl.gov">robl@mcs.anl.gov</a>><br>

>>>>> wrote:<br>

>>>>>>>>>><br>

>>>>>>>>>>> Hi Jim<br>

>>>>>>>>>>><br>

>>>>>>>>>>> You've been using the new 'bput/bget' routines, right?  Can you<br>

>>>>> tell<br>

>>>>>>>>>>> me a bit about what you are using them for, and what -- if any<br>

>>> --<br>

>>>>>>>>>>> benefit they've provided?<br>

>>>>>>>>>>><br>

>>>>>>>>>>> (Rationale: our program management likes to see papers and<br>

>>>>>>>>>>> presentations, but the most valued contribution is 'science<br>

>>>>> impact').<br>

>>>>>>>>>>><br>

>>>>>>>>>>> Thanks<br>

>>>>>>>>>>> ==rob<br>

>>>>>>>>>>><br>

>>>>>>>>>>> --<br>

>>>>>>>>>>> Rob Latham<br>

>>>>>>>>>>> Mathematics and Computer Science Division<br>

>>>>>>>>>>> Argonne National Lab, IL USA<br>

>>>>>>>>>>><br>

>>>>>>>>>><br>

>>>>>>>>>><br>

>>>>>>>>>><br>

>>>>>>>>><br>

>>>>>>>>> --<br>

>>>>>>>>> Rob Latham<br>

>>>>>>>>> Mathematics and Computer Science Division<br>

>>>>>>>>> Argonne National Lab, IL USA<br>

>>>>>>>>><br>

>>>>>>>><br>

>>>>>>>><br>

>>>>>>>><br>

>>>>>>><br>

>>>>>>> --<br>

>>>>>>> Rob Latham<br>

>>>>>>> Mathematics and Computer Science Division<br>

>>>>>>> Argonne National Lab, IL USA<br>

>>>>>>><br>

>>>>>>><br>

>>>>>>><br>

>>>>>>> --<br>

>>>>>>> Jim Edwards<br>

>>>>>>><br>

>>>>>>> CESM Software Engineering Group<br>

>>>>>>> National Center for Atmospheric Research<br>

>>>>>>> Boulder, CO<br>

>>>>>>> <a href="tel:303-497-1842" value="+13034971842">303-497-1842</a><br>

>>>>>>><br>

>>>>>><br>

>>>>>><br>

>>>>>><br>

>>>>>><br>

>>>>>><br>

>>>>>> --<br>

>>>>>> Jim Edwards<br>

>>>>>><br>

>>>>>><br>

>>>>>><br>

>>>>><br>

>>>>><br>

>>>><br>

>>>><br>

>>><br>

>>> --<br>

>>> Rob Latham<br>

>>> Mathematics and Computer Science Division<br>

>>> Argonne National Lab, IL USA<br>

>>><br>

>><br>

>><br>

>><br>

><br>

> --<br>

> Rob Latham<br>

> Mathematics and Computer Science Division<br>

> Argonne National Lab, IL USA<br>

<br>

</div></div></blockquote></div><br><br clear="all"><br>-- <br><pre>Jim Edwards<br><br><br></pre><br>