[EXTERNAL] RE: Burst buffer - blocking or unblocking transfer?

Sjaardema, Gregory D gdsjaar at sandia.gov
Wed Sep 23 16:33:35 CDT 2020


A similar use case may be a client that writes a file-per-checkpoint-dump.  Once the current checkpoint if finished, the client closes the file and then continues with the calculations.  Does that file get moved over to stable storage at that point while the client is calculating, or does the client have to wait after each dump/close cycle.

..Greg

On 9/23/20, 3:31 PM, "parallel-netcdf on behalf of Michael Laufer" <parallel-netcdf-bounces at lists.mcs.anl.gov on behalf of michael.laufer at toganetworks.com> wrote:

    Rob,

    I am referring to a case of an application (WRF, for instance) that is writing checkpoint files periodically (no reading involved).
    So my question is, once the application hands off the write request to Parallel-NetCDF with burst buffer and a quick write is made to the burst buffer, will the application continue to perform calculations or will it have to wait until the file is finally transferred to (slow) stable storage to proceed?

    Michael



    -----Original Message-----
    From: Latham, Robert J. [mailto:robl at mcs.anl.gov] 
    Sent: Wednesday, September 23, 2020 9:59 PM
    To: parallel-netcdf at lists.mcs.anl.gov; michael.laufer at toganetworks.com
    Subject: Re: Burst buffer - blocking or unblocking transfer?

    On Tue, 2020-09-22 at 18:29 +0000, Michael Laufer wrote:
    > Hi,
    >  
    > In reference to the burst buffer feature introduced in v1.10.0:
    > When the burst buffer flushes to the (long term storage) disk, does it 
    > do so in a blocking or unblocking fashion?
    >  
    > It appears to me that it is blocking, but I am not 100% sure. If that 
    > is the case, why not use an unblocking (async) transfer?
    > This would allow the computation to continue while the data transfer 
    > from BB to disk in running.
    >  
    > Please let me know if I am missing something.
    > Michael Laufer

    Parallel-NetCDF could definitely issue MPI_File_iwrite calls, but with what would it overlap that I/O?  The replay from burst buffer log to stable storage happens when pnetcdf closes a file, waits for operations to complete, flushes data, or finds a read.

    In the first three cases, there is no operation we can overlap with the write.

    In the last case, we wait for the logs to replay so we read back data as the application expects it.

    The big benefit for using the burst buffer feature is to soak up tiny noncontiguous writes.

    ==rob



More information about the parallel-netcdf mailing list