Burst buffer - blocking or unblocking transfer?

Michael Laufer michael.laufer at toganetworks.com
Wed Sep 23 14:34:23 CDT 2020


Rob,

I am referring to a case of an application (WRF, for instance) that is writing checkpoint files periodically (no reading involved).
So my question is, once the application hands off the write request to Parallel-NetCDF with burst buffer and a quick write is made to the burst buffer, will the application continue to perform calculations or will it have to wait until the file is finally transferred to (slow) stable storage to proceed?

Michael



-----Original Message-----
From: Latham, Robert J. [mailto:robl at mcs.anl.gov] 
Sent: Wednesday, September 23, 2020 9:59 PM
To: parallel-netcdf at lists.mcs.anl.gov; michael.laufer at toganetworks.com
Subject: Re: Burst buffer - blocking or unblocking transfer?

On Tue, 2020-09-22 at 18:29 +0000, Michael Laufer wrote:
> Hi,
>  
> In reference to the burst buffer feature introduced in v1.10.0:
> When the burst buffer flushes to the (long term storage) disk, does it 
> do so in a blocking or unblocking fashion?
>  
> It appears to me that it is blocking, but I am not 100% sure. If that 
> is the case, why not use an unblocking (async) transfer?
> This would allow the computation to continue while the data transfer 
> from BB to disk in running.
>  
> Please let me know if I am missing something.
> Michael Laufer

Parallel-NetCDF could definitely issue MPI_File_iwrite calls, but with what would it overlap that I/O?  The replay from burst buffer log to stable storage happens when pnetcdf closes a file, waits for operations to complete, flushes data, or finds a read.

In the first three cases, there is no operation we can overlap with the write.

In the last case, we wait for the logs to replay so we read back data as the application expects it.

The big benefit for using the burst buffer feature is to soak up tiny noncontiguous writes.

==rob


More information about the parallel-netcdf mailing list