Performance tuning problem with iput_vara_double/wait_all

Wei-keng Liao wkliao at ece.northwestern.edu
Thu Dec 27 15:56:38 CST 2012


Hi, Phil,

Sorry, I misread your codes.
My understanding now is that each process is writing 24*52 array elements
and each of them is written to a file location with a (numlon*numlat)
distance apart from its next/previous element.

PnetCDF nonblocking APIs define an MPI derived data type as
a file type to represent the file access layout for that call. Then all
file types are concatenated into a single one which is later used to
make a call to MPI collective write.

The problem is MPI-IO requires the file type contains monotonic non-decreasing
file offsets. So your codes will produce a cocatenated file type that is
against this requirement and result in PnetCDF unable to aggregate the
requests the way you think it should be. At the end, PnetCDF will make
several MPI collective I/O calls and each of them accessing non-contiguous
file locations.

Maybe a high-level aggregation will be unavoidable in your case.

Wei-keng

On Dec 27, 2012, at 3:03 PM, Phil Miller wrote:

> On Thu, Dec 27, 2012 at 2:47 PM, Wei-keng Liao
> <wkliao at ece.northwestern.edu> wrote:
>> It looks like you are writing one array element of 8 bytes at a time.
>> Performance will definitely be poor for this I/O strategy.
>> 
>> Please check if the indices translated by mask are actually continuous.
>> If so, you can replace the loop with one write call.
> 
> Hi Wei-keng,
> 
> I don't think it should be 8 bytes (1 element) at a time - each call
> should deposit 24*52 elements of data, spanning the entirety of the
> pft and weeks dimensions. Do you mean that because the file data isn't
> contiguous in those dimensions, the writes won't end up being
> combined?
> 
> I could see what happens if I transpose the file dimensions to put pft
> and weeks first, I guess.
> 
> Further, the documentation seems to indicate that by using collective
> data mode and non-blocking puts, the library will use MPI-IO's
> collective write calls to perform a two-phase redistribute-and-write,
> avoiding actually sending any tiny writes to the filesystem.
> 
> Perhaps I've misunderstood something on one of those points, though?
> 
> As for the mask continuity, they're explicitly not continuous. It's
> round-robin over the Earth's surface, so as to obtain good load
> balance, with gaps where a grid point is over water versus land. I
> have some other code that I'm adding now that fills in the gaps with a
> sentinel value, but making even those continuous will be challenging.
> 
> Phil



More information about the parallel-netcdf mailing list