Performance tuning problem with iput_vara_double/wait_all

Phil Miller mille121 at illinois.edu
Thu Dec 27 15:03:13 CST 2012


On Thu, Dec 27, 2012 at 2:47 PM, Wei-keng Liao
<wkliao at ece.northwestern.edu> wrote:
> It looks like you are writing one array element of 8 bytes at a time.
> Performance will definitely be poor for this I/O strategy.
>
> Please check if the indices translated by mask are actually continuous.
> If so, you can replace the loop with one write call.

Hi Wei-keng,

I don't think it should be 8 bytes (1 element) at a time - each call
should deposit 24*52 elements of data, spanning the entirety of the
pft and weeks dimensions. Do you mean that because the file data isn't
contiguous in those dimensions, the writes won't end up being
combined?

I could see what happens if I transpose the file dimensions to put pft
and weeks first, I guess.

Further, the documentation seems to indicate that by using collective
data mode and non-blocking puts, the library will use MPI-IO's
collective write calls to perform a two-phase redistribute-and-write,
avoiding actually sending any tiny writes to the filesystem.

Perhaps I've misunderstood something on one of those points, though?

As for the mask continuity, they're explicitly not continuous. It's
round-robin over the Earth's surface, so as to obtain good load
balance, with gaps where a grid point is over water versus land. I
have some other code that I'm adding now that fills in the gaps with a
sentinel value, but making even those continuous will be challenging.

Phil


More information about the parallel-netcdf mailing list