pnetCDF performance issues
Rob Latham
robl at mcs.anl.gov
Tue Mar 8 20:09:43 CST 2011
On Tue, Mar 08, 2011 at 07:44:44PM -0600, William Gropp wrote:
> Thanks, Rob.
>
> This does raise several questions:
>
> 1) What should the defaults be so that users get good performance
> "out of the box"?
> 2) Can/should pnetCDF diagnose poor choices and inform the user?
> 3) Can/should MPI-IO "fix" this by exploiting the MPI-IO semantics
> to permit converting writes to be aligned (e.g., by caching)?
>
> Of these, (1) is the most important for pnetCDF, particularly as
> users compare approaches.
One:
pnetcdf could stat the file system, but take a peek at ROMIO's file
system detection code for the state of portable statfs. today,
perhaps it is less of a problem than when that code was written a
decade ago. What I mean to say is: "does there exist a portable way
to determine alignment"? st_blksize is probably our best bet, but on
Lustre it's actually more important not to align blocks but to hit the
same OST.
Just because it's hard doesn't mean we shouldn't do it, of course...
HDF5 has this problem too: both libraries would benefit from an MPI-IO
interface to "file system features": alignment and "optimum tranfer
size" come to mind. others no doubt.
two:
pnetcdf has two ways to get information back to the caller: the return
code and the info object. A read-only "pnetcdf_how_we_doin" hint
might do the trick.
three:
some MPI-IO implementations do fix this, as long as collective I/O is
used. The MPI-IO on BlueGene, for example, always forces collective
I/O (even if operations are not overlapping), then aligns file domains
to block size boundaries. I know, I just complained about how
un-portable st_blksize can be but 'ad_bgl' gets to make some
simplifying assumptions.
ROMIO, at least recent versions, can also do some file domain magic
- "romio_min_fdomain_size" will enforce a lower bound on the amount of
I/O an aggregator will do.
- set the "striping_unit" hint and ROMIO will ensure file domain
boundaries are aligned to a multiple of that value.
==rob
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
More information about the parallel-netcdf
mailing list