[petsc-users] Unable to create >4GB sized HDF5 files on Cray XC30

Sun Aug 18 08:10:19 CDT 2013

Juha Jäykkä <juhaj at iki.fi> writes:

> For small files, chunking is probably not going to change performance in any 
> significant manner, so one option could be to simply not chunk small files at 
> all

This is effectively what is done now, considering that HDF5 needs
chunking to be enabled to use H5S_UNLIMITED.

> and then chunk big files "optimally" – whatever that means. HDFgroup
> seems to think that "the chunk size be approximately equal to the
> average expected size of the data block needed by the application."
> (http://www.hdfgroup.org/training/HDFtraining/UsersGuide/Perform.fm2.html)
> For more chunking stuff:
>
> In the case of PETSc I think that means not the WHOLE application, but one MPI 
> rank (or perhaps one SMP host running a mixture of MPI ranks and OpenMP 
> threads), which is probably always going to be < 4 GB (except perhaps in the 
> mixture case).

Output uses a collective write, so the granularity of the IO node is
probably more relevant for writing (e.g., BG/Q would have one IO node
per 128 compute nodes), but almost any chunk size should perform
similarly.  It would make a lot more difference for something like
visualization where subsets of the data are read, typically with
independent IO.

> turning chunking completely off works too

Are you sure?  Did you try writing a second time step?  The
documentation says that H5S_UNLIMITED requires chunking.

> See above, but note also that there can at most be 64k chunks in the file, so 
> fixing the chunk size to 10 MiB means limiting file size to 640 GiB.

Thanks for noticing this limit.  This might come from the 64k limit
on attribute sizes.

> My suggestion is to give PETSc a little more logic here, something like this:
>
> if sizeof(data) > 4GiB * 64k: no chunking # impossible to chunk!
> elif sizeof(data) < small_file_limit: no chunking # probably best for speed
> elif current rank's data size < 4 GB: chunk using current ranks data size

Chunk size needs to be collective.  We could compute an average size
From each subdomain, but can't just use the subdomain size.

> else divide current rank's data size by 2**(number of dimensions) until < 4 GB 
> and then use that chunk size.

We might want the chunk size to be smaller than 4GiB anyway to avoid
out-of-memory problems for readers and writers.

I think the chunk size (or maximum chunk size) should be settable by the user.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130818/d8ee23e1/attachment.pgp>