[petsc-users] Unable to create >4GB sized HDF5 files on Cray XC30
Jed Brown
jedbrown at mcs.anl.gov
Wed Oct 23 07:39:19 CDT 2013
Juha Jäykkä <juhaj at iki.fi> writes:
> Hi Jed,
>
> My first attempt at sanitising chunking.
Thanks, the logic looks reasonable. Unfortunately, I don't think we can
have comprehensive tests because the large file sizes won't work in many
test environments. That said, we could have a configurable parameter
instead of hard-coding 4 GiB, in which case we could set it to 1 MiB (or
less) for testing.
Could you make a commit locally and use "git format-patch" to prepare a
patch (or use a bitbucket pull request)? The body of your email would
make a fantastic commit message. Instructions here:
https://bitbucket.org/petsc/petsc/wiki/Home
Further comments below.
> --- gr2.c.orig 2013-05-14 03:17:51.000000000 +0300
> +++ gr2.c 2013-10-21 19:14:11.000000000 +0300
> @@ -341,6 +341,15 @@
> const char *vecname;
> PetscErrorCode ierr;
>
> + // just in case some libraries do not empty memory for local variables:
We have to use C89-style comments because of Microsoft.
In the C language, variables are not automatically initialized. You can
do it in the declaration using
hsize_t maxDims[6] = {0},dims[6] = {0}, ...
> + for (dim=0;dim<6;dim++) {
> + chunkDims[dim] = 0;
> + maxDims[dim] = 0;
> + dims[dim] = 0;
> + count[dim] = 0;
> + offset[dim] = 0;
> + }
> +
> PetscFunctionBegin;
We cannot have executable statements before PetscFunctionBegin.
> ierr = PetscViewerHDF5OpenGroup(viewer, &file_id, &group);CHKERRQ(ierr);
> ierr = PetscViewerHDF5GetTimestep(viewer, ×tep);CHKERRQ(ierr);
> @@ -395,6 +405,103 @@
> chunkDims[dim] = dims[dim];
> ++dim;
> #endif
> +
> + // some requirements for the chunks (assuming no split along time-step)
> + int ranks;
Declarations cannot be mixed with statements in C89.
> + MPI_Comm_size(PETSC_COMM_WORLD, &ranks);
PETSC_COMM_WORLD cannot be used in library code because this has to be
work when the vector resides on a subcommunicator. Use
PetscObjectComm((PetscObject)xin). And please name the variable "size"
or "comm_size".
> + hsize_t chunk_size, target_size, dim_save,
> + vec_size = sizeof(double)*da->M*da->N*da->P*da->w,
> + avg_local_vec_size = (hsize_t) ceil(vec_size*1.0/ranks),
> + max_chunks = 65536,
> + max_chunk_size = 4294967296,
> + GiB = 1073741824,
There is a potential problem with these long integer literals, I believe
with some C++ compilers prior to C++11 since they are not guaranteed to
interpret the large integer literal as having sufficiently large type.
One reasonable thing is to write
hsize_t Ki = 1024;
and then
Gi = Ki*Ki*Ki;
I would prefer to move all the logic below into a function
VecGetHDF5ChunkSize(). Note that almost the same logic appears in
VecView_MPI_HDF5 (src/vec/vec/impls/mpi/pdvec.c) so both versions should
call this function. VecView_MPI_HDF5 can pass in a 1D (or 2D due to
blocks) global dimensions.
> +#define fmin3(a,b,c) fmin(a,fmin(b,c))
> +#define fmax3(a,b,c) fmax(a,fmax(b,c))
> +#define max(a,b) (a>=b?a:b)
Please use PetscMax() and PetscMin(). You can define the 3-way versions
at the top of the file if need be.
> + dim_save=dim;
> + target_size = (hsize_t) ceil(fmin3(vec_size, fmax3(avg_local_vec_size, ceil(vec_size*1.0/max_chunks), min_size), max_chunk_size));
> + chunk_size = (hsize_t) max(1,chunkDims[0])*max(1,chunkDims[1])*max(1,chunkDims[2])*max(1,chunkDims[3])*max(1,chunkDims[4])*max(1,chunkDims[5])*sizeof(PetscScalar);
> +
> + // if size/rank > max_chunk_size, we need radical measures: even going down to
> + // avg_local_vec_size is not enough, so we simply use chunk size of 4 GiB no matter
> + // what, composed in the most efficient way possible.
> + // N.B. this minimises the number of chunks, which may or may not be the optimal
> + // solution. In a BG, for example, the optimal solution is probably to make # chunks = #
> + // IO nodes involved, but this author has no access to a BG to figure out how to
> + // reliably find the right number. And even then it may or may not be enough.
> + if (avg_local_vec_size > max_chunk_size) {
> + // check if we can just split local z-axis: is that enough?
> + zslices=(hsize_t) ceil(vec_size*1.0/(da->p*max_chunk_size))*zslices;
> + if (zslices > da->P) {
> + // lattice is too large in xy-directions, splitting z only is not enough
> + zslices = da->P;
> + yslices=(hsize_t) ceil(vec_size*1.0/(zslices*da->n*max_chunk_size))*yslices;
> + if (yslices > da->N) {
> + // lattice is too large in x-direction, splitting along z, y is not enough
> + yslices = da->N;
You mixed tabs and spaces here. Please use all spaces.
> + xslices=(hsize_t) ceil(vec_size*1.0/(zslices*yslices*da->m*max_chunk_size))*xslices;
> + }
> + }
> + dim = 0;
> + if (timestep >= 0) {
> + ++dim;
> + }
> + // prefer to split z-axis, even down to planar slices
> + if (da->dim == 3) {
> + chunkDims[dim++] = (hsize_t) floor(da->P*1.0/zslices);
> + chunkDims[dim++] = (hsize_t) floor(da->N*1.0/yslices);
> + chunkDims[dim++] = (hsize_t) floor(da->M*1.0/xslices);
> + } else {
> + // This is a 2D world exceeding 4GiB in size; yes, I've seen them, even used myself
> + chunkDims[dim++] = (hsize_t) floor(da->N*1.0/yslices);
> + chunkDims[dim++] = (hsize_t) floor(da->M*1.0/xslices);
> + }
> + chunk_size = (hsize_t) fmax(1,chunkDims[0])*fmax(1,chunkDims[1])*fmax(1,chunkDims[2])*fmax(1,chunkDims[3])*fmax(1,chunkDims[4])*fmax(1,chunkDims[5])*sizeof(double);
> + } else {
> + if (target_size < chunk_size) {
> + // only change the defaults if target_size < chunk_size
> + dim = 0;
> + if (timestep >= 0) {
> + ++dim;
> + }
> + // prefer to split z-axis, even down to planar slices
> + if (da->dim == 3) {
> + // try splitting the z-axis to core-size bits, i.e. divide chunk size by # ranks in z-direction
> + if (target_size >= chunk_size/da->p) {
> + // just make chunks the size of <local_z>x<whole_world_y>x<whole_world_x>x<dof>
> + chunkDims[dim] = (hsize_t) ceil(da->P*1.0/da->p);
> + } else {
> + // oops, just splitting the z-axis is NOT ENOUGH, need to split more; let's be
> + // radical and let everyone write all they've got
> + chunkDims[dim++] = (hsize_t) ceil(da->P*1.0/da->p);
> + chunkDims[dim++] = (hsize_t) ceil(da->N*1.0/da->n);
> + chunkDims[dim++] = (hsize_t) ceil(da->M*1.0/da->m);
> + }
> + } else {
> + // This is a 2D world exceeding 4GiB in size; yes, I've seen them, even used myself
> + if (target_size >= chunk_size/da->n) {
> + // just make chunks the size of <local_z>x<whole_world_y>x<whole_world_x>x<dof>
> + chunkDims[dim] = (hsize_t) ceil(da->N*1.0/da->n);
> + } else {
> + // oops, just splitting the z-axis is NOT ENOUGH, need to split more; let's be
> + // radical and let everyone write all they've got
> + chunkDims[dim++] = (hsize_t) ceil(da->N*1.0/da->n);
> + chunkDims[dim++] = (hsize_t) ceil(da->M*1.0/da->m);
> + }
> +
> + }
> + chunk_size = (hsize_t) fmax(1,chunkDims[0])*fmax(1,chunkDims[1])*fmax(1,chunkDims[2])*fmax(1,chunkDims[3])*fmax(1,chunkDims[4])*fmax(1,chunkDims[5])*sizeof(double);
> + } else {
> + // precomputed chunks are fine, we don't need to do anything
> + }
> + }
> + dim=dim_save; // need to have the original value here no matter what!
> +
> filespace = H5Screate_simple(dim, dims, maxDims);
> if (filespace == -1) SETERRQ(PETSC_COMM_SELF,PETSC_ERR_LIB,"Cannot H5Screate_simple()");
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20131023/60a412ef/attachment.pgp>
More information about the petsc-users
mailing list