[petsc-users] Unable to create >4GB sized HDF5 files on Cray XC30

Jed Brown jedbrown at mcs.anl.gov
Wed Oct 23 07:39:19 CDT 2013


Juha Jäykkä <juhaj at iki.fi> writes:

> Hi Jed, 
>
> My first attempt at sanitising chunking.

Thanks, the logic looks reasonable.  Unfortunately, I don't think we can
have comprehensive tests because the large file sizes won't work in many
test environments.  That said, we could have a configurable parameter
instead of hard-coding 4 GiB, in which case we could set it to 1 MiB (or
less) for testing.

Could you make a commit locally and use "git format-patch" to prepare a
patch (or use a bitbucket pull request)?  The body of your email would
make a fantastic commit message.  Instructions here:

https://bitbucket.org/petsc/petsc/wiki/Home

Further comments below.

> --- gr2.c.orig	2013-05-14 03:17:51.000000000 +0300
> +++ gr2.c	2013-10-21 19:14:11.000000000 +0300
> @@ -341,6 +341,15 @@
>    const char     *vecname;
>    PetscErrorCode ierr;
>  
> +  // just in case some libraries do not empty memory for local variables:

We have to use C89-style comments because of Microsoft.

In the C language, variables are not automatically initialized.  You can
do it in the declaration using

  hsize_t maxDims[6] = {0},dims[6] = {0}, ...

> +  for (dim=0;dim<6;dim++) {
> +    chunkDims[dim] = 0;
> +    maxDims[dim] = 0;
> +    dims[dim] = 0;
> +    count[dim] = 0;
> +    offset[dim] = 0;
> +  }
> +
>    PetscFunctionBegin;

We cannot have executable statements before PetscFunctionBegin.

>    ierr = PetscViewerHDF5OpenGroup(viewer, &file_id, &group);CHKERRQ(ierr);
>    ierr = PetscViewerHDF5GetTimestep(viewer, &timestep);CHKERRQ(ierr);
> @@ -395,6 +405,103 @@
>    chunkDims[dim] = dims[dim];
>    ++dim;
>  #endif
> +
> +  // some requirements for the chunks (assuming no split along time-step)
> +  int ranks;

Declarations cannot be mixed with statements in C89.

> +  MPI_Comm_size(PETSC_COMM_WORLD, &ranks);

PETSC_COMM_WORLD cannot be used in library code because this has to be
work when the vector resides on a subcommunicator.  Use
PetscObjectComm((PetscObject)xin).  And please name the variable "size"
or "comm_size".

> +  hsize_t chunk_size, target_size, dim_save,
> +    vec_size = sizeof(double)*da->M*da->N*da->P*da->w,
> +    avg_local_vec_size = (hsize_t) ceil(vec_size*1.0/ranks),
> +    max_chunks = 65536,
> +    max_chunk_size = 4294967296,
> +    GiB = 1073741824,

There is a potential problem with these long integer literals, I believe
with some C++ compilers prior to C++11 since they are not guaranteed to
interpret the large integer literal as having sufficiently large type.
One reasonable thing is to write

  hsize_t Ki = 1024;

and then

  Gi = Ki*Ki*Ki;


I would prefer to move all the logic below into a function
VecGetHDF5ChunkSize().  Note that almost the same logic appears in
VecView_MPI_HDF5 (src/vec/vec/impls/mpi/pdvec.c) so both versions should
call this function.  VecView_MPI_HDF5 can pass in a 1D (or 2D due to
blocks) global dimensions.

> +#define fmin3(a,b,c)	fmin(a,fmin(b,c))
> +#define fmax3(a,b,c)	fmax(a,fmax(b,c))
> +#define max(a,b)        (a>=b?a:b)

Please use PetscMax() and PetscMin().  You can define the 3-way versions
at the top of the file if need be.

> +  dim_save=dim;
> +  target_size = (hsize_t) ceil(fmin3(vec_size, fmax3(avg_local_vec_size, ceil(vec_size*1.0/max_chunks), min_size), max_chunk_size));
> +  chunk_size = (hsize_t) max(1,chunkDims[0])*max(1,chunkDims[1])*max(1,chunkDims[2])*max(1,chunkDims[3])*max(1,chunkDims[4])*max(1,chunkDims[5])*sizeof(PetscScalar);
> +
> +  // if size/rank > max_chunk_size, we need radical measures: even going down to
> +  // avg_local_vec_size is not enough, so we simply use chunk size of 4 GiB no matter
> +  // what, composed in the most efficient way possible.
> +  // N.B. this minimises the number of chunks, which may or may not be the optimal
> +  // solution. In a BG, for example, the optimal solution is probably to make # chunks = #
> +  // IO nodes involved, but this author has no access to a BG to figure out how to
> +  // reliably find the right number. And even then it may or may not be enough.
> +  if (avg_local_vec_size > max_chunk_size) {
> +    // check if we can just split local z-axis: is that enough?
> +    zslices=(hsize_t) ceil(vec_size*1.0/(da->p*max_chunk_size))*zslices;
> +    if (zslices > da->P) {
> +      // lattice is too large in xy-directions, splitting z only is not enough
> +      zslices = da->P;
> +      yslices=(hsize_t) ceil(vec_size*1.0/(zslices*da->n*max_chunk_size))*yslices;
> +      if (yslices > da->N) {
> +	// lattice is too large in x-direction, splitting along z, y is not enough
> +	yslices = da->N;

You mixed tabs and spaces here.  Please use all spaces.

> +	xslices=(hsize_t) ceil(vec_size*1.0/(zslices*yslices*da->m*max_chunk_size))*xslices;
> +      }
> +    }
> +    dim = 0;
> +    if (timestep >= 0) {
> +      ++dim;
> +    }
> +    // prefer to split z-axis, even down to planar slices
> +    if (da->dim == 3) {
> +      chunkDims[dim++] = (hsize_t) floor(da->P*1.0/zslices);
> +      chunkDims[dim++] = (hsize_t) floor(da->N*1.0/yslices);
> +      chunkDims[dim++] = (hsize_t) floor(da->M*1.0/xslices);
> +    } else {
> +      // This is a 2D world exceeding 4GiB in size; yes, I've seen them, even used myself
> +      chunkDims[dim++] = (hsize_t) floor(da->N*1.0/yslices);
> +      chunkDims[dim++] = (hsize_t) floor(da->M*1.0/xslices);
> +    }
> +    chunk_size = (hsize_t) fmax(1,chunkDims[0])*fmax(1,chunkDims[1])*fmax(1,chunkDims[2])*fmax(1,chunkDims[3])*fmax(1,chunkDims[4])*fmax(1,chunkDims[5])*sizeof(double);
> +  } else {
> +    if (target_size < chunk_size) {
> +      // only change the defaults if target_size < chunk_size
> +      dim = 0;
> +      if (timestep >= 0) {
> +	++dim;
> +      }
> +      // prefer to split z-axis, even down to planar slices
> +      if (da->dim == 3) {
> +	// try splitting the z-axis to core-size bits, i.e. divide chunk size by # ranks in z-direction
> +	if (target_size >= chunk_size/da->p) {
> +	  // just make chunks the size of <local_z>x<whole_world_y>x<whole_world_x>x<dof>
> +	  chunkDims[dim] = (hsize_t) ceil(da->P*1.0/da->p);
> +	} else {
> +	  // oops, just splitting the z-axis is NOT ENOUGH, need to split more; let's be
> +	  // radical and let everyone write all they've got
> +	  chunkDims[dim++] = (hsize_t) ceil(da->P*1.0/da->p);
> +	  chunkDims[dim++] = (hsize_t) ceil(da->N*1.0/da->n);
> +	  chunkDims[dim++] = (hsize_t) ceil(da->M*1.0/da->m);
> +	}
> +      } else {
> +	// This is a 2D world exceeding 4GiB in size; yes, I've seen them, even used myself
> +	if (target_size >= chunk_size/da->n) {
> +	  // just make chunks the size of <local_z>x<whole_world_y>x<whole_world_x>x<dof>
> +	  chunkDims[dim] = (hsize_t) ceil(da->N*1.0/da->n);
> +	} else {
> +	  // oops, just splitting the z-axis is NOT ENOUGH, need to split more; let's be
> +	  // radical and let everyone write all they've got
> +	  chunkDims[dim++] = (hsize_t) ceil(da->N*1.0/da->n);
> +	  chunkDims[dim++] = (hsize_t) ceil(da->M*1.0/da->m);
> +	}
> +      
> +      }
> +      chunk_size = (hsize_t) fmax(1,chunkDims[0])*fmax(1,chunkDims[1])*fmax(1,chunkDims[2])*fmax(1,chunkDims[3])*fmax(1,chunkDims[4])*fmax(1,chunkDims[5])*sizeof(double);
> +    } else {
> +      // precomputed chunks are fine, we don't need to do anything
> +    }
> +  }
> +  dim=dim_save; // need to have the original value here no matter what!
> +
>    filespace = H5Screate_simple(dim, dims, maxDims);
>    if (filespace == -1) SETERRQ(PETSC_COMM_SELF,PETSC_ERR_LIB,"Cannot H5Screate_simple()");
>  
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20131023/60a412ef/attachment.pgp>


More information about the petsc-users mailing list