[petsc-users] Unable to create >4GB sized HDF5 files on Cray XC30
Juha Jäykkä
juhaj at iki.fi
Sun Oct 6 11:59:37 CDT 2013
> > Actually, I didn't ask for it. I only asked for a bug to be fixed. A bug
> > which means > 4 GB Vec cannot be saved into a HDF5 file using PETSc
> > VecView, because chunking *was* introduced, but with insane chunk sizes
> Ah, right. I would note that using the local size is also flawed
> because we can only have 65k chunks, but we sometimes run jobs with more
> than that number of processes. Maybe we need something like this?
>
> chunk_size = min(vec_size, max(avg_local_vec_size, vec_size/65k, 10 MiB),
> 4 GiB)
Argh, messy indeed. Are you sure you mean 65 k and not 64 Ki? I made a small
table of the situation just to make sure I am not missing anything. In the
table, "small" means < 4 GB, "large" means >= 4 GB, "few" means < 65 k, "many"
means >= 65 k. Note that local size > global size is impossible, but I include
the row on the table for completeness's sake.
Variables: local size global size # ranks chunks
small small few global size
small small many global size[1]
small large few avg local size
small large many 4 GiB
large small few impossible
large small many impossible
large large few 4 GiB[2]
large large many 65 k chunks
[1] It sounds improbable anyone would run a problem with < 4 GiB data with >=
65k ranks, but fortunately it's not a problem.
[2] Unless I'm mistaken, this situation will always give < 65 k chunks for 4
GiB chunk size.
I also believe your formula gives "the right" answer in each case. Just one
more question: is "average local size" a good solution or is it better to use
"max local size"? The latter will cause more unnecessary data in the file, but
unless I'm mistaken, the former will require extra MPI communication to fill
in the portions of ranks whose local size is less than average.
HDF5 really needs to fix this internally. As it stands, a single HDF5 dataset
cannot hold more than 260 TiB – not that many people would want such files
anyway, but then again, "640 kiB should be enough for everybody", right? I'm
running simulations which take more than terabyte of memory, and I'm by far
not the biggest memory consumer in the world, so the limit is not really as
far as it might seem.
> I think we're planning to tag 3.4.3 in the next couple weeks. There
> might be a 3.4.4 as well, but I could see going straight to 3.5.
Ok. I don't see myself having time to fix and test this in two weeks, but
3.4.4 should be doable. Anyone else want to fix the bug by then?
Cheers,
Juha
--
-----------------------------------------------
| Juha Jäykkä, juhaj at iki.fi |
| http://koti.kapsi.fi/~juhaj/ |
-----------------------------------------------
More information about the petsc-users
mailing list