[petsc-users] Unable to create >4GB sized HDF5 files on Cray XC30

Juha Jäykkä juhaj at iki.fi
Sun Oct 6 11:59:37 CDT 2013


> > Actually, I didn't ask for it. I only asked for a bug to be fixed. A bug
> > which means > 4 GB Vec cannot be saved into a HDF5 file using PETSc
> > VecView, because chunking *was* introduced, but with insane chunk sizes
> Ah, right.  I would note that using the local size is also flawed
> because we can only have 65k chunks, but we sometimes run jobs with more
> than that number of processes.  Maybe we need something like this?
> 
>   chunk_size = min(vec_size, max(avg_local_vec_size, vec_size/65k, 10 MiB),
> 4 GiB)

Argh, messy indeed. Are you sure you mean 65 k and not 64 Ki? I made a small 
table of the situation just to make sure I am not missing anything. In the 
table, "small" means < 4 GB, "large" means >= 4 GB, "few" means < 65 k, "many" 
means >= 65 k. Note that local size > global size is impossible, but I include 
the row on the table for completeness's sake.

Variables: 	local size	global size	# ranks		chunks
		small		small		few		global size
		small		small		many		global size[1]	
		small		large		few		avg local size
		small		large		many		4 GiB
		large		small		few		impossible
		large		small		many		impossible
		large		large		few		4 GiB[2]
		large		large		many		65 k chunks

[1] It sounds improbable anyone would run a problem with < 4 GiB data with >= 
65k ranks, but fortunately it's not a problem.

[2] Unless I'm mistaken, this situation will always give < 65 k chunks for 4 
GiB chunk size.

I also believe your formula gives "the right" answer in each case. Just one 
more question: is "average local size" a good solution or is it better to use 
"max local size"? The latter will cause more unnecessary data in the file, but 
unless I'm mistaken, the former will require extra MPI communication to fill 
in the portions of ranks whose local size is less than average.

HDF5 really needs to fix this internally. As it stands, a single HDF5 dataset 
cannot hold more than 260 TiB – not that many people would want such files 
anyway, but then again, "640 kiB should be enough for everybody", right? I'm 
running simulations which take more than terabyte of memory, and I'm by far 
not the biggest memory consumer in the world, so the limit is not really as 
far as it might seem.

> I think we're planning to tag 3.4.3 in the next couple weeks.  There
> might be a 3.4.4 as well, but I could see going straight to 3.5.

Ok. I don't see myself having time to fix and test this in two weeks, but 
3.4.4 should be doable. Anyone else want to fix the bug by then?

Cheers,
Juha

-- 
		 -----------------------------------------------
		| Juha Jäykkä, juhaj at iki.fi			|
		| http://koti.kapsi.fi/~juhaj/			|
		 -----------------------------------------------



More information about the petsc-users mailing list