[petsc-users] Unable to create >4GB sized HDF5 files on Cray XC30

Sat Oct 5 02:50:14 CDT 2013

> What range of chunk sizes are you using?  For each fixed number of
> ranks, how does the performance vary when varying chunk size from, say,
> 5MB to 500MB?

I didn't write down the results as they were just a byproduct of getting 
usable performance, but I will do that when I have the time.

> > Why not use the size of the local part of the DA/Vec? That would guarantee
> That's fine, but the chunk size needs to be *collective* so we need to
> do a reduction or otherwise compute the "average size".

I guess I was lucky to have every rank's local Vec precisely the same size (no 
wonder: my test was a 1024^3 lattice with 4096 ranks probably means there were 
64^3 each.

What happens when, say, one has three ranks and Vec lengths 120, 120, and 128 
bytes (or 15, 15, and 16 doubles) on the three ranks: the average becomes 
122.667 which isn't even an integer. How should the chunk dimensions look like 
here? Ceil(122.667), perhaps? But then the sum of chunk sizes > data size, so 
there will be extra data in the file (which I presume HDF5 will take care of 
when accessing the file), is that right?

I'm just asking all these stupid questions since you asked for a patch. ;)

> > Is the granularity (number of ranks actually doing disc IO) settable on
> > HDF5 side or does that need to be set in MPI-IO?
> I'm not sure what you mean.  On a system like BG, the computed nodes are
> not connected to disks and instead have to send the data to IO nodes.
> The distribution of IO nodes is part of the machine design.  The ranks
> participating in IO are just rearranging data before sending it to the
> IO nodes.

Sorry, I forgot about BG and its kin. I meant the ranks participating in IO. 
Obviously the number of IO nodes is determined by the hardware. I just had a 
Cray XK30 in my mind, where those ranks participating in IO are the IO nodes, 
too, so I didn't think of making the distinction between "ranks doing MPI-IO" 
and "IO nodes", which I of course should have.

> Send a patch (or submit a pull request) against 'maint' and we'll
> consider it.  As long as the change doesn't break any existing uses, it
> could be merged to 'maint' (thus v3.4.k for k>=3) after testing.

I'll try to get something useful in. What's the timetable?

Cheers,
Juha

-- 
		 -----------------------------------------------
		| Juha Jäykkä, juhaj at iki.fi			|
		| http://koti.kapsi.fi/~juhaj/			|
		 -----------------------------------------------