[petsc-users] Unable to create >4GB sized HDF5 files on Cray XC30

Fri Oct 4 16:23:00 CDT 2013

Juha Jäykkä <juhaj at iki.fi> writes:

> On Sunday 18 August 2013 08:10:19 Jed Brown wrote:
>> Output uses a collective write, so the granularity of the IO node is
>> probably more relevant for writing (e.g., BG/Q would have one IO node
>> per 128 compute nodes), but almost any chunk size should perform
>> similarly.  It would make a lot more difference for something like
>
> I ran into this on a Cray XK30 and it's certainly not the case there that any 
> chunk size performs even close to similarly: I can get IO throughput from 
> roughly 50 MB/s to 16 GB/s depending on the chunk sizes and number of ranks 
> participating in the MPI IO operation (underneath H5Dwrite()). 

What range of chunk sizes are you using?  For each fixed number of
ranks, how does the performance vary when varying chunk size from, say,
5MB to 500MB?

> Yes, this certainly needs to be considered, too. I guess huge chunks are bad 
> here?

Likely, but depends what you are looking at.

>> Chunk size needs to be collective.  We could compute an average size
>> From each subdomain, but can't just use the subdomain size.
>
> Why not use the size of the local part of the DA/Vec? That would guarantee

That's fine, but the chunk size needs to be *collective* so we need to
do a reduction or otherwise compute the "average size".

>> I think the chunk size (or maximum chunk size) should be settable by the
>> user.
>
> I agree, that would be the best solution.
>
> Is the granularity (number of ranks actually doing disc IO) settable on HDF5 
> side or does that need to be set in MPI-IO?

I'm not sure what you mean.  On a system like BG, the computed nodes are
not connected to disks and instead have to send the data to IO nodes.
The distribution of IO nodes is part of the machine design.  The ranks
participating in IO are just rearranging data before sending it to the
IO nodes.

> Any idea which version of PETSc this fix might get into? I currently keep my 
> own patched version of gr2.c around, which uses local-Vec-size chunks and it 
> works ok, but I'd like to be able to use vanilla PETSc again.

Send a patch (or submit a pull request) against 'maint' and we'll
consider it.  As long as the change doesn't break any existing uses, it
could be merged to 'maint' (thus v3.4.k for k>=3) after testing.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20131004/1368a4bb/attachment-0001.pgp>