[petsc-users] Load Vec from 1D HDF5 dataset MATT READ THIS EMAIL!

Fri Mar 20 02:27:21 CDT 2015

In my opinion, one should handle a 1D Vec as a 1D dataset and vice 
versa. Why? Because that is the most logical behaviour for the users.

In the code I am working on, I use Python, with h5py, to generate grids, 
which is used to create a DMDA, and the initial condition on this DMDA 
is again loaded from the same HDF5 file created by the Python script. 
This works GREAT, expect reading these 1D datasets (grid point locations).

When dealing with DMDA's this is simple. If I have a 3D DMDA with dof=1, 
then I create a numpy array with three dimensions, fill it with data and 
write it to a HDF5 file, like this:

   p = np.zeros([nz, ny, nx]) # Note only 3 dims!
   # Fill p with data here
   f = h5py.File('grid.h5', 'w')
   f.create_dataset('/FIELDS/p', data=p)

If my DMDA has dof=3, I do the following:

   U = np.zeros([nz, ny, nx, 3]) # Note 4 dims!
   # Fill U with data here
   f = h5py.File('grid.h5', 'w')
   f.create_dataset('/FIELDS/U', data=U)

So there is obviously a case of special-treatment in these cases when 
dof=1, since that leads to one lower dimension.

I think this "special treatment" of DMDA Vecs with dof=1 is a consistent 
behaviour.

I think that the cases of plain, non-DMDA Vecs as well should be handled 
the same way, i.e. a Vec with bs=1, no timestepping and no complex 
numbers should be 1D. This would be the same "logic" as used in DMDA 
Vecs, where dof=1 means one lower dimension.

I also think that one should accept reading of a 2D dataset with size 
Nx1 into a Vec, in order to give maximum flexibility for the user. The 
possibility of manipulating "heavy data" in HDF5 files in simple Python 
scripts (for pre- and postprocessing) is simply a too sweet fruit to 
leave behind, and then you need as much flexibility as possible for the 
users.

Regards,
Håkon

On 20. mars 2015 03:22, Barry Smith wrote:
>
>> On Mar 19, 2015, at 9:05 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>
>> On Thu, Mar 19, 2015 at 8:00 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>
>>    MATT READ THIS EMAIL!
>>
>>    Ok here is the scoop.  In  VecView_MPI_HDF5() is the code (use meta x vc-annotate in emacs)
>>
>> df09a399 src/vec/vec/impls/mpi/pdvec.c (Matthew Knepley      2011-01-20 16:27:23 -0600  682)   if (bs >= 1) {
>> 272345b0 src/vec/vec/impls/mpi/pdvec.c (Karl Rupp            2013-01-26 20:27:50 -0600  683)     dims[dim]      = bs;
>> 272345b0 src/vec/vec/impls/mpi/pdvec.c (Karl Rupp            2013-01-26 20:27:50 -0600  684)     maxDims[dim]   = dims[dim];
>> 61ea93a1 src/vec/vec/impls/mpi/pdvec.c (Brad Aagaard         2010-11-06 15:31:18 -0700  685)     chunkDims[dim] = dims[dim];
>> 43c8710a src/vec/vec/impls/mpi/pdvec.c (Matthew Knepley      2010-11-05 19:21:58 -0500  686)     ++dim;
>> 5464dd29 src/vec/vec/impls/mpi/pdvec.c (Matthew Knepley      2010-10-31 19:04:52 -0500  687)   }  if (bs >= 1) {
>>
>>    then in PetscErrorCode VecLoad_HDF5(Vec xin, PetscViewer viewer)
>>
>>    f6e5521d src/vec/vec/utils/vecio.c     (Karl Rupp            2013-01-28 13:32:07 -0600 277) if (bs >= 1) ++dim;
>>
>> I check the commit message for df09a399 at bitbucket.com/petsc/petsc/ and it is
>>
>> Small change to HDF5 output (now Vec always has blocksize dim)
>>
>> So what is happening is even the simplest vector (with a block size of 1) is treated as a 2d HDF5 object
>>
>> Matt, what was the rational for making vectors always have a dimension of block size even when it is one? Seems bad to me. Maybe makes HDF5 readers simpler since they don't need to handle bs 1 differently?
>>
>> My rationale was to simplify our code path, and have one way to handle fields with components. I did not
>> want to special case a field with one component.
>
>    Why have the meaningless glop of   if (bs >= 1) { in the code? If you are always adding one to the dimension then always add it. Having this if test in the source is strange and confusing. It should be removed in several places.
>
>    This is not documented anywhere. How is anyone to know that plain old PETSc vectors always have at least two dimensions in HDF5? That is certainly not intuitive. At a minimum the error message which is currently
>
>> [0]PETSC ERROR: Unexpected data in file
>>>> [0]PETSC ERROR: Dimension of array in file 1 not 2 as expected
>
>   could point (when the HDF5 file has a dimension of 1)  to a PETSc FAQ or manual page that explains that PETSc vectors always have a dimension of at least 2 and at that complex PETSC_SCALAR makes it a dimension of at least 3 currently users have to search rather obscure code to figure this out.
>
> Barry
>
>>
>>    Thanks,
>>
>>       Matt
>>
>>   Barry
>>
>>
>>
>>
>>> On Mar 19, 2015, at 12:31 PM, Håkon Strandenes <haakon at hakostra.net> wrote:
>>>
>>> I'm sorry I didn't include this in the first email. Here is an example HDF5-file, together with a sample program that demonstrate the problem.
>>>
>>> I have been trying to dig into VecLoad_HDF5() in vecio.c in order to suggest a solution myself, however, I do not think I fully understand the concept of the Vec block size and how that affects how the Vec is stored in memory and on disk (HDF5 file).
>>>
>>> Regards,
>>> Håkon Strandenes
>>>
>>>
>>> On 19. mars 2015 18:17, Barry Smith wrote:
>>>>
>>>>    Please email the file in.h5 so we can try to reproduce the problem and debug it. Are you using petsc version 3.5.?
>>>>
>>>>    Barry
>>>>
>>>>> On Mar 19, 2015, at 4:19 AM, Håkon Strandenes <haakon at hakostra.net> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am again in trouble with PETSc and its HDF5-related functions.
>>>>>
>>>>> I have a code in which I want to read a 1D dataset from a HDF5 file into a PETSc Vec. The array are "small", and each MPI process should read the entire dataset and store it in its own sequential Vec.
>>>>>
>>>>> Currently my code is as follows (somewhat simplified):
>>>>>
>>>>> VecCreate(PETSC_COMM_SELF, &v);
>>>>> VecSetType(v, VECSEQ);
>>>>> PetscObjectSetName((PetscObject)v, "x");
>>>>> PetscViewerHDF5Open(PETSC_COMM_WORLD, "in.h5", FILE_MODE_READ, &viewer);
>>>>> VecLoad(v, viewer);
>>>>>
>>>>> This works, but only if the dataset in the HDF5 file is a 2D dataset, of size Nx1, where N is the number of elements I read. If the dataset in the HDF5 file is of size N (i.e. a 1D dataset) this does not work.
>>>>>
>>>>> The error message I get is the following:
>>>>> [0]PETSC ERROR: Unexpected data in file
>>>>> [0]PETSC ERROR: Dimension of array in file 1 not 2 as expected
>>>>> etc. and it points back to the line 296 in file vecio.c, function VecLoad_HDF5().
>>>>>
>>>>> Technically this is not a problem. I could of course just go ahead and "reshape" all my datasets into 2D datasets. I do however find this behaviour very un-logical. Why can't I read a 1D dataset into a Vec?
>>>>>
>>>>> So my question are:
>>>>> 1) Do I do anything wrong when creating my Vec and reading my datasets?
>>>>> 2) Is there anything I can do to read my 1D datasets into Vecs?
>>>>> 3) Could PETSc perhaps be a little less restrictive in these cases?
>>>>>
>>>>> Thanks in advance.
>>>>>
>>>>> Regards,
>>>>> Håkon Strandenes
>>>>
>>>>
>>> <VecLoadTest.tar.gz>
>>
>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> -- Norbert Wiener
>
>