[petsc-users] Load Vec from 1D HDF5 dataset MATT READ THIS EMAIL!

Sun Mar 29 14:41:53 CDT 2015

> On Mar 29, 2015, at 6:40 AM, Matthew Knepley <knepley at gmail.com> wrote:
> 
> On Sat, Mar 28, 2015 at 10:36 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
> > On Mar 28, 2015, at 10:04 PM, Matthew Knepley <knepley at gmail.com> wrote:
> >
> > On Sat, Mar 28, 2015 at 9:59 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> >   Hakon,
> >
> >     I have pushed to the branch barry/feature-hdf5-flexible-initial-dimension and next the change so that Vecs and Vecs obtained from DMCreateGlobalVector() with a DMDA will NOT have the extra dimension if BS == 1. To add that extra dimension even when bs == 1 with VecView() or to handle a file with that extra dimension with VecLoad() one must call PetscViewerHDF5SetBaseDimension2() or -viewer_hdf5_base_dimension2 true
> >
> >    Please try it out and let me know if you have any trouble
> >
> >    Thanks
> >
> >   Barry
> >
> >   Matt,
> >
> >    Scream as much as you want but adding that extra dimension automatically is not intuitive so it cannot be the default.
> >
> > Interfaces should be intuitive,  file formats should be consistent.
> 
>   Hakon sent specific examples where, because of the file format, the user interface is extremely non-intuitive (below). A user loading up a "plain old vector" from a file format expects 1 a one dimension beast and that is what we should deliver to them. Now you could argue that the HDF5 interface sucks, because it bleeds the file format through the interface, that could be true, I don't care, that is what we are stuck with.
> 
> But you changed the PETSc output format in response, which is not necessary.
> 
> We could put the special case in the reader,

   Matt, 

     The problem is that we are not providing "the reader", nor could we!  "The reader" is whatever of many many tools that the user is using to read from the HDF5 file. It might be Matlab, python, a Fortran program, a Ruby program, whatever. And the "reader code" that user has to write is directly dependent on the format used in the file. In his example

> g = h5py.File('grid.h5', 'r')
> > >>   x = g['/MESH/nodes/x'][:,0]

  How do we provide this code to the user? Or are you saying we have to provide PETSc specific HDF5 readers for all packages and configurations? That is totally unrealistic and even if we did provide them no one would use them. The entire point of HDF5 is you can write your own readers, you are not stuck with using only readers provided by someone who provided the data set. There is not, and should not be a PETSc specific HDF5 format with its own readers and writers.

  Barry

> so that he could load the 1D vector he expects, or we could load the 2D vector from the PETSc format.
> He gets the intuitive load he wants, but we get a consistent format.
> 
> Now he might get what he expects from View(), which I am not sure he was asking for, but we have to special case all our tools which manipulate
> the PETSc format.
> 
>    Matt
>  
> 
>   Barry
> 
> 
> 
> > This gets that entirely wrong.
> >
> >    Matt
> >
> > > On Mar 25, 2015, at 10:36 AM, Håkon Strandenes <haakon at hakostra.net> wrote:
> > >
> > > Did you come to any conclusion on this issue?
> > >
> > > Regards,
> > > Håkon
> > >
> > > On 20. mars 2015 22:02, Håkon Strandenes wrote:
> > >> On 20. mars 2015 20:48, Barry Smith wrote:
> > >>> Why is 1 dimension a special case that is not worthy of its own
> > >>> format? The same thing would hold for 2d and 3d. One could then argue
> > >>> that we should have a single six dimensional format for the files for
> > >>> all vectors that PETSc produces. Then a 1d problem has five of the
> > >>> dimensions being 1.
> > >>
> > >> This is a very good point, and support my view.
> > >>
> > >> Let me come with two very simple example cases:
> > >>
> > >>
> > >> Case 1:
> > >> Create a list of grid points in an external preprocessor for the purpose
> > >> of loading this into a Vec later:
> > >>
> > >>   x = np.linspace(0.0, 1.0, num=21)
> > >>   f.create_dataset('/MESH/nodes/x', data=x)
> > >>
> > >> vs.
> > >>
> > >>   x = np.linspace(0.0, 1.0, num=21)
> > >>   x = x.reshape((21,1))
> > >>   f.create_dataset('/MESH/nodes/x', data=x)
> > >>
> > >>
> > >> Case 2:
> > >> Read three Vecs written to disk by PETSc, and calculate total "bounding
> > >> box volume" of the grid:
> > >>
> > >>   g = h5py.File('grid.h5', 'r')
> > >>   x = g['/MESH/nodes/x']
> > >>   y = g['/MESH/nodes/y']
> > >>   z = g['/MESH/nodes/z']
> > >>   Vol = (xp[-1] - xp[0])*(yp[-1] - yp[0])*(zp[-1] - zp[0])
> > >>
> > >> vs.
> > >>
> > >>   g = h5py.File('grid.h5', 'r')
> > >>   x = g['/MESH/nodes/x'][:,0]
> > >>   y = g['/MESH/nodes/y'][:,0]
> > >>   z = g['/MESH/nodes/z'][:,0]
> > >>   Vol = (x[-1] - x[0])*(y[-1] - y[0])*(z[-1] - z[0])
> > >>
> > >>
> > >> In both cases I think handling this extra, unnecessary dimension makes
> > >> the code less attractive. It's not that either way is difficult,
> > >> problematic or impossible, but it's just that 1D Vecs should intuitively
> > >> be 1D datasets, and not 2D, 3D or 6D. This seriously confused me for
> > >> quite a while until I figured this out, even after having written an
> > >> entire Navier-Stokes DNS solver using the PETSc library for everything
> > >> except time integration and filling these simple 1D coordinate arrays!
> > >>
> > >> Regards,
> > >> Håkon
> >
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> > -- Norbert Wiener
> 
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener