[petsc-dev] Writing rich state

Tue Feb 23 10:24:24 CST 2010

   I've thought about this be never done anything, I think it is worth  
investigating.

   BTW: My long term goal is also that all PETSc source code lives in  
an appropriate database with appropriate relationships and meta-data  
stored there.

   The fact that we (meaning HPC and OpenSource in general) use flat  
files so much shows a failure of something.

    Barry

On Feb 23, 2010, at 9:31 AM, Jed Brown wrote:

> Matt and I talked about this a couple months ago, but I'd like to also
> mention it here.  It seems to me that data formats like HDF5 are  
> really
> a pain to use for generic purposes, because you end up trying to map a
> directed graph of object relations (composition) into a hierarchical
> data format, and then implement relational queries on top of this
> hierarchy.  (I've done this, to some extent, and I ended up writing
> cumbersome code to walk this hierarchy to answer queries that would be
> one-line SQL queries.)
>
> To elaborate slightly on the problem, the goal would be to write  
> vectors
> living on a DMComposite, with extra semantics like time step and  
> units,
> in a way that could be used for visualization as well as checkpoints  
> for
> forward and adjoint models.  PETSc's unadorned binary IO is fine if  
> the
> same code is going to read it back in, because everything will be  
> wired
> up correctly and we're just loading into a Vec (although it's already
> somewhat tricky when the layout changes in the unstructured case).   
> But
> there just isn't enough metadata to operate on in any sort of generic
> way, and I hate writing custom code to describe meshes and relations
> between them.
>
> Current scientific data formats (at least those I have seen) are a
> hassle to use since they have poor support for expressing relations.
> HDF5 has the equivalent of file-system symlinks, but after
> normalization, all the relations end up being encoded as a bunch of
> symlinks, which is a relatively low-level view and isn't a  
> particularly
> convenient thing to traverse when answering a query.
>
> So I'm curious if anyone has put such metadata into a relational
> database instead of trying to contort it into one of these  
> "scientific"
> data formats.  My thought would be to drop only the metadata into
> something like Sqlite, and write the arrays themselves using MPI-IO  
> (or
> HDF5/NetCDF/whatever, but these don't provide much when we aren't  
> using
> them for metadata).  This would allow efficient support of queries  
> like
> "all vector fields at step M" and "fields B and C from step M to N on
> subdomains intersecting bounding box XYZ".  This isn't completely
> different from what XDMF tries to do, but experimentation with that  
> left
> a sour taste.  Is SQL a stupid idea for this purpose and I'd be better
> off writing code to support the queries I want on HDF5/XDMF/something
> else?
>
> Jed