[petsc-dev] Writing rich state
Barry Smith
bsmith at mcs.anl.gov
Tue Feb 23 10:24:24 CST 2010
I've thought about this be never done anything, I think it is worth
investigating.
BTW: My long term goal is also that all PETSc source code lives in
an appropriate database with appropriate relationships and meta-data
stored there.
The fact that we (meaning HPC and OpenSource in general) use flat
files so much shows a failure of something.
Barry
On Feb 23, 2010, at 9:31 AM, Jed Brown wrote:
> Matt and I talked about this a couple months ago, but I'd like to also
> mention it here. It seems to me that data formats like HDF5 are
> really
> a pain to use for generic purposes, because you end up trying to map a
> directed graph of object relations (composition) into a hierarchical
> data format, and then implement relational queries on top of this
> hierarchy. (I've done this, to some extent, and I ended up writing
> cumbersome code to walk this hierarchy to answer queries that would be
> one-line SQL queries.)
>
> To elaborate slightly on the problem, the goal would be to write
> vectors
> living on a DMComposite, with extra semantics like time step and
> units,
> in a way that could be used for visualization as well as checkpoints
> for
> forward and adjoint models. PETSc's unadorned binary IO is fine if
> the
> same code is going to read it back in, because everything will be
> wired
> up correctly and we're just loading into a Vec (although it's already
> somewhat tricky when the layout changes in the unstructured case).
> But
> there just isn't enough metadata to operate on in any sort of generic
> way, and I hate writing custom code to describe meshes and relations
> between them.
>
> Current scientific data formats (at least those I have seen) are a
> hassle to use since they have poor support for expressing relations.
> HDF5 has the equivalent of file-system symlinks, but after
> normalization, all the relations end up being encoded as a bunch of
> symlinks, which is a relatively low-level view and isn't a
> particularly
> convenient thing to traverse when answering a query.
>
> So I'm curious if anyone has put such metadata into a relational
> database instead of trying to contort it into one of these
> "scientific"
> data formats. My thought would be to drop only the metadata into
> something like Sqlite, and write the arrays themselves using MPI-IO
> (or
> HDF5/NetCDF/whatever, but these don't provide much when we aren't
> using
> them for metadata). This would allow efficient support of queries
> like
> "all vector fields at step M" and "fields B and C from step M to N on
> subdomains intersecting bounding box XYZ". This isn't completely
> different from what XDMF tries to do, but experimentation with that
> left
> a sour taste. Is SQL a stupid idea for this purpose and I'd be better
> off writing code to support the queries I want on HDF5/XDMF/something
> else?
>
> Jed
More information about the petsc-dev
mailing list