[petsc-dev] Writing rich state

Dmitry Karpeev karpeev at mcs.anl.gov
Tue Feb 23 13:44:55 CST 2010


Yes, but what about using Spotlight programmatically (e.g., from
PETSc) to store rich state,
checkpointing, etc?
For example, I want to store a Vec.  How do I label it?  There maybe
various user contexts
that share it, so I'd like to label it with all of them.

In a way, I don't to have to look at my home directory (or any
directory) at all.
I just want to extract files based on a given (set of) label(s).

Dmitry.

On Tue, Feb 23, 2010 at 1:40 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>   With google (and Spotlight on the Mac) is there any need to organize
> anything anymore? Just burp down the data any way you please anywhere you
> want it and then have smart search tools find it for you and format it the
> way you need it at the time you need it? This does mean you need decent
> tools to parse random stuff for the search to understand it.
>
>   Ironically in the past few years with Spotlight on my Mac I actually do a
> better job of organizing my home directory structure then I ever have
> before.
>
>   Barry
>
> On Feb 23, 2010, at 1:31 PM, Dmitry Karpeev wrote:
>
>> This takes the discussion in a somewhat tangential direction, but consider
>> this:
>>
>> We use hierarchical file systems, which are also a pain.
>> Say, I'm working on project PETSc and I'm writing a DOE proposal for it.
>> Should I put it in ~/PETSc/Proposals/DOE/proposal or
>> ~/Proposals/DOE/PETSc/proposal or
>> ~/Proposals/PETSc/DOE?
>> Later (3 months from now) I might want to come back and retrieve a
>> file from that proposal tree.
>> Where do I look for it?
>> Maybe I should have all of these paths, all but one being soft links
>> to the master path?
>> I've tried that.  It's a pain.
>>
>> Basically, any hierarchical storage format, such as a file system,
>> will impose a tree structure on
>> what is fundamentally a (hyper)graph.
>> GMail solves a similar problem by allowing multiple labels on a piece of
>> email.
>> Then I can search on any or several of the labels: Proposals, DOE,
>> PETSc, irrespective of the order.
>> A file system imposes an artificial order.
>> You can think of labels as being the hyperedges in the hypergraph.
>>
>> It would be nice to have a file system that functioned a bit like
>> GMail, I think.
>> In fact, I've thought about writing a Python replacement for 'ls',
>> that would list files with a given label or labels.   I'm too lazy and
>> incompetent, however.
>> In the simplest case the metadata could go right into the filename,
>> but maybe that's not
>> a good thing to do in general.
>>
>>
>> Dmitry.
>>
>> On Tue, Feb 23, 2010 at 10:24 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>
>>>  I've thought about this be never done anything, I think it is worth
>>> investigating.
>>>
>>>  BTW: My long term goal is also that all PETSc source code lives in an
>>> appropriate database with appropriate relationships and meta-data stored
>>> there.
>>>
>>>  The fact that we (meaning HPC and OpenSource in general) use flat files
>>> so
>>> much shows a failure of something.
>>>
>>>  Barry
>>>
>>> On Feb 23, 2010, at 9:31 AM, Jed Brown wrote:
>>>
>>>> Matt and I talked about this a couple months ago, but I'd like to also
>>>> mention it here.  It seems to me that data formats like HDF5 are really
>>>> a pain to use for generic purposes, because you end up trying to map a
>>>> directed graph of object relations (composition) into a hierarchical
>>>> data format, and then implement relational queries on top of this
>>>> hierarchy.  (I've done this, to some extent, and I ended up writing
>>>> cumbersome code to walk this hierarchy to answer queries that would be
>>>> one-line SQL queries.)
>>>>
>>>> To elaborate slightly on the problem, the goal would be to write vectors
>>>> living on a DMComposite, with extra semantics like time step and units,
>>>> in a way that could be used for visualization as well as checkpoints for
>>>> forward and adjoint models.  PETSc's unadorned binary IO is fine if the
>>>> same code is going to read it back in, because everything will be wired
>>>> up correctly and we're just loading into a Vec (although it's already
>>>> somewhat tricky when the layout changes in the unstructured case).  But
>>>> there just isn't enough metadata to operate on in any sort of generic
>>>> way, and I hate writing custom code to describe meshes and relations
>>>> between them.
>>>>
>>>> Current scientific data formats (at least those I have seen) are a
>>>> hassle to use since they have poor support for expressing relations.
>>>> HDF5 has the equivalent of file-system symlinks, but after
>>>> normalization, all the relations end up being encoded as a bunch of
>>>> symlinks, which is a relatively low-level view and isn't a particularly
>>>> convenient thing to traverse when answering a query.
>>>>
>>>> So I'm curious if anyone has put such metadata into a relational
>>>> database instead of trying to contort it into one of these "scientific"
>>>> data formats.  My thought would be to drop only the metadata into
>>>> something like Sqlite, and write the arrays themselves using MPI-IO (or
>>>> HDF5/NetCDF/whatever, but these don't provide much when we aren't using
>>>> them for metadata).  This would allow efficient support of queries like
>>>> "all vector fields at step M" and "fields B and C from step M to N on
>>>> subdomains intersecting bounding box XYZ".  This isn't completely
>>>> different from what XDMF tries to do, but experimentation with that left
>>>> a sour taste.  Is SQL a stupid idea for this purpose and I'd be better
>>>> off writing code to support the queries I want on HDF5/XDMF/something
>>>> else?
>>>>
>>>> Jed
>>>
>>>
>
>



More information about the petsc-dev mailing list