[petsc-dev] Writing rich state
Barry Smith
bsmith at mcs.anl.gov
Tue Feb 23 13:50:02 CST 2010
On Feb 23, 2010, at 1:44 PM, Dmitry Karpeev wrote:
> Yes, but what about using Spotlight programmatically (e.g., from
> PETSc) to store rich state,
> checkpointing, etc?
> For example, I want to store a Vec. How do I label it? There maybe
> various user contexts
> that share it, so I'd like to label it with all of them.
>
> In a way, I don't to have to look at my home directory (or any
> directory) at all.
> I just want to extract files based on a given (set of) label(s).
>
Yes
> Dmitry.
>
> On Tue, Feb 23, 2010 at 1:40 PM, Barry Smith <bsmith at mcs.anl.gov>
> wrote:
>>
>> With google (and Spotlight on the Mac) is there any need to
>> organize
>> anything anymore? Just burp down the data any way you please
>> anywhere you
>> want it and then have smart search tools find it for you and format
>> it the
>> way you need it at the time you need it? This does mean you need
>> decent
>> tools to parse random stuff for the search to understand it.
>>
>> Ironically in the past few years with Spotlight on my Mac I
>> actually do a
>> better job of organizing my home directory structure then I ever have
>> before.
>>
>> Barry
>>
>> On Feb 23, 2010, at 1:31 PM, Dmitry Karpeev wrote:
>>
>>> This takes the discussion in a somewhat tangential direction, but
>>> consider
>>> this:
>>>
>>> We use hierarchical file systems, which are also a pain.
>>> Say, I'm working on project PETSc and I'm writing a DOE proposal
>>> for it.
>>> Should I put it in ~/PETSc/Proposals/DOE/proposal or
>>> ~/Proposals/DOE/PETSc/proposal or
>>> ~/Proposals/PETSc/DOE?
>>> Later (3 months from now) I might want to come back and retrieve a
>>> file from that proposal tree.
>>> Where do I look for it?
>>> Maybe I should have all of these paths, all but one being soft links
>>> to the master path?
>>> I've tried that. It's a pain.
>>>
>>> Basically, any hierarchical storage format, such as a file system,
>>> will impose a tree structure on
>>> what is fundamentally a (hyper)graph.
>>> GMail solves a similar problem by allowing multiple labels on a
>>> piece of
>>> email.
>>> Then I can search on any or several of the labels: Proposals, DOE,
>>> PETSc, irrespective of the order.
>>> A file system imposes an artificial order.
>>> You can think of labels as being the hyperedges in the hypergraph.
>>>
>>> It would be nice to have a file system that functioned a bit like
>>> GMail, I think.
>>> In fact, I've thought about writing a Python replacement for 'ls',
>>> that would list files with a given label or labels. I'm too lazy
>>> and
>>> incompetent, however.
>>> In the simplest case the metadata could go right into the filename,
>>> but maybe that's not
>>> a good thing to do in general.
>>>
>>>
>>> Dmitry.
>>>
>>> On Tue, Feb 23, 2010 at 10:24 AM, Barry Smith <bsmith at mcs.anl.gov>
>>> wrote:
>>>>
>>>> I've thought about this be never done anything, I think it is
>>>> worth
>>>> investigating.
>>>>
>>>> BTW: My long term goal is also that all PETSc source code lives
>>>> in an
>>>> appropriate database with appropriate relationships and meta-data
>>>> stored
>>>> there.
>>>>
>>>> The fact that we (meaning HPC and OpenSource in general) use
>>>> flat files
>>>> so
>>>> much shows a failure of something.
>>>>
>>>> Barry
>>>>
>>>> On Feb 23, 2010, at 9:31 AM, Jed Brown wrote:
>>>>
>>>>> Matt and I talked about this a couple months ago, but I'd like
>>>>> to also
>>>>> mention it here. It seems to me that data formats like HDF5 are
>>>>> really
>>>>> a pain to use for generic purposes, because you end up trying to
>>>>> map a
>>>>> directed graph of object relations (composition) into a
>>>>> hierarchical
>>>>> data format, and then implement relational queries on top of this
>>>>> hierarchy. (I've done this, to some extent, and I ended up
>>>>> writing
>>>>> cumbersome code to walk this hierarchy to answer queries that
>>>>> would be
>>>>> one-line SQL queries.)
>>>>>
>>>>> To elaborate slightly on the problem, the goal would be to write
>>>>> vectors
>>>>> living on a DMComposite, with extra semantics like time step and
>>>>> units,
>>>>> in a way that could be used for visualization as well as
>>>>> checkpoints for
>>>>> forward and adjoint models. PETSc's unadorned binary IO is fine
>>>>> if the
>>>>> same code is going to read it back in, because everything will
>>>>> be wired
>>>>> up correctly and we're just loading into a Vec (although it's
>>>>> already
>>>>> somewhat tricky when the layout changes in the unstructured
>>>>> case). But
>>>>> there just isn't enough metadata to operate on in any sort of
>>>>> generic
>>>>> way, and I hate writing custom code to describe meshes and
>>>>> relations
>>>>> between them.
>>>>>
>>>>> Current scientific data formats (at least those I have seen) are a
>>>>> hassle to use since they have poor support for expressing
>>>>> relations.
>>>>> HDF5 has the equivalent of file-system symlinks, but after
>>>>> normalization, all the relations end up being encoded as a bunch
>>>>> of
>>>>> symlinks, which is a relatively low-level view and isn't a
>>>>> particularly
>>>>> convenient thing to traverse when answering a query.
>>>>>
>>>>> So I'm curious if anyone has put such metadata into a relational
>>>>> database instead of trying to contort it into one of these
>>>>> "scientific"
>>>>> data formats. My thought would be to drop only the metadata into
>>>>> something like Sqlite, and write the arrays themselves using MPI-
>>>>> IO (or
>>>>> HDF5/NetCDF/whatever, but these don't provide much when we
>>>>> aren't using
>>>>> them for metadata). This would allow efficient support of
>>>>> queries like
>>>>> "all vector fields at step M" and "fields B and C from step M to
>>>>> N on
>>>>> subdomains intersecting bounding box XYZ". This isn't completely
>>>>> different from what XDMF tries to do, but experimentation with
>>>>> that left
>>>>> a sour taste. Is SQL a stupid idea for this purpose and I'd be
>>>>> better
>>>>> off writing code to support the queries I want on HDF5/XDMF/
>>>>> something
>>>>> else?
>>>>>
>>>>> Jed
>>>>
>>>>
>>
>>
More information about the petsc-dev
mailing list