[petsc-dev] Writing rich state

Barry Smith bsmith at mcs.anl.gov
Tue Feb 23 13:50:02 CST 2010

On Feb 23, 2010, at 1:44 PM, Dmitry Karpeev wrote:

> Yes, but what about using Spotlight programmatically (e.g., from
> PETSc) to store rich state,
> checkpointing, etc?
> For example, I want to store a Vec.  How do I label it?  There maybe
> various user contexts
> that share it, so I'd like to label it with all of them.
> In a way, I don't to have to look at my home directory (or any
> directory) at all.
> I just want to extract files based on a given (set of) label(s).

> Dmitry.
> On Tue, Feb 23, 2010 at 1:40 PM, Barry Smith <bsmith at mcs.anl.gov>  
> wrote:
>>   With google (and Spotlight on the Mac) is there any need to  
>> organize
>> anything anymore? Just burp down the data any way you please  
>> anywhere you
>> want it and then have smart search tools find it for you and format  
>> it the
>> way you need it at the time you need it? This does mean you need  
>> decent
>> tools to parse random stuff for the search to understand it.
>>   Ironically in the past few years with Spotlight on my Mac I  
>> actually do a
>> better job of organizing my home directory structure then I ever have
>> before.
>>   Barry
>> On Feb 23, 2010, at 1:31 PM, Dmitry Karpeev wrote:
>>> This takes the discussion in a somewhat tangential direction, but  
>>> consider
>>> this:
>>> We use hierarchical file systems, which are also a pain.
>>> Say, I'm working on project PETSc and I'm writing a DOE proposal  
>>> for it.
>>> Should I put it in ~/PETSc/Proposals/DOE/proposal or
>>> ~/Proposals/DOE/PETSc/proposal or
>>> ~/Proposals/PETSc/DOE?
>>> Later (3 months from now) I might want to come back and retrieve a
>>> file from that proposal tree.
>>> Where do I look for it?
>>> Maybe I should have all of these paths, all but one being soft links
>>> to the master path?
>>> I've tried that.  It's a pain.
>>> Basically, any hierarchical storage format, such as a file system,
>>> will impose a tree structure on
>>> what is fundamentally a (hyper)graph.
>>> GMail solves a similar problem by allowing multiple labels on a  
>>> piece of
>>> email.
>>> Then I can search on any or several of the labels: Proposals, DOE,
>>> PETSc, irrespective of the order.
>>> A file system imposes an artificial order.
>>> You can think of labels as being the hyperedges in the hypergraph.
>>> It would be nice to have a file system that functioned a bit like
>>> GMail, I think.
>>> In fact, I've thought about writing a Python replacement for 'ls',
>>> that would list files with a given label or labels.   I'm too lazy  
>>> and
>>> incompetent, however.
>>> In the simplest case the metadata could go right into the filename,
>>> but maybe that's not
>>> a good thing to do in general.
>>> Dmitry.
>>> On Tue, Feb 23, 2010 at 10:24 AM, Barry Smith <bsmith at mcs.anl.gov>  
>>> wrote:
>>>>  I've thought about this be never done anything, I think it is  
>>>> worth
>>>> investigating.
>>>>  BTW: My long term goal is also that all PETSc source code lives  
>>>> in an
>>>> appropriate database with appropriate relationships and meta-data  
>>>> stored
>>>> there.
>>>>  The fact that we (meaning HPC and OpenSource in general) use  
>>>> flat files
>>>> so
>>>> much shows a failure of something.
>>>>  Barry
>>>> On Feb 23, 2010, at 9:31 AM, Jed Brown wrote:
>>>>> Matt and I talked about this a couple months ago, but I'd like  
>>>>> to also
>>>>> mention it here.  It seems to me that data formats like HDF5 are  
>>>>> really
>>>>> a pain to use for generic purposes, because you end up trying to  
>>>>> map a
>>>>> directed graph of object relations (composition) into a  
>>>>> hierarchical
>>>>> data format, and then implement relational queries on top of this
>>>>> hierarchy.  (I've done this, to some extent, and I ended up  
>>>>> writing
>>>>> cumbersome code to walk this hierarchy to answer queries that  
>>>>> would be
>>>>> one-line SQL queries.)
>>>>> To elaborate slightly on the problem, the goal would be to write  
>>>>> vectors
>>>>> living on a DMComposite, with extra semantics like time step and  
>>>>> units,
>>>>> in a way that could be used for visualization as well as  
>>>>> checkpoints for
>>>>> forward and adjoint models.  PETSc's unadorned binary IO is fine  
>>>>> if the
>>>>> same code is going to read it back in, because everything will  
>>>>> be wired
>>>>> up correctly and we're just loading into a Vec (although it's  
>>>>> already
>>>>> somewhat tricky when the layout changes in the unstructured  
>>>>> case).  But
>>>>> there just isn't enough metadata to operate on in any sort of  
>>>>> generic
>>>>> way, and I hate writing custom code to describe meshes and  
>>>>> relations
>>>>> between them.
>>>>> Current scientific data formats (at least those I have seen) are a
>>>>> hassle to use since they have poor support for expressing  
>>>>> relations.
>>>>> HDF5 has the equivalent of file-system symlinks, but after
>>>>> normalization, all the relations end up being encoded as a bunch  
>>>>> of
>>>>> symlinks, which is a relatively low-level view and isn't a  
>>>>> particularly
>>>>> convenient thing to traverse when answering a query.
>>>>> So I'm curious if anyone has put such metadata into a relational
>>>>> database instead of trying to contort it into one of these  
>>>>> "scientific"
>>>>> data formats.  My thought would be to drop only the metadata into
>>>>> something like Sqlite, and write the arrays themselves using MPI- 
>>>>> IO (or
>>>>> HDF5/NetCDF/whatever, but these don't provide much when we  
>>>>> aren't using
>>>>> them for metadata).  This would allow efficient support of  
>>>>> queries like
>>>>> "all vector fields at step M" and "fields B and C from step M to  
>>>>> N on
>>>>> subdomains intersecting bounding box XYZ".  This isn't completely
>>>>> different from what XDMF tries to do, but experimentation with  
>>>>> that left
>>>>> a sour taste.  Is SQL a stupid idea for this purpose and I'd be  
>>>>> better
>>>>> off writing code to support the queries I want on HDF5/XDMF/ 
>>>>> something
>>>>> else?
>>>>> Jed

More information about the petsc-dev mailing list