[Swift-user] Virtual data schema / catalog?

Allen, M. David dmallen at mitre.org
Fri Sep 14 08:39:17 CDT 2007


Thanks for the information.  I think someplace like Xindice might be a
decent starting point.  The XML rather than the KML is more interesting
to me.

The paper already has a decent starting point for a relational schema
to store this kind of information.  Figuring out the Xpath to massage
the information from a hierarchical form into that schema would be a
pain, but it's doable.  This may be something that I'll explore.  If
I'm able to pull something together, I'll post the results.

Being able to look through a provenance chain would be good because it
could allow selective regeneration of data sets.  I.e. in some cases I
don't want to run the entire workflow, I just want software that will
figure out which datasets in a workflow are missing, and then recompute
only those pieces (and any others that depend on them).

Thanks again

-- David 

-----Original Message-----
From: Ben Clifford [mailto:benc at hawaga.org.uk] 
Sent: Friday, September 14, 2007 9:21 AM
To: Allen, M. David
Cc: swift-user at ci.uchicago.edu
Subject: Re: [Swift-user] Virtual data schema / catalog?


There's a long term plan to have something like a VDC but no 
implementation.

Tibi has worked on an experiment management database for one specific 
project - maybe he will comment on how that overlaps here.

The origins of the XML and KML files are roughly analogous to VDS1's
XML 
form of VDL and to Condor DAGman graphs (respectively).

The XML files are more or less the same description as the SwiftScript 
.swift files, but in an XML form. The KML files are lower-level, 
describing various things that need to happen inside the execution 
environemnt itself.

The intention (or at least my intention) was/is that the XML would be 
stuff that would go into a VDC, with the KML files not being 
standardised/shareable.



In addition, there's a utility called kickstart which will generate 
invocation information for runs on the actual sites that the jobs run
on. 
Swift can be configured to always run that and bring the files back to
the submitting system.

kickstart was also present in VDS1 - its the same executable for both.

The invocation records from this are also in XML form.


So the short answer is: its easy to get a bunch of XML descriptions of 
both the high level workflow and actual invocations dumped into files
in a 
directory. We haven't got anything in Swift that will do anything with 
those files.

If I was hacking round with this, then given the XML nature of the 
invocation records and the XML intermediate for of SwiftScript, I'd be 
inclined to make something that would import them all into an XML
database 
like Xindice and then play around making XPath queries against that.

-- 



More information about the Swift-user mailing list