[Swift-user] Virtual data schema / catalog?

Fri Sep 14 09:52:59 CDT 2007

One of the swift-related tools that I was working on is a "experiment
tracking" utility.
Essentially it's a python script that is executed by the user after
the workflow has finished, and which parses the execution log file for
useful information, such as parameters that have been passed to the
atomic procedures, and the names of the files that were involved in
the current execution. Since the swift script from which the workflow
engine produces the log is up to the user, the log parser has been
built with a lot of assumptions in mind, and still needs to be refined
further.

When all the information is gathered, there is a storage step, where
the metadata-looking data is stored in a relational database, and the
files are copied to a local cache in the user's home directory, set
aside for this purpose

The current code for this tool is here:
http://www.ci.uchicago.edu/trac/swift/browser/SwiftApps/Aphasia/saveExperiment.py
... and an example of what ends up in the database is here:
http://tp-neurodb.ci.uchicago.edu:8080/ExperimentManagement

This tool was built so address urgent needs of some of the researchers
using swift, and it will be superseded by the redesigned and
reimplemented VDC (when it's done).

Tibi

On 9/14/07, Ben Clifford <benc at hawaga.org.uk> wrote:
>
> There's a long term plan to have something like a VDC but no
> implementation.
>
> Tibi has worked on an experiment management database for one specific
> project - maybe he will comment on how that overlaps here.
>
> The origins of the XML and KML files are roughly analogous to VDS1's XML
> form of VDL and to Condor DAGman graphs (respectively).
>
> The XML files are more or less the same description as the SwiftScript
> .swift files, but in an XML form. The KML files are lower-level,
> describing various things that need to happen inside the execution
> environemnt itself.
>
> The intention (or at least my intention) was/is that the XML would be
> stuff that would go into a VDC, with the KML files not being
> standardised/shareable.
>
>
>
> In addition, there's a utility called kickstart which will generate
> invocation information for runs on the actual sites that the jobs run on.
> Swift can be configured to always run that and bring the files back to
> the submitting system.
>
> kickstart was also present in VDS1 - its the same executable for both.
>
> The invocation records from this are also in XML form.
>
>
> So the short answer is: its easy to get a bunch of XML descriptions of
> both the high level workflow and actual invocations dumped into files in a
> directory. We haven't got anything in Swift that will do anything with
> those files.
>
> If I was hacking round with this, then given the XML nature of the
> invocation records and the XML intermediate for of SwiftScript, I'd be
> inclined to make something that would import them all into an XML database
> like Xindice and then play around making XPath queries against that.
>
> --
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>

-- 
Tiberiu (Tibi) Stef-Praun, PhD
Research Staff, Computation Institute
5640 S. Ellis Ave, #405
University of Chicago
http://www-unix.mcs.anl.gov/~tiberius/