[Swift-user] Virtual data schema / catalog?

Allen, M. David dmallen at mitre.org
Fri Sep 14 09:00:26 CDT 2007


I noticed the stop/restart feature when I read the user guide, and it's
something I'm very interested in.  One of the things on my to do list
is to think about developing a perl script that can allow workflows to
be intentionally stopped to allow human intervention.

In other words, right now I can use swift to chain a bunch of programs
together, but what if I want to take a work product, assign it to a
human to do something with it, and then use the output of the human's
effort to act as the input to the next step?  One way might be to
intentionally stop the workflow at the point where the human should
take the input.  (Alternately, to let it fail because the human work
product doesn't exist yet)  The human would then go off and do whatever
they're supposed to do, and subsequently upload the result to some
website.  The act of uploading that result would then restart the
workflow to continue processing with the human's result as an
intermediate work product.  Viola.  Humans can be tasked to do
arbitrarily complex things that computers can't do, just like they were
an invocation of "grep".  :)

On a separate note, (the XML schema) - I will be using some of the
nightly builds.  Is the XML schema fairly stable?  How often does it
change?

-- David

-----Original Message-----
From: Ben Clifford [mailto:benc at hawaga.org.uk] 
Sent: Friday, September 14, 2007 9:47 AM
To: Allen, M. David
Cc: swift-user at ci.uchicago.edu
Subject: RE: [Swift-user] Virtual data schema / catalog?


> Being able to look through a provenance chain would be good because
it
> could allow selective regeneration of data sets.  I.e. in some cases
I
> don't want to run the entire workflow, I just want software that will
> figure out which datasets in a workflow are missing, and then
recompute
> only those pieces (and any others that depend on them).

Also related to this, then:

Swift (or rather the Karajan workflow engine underneath it) has this 
concept of restart logs. These are implemented at a lower level than
the 
XML.

Briefly, if you have a KML file, you can run part of it, have a
failure, 
let the system abort and write out a restart log; and then you can run 
again using the restart log to ignore work already done.

Because this happens at the KML level, its suspect its not something
that 
you can really put in a database and come back to with a different 
(version of the) workflow - its more intended for "i set this day long 
workflow running; it died overnight; tomorrow I will restart it".

There's a brief section on this in the tutorial, at
http://www.ci.uchicago.edu/swift/guides/tutorial.php#id2860757

"16. Starting and restarting"

-- 



More information about the Swift-user mailing list