[Swift-user] Virtual data schema / catalog?

Fri Sep 14 09:11:51 CDT 2007

On Fri, 2007-09-14 at 10:00 -0400, Allen, M. David wrote:
[...]
> In other words, right now I can use swift to chain a bunch of programs
> together, but what if I want to take a work product, assign it to a
> human to do something with it, and then use the output of the human's
> effort to act as the input to the next step?  One way might be to
> intentionally stop the workflow at the point where the human should
> take the input.  (Alternately, to let it fail because the human work
> product doesn't exist yet)  The human would then go off and do whatever
> they're supposed to do, and subsequently upload the result to some
> website.  The act of uploading that result would then restart the
> workflow to continue processing with the human's result as an
> intermediate work product.  Viola.  Humans can be tasked to do
> arbitrarily complex things that computers can't do, just like they were
> an invocation of "grep".  :)

Human processes are, to a large extent, like all other processes (albeit
somewhat nondeterministic). So you can model them as a process/app in
Swift. You would have to take care of designing the bridging, but I
don't think that differs much for Swift vs. other things.

To be more specific, you can have an application String
spellcheck(String), which is actually done by a person. The actual
executable may display an editable box on the screen and write the
output into a file when the user clicks "I'm done", or send email and
expect a reply and write to the file or any other reasonable user
interface. To swift it won't make a difference. In fact, this will work
with any language/system.

> 
[...]

Mihael

> 
> -- David
> 
> -----Original Message-----
> From: Ben Clifford [mailto:benc at hawaga.org.uk] 
> Sent: Friday, September 14, 2007 9:47 AM
> To: Allen, M. David
> Cc: swift-user at ci.uchicago.edu
> Subject: RE: [Swift-user] Virtual data schema / catalog?
> 
> 
> > Being able to look through a provenance chain would be good because
> it
> > could allow selective regeneration of data sets.  I.e. in some cases
> I
> > don't want to run the entire workflow, I just want software that will
> > figure out which datasets in a workflow are missing, and then
> recompute
> > only those pieces (and any others that depend on them).
> 
> Also related to this, then:
> 
> Swift (or rather the Karajan workflow engine underneath it) has this 
> concept of restart logs. These are implemented at a lower level than
> the 
> XML.
> 
> Briefly, if you have a KML file, you can run part of it, have a
> failure, 
> let the system abort and write out a restart log; and then you can run 
> again using the restart log to ignore work already done.
> 
> Because this happens at the KML level, its suspect its not something
> that 
> you can really put in a database and come back to with a different 
> (version of the) workflow - its more intended for "i set this day long 
> workflow running; it died overnight; tomorrow I will restart it".
> 
> There's a brief section on this in the tutorial, at
> http://www.ci.uchicago.edu/swift/guides/tutorial.php#id2860757
> 
> "16. Starting and restarting"
>