[Swift-user] Virtual data schema / catalog?
Mihael Hategan
hategan at mcs.anl.gov
Fri Sep 14 09:11:51 CDT 2007
On Fri, 2007-09-14 at 10:00 -0400, Allen, M. David wrote:
[...]
> In other words, right now I can use swift to chain a bunch of programs
> together, but what if I want to take a work product, assign it to a
> human to do something with it, and then use the output of the human's
> effort to act as the input to the next step? One way might be to
> intentionally stop the workflow at the point where the human should
> take the input. (Alternately, to let it fail because the human work
> product doesn't exist yet) The human would then go off and do whatever
> they're supposed to do, and subsequently upload the result to some
> website. The act of uploading that result would then restart the
> workflow to continue processing with the human's result as an
> intermediate work product. Viola. Humans can be tasked to do
> arbitrarily complex things that computers can't do, just like they were
> an invocation of "grep". :)
Human processes are, to a large extent, like all other processes (albeit
somewhat nondeterministic). So you can model them as a process/app in
Swift. You would have to take care of designing the bridging, but I
don't think that differs much for Swift vs. other things.
To be more specific, you can have an application String
spellcheck(String), which is actually done by a person. The actual
executable may display an editable box on the screen and write the
output into a file when the user clicks "I'm done", or send email and
expect a reply and write to the file or any other reasonable user
interface. To swift it won't make a difference. In fact, this will work
with any language/system.
>
[...]
Mihael
>
> -- David
>
> -----Original Message-----
> From: Ben Clifford [mailto:benc at hawaga.org.uk]
> Sent: Friday, September 14, 2007 9:47 AM
> To: Allen, M. David
> Cc: swift-user at ci.uchicago.edu
> Subject: RE: [Swift-user] Virtual data schema / catalog?
>
>
> > Being able to look through a provenance chain would be good because
> it
> > could allow selective regeneration of data sets. I.e. in some cases
> I
> > don't want to run the entire workflow, I just want software that will
> > figure out which datasets in a workflow are missing, and then
> recompute
> > only those pieces (and any others that depend on them).
>
> Also related to this, then:
>
> Swift (or rather the Karajan workflow engine underneath it) has this
> concept of restart logs. These are implemented at a lower level than
> the
> XML.
>
> Briefly, if you have a KML file, you can run part of it, have a
> failure,
> let the system abort and write out a restart log; and then you can run
> again using the restart log to ignore work already done.
>
> Because this happens at the KML level, its suspect its not something
> that
> you can really put in a database and come back to with a different
> (version of the) workflow - its more intended for "i set this day long
> workflow running; it died overnight; tomorrow I will restart it".
>
> There's a brief section on this in the tutorial, at
> http://www.ci.uchicago.edu/swift/guides/tutorial.php#id2860757
>
> "16. Starting and restarting"
>
More information about the Swift-user
mailing list