[Swift-user] Virtual data schema / catalog?
Veronika Nefedova
nefedova at mcs.anl.gov
Fri Sep 14 09:07:31 CDT 2007
I am not sure if swift has such features at the present (Ben or
Mihael should comment on that), but a very simple workaround would be
to have 2 independent workflows. When the first one finishes, a
researcher would work with the data, upload it wherever it should go,
etc, and then with just one extra push of a button the second part of
the workflow could be started.
Of course this won't work if you need to stop the workflow each time
at different places, but if you need to stop it after one particular
step (every time the same) - the above solution would work.
Nika
On Sep 14, 2007, at 9:00 AM, Allen, M. David wrote:
> I noticed the stop/restart feature when I read the user guide, and
> it's
> something I'm very interested in. One of the things on my to do list
> is to think about developing a perl script that can allow workflows to
> be intentionally stopped to allow human intervention.
>
> In other words, right now I can use swift to chain a bunch of programs
> together, but what if I want to take a work product, assign it to a
> human to do something with it, and then use the output of the human's
> effort to act as the input to the next step? One way might be to
> intentionally stop the workflow at the point where the human should
> take the input. (Alternately, to let it fail because the human work
> product doesn't exist yet) The human would then go off and do
> whatever
> they're supposed to do, and subsequently upload the result to some
> website. The act of uploading that result would then restart the
> workflow to continue processing with the human's result as an
> intermediate work product. Viola. Humans can be tasked to do
> arbitrarily complex things that computers can't do, just like they
> were
> an invocation of "grep". :)
>
> On a separate note, (the XML schema) - I will be using some of the
> nightly builds. Is the XML schema fairly stable? How often does it
> change?
>
> -- David
>
> -----Original Message-----
> From: Ben Clifford [mailto:benc at hawaga.org.uk]
> Sent: Friday, September 14, 2007 9:47 AM
> To: Allen, M. David
> Cc: swift-user at ci.uchicago.edu
> Subject: RE: [Swift-user] Virtual data schema / catalog?
>
>
>> Being able to look through a provenance chain would be good because
> it
>> could allow selective regeneration of data sets. I.e. in some cases
> I
>> don't want to run the entire workflow, I just want software that will
>> figure out which datasets in a workflow are missing, and then
> recompute
>> only those pieces (and any others that depend on them).
>
> Also related to this, then:
>
> Swift (or rather the Karajan workflow engine underneath it) has this
> concept of restart logs. These are implemented at a lower level than
> the
> XML.
>
> Briefly, if you have a KML file, you can run part of it, have a
> failure,
> let the system abort and write out a restart log; and then you can run
> again using the restart log to ignore work already done.
>
> Because this happens at the KML level, its suspect its not something
> that
> you can really put in a database and come back to with a different
> (version of the) workflow - its more intended for "i set this day long
> workflow running; it died overnight; tomorrow I will restart it".
>
> There's a brief section on this in the tutorial, at
> http://www.ci.uchicago.edu/swift/guides/tutorial.php#id2860757
>
> "16. Starting and restarting"
>
> --
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>
More information about the Swift-user
mailing list