[Swift-user] Virtual data schema / catalog?

Allen, M. David dmallen at mitre.org
Fri Sep 14 09:13:35 CDT 2007


Two workflows could work, but that really would be a workaround.  The
goal would be to have a single workflow so that it could be understood
and maintained as a single workflow.

I don't necessarily think that this even should be done within swift.
It could probably be accomplished with a relatively simple perl script.
It could use one of two approachs:

(1) The script could intentionally cause an error to stop the workflow
after having sent notification to the human; when the human responds, a
separate script could restart the workflow to continue
(2) The script could send notification to the human, and then just
sleep indefinitely until signaled by some other thread to wake up when
the human responds.

All swift would know or care is that you're invoking a simple script
with a few simple parameters.  (That script might take 3 days to
complete, but that's another story...)

-- David

-----Original Message-----
From: Veronika Nefedova [mailto:nefedova at mcs.anl.gov] 
Sent: Friday, September 14, 2007 10:08 AM
To: Allen, M. David
Cc: Ben Clifford; swift-user at ci.uchicago.edu
Subject: Re: [Swift-user] Virtual data schema / catalog?

I am not sure if swift has such features at the present (Ben or  
Mihael should comment on that), but a very simple workaround would be  
to have 2 independent  workflows. When the first one finishes, a  
researcher would work with the data, upload it wherever it should go,  
etc, and then with just one extra push of a button the second part of  
the workflow could be started.
Of course this won't work if you need to stop the workflow each time  
at different places, but if you need to stop it after one particular  
step (every time the same) - the above solution would work.

Nika



On Sep 14, 2007, at 9:00 AM, Allen, M. David wrote:

> I noticed the stop/restart feature when I read the user guide, and  
> it's
> something I'm very interested in.  One of the things on my to do list
> is to think about developing a perl script that can allow workflows
to
> be intentionally stopped to allow human intervention.
>
> In other words, right now I can use swift to chain a bunch of
programs
> together, but what if I want to take a work product, assign it to a
> human to do something with it, and then use the output of the human's
> effort to act as the input to the next step?  One way might be to
> intentionally stop the workflow at the point where the human should
> take the input.  (Alternately, to let it fail because the human work
> product doesn't exist yet)  The human would then go off and do  
> whatever
> they're supposed to do, and subsequently upload the result to some
> website.  The act of uploading that result would then restart the
> workflow to continue processing with the human's result as an
> intermediate work product.  Viola.  Humans can be tasked to do
> arbitrarily complex things that computers can't do, just like they  
> were
> an invocation of "grep".  :)
>
> On a separate note, (the XML schema) - I will be using some of the
> nightly builds.  Is the XML schema fairly stable?  How often does it
> change?
>
> -- David
>
> -----Original Message-----
> From: Ben Clifford [mailto:benc at hawaga.org.uk]
> Sent: Friday, September 14, 2007 9:47 AM
> To: Allen, M. David
> Cc: swift-user at ci.uchicago.edu
> Subject: RE: [Swift-user] Virtual data schema / catalog?
>
>
>> Being able to look through a provenance chain would be good because
> it
>> could allow selective regeneration of data sets.  I.e. in some cases
> I
>> don't want to run the entire workflow, I just want software that
will
>> figure out which datasets in a workflow are missing, and then
> recompute
>> only those pieces (and any others that depend on them).
>
> Also related to this, then:
>
> Swift (or rather the Karajan workflow engine underneath it) has this
> concept of restart logs. These are implemented at a lower level than
> the
> XML.
>
> Briefly, if you have a KML file, you can run part of it, have a
> failure,
> let the system abort and write out a restart log; and then you can
run
> again using the restart log to ignore work already done.
>
> Because this happens at the KML level, its suspect its not something
> that
> you can really put in a database and come back to with a different
> (version of the) workflow - its more intended for "i set this day
long
> workflow running; it died overnight; tomorrow I will restart it".
>
> There's a brief section on this in the tutorial, at
> http://www.ci.uchicago.edu/swift/guides/tutorial.php#id2860757
>
> "16. Starting and restarting"
>
> -- 
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>




More information about the Swift-user mailing list