[Swift-devel] Interesting observation when running Swift

Veronika V. Nefedova nefedova at mcs.anl.gov
Tue Apr 10 12:54:37 CDT 2007


The problem is with the output -- you can't produce an unknown (to swift) 
number of files...

Nika

At 12:46 PM 4/10/2007, Ian Foster wrote:
>I thought we could already do that. We can, I think, in the case of the 
>first stage--we can take all files in a directory as a datset, say? But we 
>can't do that for latter stages?
>
>
>Sent via BlackBerry from T-Mobile
>
>-----Original Message-----
>From: "Veronika  V. Nefedova" <nefedova at mcs.anl.gov>
>Date: Tue, 10 Apr 2007 12:12:12
>To:"Tiberiu Stef-Praun" <tiberius at ci.uchicago.edu>,  "Mihael Hategan" 
><hategan at mcs.anl.gov>
>Cc:swift-devel at ci.uchicago.edu
>Subject: Re: [Swift-devel] Interesting observation when running Swift
>
>I think that something like that would be useful:
>
>outputsStage1[]=stage1()
>outputsStage2[]=stage2(outputsStage1[])
>
>if you didn't have to specify the number or specific filenames for the
>outputs. Basically it would be good for the Workflow engine to understand
>this: "get all the produced files from stage 1 and use them as an input for
>Stage 2"
>
>(;
>
>Nika
>
>
>At 11:57 AM 4/10/2007, Tiberiu Stef-Praun wrote:
> >Interesting.
> >
> >Does anyone else think that monitoring the filesystem could be a useful 
> idea ?
> >
> >For instance it could help with file-driven dependencies, in scenarios
> >where we want to have continuous workflows, or compose independent
> >wokflows. The filesystem would act as the publish-subscribe mechanism
> >for some workflow cases.
> >
> >Tibi
> >
> >On 4/10/07, Mihael Hategan <hategan at mcs.anl.gov> wrote:
> >>Swift doesn't monitor the file system.
> >>Data driven doesn't mean that it does magic in the background. It means
> >>that you have to express data dependencies in the code.
> >>
> >>On Tue, 2007-04-10 at 11:47 -0500, Tiberiu Stef-Praun wrote:
> >> > I have a workflow along these lines:
> >> >
> >> > // this one generates outputsStage1[]
> >> > stage1()
> >> > // this one merges the stage1 outputs
> >> > stage2(outputsStage1[])
> >> >
> >> > note that it is not outputsStage1=stage1()
> >> >
> >> > Since the outputsStage1 files were not generated yet, I expected that
> >> > Karajan waited for them to be created before running stage2, but that
> >> > was not the case: stage2 was executed when the workflow started (and
> >> > it failed) and caused the workflow to fail.
> >> >
> >> > I know how to fix the workflow, that is not the issue. The issue is
> >> > that I expected the workflow to be data-driven, but it seems to be
> >> > code driven. Explanation: it attempted to execute a section even if
> >> > its input files were not available.
> >> >
> >> > Correct me if I am wrong.
> >> > Tibi
> >> >
> >>
> >
> >
> >--
> >Tiberiu (Tibi) Stef-Praun, PhD
> >Research Staff, Computation Institute
> >5640 S. Ellis Ave, #405
> >University of Chicago
> >http://www-unix.mcs.anl.gov/~tiberius/
> >_______________________________________________
> >Swift-devel mailing list
> >Swift-devel at ci.uchicago.edu
> >http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
>
>_______________________________________________
>Swift-devel mailing list
>Swift-devel at ci.uchicago.edu
>http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel





More information about the Swift-devel mailing list