[Swift-devel] Interesting observation when running Swift

Mihael Hategan hategan at mcs.anl.gov
Tue Apr 10 12:56:55 CDT 2007


On Tue, 2007-04-10 at 17:46 +0000, Ian Foster wrote:
> I thought we could already do that. We can, I think, in the case of the first stage--we can take all files in a directory as a datset, say? But we can't do that for latter stages?

Not much to do with when the stages happen. It's more about knowing what
to stage out.

Right now mappers can tell what files are relevant in a given collection
(directory) that they manage, and the details are left to the
implementations of the mappers. What's needed is to feed each mapper for
return values from an atomic proc with the list of files generated by an
application, let them select the relevant files, then do the stage-out
and populate the Swift data structures based on that.

> 
> 
> Sent via BlackBerry from T-Mobile  
> 
> -----Original Message-----
> From: "Veronika  V. Nefedova" <nefedova at mcs.anl.gov>
> Date: Tue, 10 Apr 2007 12:12:12 
> To:"Tiberiu Stef-Praun" <tiberius at ci.uchicago.edu>,  "Mihael Hategan" <hategan at mcs.anl.gov>
> Cc:swift-devel at ci.uchicago.edu
> Subject: Re: [Swift-devel] Interesting observation when running Swift
> 
> I think that something like that would be useful:
> 
> outputsStage1[]=stage1()
> outputsStage2[]=stage2(outputsStage1[])
> 
> if you didn't have to specify the number or specific filenames for the 
> outputs. Basically it would be good for the Workflow engine to understand 
> this: "get all the produced files from stage 1 and use them as an input for 
> Stage 2"
> 
> (;
> 
> Nika
> 
> 
> At 11:57 AM 4/10/2007, Tiberiu Stef-Praun wrote:
> >Interesting.
> >
> >Does anyone else think that monitoring the filesystem could be a useful idea ?
> >
> >For instance it could help with file-driven dependencies, in scenarios
> >where we want to have continuous workflows, or compose independent
> >wokflows. The filesystem would act as the publish-subscribe mechanism
> >for some workflow cases.
> >
> >Tibi
> >
> >On 4/10/07, Mihael Hategan <hategan at mcs.anl.gov> wrote:
> >>Swift doesn't monitor the file system.
> >>Data driven doesn't mean that it does magic in the background. It means
> >>that you have to express data dependencies in the code.
> >>
> >>On Tue, 2007-04-10 at 11:47 -0500, Tiberiu Stef-Praun wrote:
> >> > I have a workflow along these lines:
> >> >
> >> > // this one generates outputsStage1[]
> >> > stage1()
> >> > // this one merges the stage1 outputs
> >> > stage2(outputsStage1[])
> >> >
> >> > note that it is not outputsStage1=stage1()
> >> >
> >> > Since the outputsStage1 files were not generated yet, I expected that
> >> > Karajan waited for them to be created before running stage2, but that
> >> > was not the case: stage2 was executed when the workflow started (and
> >> > it failed) and caused the workflow to fail.
> >> >
> >> > I know how to fix the workflow, that is not the issue. The issue is
> >> > that I expected the workflow to be data-driven, but it seems to be
> >> > code driven. Explanation: it attempted to execute a section even if
> >> > its input files were not available.
> >> >
> >> > Correct me if I am wrong.
> >> > Tibi
> >> >
> >>
> >
> >
> >--
> >Tiberiu (Tibi) Stef-Praun, PhD
> >Research Staff, Computation Institute
> >5640 S. Ellis Ave, #405
> >University of Chicago
> >http://www-unix.mcs.anl.gov/~tiberius/
> >_______________________________________________
> >Swift-devel mailing list
> >Swift-devel at ci.uchicago.edu
> >http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 




More information about the Swift-devel mailing list