[Swift-devel] Interesting observation when running Swift
Yong Zhao
yongzh at cs.uchicago.edu
Tue Apr 10 13:01:44 CDT 2007
aren't these already addressed by remote mappers?
Yong.
On Tue, 10 Apr 2007, Mihael Hategan wrote:
> On Tue, 2007-04-10 at 12:12 -0500, Veronika V. Nefedova wrote:
> > I think that something like that would be useful:
> >
> > outputsStage1[]=stage1()
> > outputsStage2[]=stage2(outputsStage1[])
> >
> > if you didn't have to specify the number or specific filenames for the
> > outputs. Basically it would be good for the Workflow engine to understand
> > this: "get all the produced files from stage 1 and use them as an input for
> > Stage 2"
>
> Some applications are known to produce extra temporary files. Mappers
> are supposed to be able to extract arrays from a cluttered file system
> (assuming that there are no ambiguities in naming patterns). So yes, it
> would be useful, and I think one of the planned features, and I see no
> obvious problems besides the possible naming conflicts (which would
> apply to a local file system anyway, so not a new problem).
>
> >
> > (;
> >
> > Nika
> >
> >
> > At 11:57 AM 4/10/2007, Tiberiu Stef-Praun wrote:
> > >Interesting.
> > >
> > >Does anyone else think that monitoring the filesystem could be a useful idea ?
> > >
> > >For instance it could help with file-driven dependencies, in scenarios
> > >where we want to have continuous workflows, or compose independent
> > >wokflows. The filesystem would act as the publish-subscribe mechanism
> > >for some workflow cases.
> > >
> > >Tibi
> > >
> > >On 4/10/07, Mihael Hategan <hategan at mcs.anl.gov> wrote:
> > >>Swift doesn't monitor the file system.
> > >>Data driven doesn't mean that it does magic in the background. It means
> > >>that you have to express data dependencies in the code.
> > >>
> > >>On Tue, 2007-04-10 at 11:47 -0500, Tiberiu Stef-Praun wrote:
> > >> > I have a workflow along these lines:
> > >> >
> > >> > // this one generates outputsStage1[]
> > >> > stage1()
> > >> > // this one merges the stage1 outputs
> > >> > stage2(outputsStage1[])
> > >> >
> > >> > note that it is not outputsStage1=stage1()
> > >> >
> > >> > Since the outputsStage1 files were not generated yet, I expected that
> > >> > Karajan waited for them to be created before running stage2, but that
> > >> > was not the case: stage2 was executed when the workflow started (and
> > >> > it failed) and caused the workflow to fail.
> > >> >
> > >> > I know how to fix the workflow, that is not the issue. The issue is
> > >> > that I expected the workflow to be data-driven, but it seems to be
> > >> > code driven. Explanation: it attempted to execute a section even if
> > >> > its input files were not available.
> > >> >
> > >> > Correct me if I am wrong.
> > >> > Tibi
> > >> >
> > >>
> > >
> > >
> > >--
> > >Tiberiu (Tibi) Stef-Praun, PhD
> > >Research Staff, Computation Institute
> > >5640 S. Ellis Ave, #405
> > >University of Chicago
> > >http://www-unix.mcs.anl.gov/~tiberius/
> > >_______________________________________________
> > >Swift-devel mailing list
> > >Swift-devel at ci.uchicago.edu
> > >http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >
> >
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
More information about the Swift-devel
mailing list