[Swift-devel] Interesting observation when running Swift

Mihael Hategan hategan at mcs.anl.gov
Tue Apr 10 13:03:31 CDT 2007


Remote mappers?

On Tue, 2007-04-10 at 13:01 -0500, Yong Zhao wrote:
> aren't these already addressed by remote mappers?
> 
> Yong.
> 
> On Tue, 10 Apr 2007, Mihael Hategan wrote:
> 
> > On Tue, 2007-04-10 at 12:12 -0500, Veronika V. Nefedova wrote:
> > > I think that something like that would be useful:
> > >
> > > outputsStage1[]=stage1()
> > > outputsStage2[]=stage2(outputsStage1[])
> > >
> > > if you didn't have to specify the number or specific filenames for the
> > > outputs. Basically it would be good for the Workflow engine to understand
> > > this: "get all the produced files from stage 1 and use them as an input for
> > > Stage 2"
> >
> > Some applications are known to produce extra temporary files. Mappers
> > are supposed to be able to extract arrays from a cluttered file system
> > (assuming that there are no ambiguities in naming patterns). So yes, it
> > would be useful, and I think one of the planned features, and I see no
> > obvious problems besides the possible naming conflicts (which would
> > apply to a local file system anyway, so not a new problem).
> >
> > >
> > > (;
> > >
> > > Nika
> > >
> > >
> > > At 11:57 AM 4/10/2007, Tiberiu Stef-Praun wrote:
> > > >Interesting.
> > > >
> > > >Does anyone else think that monitoring the filesystem could be a useful idea ?
> > > >
> > > >For instance it could help with file-driven dependencies, in scenarios
> > > >where we want to have continuous workflows, or compose independent
> > > >wokflows. The filesystem would act as the publish-subscribe mechanism
> > > >for some workflow cases.
> > > >
> > > >Tibi
> > > >
> > > >On 4/10/07, Mihael Hategan <hategan at mcs.anl.gov> wrote:
> > > >>Swift doesn't monitor the file system.
> > > >>Data driven doesn't mean that it does magic in the background. It means
> > > >>that you have to express data dependencies in the code.
> > > >>
> > > >>On Tue, 2007-04-10 at 11:47 -0500, Tiberiu Stef-Praun wrote:
> > > >> > I have a workflow along these lines:
> > > >> >
> > > >> > // this one generates outputsStage1[]
> > > >> > stage1()
> > > >> > // this one merges the stage1 outputs
> > > >> > stage2(outputsStage1[])
> > > >> >
> > > >> > note that it is not outputsStage1=stage1()
> > > >> >
> > > >> > Since the outputsStage1 files were not generated yet, I expected that
> > > >> > Karajan waited for them to be created before running stage2, but that
> > > >> > was not the case: stage2 was executed when the workflow started (and
> > > >> > it failed) and caused the workflow to fail.
> > > >> >
> > > >> > I know how to fix the workflow, that is not the issue. The issue is
> > > >> > that I expected the workflow to be data-driven, but it seems to be
> > > >> > code driven. Explanation: it attempted to execute a section even if
> > > >> > its input files were not available.
> > > >> >
> > > >> > Correct me if I am wrong.
> > > >> > Tibi
> > > >> >
> > > >>
> > > >
> > > >
> > > >--
> > > >Tiberiu (Tibi) Stef-Praun, PhD
> > > >Research Staff, Computation Institute
> > > >5640 S. Ellis Ave, #405
> > > >University of Chicago
> > > >http://www-unix.mcs.anl.gov/~tiberius/
> > > >_______________________________________________
> > > >Swift-devel mailing list
> > > >Swift-devel at ci.uchicago.edu
> > > >http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > >
> > >
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >
> 




More information about the Swift-devel mailing list