[Swift-devel] mapper problem or ...?
Mihael Hategan
hategan at mcs.anl.gov
Wed Mar 14 10:20:30 CDT 2007
On Wed, 2007-03-14 at 09:32 -0500, Veronika V. Nefedova wrote:
> Ok, now I think you hit the area in your explanations that I always had a
> problem with. So here is my understanding of things:
>
> if I have two apps that I need to chain together, I need to do this:
>
> file a <"a.txt">;
> file b <"b.txt">;
> file c <"c.txt">;
> a = APP1 (b);
> c = APP2 (a);
Yep. But if you don't care about what file a is put in, you can skip
mapping it. Although I gather it doesn't change things by much:
file a;
file b <"b.txt">;
file c <"c.txt">;
a = APP1 (b);
c = APP2 (a);
>
> I.e. the chaining of the programs happens on a 'logical' file level (a,b,c
> rather then a.txt, b.txt, c.txt). Is that a correct understanding?
Yes.
> I acted
> on this understanding and my workflow has been working fine (till now --
> but thats another story). Having create all this logical files was a *big*
> pain (as I couldn't have the same logical names as physical filenames due
> to a different file naming conventions in swift: no multiple ".", etc). It
> really would've been much easier for my workflow to have just this:
>
> a1.txt = APP1 (b.txt);
> a2.txt = APP2 (b.txt);
> c.txt = APP3 (a1.txt, a2.txt);
>
> as my applications produce an enormous amount of intermediate files with
> some specific naming conventions.
Swift needs to know about those. Any workflow system would need to know
about those. There is no way to automatically determine what set of
files an application invocation will need. It may be possible to
determine what set of files an application invocation produces (although
making it consistent may be difficult), but even in that case the matter
of distinguishing which of those are meaningful for your workflow is not
quite possible.
>
> Now back to my original problem - constructing and passing to my next
> application a collection of files. If I didn't have to do any mappers, it
> would've been just as easy as (for example):
>
> c.txt = APP3 (a*.txt);
There isn't much difference between that and c = APP3(a), where a is an
array. But I digress.
>
> Does it make sense at all ?
Of course.
However, I'm not convinced how well it would work, for reasons outlined
above.
So there is a number of operations and certain dependencies between
them, where the operations are job submissions and file transfers (let's
abstract low level, technology dependent things for now). These need to
happen and are the result of executing a workflow (regardless of how
it's expressed). They represent the application that you are trying to
implement. If there is a way to infer all those operations and all the
dependencies from your specification model, then it would be ok. So in
the context of exploring a different way of expressing things, it would
be helpful to have a clear illustration of both of them and the rules
that get you from one to the other.
Mihael
>
> Thanks,
>
> Nika
>
>
>
> At 06:41 PM 3/13/2007, Mihael Hategan wrote:
> >On Tue, 2007-03-13 at 18:28 -0500, Veronika V. Nefedova wrote:
> > > Hmmm. So here is how my files are produced (inside double loop over $s and
> > > $name):
> > >
> > > file $s9prt <"$name.prt">;
> > > file $s9wham <"$s9.wham">;
> > > file $s9crd <"$s9.crd">;
> > > file $s9out <"$s9.out">;
> > > file $s9done <"$s9donefile">;
> > >
> > > ($s9wham, $s9crd, $s9out, $s9done) = CHARMM3 (standn, gaff_prm, gaff_rft,
> > > rtf_file_$s, prm_file_$s, psf_file_$s, crd_eq_file_$s, $s9prt, "$ss1",
> > > "$s1", "$s2", "$s3", "$s4", "$s5", "$s7", "$s8", "$sprt", "$rcut1",
> > "$rcut2");
> > >
> > > so if I change the mapping of the needed output file ($s9wham), everything
> > > should work?
> > >
> > > file whamfiles_$s[$i] <"$s9.wham">;
> >
> >That one won't work.
> >You need to let Swift map whamfiles_$s[] to what it wants. So you can't
> >map individual items in an array differently.
> >
> >I believe that you rely on the fact that whamfiles_xzy maps to the same
> >file names as some other variables. This won't work. You need to use the
> >same variable. The file names are irrelevant if the program doesn't make
> >sense for Swift.
> >So think about it this way: mentally remove all the mapper declarations
> >from the Swift program. If after that, the program makes sense, then you
> >should be good to go. If it doesn't then it's likely it won't work.
> >Remember, mapping is not something that can be used to hack things
> >because the workflow structure has nothing to do with the mappers and
> >Swift ignores mappers when figuring out the data flow.
> >
> >(dependent mappers notwithstanding)
> >
> > > i=`expr $i + 1`
> > >
> > > and call the function:
> > > (whamfiles_$s[$i], $s9crd, $s9out, $s9done) = CHARMM3 (standn, gaff_prm,
> > > gaff_rft, rtf_file_$s, prm_file_$s, psf_file_$s, crd_eq_file_$s, $s9prt,
> > > "$ss1", "$s1", "$s2", "$s3", "$s4", "$s5", "$s7", "$s8", "$sprt",
> > "$rcut1",
> > > "$rcut2");
> > >
> > > Nika
> > >
> > > At 06:12 PM 3/13/2007, Mihael Hategan wrote:
> > > >On Tue, 2007-03-13 at 18:07 -0500, Veronika V. Nefedova wrote:
> > > > > ok, here is in short what I need to do:
> > > > >
> > > > > at some point in the workflow N files are produced (in my case its
> > 68, but
> > > > > it could be any number). These files produced each by a separate
> > job (i.e.
> > > > > N jobs produce N files).
> > > > > The next job in the workflow needs to take those N files as an input.
> > > > >
> > > > > Question: how do I pass these unknown number of files as an input to an
> > > > > application ? The array_mapper didn't work (or i didn't use it
> > correctly).
> > > >
> > > >In this case you need some other kind of mapper that can deal with
> > > >unknown numbers of items. The default mapper (i.e. specifying no mapper)
> > > >should work.
> > > >
> > > >So you need to do:
> > > >
> > > >file whamfiles_002[];
> > > >
> > > >foreach v,k in someinput {
> > > > whamfiles_002[k] = job(v);
> > > >}
> > > >
> > > >... = GENERATOR(whamfiles_002);
> > > >
> > > > >
> > > > > Thanks!
> > > > >
> > > > > Nika
> > > > >
> > > > > At 05:56 PM 3/13/2007, Mihael Hategan wrote:
> > > > > >On a third thought. This looks like, eventually, you are trying to do
> > > > > >the same thing that Yong did with the dependent mappers earlier. I
> > think
> > > > > >he would have more insight on the topic.
> > > > > >
> > > > > >On Tue, 2007-03-13 at 17:23 -0500, Veronika V. Nefedova wrote:
> > > > > > > I think I am confused. Sorry!
> > > > > > > what will be the type of 'whamfiles' ? If its a string - will
> > the swift
> > > > > > > know to brake it down to filenames and stage them all in ?
> > > > > > > Also - is there a mapper (or whatever) that can map the list of
> > > > *logical*
> > > > > > > file names to an array ? (thats what I was trying to do).
> > > > > > >
> > > > > > > Thanks!
> > > > > > >
> > > > > > > Nika
> > > > > > >
> > > > > > >
> > > > > > > At 04:54 PM 3/13/2007, Mihael Hategan wrote:
> > > > > > > >Oh my :)
> > > > > > > >@whamfiles_m002 is known by the system at all times. That means
> > > > > > > >GENERATOR does not need to wait for the actual files to be
> > there since
> > > > > > > >it knows very well what @whamfiles_m002 is (the list of names).
> > > > > > > >
> > > > > > > >You should try this instead:
> > > > > > > >...
> > > > > > > >... GENERATOR(whamfiles, str) {
> > > > > > > > app {
> > > > > > > > generator @whamfiles, str;
> > > > > > > > }
> > > > > > > >}
> > > > > > > >
> > > > > > > >... = GENERATOR(whamfiles_m002, "m002")
> > > > > > > >
> > > > > > > >Mihael
> > > > > > > >
> > > > > > > >On Tue, 2007-03-13 at 16:46 -0500, Veronika V. Nefedova wrote:
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > I have a question:
> > > > > > > > >
> > > > > > > > > I am using a fixed_array_mapper to pass some 68 files as an
> > > > input to my
> > > > > > > > > application called GENERATOR. I need to use the mapper
> > since the
> > > > > > number of
> > > > > > > > > input files is unknown before the workflow starts. Here is
> > how I
> > > > > > use it:
> > > > > > > > > file whamfiles_m002[] <fixed_array_mapper;files="
> > > > > > solv_chg_a0_m002_wham,
> > > > > > > > > solv_chg_a1_m002_wham, solv_chg_a10_m002_wham, <snip -- many
> > > > files,
> > > > > > you
> > > > > > > > get
> > > > > > > > > the idea>, solv_repu_0_0DOT2_b1_m002_wham">;
> > > > > > > > >
> > > > > > > > > These files are all generated by stage four of my workflow,
> > each
> > > > > > file is
> > > > > > > > > mapped to a physical filename, for example:
> > > > > > > > >
> > > > > > > > > file solv_chg_a0_m002_wham <"solv_chg_a0_m002.wham">;
> > > > > > > > > and this particular file is produced this way:
> > > > > > > > > (solv_chg_a0_m002_wham, solv_chg_a0_m002_crd,
> > solv_chg_a0_m002_out,
> > > > > > > > > solv_chg_a0_m002_done) = CHARMM2 (standn, gaff_prm, gaff_rft,
> > > > > > > > > rtf_file_m002, prm_file_m002, psf_file_m002, crd_eq_file_m002,
> > > > > > > > > solv_chg_a0_m002_prt, "prtfile:solv_chg_a0",
> > "system:solv_m002",
> > > > > > > > > "stitle:m002", "rtffile:parm03_gaff_all.rtf",
> > > > > > > > > "paramfile:parm03_gaffnb_all.prm", "gaff:m002_am1",
> > "stage:chg",
> > > > > > > > > "urandseed:5395098", "dirname:solv_chg_a0_m002");
> > > > > > > > >
> > > > > > > > > Then I call my application (the last stage of my workflow,
> > > > stage five)
> > > > > > > > >
> > > > > > > > > (solv_chg_m002, solv_disp_m002,
> > solv_repu_0DOT2_0DOT3_m002DOTwham,
> > > > > > > > > solv_repu_0DOT3_0DOT4_m002DOTwham,
> > > > solv_repu_0DOT4_0DOT5_m002DOTwham,
> > > > > > > > > solv_repu_0DOT5_0DOT6_m002DOTwham,
> > > > solv_repu_0DOT6_0DOT7_m002DOTwham,
> > > > > > > > > solv_repu_0DOT7_0DOT8_m002DOTwham,
> > > > solv_repu_0DOT8_0DOT9_m002DOTwham,
> > > > > > > > > solv_repu_0DOT9_1_m002DOTwham,
> > solv_repu_0_0DOT2_m002DOTwham ) =
> > > > > > GENERATOR
> > > > > > > > > (@whamfiles_m002, "m002");
> > > > > > > > >
> > > > > > > > > And then when I start my workflow, the GENERATOR starts right
> > > > away.
> > > > > > I am
> > > > > > > > > not sure why. Does the mapper look for the physical files
> > on the
> > > > > > disk and
> > > > > > > > > when finds them - starts right away ? I do have the needed
> > > > files in the
> > > > > > > > > directory from my previous runs. Or there is something else
> > wrong
> > > > > > here ?
> > > > > > > > >
> > > > > > > > > 109] wiggum /sandbox/ydeng/alamines > Swift V 0.0405
> > > > > > > > > RunID: b0n2liektep92
> > > > > > > > > pre_ch started <---------- thats the first stage
> > > > > > > > > generator_cat started <----------- not supposed to start
> > now!
> > > > > > > > > generator_cat started
> > > > > > > > >
> > > > > > > > > My complete dtm file is in /home/nefedova/swift.dtm on
> > > > > > > > > terminable.ci.uchicago, but its pretty big...
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Nika
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > _______________________________________________
> > > > > > > > > Swift-devel mailing list
> > > > > > > > > Swift-devel at ci.uchicago.edu
> > > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > >
> > >
>
>
More information about the Swift-devel
mailing list