[Swift-devel] mapper problem or ...?

Veronika V. Nefedova nefedova at mcs.anl.gov
Wed Mar 14 11:31:35 CDT 2007


You got it!
I also would like to see this happen, for example:
(a*.txt, c1.txt) = APP2 (b.txt);
and
(s.txt) = APP3 (a*.txt);
(and the like)

Then one won't need to worry about any mappers (;

Thanks!

Nika


At 11:20 AM 3/14/2007, Mihael Hategan wrote:
>Ok, I see what you're saying. You're not suggesting "hiding" of
>dependencies.
>
>I guess it could be possible to come up with some syntatic sugar. We
>would then consider data like that to be singletons.
>Example:
><"a.txt"> = APP1(<"b.txt">);
><"c.txt"> = APP2(<"a.txt">);
>
>On Wed, 2007-03-14 at 11:02 -0500, Veronika V. Nefedova wrote:
> > Hi, Mihael:
> >
> > please see my comments below...
> >
> > Thanks,
> >
> > Nika
> >
> > At 10:20 AM 3/14/2007, Mihael Hategan wrote:
> > >On Wed, 2007-03-14 at 09:32 -0500, Veronika V. Nefedova wrote:
> > > > Ok, now I think you hit the area in your explanations that I always 
> had a
> > > > problem with. So here is my understanding of things:
> > > >
> > > > if I have two apps that I need to chain together, I need to do this:
> > > >
> > > > file a <"a.txt">;
> > > > file b <"b.txt">;
> > > > file c <"c.txt">;
> > > >   a = APP1 (b);
> > > >   c = APP2 (a);
> > >
> > >Yep. But if you don't care about what file a is put in, you can skip
> > >mapping it. Although I gather it doesn't change things by much:
> > >file a;
> > >file b <"b.txt">;
> > >file c <"c.txt">;
> > >   a = APP1 (b);
> > >   c = APP2 (a);
> >
> > I do care about "a.txt", but I do not care about "a". Thats the main point.
> >
> > more below...
> >
> > > >
> > > > I.e. the chaining of the programs happens on a 'logical' file level 
> (a,b,c
> > > > rather then a.txt, b.txt, c.txt). Is that a correct understanding?
> > >
> > >Yes.
> > >
> > > >  I acted
> > > > on this understanding and my workflow has been working fine (till 
> now --
> > > > but thats another story). Having create all this logical files was 
> a *big*
> > > > pain (as I couldn't have the same logical names as physical 
> filenames due
> > > > to a different file naming conventions in swift: no multiple ".", 
> etc). It
> > > > really would've been much easier for my workflow to have just this:
> > > >
> > > > a1.txt = APP1 (b.txt);
> > > > a2.txt = APP2 (b.txt);
> > > > c.txt   = APP3 (a1.txt, a2.txt);
> > > >
> > > > as my applications produce an enormous amount of intermediate files 
> with
> > > > some specific naming conventions.
> > >
> > >Swift needs to know about those. Any workflow system would need to know
> > >about those. There is no way to automatically determine what set of
> > >files an application invocation will need. It may be possible to
> > >determine what set of files an application invocation produces (although
> > >making it consistent may be difficult), but even in that case the matter
> > >of distinguishing which of those are meaningful for your workflow is not
> > >quite possible.
> >
> > I do not agree. You can specify the files that you need (intermediate or
> > final) on the left side of the function call - exactly the way it is done
> > now (but use the actual file names) :
> >
> > (a1.txt, a2.txt, a3.txt) = APP1 (b*.txt);
> >
> > where APP could be producing hundreds of a.txt files (a1.txt - a100.txt)
> > and 10 c*.txt files (c1.txt -c10.txt). And only those 3 specified 
> should be
> > cared for. Or it could be done even this way:
> >
> > (a*.txt, c1.txt) = APP2 (b.txt); where I want to get all a*.txt files and
> > only one c1.txt file
> >
> > Swift stages files just before the application starts. So it shouldn't
> > affect the workflow system at all (to my understanding). Just the amount
> > files that need to be staged in/out (alternatively, you can always zip all
> > files together and have just one file staged in/out for any application).
> >
> > Anyway, I am not saying all this is easy -- just suggesting some
> > alternatives to the current system that requires (in case of my
> > application) some tedious filename operations...
> >
> > more below...
> >
> > > >
> > > > Now back to my original problem - constructing and passing to my next
> > > > application a collection of files. If I didn't have to do any 
> mappers, it
> > > > would've been just as easy as (for example):
> > > >
> > > > c.txt = APP3 (a*.txt);
> > >There isn't much difference between that and c = APP3(a), where a is an
> > >array. But I digress.
> >
> >
> > Ok. But how do I construct that array in a clean way ? I thought that a
> > fixed_array_mapper would do that for me (if I pass a string of logical
> > filenames to it, shouldn't it create an array of files for me ?). Thats 
> the
> > main point - I can't construct an array of logical filenames and pass 
> it to
> > my application without re-writing the already-working code. Or I am just
> > missing something - and an answer is a one line code change ? (;
> >
> > more below...
> >
> > > >
> > > > Does it make sense at all ?
> > >
> > >Of course.
> > >However, I'm not convinced how well it would work, for reasons outlined
> > >above.
> > >So there is a number of operations and certain dependencies between
> > >them, where the operations are job submissions and file transfers (let's
> > >abstract low level, technology dependent things for now). These need to
> > >happen and are the result of executing a workflow (regardless of how
> > >it's expressed). They represent the application that you are trying to
> > >implement. If there is a way to infer all those operations and all the
> > >dependencies from your specification model, then it would be ok. So in
> > >the context of exploring a different way of expressing things, it would
> > >be helpful to have a clear illustration of both of them and the rules
> > >that get you from one to the other.
> >
> > It does sound like a good topic for the discussion! (-;
> >
> > Thanks again,
> >
> > Nika
> >
> >
> > >Mihael
> > >
> > > >
> > > > Thanks,
> > > >
> > > > Nika
> > > >
> > > >
> > > >
> > > > At 06:41 PM 3/13/2007, Mihael Hategan wrote:
> > > > >On Tue, 2007-03-13 at 18:28 -0500, Veronika V. Nefedova wrote:
> > > > > > Hmmm. So here is how my files are produced (inside double loop 
> over
> > > $s and
> > > > > > $name):
> > > > > >
> > > > > > file $s9prt <"$name.prt">;
> > > > > > file $s9wham  <"$s9.wham">;
> > > > > > file $s9crd  <"$s9.crd">;
> > > > > > file $s9out <"$s9.out">;
> > > > > > file $s9done  <"$s9donefile">;
> > > > > >
> > > > > > ($s9wham, $s9crd, $s9out, $s9done) = CHARMM3 (standn, gaff_prm,
> > > gaff_rft,
> > > > > > rtf_file_$s, prm_file_$s, psf_file_$s, crd_eq_file_$s, $s9prt, 
> "$ss1",
> > > > > > "$s1", "$s2", "$s3", "$s4", "$s5", "$s7", "$s8", "$sprt", "$rcut1",
> > > > > "$rcut2");
> > > > > >
> > > > > > so if I change the mapping of the needed output file ($s9wham),
> > > everything
> > > > > > should work?
> > > > > >
> > > > > > file whamfiles_$s[$i]  <"$s9.wham">;
> > > > >
> > > > >That one won't work.
> > > > >You need to let Swift map whamfiles_$s[] to what it wants. So you 
> can't
> > > > >map individual items in an array differently.
> > > > >
> > > > >I believe that you rely on the fact that whamfiles_xzy maps to the 
> same
> > > > >file names as some other variables. This won't work. You need to 
> use the
> > > > >same variable. The file names are irrelevant if the program 
> doesn't make
> > > > >sense for Swift.
> > > > >So think about it this way: mentally remove all the mapper 
> declarations
> > > > >from the Swift program. If after that, the program makes sense, 
> then you
> > > > >should be good to go. If it doesn't then it's likely it won't work.
> > > > >Remember, mapping is not something that can be used to hack things
> > > > >because the workflow structure has nothing to do with the mappers and
> > > > >Swift ignores mappers when figuring out the data flow.
> > > > >
> > > > >(dependent mappers notwithstanding)
> > > > >
> > > > > > i=`expr $i + 1`
> > > > > >
> > > > > > and call the function:
> > > > > > (whamfiles_$s[$i], $s9crd, $s9out, $s9done) = CHARMM3 (standn,
> > > gaff_prm,
> > > > > > gaff_rft, rtf_file_$s, prm_file_$s, psf_file_$s, crd_eq_file_$s,
> > > $s9prt,
> > > > > > "$ss1", "$s1", "$s2", "$s3", "$s4", "$s5", "$s7", "$s8", "$sprt",
> > > > > "$rcut1",
> > > > > > "$rcut2");
> > > > > >
> > > > > > Nika
> > > > > >
> > > > > > At 06:12 PM 3/13/2007, Mihael Hategan wrote:
> > > > > > >On Tue, 2007-03-13 at 18:07 -0500, Veronika V. Nefedova wrote:
> > > > > > > > ok, here is in short what I need to do:
> > > > > > > >
> > > > > > > > at some point in the workflow N files are produced (in my 
> case its
> > > > > 68, but
> > > > > > > > it could be any number). These files produced each by a 
> separate
> > > > > job (i.e.
> > > > > > > > N jobs produce N files).
> > > > > > > > The next job in the workflow needs to take those N files as an
> > > input.
> > > > > > > >
> > > > > > > > Question: how do I pass these unknown number of files as an
> > > input to an
> > > > > > > > application ? The array_mapper didn't work (or i didn't use it
> > > > > correctly).
> > > > > > >
> > > > > > >In this case you need some other kind of mapper that can deal with
> > > > > > >unknown numbers of items. The default mapper (i.e. specifying no
> > > mapper)
> > > > > > >should work.
> > > > > > >
> > > > > > >So you need to do:
> > > > > > >
> > > > > > >file whamfiles_002[];
> > > > > > >
> > > > > > >foreach v,k in someinput {
> > > > > > >   whamfiles_002[k] = job(v);
> > > > > > >}
> > > > > > >
> > > > > > >... = GENERATOR(whamfiles_002);
> > > > > > >
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > >
> > > > > > > > Nika
> > > > > > > >
> > > > > > > > At 05:56 PM 3/13/2007, Mihael Hategan wrote:
> > > > > > > > >On a third thought. This looks like, eventually, you are
> > > trying to do
> > > > > > > > >the same thing that Yong did with the dependent mappers
> > > earlier. I
> > > > > think
> > > > > > > > >he would have more insight on the topic.
> > > > > > > > >
> > > > > > > > >On Tue, 2007-03-13 at 17:23 -0500, Veronika V. Nefedova wrote:
> > > > > > > > > > I think I am confused. Sorry!
> > > > > > > > > > what will be the type of 'whamfiles' ? If its a string 
> - will
> > > > > the swift
> > > > > > > > > > know to brake it down to filenames and stage them all in ?
> > > > > > > > > > Also - is there a mapper (or whatever) that can map the 
> list of
> > > > > > > *logical*
> > > > > > > > > > file names to an array ? (thats what I was trying to do).
> > > > > > > > > >
> > > > > > > > > > Thanks!
> > > > > > > > > >
> > > > > > > > > > Nika
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > At 04:54 PM 3/13/2007, Mihael Hategan wrote:
> > > > > > > > > > >Oh my :)
> > > > > > > > > > >@whamfiles_m002 is known by the system at all times. That
> > > means
> > > > > > > > > > >GENERATOR does not need to wait for the actual files to be
> > > > > there since
> > > > > > > > > > >it knows very well what @whamfiles_m002 is (the list of
> > > names).
> > > > > > > > > > >
> > > > > > > > > > >You should try this instead:
> > > > > > > > > > >...
> > > > > > > > > > >... GENERATOR(whamfiles, str) {
> > > > > > > > > > >    app {
> > > > > > > > > > >      generator @whamfiles, str;
> > > > > > > > > > >    }
> > > > > > > > > > >}
> > > > > > > > > > >
> > > > > > > > > > >... = GENERATOR(whamfiles_m002, "m002")
> > > > > > > > > > >
> > > > > > > > > > >Mihael
> > > > > > > > > > >
> > > > > > > > > > >On Tue, 2007-03-13 at 16:46 -0500, Veronika V. 
> Nefedova wrote:
> > > > > > > > > > > > Hi,
> > > > > > > > > > > >
> > > > > > > > > > > > I have a question:
> > > > > > > > > > > >
> > > > > > > > > > > > I am using a fixed_array_mapper to pass some 68 
> files as an
> > > > > > > input to my
> > > > > > > > > > > > application called GENERATOR. I need to use the mapper
> > > > > since the
> > > > > > > > > number of
> > > > > > > > > > > > input files is unknown before the workflow starts. 
> Here is
> > > > > how I
> > > > > > > > > use it:
> > > > > > > > > > > > file whamfiles_m002[] <fixed_array_mapper;files="
> > > > > > > > > solv_chg_a0_m002_wham,
> > > > > > > > > > > > solv_chg_a1_m002_wham, solv_chg_a10_m002_wham, 
> <snip --
> > > many
> > > > > > > files,
> > > > > > > > > you
> > > > > > > > > > > get
> > > > > > > > > > > > the idea>, solv_repu_0_0DOT2_b1_m002_wham">;
> > > > > > > > > > > >
> > > > > > > > > > > > These files are all generated by stage four of my
> > > workflow,
> > > > > each
> > > > > > > > > file is
> > > > > > > > > > > > mapped to a physical filename, for example:
> > > > > > > > > > > >
> > > > > > > > > > > > file solv_chg_a0_m002_wham  <"solv_chg_a0_m002.wham">;
> > > > > > > > > > > > and this particular file is produced this way:
> > > > > > > > > > > > (solv_chg_a0_m002_wham, solv_chg_a0_m002_crd,
> > > > > solv_chg_a0_m002_out,
> > > > > > > > > > > > solv_chg_a0_m002_done) = CHARMM2 (standn, gaff_prm,
> > > gaff_rft,
> > > > > > > > > > > > rtf_file_m002, prm_file_m002, psf_file_m002,
> > > crd_eq_file_m002,
> > > > > > > > > > > > solv_chg_a0_m002_prt, "prtfile:solv_chg_a0",
> > > > > "system:solv_m002",
> > > > > > > > > > > > "stitle:m002", "rtffile:parm03_gaff_all.rtf",
> > > > > > > > > > > > "paramfile:parm03_gaffnb_all.prm", "gaff:m002_am1",
> > > > > "stage:chg",
> > > > > > > > > > > > "urandseed:5395098", "dirname:solv_chg_a0_m002");
> > > > > > > > > > > >
> > > > > > > > > > > > Then I call my application (the last stage of my 
> workflow,
> > > > > > > stage five)
> > > > > > > > > > > >
> > > > > > > > > > > > (solv_chg_m002, solv_disp_m002,
> > > > > solv_repu_0DOT2_0DOT3_m002DOTwham,
> > > > > > > > > > > > solv_repu_0DOT3_0DOT4_m002DOTwham,
> > > > > > > solv_repu_0DOT4_0DOT5_m002DOTwham,
> > > > > > > > > > > > solv_repu_0DOT5_0DOT6_m002DOTwham,
> > > > > > > solv_repu_0DOT6_0DOT7_m002DOTwham,
> > > > > > > > > > > > solv_repu_0DOT7_0DOT8_m002DOTwham,
> > > > > > > solv_repu_0DOT8_0DOT9_m002DOTwham,
> > > > > > > > > > > > solv_repu_0DOT9_1_m002DOTwham,
> > > > > solv_repu_0_0DOT2_m002DOTwham ) =
> > > > > > > > > GENERATOR
> > > > > > > > > > > > (@whamfiles_m002, "m002");
> > > > > > > > > > > >
> > > > > > > > > > > > And then when I start my workflow, the GENERATOR 
> starts
> > > right
> > > > > > > away.
> > > > > > > > > I am
> > > > > > > > > > > > not sure why. Does the mapper look for the physical 
> files
> > > > > on the
> > > > > > > > > disk and
> > > > > > > > > > > > when finds them - starts right away ? I do have the 
> needed
> > > > > > > files in the
> > > > > > > > > > > > directory from my previous runs. Or there is something
> > > else
> > > > > wrong
> > > > > > > > > here ?
> > > > > > > > > > > >
> > > > > > > > > > > > 109] wiggum /sandbox/ydeng/alamines > Swift V 0.0405
> > > > > > > > > > > > RunID: b0n2liektep92
> > > > > > > > > > > > pre_ch started              <---------- thats the 
> first
> > > stage
> > > > > > > > > > > > generator_cat started    <----------- not supposed to
> > > start
> > > > > now!
> > > > > > > > > > > > generator_cat started
> > > > > > > > > > > >
> > > > > > > > > > > > My complete dtm file is in /home/nefedova/swift.dtm on
> > > > > > > > > > > > terminable.ci.uchicago, but its pretty big...
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > >
> > > > > > > > > > > > Nika
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > _______________________________________________
> > > > > > > > > > > > Swift-devel mailing list
> > > > > > > > > > > > Swift-devel at ci.uchicago.edu
> > > > > > > > > > > > 
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > >
> > > >
> >
> >





More information about the Swift-devel mailing list