[Swift-devel] mapper problem or ...?

Mihael Hategan hategan at mcs.anl.gov
Wed Mar 14 11:47:06 CDT 2007


On Wed, 2007-03-14 at 11:31 -0500, Veronika V. Nefedova wrote:
> You got it!
> I also would like to see this happen, for example:
> (a*.txt, c1.txt) = APP2 (b.txt);
> and
> (s.txt) = APP3 (a*.txt);
> (and the like)

That's slightly different. I think we need to digest these issues.

> 
> Then one won't need to worry about any mappers (;
> 
> Thanks!
> 
> Nika
> 
> 
> At 11:20 AM 3/14/2007, Mihael Hategan wrote:
> >Ok, I see what you're saying. You're not suggesting "hiding" of
> >dependencies.
> >
> >I guess it could be possible to come up with some syntatic sugar. We
> >would then consider data like that to be singletons.
> >Example:
> ><"a.txt"> = APP1(<"b.txt">);
> ><"c.txt"> = APP2(<"a.txt">);
> >
> >On Wed, 2007-03-14 at 11:02 -0500, Veronika V. Nefedova wrote:
> > > Hi, Mihael:
> > >
> > > please see my comments below...
> > >
> > > Thanks,
> > >
> > > Nika
> > >
> > > At 10:20 AM 3/14/2007, Mihael Hategan wrote:
> > > >On Wed, 2007-03-14 at 09:32 -0500, Veronika V. Nefedova wrote:
> > > > > Ok, now I think you hit the area in your explanations that I always 
> > had a
> > > > > problem with. So here is my understanding of things:
> > > > >
> > > > > if I have two apps that I need to chain together, I need to do this:
> > > > >
> > > > > file a <"a.txt">;
> > > > > file b <"b.txt">;
> > > > > file c <"c.txt">;
> > > > >   a = APP1 (b);
> > > > >   c = APP2 (a);
> > > >
> > > >Yep. But if you don't care about what file a is put in, you can skip
> > > >mapping it. Although I gather it doesn't change things by much:
> > > >file a;
> > > >file b <"b.txt">;
> > > >file c <"c.txt">;
> > > >   a = APP1 (b);
> > > >   c = APP2 (a);
> > >
> > > I do care about "a.txt", but I do not care about "a". Thats the main point.
> > >
> > > more below...
> > >
> > > > >
> > > > > I.e. the chaining of the programs happens on a 'logical' file level 
> > (a,b,c
> > > > > rather then a.txt, b.txt, c.txt). Is that a correct understanding?
> > > >
> > > >Yes.
> > > >
> > > > >  I acted
> > > > > on this understanding and my workflow has been working fine (till 
> > now --
> > > > > but thats another story). Having create all this logical files was 
> > a *big*
> > > > > pain (as I couldn't have the same logical names as physical 
> > filenames due
> > > > > to a different file naming conventions in swift: no multiple ".", 
> > etc). It
> > > > > really would've been much easier for my workflow to have just this:
> > > > >
> > > > > a1.txt = APP1 (b.txt);
> > > > > a2.txt = APP2 (b.txt);
> > > > > c.txt   = APP3 (a1.txt, a2.txt);
> > > > >
> > > > > as my applications produce an enormous amount of intermediate files 
> > with
> > > > > some specific naming conventions.
> > > >
> > > >Swift needs to know about those. Any workflow system would need to know
> > > >about those. There is no way to automatically determine what set of
> > > >files an application invocation will need. It may be possible to
> > > >determine what set of files an application invocation produces (although
> > > >making it consistent may be difficult), but even in that case the matter
> > > >of distinguishing which of those are meaningful for your workflow is not
> > > >quite possible.
> > >
> > > I do not agree. You can specify the files that you need (intermediate or
> > > final) on the left side of the function call - exactly the way it is done
> > > now (but use the actual file names) :
> > >
> > > (a1.txt, a2.txt, a3.txt) = APP1 (b*.txt);
> > >
> > > where APP could be producing hundreds of a.txt files (a1.txt - a100.txt)
> > > and 10 c*.txt files (c1.txt -c10.txt). And only those 3 specified 
> > should be
> > > cared for. Or it could be done even this way:
> > >
> > > (a*.txt, c1.txt) = APP2 (b.txt); where I want to get all a*.txt files and
> > > only one c1.txt file
> > >
> > > Swift stages files just before the application starts. So it shouldn't
> > > affect the workflow system at all (to my understanding). Just the amount
> > > files that need to be staged in/out (alternatively, you can always zip all
> > > files together and have just one file staged in/out for any application).
> > >
> > > Anyway, I am not saying all this is easy -- just suggesting some
> > > alternatives to the current system that requires (in case of my
> > > application) some tedious filename operations...
> > >
> > > more below...
> > >
> > > > >
> > > > > Now back to my original problem - constructing and passing to my next
> > > > > application a collection of files. If I didn't have to do any 
> > mappers, it
> > > > > would've been just as easy as (for example):
> > > > >
> > > > > c.txt = APP3 (a*.txt);
> > > >There isn't much difference between that and c = APP3(a), where a is an
> > > >array. But I digress.
> > >
> > >
> > > Ok. But how do I construct that array in a clean way ? I thought that a
> > > fixed_array_mapper would do that for me (if I pass a string of logical
> > > filenames to it, shouldn't it create an array of files for me ?). Thats 
> > the
> > > main point - I can't construct an array of logical filenames and pass 
> > it to
> > > my application without re-writing the already-working code. Or I am just
> > > missing something - and an answer is a one line code change ? (;
> > >
> > > more below...
> > >
> > > > >
> > > > > Does it make sense at all ?
> > > >
> > > >Of course.
> > > >However, I'm not convinced how well it would work, for reasons outlined
> > > >above.
> > > >So there is a number of operations and certain dependencies between
> > > >them, where the operations are job submissions and file transfers (let's
> > > >abstract low level, technology dependent things for now). These need to
> > > >happen and are the result of executing a workflow (regardless of how
> > > >it's expressed). They represent the application that you are trying to
> > > >implement. If there is a way to infer all those operations and all the
> > > >dependencies from your specification model, then it would be ok. So in
> > > >the context of exploring a different way of expressing things, it would
> > > >be helpful to have a clear illustration of both of them and the rules
> > > >that get you from one to the other.
> > >
> > > It does sound like a good topic for the discussion! (-;
> > >
> > > Thanks again,
> > >
> > > Nika
> > >
> > >
> > > >Mihael
> > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Nika
> > > > >
> > > > >
> > > > >
> > > > > At 06:41 PM 3/13/2007, Mihael Hategan wrote:
> > > > > >On Tue, 2007-03-13 at 18:28 -0500, Veronika V. Nefedova wrote:
> > > > > > > Hmmm. So here is how my files are produced (inside double loop 
> > over
> > > > $s and
> > > > > > > $name):
> > > > > > >
> > > > > > > file $s9prt <"$name.prt">;
> > > > > > > file $s9wham  <"$s9.wham">;
> > > > > > > file $s9crd  <"$s9.crd">;
> > > > > > > file $s9out <"$s9.out">;
> > > > > > > file $s9done  <"$s9donefile">;
> > > > > > >
> > > > > > > ($s9wham, $s9crd, $s9out, $s9done) = CHARMM3 (standn, gaff_prm,
> > > > gaff_rft,
> > > > > > > rtf_file_$s, prm_file_$s, psf_file_$s, crd_eq_file_$s, $s9prt, 
> > "$ss1",
> > > > > > > "$s1", "$s2", "$s3", "$s4", "$s5", "$s7", "$s8", "$sprt", "$rcut1",
> > > > > > "$rcut2");
> > > > > > >
> > > > > > > so if I change the mapping of the needed output file ($s9wham),
> > > > everything
> > > > > > > should work?
> > > > > > >
> > > > > > > file whamfiles_$s[$i]  <"$s9.wham">;
> > > > > >
> > > > > >That one won't work.
> > > > > >You need to let Swift map whamfiles_$s[] to what it wants. So you 
> > can't
> > > > > >map individual items in an array differently.
> > > > > >
> > > > > >I believe that you rely on the fact that whamfiles_xzy maps to the 
> > same
> > > > > >file names as some other variables. This won't work. You need to 
> > use the
> > > > > >same variable. The file names are irrelevant if the program 
> > doesn't make
> > > > > >sense for Swift.
> > > > > >So think about it this way: mentally remove all the mapper 
> > declarations
> > > > > >from the Swift program. If after that, the program makes sense, 
> > then you
> > > > > >should be good to go. If it doesn't then it's likely it won't work.
> > > > > >Remember, mapping is not something that can be used to hack things
> > > > > >because the workflow structure has nothing to do with the mappers and
> > > > > >Swift ignores mappers when figuring out the data flow.
> > > > > >
> > > > > >(dependent mappers notwithstanding)
> > > > > >
> > > > > > > i=`expr $i + 1`
> > > > > > >
> > > > > > > and call the function:
> > > > > > > (whamfiles_$s[$i], $s9crd, $s9out, $s9done) = CHARMM3 (standn,
> > > > gaff_prm,
> > > > > > > gaff_rft, rtf_file_$s, prm_file_$s, psf_file_$s, crd_eq_file_$s,
> > > > $s9prt,
> > > > > > > "$ss1", "$s1", "$s2", "$s3", "$s4", "$s5", "$s7", "$s8", "$sprt",
> > > > > > "$rcut1",
> > > > > > > "$rcut2");
> > > > > > >
> > > > > > > Nika
> > > > > > >
> > > > > > > At 06:12 PM 3/13/2007, Mihael Hategan wrote:
> > > > > > > >On Tue, 2007-03-13 at 18:07 -0500, Veronika V. Nefedova wrote:
> > > > > > > > > ok, here is in short what I need to do:
> > > > > > > > >
> > > > > > > > > at some point in the workflow N files are produced (in my 
> > case its
> > > > > > 68, but
> > > > > > > > > it could be any number). These files produced each by a 
> > separate
> > > > > > job (i.e.
> > > > > > > > > N jobs produce N files).
> > > > > > > > > The next job in the workflow needs to take those N files as an
> > > > input.
> > > > > > > > >
> > > > > > > > > Question: how do I pass these unknown number of files as an
> > > > input to an
> > > > > > > > > application ? The array_mapper didn't work (or i didn't use it
> > > > > > correctly).
> > > > > > > >
> > > > > > > >In this case you need some other kind of mapper that can deal with
> > > > > > > >unknown numbers of items. The default mapper (i.e. specifying no
> > > > mapper)
> > > > > > > >should work.
> > > > > > > >
> > > > > > > >So you need to do:
> > > > > > > >
> > > > > > > >file whamfiles_002[];
> > > > > > > >
> > > > > > > >foreach v,k in someinput {
> > > > > > > >   whamfiles_002[k] = job(v);
> > > > > > > >}
> > > > > > > >
> > > > > > > >... = GENERATOR(whamfiles_002);
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks!
> > > > > > > > >
> > > > > > > > > Nika
> > > > > > > > >
> > > > > > > > > At 05:56 PM 3/13/2007, Mihael Hategan wrote:
> > > > > > > > > >On a third thought. This looks like, eventually, you are
> > > > trying to do
> > > > > > > > > >the same thing that Yong did with the dependent mappers
> > > > earlier. I
> > > > > > think
> > > > > > > > > >he would have more insight on the topic.
> > > > > > > > > >
> > > > > > > > > >On Tue, 2007-03-13 at 17:23 -0500, Veronika V. Nefedova wrote:
> > > > > > > > > > > I think I am confused. Sorry!
> > > > > > > > > > > what will be the type of 'whamfiles' ? If its a string 
> > - will
> > > > > > the swift
> > > > > > > > > > > know to brake it down to filenames and stage them all in ?
> > > > > > > > > > > Also - is there a mapper (or whatever) that can map the 
> > list of
> > > > > > > > *logical*
> > > > > > > > > > > file names to an array ? (thats what I was trying to do).
> > > > > > > > > > >
> > > > > > > > > > > Thanks!
> > > > > > > > > > >
> > > > > > > > > > > Nika
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > At 04:54 PM 3/13/2007, Mihael Hategan wrote:
> > > > > > > > > > > >Oh my :)
> > > > > > > > > > > >@whamfiles_m002 is known by the system at all times. That
> > > > means
> > > > > > > > > > > >GENERATOR does not need to wait for the actual files to be
> > > > > > there since
> > > > > > > > > > > >it knows very well what @whamfiles_m002 is (the list of
> > > > names).
> > > > > > > > > > > >
> > > > > > > > > > > >You should try this instead:
> > > > > > > > > > > >...
> > > > > > > > > > > >... GENERATOR(whamfiles, str) {
> > > > > > > > > > > >    app {
> > > > > > > > > > > >      generator @whamfiles, str;
> > > > > > > > > > > >    }
> > > > > > > > > > > >}
> > > > > > > > > > > >
> > > > > > > > > > > >... = GENERATOR(whamfiles_m002, "m002")
> > > > > > > > > > > >
> > > > > > > > > > > >Mihael
> > > > > > > > > > > >
> > > > > > > > > > > >On Tue, 2007-03-13 at 16:46 -0500, Veronika V. 
> > Nefedova wrote:
> > > > > > > > > > > > > Hi,
> > > > > > > > > > > > >
> > > > > > > > > > > > > I have a question:
> > > > > > > > > > > > >
> > > > > > > > > > > > > I am using a fixed_array_mapper to pass some 68 
> > files as an
> > > > > > > > input to my
> > > > > > > > > > > > > application called GENERATOR. I need to use the mapper
> > > > > > since the
> > > > > > > > > > number of
> > > > > > > > > > > > > input files is unknown before the workflow starts. 
> > Here is
> > > > > > how I
> > > > > > > > > > use it:
> > > > > > > > > > > > > file whamfiles_m002[] <fixed_array_mapper;files="
> > > > > > > > > > solv_chg_a0_m002_wham,
> > > > > > > > > > > > > solv_chg_a1_m002_wham, solv_chg_a10_m002_wham, 
> > <snip --
> > > > many
> > > > > > > > files,
> > > > > > > > > > you
> > > > > > > > > > > > get
> > > > > > > > > > > > > the idea>, solv_repu_0_0DOT2_b1_m002_wham">;
> > > > > > > > > > > > >
> > > > > > > > > > > > > These files are all generated by stage four of my
> > > > workflow,
> > > > > > each
> > > > > > > > > > file is
> > > > > > > > > > > > > mapped to a physical filename, for example:
> > > > > > > > > > > > >
> > > > > > > > > > > > > file solv_chg_a0_m002_wham  <"solv_chg_a0_m002.wham">;
> > > > > > > > > > > > > and this particular file is produced this way:
> > > > > > > > > > > > > (solv_chg_a0_m002_wham, solv_chg_a0_m002_crd,
> > > > > > solv_chg_a0_m002_out,
> > > > > > > > > > > > > solv_chg_a0_m002_done) = CHARMM2 (standn, gaff_prm,
> > > > gaff_rft,
> > > > > > > > > > > > > rtf_file_m002, prm_file_m002, psf_file_m002,
> > > > crd_eq_file_m002,
> > > > > > > > > > > > > solv_chg_a0_m002_prt, "prtfile:solv_chg_a0",
> > > > > > "system:solv_m002",
> > > > > > > > > > > > > "stitle:m002", "rtffile:parm03_gaff_all.rtf",
> > > > > > > > > > > > > "paramfile:parm03_gaffnb_all.prm", "gaff:m002_am1",
> > > > > > "stage:chg",
> > > > > > > > > > > > > "urandseed:5395098", "dirname:solv_chg_a0_m002");
> > > > > > > > > > > > >
> > > > > > > > > > > > > Then I call my application (the last stage of my 
> > workflow,
> > > > > > > > stage five)
> > > > > > > > > > > > >
> > > > > > > > > > > > > (solv_chg_m002, solv_disp_m002,
> > > > > > solv_repu_0DOT2_0DOT3_m002DOTwham,
> > > > > > > > > > > > > solv_repu_0DOT3_0DOT4_m002DOTwham,
> > > > > > > > solv_repu_0DOT4_0DOT5_m002DOTwham,
> > > > > > > > > > > > > solv_repu_0DOT5_0DOT6_m002DOTwham,
> > > > > > > > solv_repu_0DOT6_0DOT7_m002DOTwham,
> > > > > > > > > > > > > solv_repu_0DOT7_0DOT8_m002DOTwham,
> > > > > > > > solv_repu_0DOT8_0DOT9_m002DOTwham,
> > > > > > > > > > > > > solv_repu_0DOT9_1_m002DOTwham,
> > > > > > solv_repu_0_0DOT2_m002DOTwham ) =
> > > > > > > > > > GENERATOR
> > > > > > > > > > > > > (@whamfiles_m002, "m002");
> > > > > > > > > > > > >
> > > > > > > > > > > > > And then when I start my workflow, the GENERATOR 
> > starts
> > > > right
> > > > > > > > away.
> > > > > > > > > > I am
> > > > > > > > > > > > > not sure why. Does the mapper look for the physical 
> > files
> > > > > > on the
> > > > > > > > > > disk and
> > > > > > > > > > > > > when finds them - starts right away ? I do have the 
> > needed
> > > > > > > > files in the
> > > > > > > > > > > > > directory from my previous runs. Or there is something
> > > > else
> > > > > > wrong
> > > > > > > > > > here ?
> > > > > > > > > > > > >
> > > > > > > > > > > > > 109] wiggum /sandbox/ydeng/alamines > Swift V 0.0405
> > > > > > > > > > > > > RunID: b0n2liektep92
> > > > > > > > > > > > > pre_ch started              <---------- thats the 
> > first
> > > > stage
> > > > > > > > > > > > > generator_cat started    <----------- not supposed to
> > > > start
> > > > > > now!
> > > > > > > > > > > > > generator_cat started
> > > > > > > > > > > > >
> > > > > > > > > > > > > My complete dtm file is in /home/nefedova/swift.dtm on
> > > > > > > > > > > > > terminable.ci.uchicago, but its pretty big...
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Nika
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > _______________________________________________
> > > > > > > > > > > > > Swift-devel mailing list
> > > > > > > > > > > > > Swift-devel at ci.uchicago.edu
> > > > > > > > > > > > > 
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > >
> > >
> 
> 




More information about the Swift-devel mailing list