[Swift-devel] mapper problem or ...?

Mihael Hategan hategan at mcs.anl.gov
Wed Mar 14 11:20:12 CDT 2007


Ok, I see what you're saying. You're not suggesting "hiding" of
dependencies.

I guess it could be possible to come up with some syntatic sugar. We
would then consider data like that to be singletons.
Example:
<"a.txt"> = APP1(<"b.txt">);
<"c.txt"> = APP2(<"a.txt">);

On Wed, 2007-03-14 at 11:02 -0500, Veronika V. Nefedova wrote:
> Hi, Mihael:
> 
> please see my comments below...
> 
> Thanks,
> 
> Nika
> 
> At 10:20 AM 3/14/2007, Mihael Hategan wrote:
> >On Wed, 2007-03-14 at 09:32 -0500, Veronika V. Nefedova wrote:
> > > Ok, now I think you hit the area in your explanations that I always had a
> > > problem with. So here is my understanding of things:
> > >
> > > if I have two apps that I need to chain together, I need to do this:
> > >
> > > file a <"a.txt">;
> > > file b <"b.txt">;
> > > file c <"c.txt">;
> > >   a = APP1 (b);
> > >   c = APP2 (a);
> >
> >Yep. But if you don't care about what file a is put in, you can skip
> >mapping it. Although I gather it doesn't change things by much:
> >file a;
> >file b <"b.txt">;
> >file c <"c.txt">;
> >   a = APP1 (b);
> >   c = APP2 (a);
> 
> I do care about "a.txt", but I do not care about "a". Thats the main point.
> 
> more below...
> 
> > >
> > > I.e. the chaining of the programs happens on a 'logical' file level (a,b,c
> > > rather then a.txt, b.txt, c.txt). Is that a correct understanding?
> >
> >Yes.
> >
> > >  I acted
> > > on this understanding and my workflow has been working fine (till now --
> > > but thats another story). Having create all this logical files was a *big*
> > > pain (as I couldn't have the same logical names as physical filenames due
> > > to a different file naming conventions in swift: no multiple ".", etc). It
> > > really would've been much easier for my workflow to have just this:
> > >
> > > a1.txt = APP1 (b.txt);
> > > a2.txt = APP2 (b.txt);
> > > c.txt   = APP3 (a1.txt, a2.txt);
> > >
> > > as my applications produce an enormous amount of intermediate files with
> > > some specific naming conventions.
> >
> >Swift needs to know about those. Any workflow system would need to know
> >about those. There is no way to automatically determine what set of
> >files an application invocation will need. It may be possible to
> >determine what set of files an application invocation produces (although
> >making it consistent may be difficult), but even in that case the matter
> >of distinguishing which of those are meaningful for your workflow is not
> >quite possible.
> 
> I do not agree. You can specify the files that you need (intermediate or 
> final) on the left side of the function call - exactly the way it is done 
> now (but use the actual file names) :
> 
> (a1.txt, a2.txt, a3.txt) = APP1 (b*.txt);
> 
> where APP could be producing hundreds of a.txt files (a1.txt - a100.txt) 
> and 10 c*.txt files (c1.txt -c10.txt). And only those 3 specified should be 
> cared for. Or it could be done even this way:
> 
> (a*.txt, c1.txt) = APP2 (b.txt); where I want to get all a*.txt files and 
> only one c1.txt file
> 
> Swift stages files just before the application starts. So it shouldn't 
> affect the workflow system at all (to my understanding). Just the amount 
> files that need to be staged in/out (alternatively, you can always zip all 
> files together and have just one file staged in/out for any application).
> 
> Anyway, I am not saying all this is easy -- just suggesting some 
> alternatives to the current system that requires (in case of my 
> application) some tedious filename operations...
> 
> more below...
> 
> > >
> > > Now back to my original problem - constructing and passing to my next
> > > application a collection of files. If I didn't have to do any mappers, it
> > > would've been just as easy as (for example):
> > >
> > > c.txt = APP3 (a*.txt);
> >There isn't much difference between that and c = APP3(a), where a is an
> >array. But I digress.
> 
> 
> Ok. But how do I construct that array in a clean way ? I thought that a 
> fixed_array_mapper would do that for me (if I pass a string of logical 
> filenames to it, shouldn't it create an array of files for me ?). Thats the 
> main point - I can't construct an array of logical filenames and pass it to 
> my application without re-writing the already-working code. Or I am just 
> missing something - and an answer is a one line code change ? (;
> 
> more below...
> 
> > >
> > > Does it make sense at all ?
> >
> >Of course.
> >However, I'm not convinced how well it would work, for reasons outlined
> >above.
> >So there is a number of operations and certain dependencies between
> >them, where the operations are job submissions and file transfers (let's
> >abstract low level, technology dependent things for now). These need to
> >happen and are the result of executing a workflow (regardless of how
> >it's expressed). They represent the application that you are trying to
> >implement. If there is a way to infer all those operations and all the
> >dependencies from your specification model, then it would be ok. So in
> >the context of exploring a different way of expressing things, it would
> >be helpful to have a clear illustration of both of them and the rules
> >that get you from one to the other.
> 
> It does sound like a good topic for the discussion! (-;
> 
> Thanks again,
> 
> Nika
> 
> 
> >Mihael
> >
> > >
> > > Thanks,
> > >
> > > Nika
> > >
> > >
> > >
> > > At 06:41 PM 3/13/2007, Mihael Hategan wrote:
> > > >On Tue, 2007-03-13 at 18:28 -0500, Veronika V. Nefedova wrote:
> > > > > Hmmm. So here is how my files are produced (inside double loop over 
> > $s and
> > > > > $name):
> > > > >
> > > > > file $s9prt <"$name.prt">;
> > > > > file $s9wham  <"$s9.wham">;
> > > > > file $s9crd  <"$s9.crd">;
> > > > > file $s9out <"$s9.out">;
> > > > > file $s9done  <"$s9donefile">;
> > > > >
> > > > > ($s9wham, $s9crd, $s9out, $s9done) = CHARMM3 (standn, gaff_prm, 
> > gaff_rft,
> > > > > rtf_file_$s, prm_file_$s, psf_file_$s, crd_eq_file_$s, $s9prt, "$ss1",
> > > > > "$s1", "$s2", "$s3", "$s4", "$s5", "$s7", "$s8", "$sprt", "$rcut1",
> > > > "$rcut2");
> > > > >
> > > > > so if I change the mapping of the needed output file ($s9wham), 
> > everything
> > > > > should work?
> > > > >
> > > > > file whamfiles_$s[$i]  <"$s9.wham">;
> > > >
> > > >That one won't work.
> > > >You need to let Swift map whamfiles_$s[] to what it wants. So you can't
> > > >map individual items in an array differently.
> > > >
> > > >I believe that you rely on the fact that whamfiles_xzy maps to the same
> > > >file names as some other variables. This won't work. You need to use the
> > > >same variable. The file names are irrelevant if the program doesn't make
> > > >sense for Swift.
> > > >So think about it this way: mentally remove all the mapper declarations
> > > >from the Swift program. If after that, the program makes sense, then you
> > > >should be good to go. If it doesn't then it's likely it won't work.
> > > >Remember, mapping is not something that can be used to hack things
> > > >because the workflow structure has nothing to do with the mappers and
> > > >Swift ignores mappers when figuring out the data flow.
> > > >
> > > >(dependent mappers notwithstanding)
> > > >
> > > > > i=`expr $i + 1`
> > > > >
> > > > > and call the function:
> > > > > (whamfiles_$s[$i], $s9crd, $s9out, $s9done) = CHARMM3 (standn, 
> > gaff_prm,
> > > > > gaff_rft, rtf_file_$s, prm_file_$s, psf_file_$s, crd_eq_file_$s, 
> > $s9prt,
> > > > > "$ss1", "$s1", "$s2", "$s3", "$s4", "$s5", "$s7", "$s8", "$sprt",
> > > > "$rcut1",
> > > > > "$rcut2");
> > > > >
> > > > > Nika
> > > > >
> > > > > At 06:12 PM 3/13/2007, Mihael Hategan wrote:
> > > > > >On Tue, 2007-03-13 at 18:07 -0500, Veronika V. Nefedova wrote:
> > > > > > > ok, here is in short what I need to do:
> > > > > > >
> > > > > > > at some point in the workflow N files are produced (in my case its
> > > > 68, but
> > > > > > > it could be any number). These files produced each by a separate
> > > > job (i.e.
> > > > > > > N jobs produce N files).
> > > > > > > The next job in the workflow needs to take those N files as an 
> > input.
> > > > > > >
> > > > > > > Question: how do I pass these unknown number of files as an 
> > input to an
> > > > > > > application ? The array_mapper didn't work (or i didn't use it
> > > > correctly).
> > > > > >
> > > > > >In this case you need some other kind of mapper that can deal with
> > > > > >unknown numbers of items. The default mapper (i.e. specifying no 
> > mapper)
> > > > > >should work.
> > > > > >
> > > > > >So you need to do:
> > > > > >
> > > > > >file whamfiles_002[];
> > > > > >
> > > > > >foreach v,k in someinput {
> > > > > >   whamfiles_002[k] = job(v);
> > > > > >}
> > > > > >
> > > > > >... = GENERATOR(whamfiles_002);
> > > > > >
> > > > > > >
> > > > > > > Thanks!
> > > > > > >
> > > > > > > Nika
> > > > > > >
> > > > > > > At 05:56 PM 3/13/2007, Mihael Hategan wrote:
> > > > > > > >On a third thought. This looks like, eventually, you are 
> > trying to do
> > > > > > > >the same thing that Yong did with the dependent mappers 
> > earlier. I
> > > > think
> > > > > > > >he would have more insight on the topic.
> > > > > > > >
> > > > > > > >On Tue, 2007-03-13 at 17:23 -0500, Veronika V. Nefedova wrote:
> > > > > > > > > I think I am confused. Sorry!
> > > > > > > > > what will be the type of 'whamfiles' ? If its a string - will
> > > > the swift
> > > > > > > > > know to brake it down to filenames and stage them all in ?
> > > > > > > > > Also - is there a mapper (or whatever) that can map the list of
> > > > > > *logical*
> > > > > > > > > file names to an array ? (thats what I was trying to do).
> > > > > > > > >
> > > > > > > > > Thanks!
> > > > > > > > >
> > > > > > > > > Nika
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > At 04:54 PM 3/13/2007, Mihael Hategan wrote:
> > > > > > > > > >Oh my :)
> > > > > > > > > >@whamfiles_m002 is known by the system at all times. That 
> > means
> > > > > > > > > >GENERATOR does not need to wait for the actual files to be
> > > > there since
> > > > > > > > > >it knows very well what @whamfiles_m002 is (the list of 
> > names).
> > > > > > > > > >
> > > > > > > > > >You should try this instead:
> > > > > > > > > >...
> > > > > > > > > >... GENERATOR(whamfiles, str) {
> > > > > > > > > >    app {
> > > > > > > > > >      generator @whamfiles, str;
> > > > > > > > > >    }
> > > > > > > > > >}
> > > > > > > > > >
> > > > > > > > > >... = GENERATOR(whamfiles_m002, "m002")
> > > > > > > > > >
> > > > > > > > > >Mihael
> > > > > > > > > >
> > > > > > > > > >On Tue, 2007-03-13 at 16:46 -0500, Veronika V. Nefedova wrote:
> > > > > > > > > > > Hi,
> > > > > > > > > > >
> > > > > > > > > > > I have a question:
> > > > > > > > > > >
> > > > > > > > > > > I am using a fixed_array_mapper to pass some 68 files as an
> > > > > > input to my
> > > > > > > > > > > application called GENERATOR. I need to use the mapper
> > > > since the
> > > > > > > > number of
> > > > > > > > > > > input files is unknown before the workflow starts. Here is
> > > > how I
> > > > > > > > use it:
> > > > > > > > > > > file whamfiles_m002[] <fixed_array_mapper;files="
> > > > > > > > solv_chg_a0_m002_wham,
> > > > > > > > > > > solv_chg_a1_m002_wham, solv_chg_a10_m002_wham, <snip -- 
> > many
> > > > > > files,
> > > > > > > > you
> > > > > > > > > > get
> > > > > > > > > > > the idea>, solv_repu_0_0DOT2_b1_m002_wham">;
> > > > > > > > > > >
> > > > > > > > > > > These files are all generated by stage four of my 
> > workflow,
> > > > each
> > > > > > > > file is
> > > > > > > > > > > mapped to a physical filename, for example:
> > > > > > > > > > >
> > > > > > > > > > > file solv_chg_a0_m002_wham  <"solv_chg_a0_m002.wham">;
> > > > > > > > > > > and this particular file is produced this way:
> > > > > > > > > > > (solv_chg_a0_m002_wham, solv_chg_a0_m002_crd,
> > > > solv_chg_a0_m002_out,
> > > > > > > > > > > solv_chg_a0_m002_done) = CHARMM2 (standn, gaff_prm, 
> > gaff_rft,
> > > > > > > > > > > rtf_file_m002, prm_file_m002, psf_file_m002, 
> > crd_eq_file_m002,
> > > > > > > > > > > solv_chg_a0_m002_prt, "prtfile:solv_chg_a0",
> > > > "system:solv_m002",
> > > > > > > > > > > "stitle:m002", "rtffile:parm03_gaff_all.rtf",
> > > > > > > > > > > "paramfile:parm03_gaffnb_all.prm", "gaff:m002_am1",
> > > > "stage:chg",
> > > > > > > > > > > "urandseed:5395098", "dirname:solv_chg_a0_m002");
> > > > > > > > > > >
> > > > > > > > > > > Then I call my application (the last stage of my workflow,
> > > > > > stage five)
> > > > > > > > > > >
> > > > > > > > > > > (solv_chg_m002, solv_disp_m002,
> > > > solv_repu_0DOT2_0DOT3_m002DOTwham,
> > > > > > > > > > > solv_repu_0DOT3_0DOT4_m002DOTwham,
> > > > > > solv_repu_0DOT4_0DOT5_m002DOTwham,
> > > > > > > > > > > solv_repu_0DOT5_0DOT6_m002DOTwham,
> > > > > > solv_repu_0DOT6_0DOT7_m002DOTwham,
> > > > > > > > > > > solv_repu_0DOT7_0DOT8_m002DOTwham,
> > > > > > solv_repu_0DOT8_0DOT9_m002DOTwham,
> > > > > > > > > > > solv_repu_0DOT9_1_m002DOTwham,
> > > > solv_repu_0_0DOT2_m002DOTwham ) =
> > > > > > > > GENERATOR
> > > > > > > > > > > (@whamfiles_m002, "m002");
> > > > > > > > > > >
> > > > > > > > > > > And then when I start my workflow, the GENERATOR starts 
> > right
> > > > > > away.
> > > > > > > > I am
> > > > > > > > > > > not sure why. Does the mapper look for the physical files
> > > > on the
> > > > > > > > disk and
> > > > > > > > > > > when finds them - starts right away ? I do have the needed
> > > > > > files in the
> > > > > > > > > > > directory from my previous runs. Or there is something 
> > else
> > > > wrong
> > > > > > > > here ?
> > > > > > > > > > >
> > > > > > > > > > > 109] wiggum /sandbox/ydeng/alamines > Swift V 0.0405
> > > > > > > > > > > RunID: b0n2liektep92
> > > > > > > > > > > pre_ch started              <---------- thats the first 
> > stage
> > > > > > > > > > > generator_cat started    <----------- not supposed to 
> > start
> > > > now!
> > > > > > > > > > > generator_cat started
> > > > > > > > > > >
> > > > > > > > > > > My complete dtm file is in /home/nefedova/swift.dtm on
> > > > > > > > > > > terminable.ci.uchicago, but its pretty big...
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > >
> > > > > > > > > > > Nika
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > _______________________________________________
> > > > > > > > > > > Swift-devel mailing list
> > > > > > > > > > > Swift-devel at ci.uchicago.edu
> > > > > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > >
> > >
> 
> 




More information about the Swift-devel mailing list