[Swift-devel] mapper problem or ...?

Veronika V. Nefedova nefedova at mcs.anl.gov
Wed Mar 14 11:02:01 CDT 2007


Hi, Mihael:

please see my comments below...

Thanks,

Nika

At 10:20 AM 3/14/2007, Mihael Hategan wrote:
>On Wed, 2007-03-14 at 09:32 -0500, Veronika V. Nefedova wrote:
> > Ok, now I think you hit the area in your explanations that I always had a
> > problem with. So here is my understanding of things:
> >
> > if I have two apps that I need to chain together, I need to do this:
> >
> > file a <"a.txt">;
> > file b <"b.txt">;
> > file c <"c.txt">;
> >   a = APP1 (b);
> >   c = APP2 (a);
>
>Yep. But if you don't care about what file a is put in, you can skip
>mapping it. Although I gather it doesn't change things by much:
>file a;
>file b <"b.txt">;
>file c <"c.txt">;
>   a = APP1 (b);
>   c = APP2 (a);

I do care about "a.txt", but I do not care about "a". Thats the main point.

more below...

> >
> > I.e. the chaining of the programs happens on a 'logical' file level (a,b,c
> > rather then a.txt, b.txt, c.txt). Is that a correct understanding?
>
>Yes.
>
> >  I acted
> > on this understanding and my workflow has been working fine (till now --
> > but thats another story). Having create all this logical files was a *big*
> > pain (as I couldn't have the same logical names as physical filenames due
> > to a different file naming conventions in swift: no multiple ".", etc). It
> > really would've been much easier for my workflow to have just this:
> >
> > a1.txt = APP1 (b.txt);
> > a2.txt = APP2 (b.txt);
> > c.txt   = APP3 (a1.txt, a2.txt);
> >
> > as my applications produce an enormous amount of intermediate files with
> > some specific naming conventions.
>
>Swift needs to know about those. Any workflow system would need to know
>about those. There is no way to automatically determine what set of
>files an application invocation will need. It may be possible to
>determine what set of files an application invocation produces (although
>making it consistent may be difficult), but even in that case the matter
>of distinguishing which of those are meaningful for your workflow is not
>quite possible.

I do not agree. You can specify the files that you need (intermediate or 
final) on the left side of the function call - exactly the way it is done 
now (but use the actual file names) :

(a1.txt, a2.txt, a3.txt) = APP1 (b*.txt);

where APP could be producing hundreds of a.txt files (a1.txt - a100.txt) 
and 10 c*.txt files (c1.txt -c10.txt). And only those 3 specified should be 
cared for. Or it could be done even this way:

(a*.txt, c1.txt) = APP2 (b.txt); where I want to get all a*.txt files and 
only one c1.txt file

Swift stages files just before the application starts. So it shouldn't 
affect the workflow system at all (to my understanding). Just the amount 
files that need to be staged in/out (alternatively, you can always zip all 
files together and have just one file staged in/out for any application).

Anyway, I am not saying all this is easy -- just suggesting some 
alternatives to the current system that requires (in case of my 
application) some tedious filename operations...

more below...

> >
> > Now back to my original problem - constructing and passing to my next
> > application a collection of files. If I didn't have to do any mappers, it
> > would've been just as easy as (for example):
> >
> > c.txt = APP3 (a*.txt);
>There isn't much difference between that and c = APP3(a), where a is an
>array. But I digress.


Ok. But how do I construct that array in a clean way ? I thought that a 
fixed_array_mapper would do that for me (if I pass a string of logical 
filenames to it, shouldn't it create an array of files for me ?). Thats the 
main point - I can't construct an array of logical filenames and pass it to 
my application without re-writing the already-working code. Or I am just 
missing something - and an answer is a one line code change ? (;

more below...

> >
> > Does it make sense at all ?
>
>Of course.
>However, I'm not convinced how well it would work, for reasons outlined
>above.
>So there is a number of operations and certain dependencies between
>them, where the operations are job submissions and file transfers (let's
>abstract low level, technology dependent things for now). These need to
>happen and are the result of executing a workflow (regardless of how
>it's expressed). They represent the application that you are trying to
>implement. If there is a way to infer all those operations and all the
>dependencies from your specification model, then it would be ok. So in
>the context of exploring a different way of expressing things, it would
>be helpful to have a clear illustration of both of them and the rules
>that get you from one to the other.

It does sound like a good topic for the discussion! (-;

Thanks again,

Nika


>Mihael
>
> >
> > Thanks,
> >
> > Nika
> >
> >
> >
> > At 06:41 PM 3/13/2007, Mihael Hategan wrote:
> > >On Tue, 2007-03-13 at 18:28 -0500, Veronika V. Nefedova wrote:
> > > > Hmmm. So here is how my files are produced (inside double loop over 
> $s and
> > > > $name):
> > > >
> > > > file $s9prt <"$name.prt">;
> > > > file $s9wham  <"$s9.wham">;
> > > > file $s9crd  <"$s9.crd">;
> > > > file $s9out <"$s9.out">;
> > > > file $s9done  <"$s9donefile">;
> > > >
> > > > ($s9wham, $s9crd, $s9out, $s9done) = CHARMM3 (standn, gaff_prm, 
> gaff_rft,
> > > > rtf_file_$s, prm_file_$s, psf_file_$s, crd_eq_file_$s, $s9prt, "$ss1",
> > > > "$s1", "$s2", "$s3", "$s4", "$s5", "$s7", "$s8", "$sprt", "$rcut1",
> > > "$rcut2");
> > > >
> > > > so if I change the mapping of the needed output file ($s9wham), 
> everything
> > > > should work?
> > > >
> > > > file whamfiles_$s[$i]  <"$s9.wham">;
> > >
> > >That one won't work.
> > >You need to let Swift map whamfiles_$s[] to what it wants. So you can't
> > >map individual items in an array differently.
> > >
> > >I believe that you rely on the fact that whamfiles_xzy maps to the same
> > >file names as some other variables. This won't work. You need to use the
> > >same variable. The file names are irrelevant if the program doesn't make
> > >sense for Swift.
> > >So think about it this way: mentally remove all the mapper declarations
> > >from the Swift program. If after that, the program makes sense, then you
> > >should be good to go. If it doesn't then it's likely it won't work.
> > >Remember, mapping is not something that can be used to hack things
> > >because the workflow structure has nothing to do with the mappers and
> > >Swift ignores mappers when figuring out the data flow.
> > >
> > >(dependent mappers notwithstanding)
> > >
> > > > i=`expr $i + 1`
> > > >
> > > > and call the function:
> > > > (whamfiles_$s[$i], $s9crd, $s9out, $s9done) = CHARMM3 (standn, 
> gaff_prm,
> > > > gaff_rft, rtf_file_$s, prm_file_$s, psf_file_$s, crd_eq_file_$s, 
> $s9prt,
> > > > "$ss1", "$s1", "$s2", "$s3", "$s4", "$s5", "$s7", "$s8", "$sprt",
> > > "$rcut1",
> > > > "$rcut2");
> > > >
> > > > Nika
> > > >
> > > > At 06:12 PM 3/13/2007, Mihael Hategan wrote:
> > > > >On Tue, 2007-03-13 at 18:07 -0500, Veronika V. Nefedova wrote:
> > > > > > ok, here is in short what I need to do:
> > > > > >
> > > > > > at some point in the workflow N files are produced (in my case its
> > > 68, but
> > > > > > it could be any number). These files produced each by a separate
> > > job (i.e.
> > > > > > N jobs produce N files).
> > > > > > The next job in the workflow needs to take those N files as an 
> input.
> > > > > >
> > > > > > Question: how do I pass these unknown number of files as an 
> input to an
> > > > > > application ? The array_mapper didn't work (or i didn't use it
> > > correctly).
> > > > >
> > > > >In this case you need some other kind of mapper that can deal with
> > > > >unknown numbers of items. The default mapper (i.e. specifying no 
> mapper)
> > > > >should work.
> > > > >
> > > > >So you need to do:
> > > > >
> > > > >file whamfiles_002[];
> > > > >
> > > > >foreach v,k in someinput {
> > > > >   whamfiles_002[k] = job(v);
> > > > >}
> > > > >
> > > > >... = GENERATOR(whamfiles_002);
> > > > >
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > Nika
> > > > > >
> > > > > > At 05:56 PM 3/13/2007, Mihael Hategan wrote:
> > > > > > >On a third thought. This looks like, eventually, you are 
> trying to do
> > > > > > >the same thing that Yong did with the dependent mappers 
> earlier. I
> > > think
> > > > > > >he would have more insight on the topic.
> > > > > > >
> > > > > > >On Tue, 2007-03-13 at 17:23 -0500, Veronika V. Nefedova wrote:
> > > > > > > > I think I am confused. Sorry!
> > > > > > > > what will be the type of 'whamfiles' ? If its a string - will
> > > the swift
> > > > > > > > know to brake it down to filenames and stage them all in ?
> > > > > > > > Also - is there a mapper (or whatever) that can map the list of
> > > > > *logical*
> > > > > > > > file names to an array ? (thats what I was trying to do).
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > >
> > > > > > > > Nika
> > > > > > > >
> > > > > > > >
> > > > > > > > At 04:54 PM 3/13/2007, Mihael Hategan wrote:
> > > > > > > > >Oh my :)
> > > > > > > > >@whamfiles_m002 is known by the system at all times. That 
> means
> > > > > > > > >GENERATOR does not need to wait for the actual files to be
> > > there since
> > > > > > > > >it knows very well what @whamfiles_m002 is (the list of 
> names).
> > > > > > > > >
> > > > > > > > >You should try this instead:
> > > > > > > > >...
> > > > > > > > >... GENERATOR(whamfiles, str) {
> > > > > > > > >    app {
> > > > > > > > >      generator @whamfiles, str;
> > > > > > > > >    }
> > > > > > > > >}
> > > > > > > > >
> > > > > > > > >... = GENERATOR(whamfiles_m002, "m002")
> > > > > > > > >
> > > > > > > > >Mihael
> > > > > > > > >
> > > > > > > > >On Tue, 2007-03-13 at 16:46 -0500, Veronika V. Nefedova wrote:
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > I have a question:
> > > > > > > > > >
> > > > > > > > > > I am using a fixed_array_mapper to pass some 68 files as an
> > > > > input to my
> > > > > > > > > > application called GENERATOR. I need to use the mapper
> > > since the
> > > > > > > number of
> > > > > > > > > > input files is unknown before the workflow starts. Here is
> > > how I
> > > > > > > use it:
> > > > > > > > > > file whamfiles_m002[] <fixed_array_mapper;files="
> > > > > > > solv_chg_a0_m002_wham,
> > > > > > > > > > solv_chg_a1_m002_wham, solv_chg_a10_m002_wham, <snip -- 
> many
> > > > > files,
> > > > > > > you
> > > > > > > > > get
> > > > > > > > > > the idea>, solv_repu_0_0DOT2_b1_m002_wham">;
> > > > > > > > > >
> > > > > > > > > > These files are all generated by stage four of my 
> workflow,
> > > each
> > > > > > > file is
> > > > > > > > > > mapped to a physical filename, for example:
> > > > > > > > > >
> > > > > > > > > > file solv_chg_a0_m002_wham  <"solv_chg_a0_m002.wham">;
> > > > > > > > > > and this particular file is produced this way:
> > > > > > > > > > (solv_chg_a0_m002_wham, solv_chg_a0_m002_crd,
> > > solv_chg_a0_m002_out,
> > > > > > > > > > solv_chg_a0_m002_done) = CHARMM2 (standn, gaff_prm, 
> gaff_rft,
> > > > > > > > > > rtf_file_m002, prm_file_m002, psf_file_m002, 
> crd_eq_file_m002,
> > > > > > > > > > solv_chg_a0_m002_prt, "prtfile:solv_chg_a0",
> > > "system:solv_m002",
> > > > > > > > > > "stitle:m002", "rtffile:parm03_gaff_all.rtf",
> > > > > > > > > > "paramfile:parm03_gaffnb_all.prm", "gaff:m002_am1",
> > > "stage:chg",
> > > > > > > > > > "urandseed:5395098", "dirname:solv_chg_a0_m002");
> > > > > > > > > >
> > > > > > > > > > Then I call my application (the last stage of my workflow,
> > > > > stage five)
> > > > > > > > > >
> > > > > > > > > > (solv_chg_m002, solv_disp_m002,
> > > solv_repu_0DOT2_0DOT3_m002DOTwham,
> > > > > > > > > > solv_repu_0DOT3_0DOT4_m002DOTwham,
> > > > > solv_repu_0DOT4_0DOT5_m002DOTwham,
> > > > > > > > > > solv_repu_0DOT5_0DOT6_m002DOTwham,
> > > > > solv_repu_0DOT6_0DOT7_m002DOTwham,
> > > > > > > > > > solv_repu_0DOT7_0DOT8_m002DOTwham,
> > > > > solv_repu_0DOT8_0DOT9_m002DOTwham,
> > > > > > > > > > solv_repu_0DOT9_1_m002DOTwham,
> > > solv_repu_0_0DOT2_m002DOTwham ) =
> > > > > > > GENERATOR
> > > > > > > > > > (@whamfiles_m002, "m002");
> > > > > > > > > >
> > > > > > > > > > And then when I start my workflow, the GENERATOR starts 
> right
> > > > > away.
> > > > > > > I am
> > > > > > > > > > not sure why. Does the mapper look for the physical files
> > > on the
> > > > > > > disk and
> > > > > > > > > > when finds them - starts right away ? I do have the needed
> > > > > files in the
> > > > > > > > > > directory from my previous runs. Or there is something 
> else
> > > wrong
> > > > > > > here ?
> > > > > > > > > >
> > > > > > > > > > 109] wiggum /sandbox/ydeng/alamines > Swift V 0.0405
> > > > > > > > > > RunID: b0n2liektep92
> > > > > > > > > > pre_ch started              <---------- thats the first 
> stage
> > > > > > > > > > generator_cat started    <----------- not supposed to 
> start
> > > now!
> > > > > > > > > > generator_cat started
> > > > > > > > > >
> > > > > > > > > > My complete dtm file is in /home/nefedova/swift.dtm on
> > > > > > > > > > terminable.ci.uchicago, but its pretty big...
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > >
> > > > > > > > > > Nika
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > _______________________________________________
> > > > > > > > > > Swift-devel mailing list
> > > > > > > > > > Swift-devel at ci.uchicago.edu
> > > > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > >
> > > >
> >
> >





More information about the Swift-devel mailing list