[Swift-devel] duplicated job submission in swift-0.92?

Michael Wilde wilde at mcs.anl.gov
Wed Mar 30 22:10:09 CDT 2011


Yeah, I'm done for now.  Except that Im a little baffled as to why this is turning up now, as I thought I was running much more complex scripts (generating hundreds of thousands of files) with no sign of behavior like this, on what I think is the same revision. And what Im running is (I *think*) from before the merge you are talking about. Unless I'm misunderstanding what you discovered in svn.

- Mike


----- Original Message -----
> I think at this point we should stop testing 0.92 until we figure out
> the reason for the merge.
> 
> Trunk contained a pretty dramatic change to the karajan engine and I
> would expect badness like that on a merge back to a stable branch. The
> previous behaviour (double iterations) alone is a sign of badness, and
> so this new thing doesn't surprise me.
> 
> Mihael
> 
> On Wed, 2011-03-30 at 21:54 -0500, Michael Wilde wrote:
> > There seems to be something non-deterministic about this script:
> >
> > com$ cat zz5.swift
> > int arr[];
> > int brr[];
> >
> > arr[0]=1;
> > arr[1]=2;
> >
> > brr = [1:2];
> >
> > trace("arr",arr);
> > trace("brr",brr);
> >
> > foreach a in arr {
> >   trace("for", a);
> > }
> >
> > com$
> >
> > (By the way, Im seeing the same error on communicado)
> >
> > The script above sometimes prints 2, 3, or 4 instances of the
> > trace() inside the foreach. And sometimes it hangs on one of the two
> > trace statements outside the loop. Most cases, it prints all 6
> > traces, as in the original failing case.
> >
> > com$ swift zz5.swift
> > Swift svn swift-r4087 (swift modified locally) cog-r3051
> >
> > RunID: 20110330-2148-qf5anxr6
> > Progress: time:2
> > SwiftScript trace: arr, arr.$[]/2
> > SwiftScript trace: brr, brr.$[]/2
> > SwiftScript trace: for, 1
> > SwiftScript trace: for, 2
> > Final status: time:12
> > Time: 1.163, rate: 14087 j/s
> > com$ swift zz5.swift
> > Swift svn swift-r4087 (swift modified locally) cog-r3051
> >
> > RunID: 20110330-2148-kouc9zq3
> > Progress: time:3
> > SwiftScript trace: arr, arr.$[]/2
> > SwiftScript trace: brr, brr.$[]/2
> > SwiftScript trace: for, 2
> > SwiftScript trace: for, 2
> > SwiftScript trace: for, 1
> > SwiftScript trace: for, 1
> > Final status: time:16
> > Time: 1.214, rate: 13495 j/s
> > com$ swift zz5.swift
> > Swift svn swift-r4087 (swift modified locally) cog-r3051
> >
> > RunID: 20110330-2148-lksn2a17
> > Progress: time:2
> > SwiftScript trace: arr, arr.$[]/2
> > SwiftScript trace: brr, brr.$[]/2
> > SwiftScript trace: for, 1
> > SwiftScript trace: for, 2
> > SwiftScript trace: for, 2
> > SwiftScript trace: for, 1
> > Final status: time:17
> > Time: 1.227, rate: 13352 j/s
> > com$ swift zz5.swift
> > Swift svn swift-r4087 (swift modified locally) cog-r3051
> >
> > RunID: 20110330-2148-tl2xtxx6
> > Progress: time:1
> > SwiftScript trace: arr, arr.$[]/2
> > SwiftScript trace: for, 1
> > SwiftScript trace: for, 1
> > SwiftScript trace: brr, brr.$[]/2
> > SwiftScript trace: for, 2
> > SwiftScript trace: for, 2
> > Final status: time:14
> > Time: 1.224, rate: 13385 j/s
> > com$ swift zz5.swift
> > Swift svn swift-r4087 (swift modified locally) cog-r3051
> >
> > RunID: 20110330-2148-mk5aypbg
> > Progress: time:6
> > SwiftScript trace: arr, arr.$[]/2
> > SwiftScript trace: brr, brr.$[]/2
> > SwiftScript trace: for, 2
> > SwiftScript trace: for, 1
> > SwiftScript trace: for, 1
> > SwiftScript trace: for, 2
> > Final status: time:17
> > Time: 1.191, rate: 13756 j/s
> > com$ swift zz5.swift
> > Swift svn swift-r4087 (swift modified locally) cog-r3051
> >
> > RunID: 20110330-2148-hgcbaxga
> > Progress: time:2
> > SwiftScript trace: arr, arr.$[]/2
> > SwiftScript trace: for, 1
> > SwiftScript trace: for, 2
> > SwiftScript trace: for, 2
> > SwiftScript trace: for, 1
> > com$ swift zz5.swift
> > Swift svn swift-r4087 (swift modified locally) cog-r3051
> >
> > RunID: 20110330-2149-oaa0kuy8
> > Progress:SwiftScript trace: arr, arr.$[]/2
> >   time:9
> > SwiftScript trace: for, 2
> > SwiftScript trace: brr, brr.$[]/2
> > SwiftScript trace: for, 2
> > SwiftScript trace: for, 1
> > Final status: time:17
> > Time: 1.241, rate: 13202 j/s
> > com$
> >
> >
> > ----- Original Message -----
> > > On Wed, 2011-03-30 at 20:37 -0500, Michael Wilde wrote:
> > > > OK, what am I missing?
> > >
> > > Nothing. That shouldn't be happening.
> > >
> > > Here's what (approximately) you should get:
> > > Swift svn swift-r3526 (swift modified locally) cog-r656 (cog
> > > modified
> > > locally)
> > >
> > > RunID: 20110330-1839-y71gls4a
> > > Progress: time:0
> > > [Misc] WARN pool-1-thread-4 - SwiftScript trace: for, 2
> > > [Misc] WARN pool-1-thread-1 - SwiftScript trace: for, 1
> > > Final status: time:54
> > >
> > > >
> > > > int arr[];
> > > >
> > > > arr[0]=1;
> > > > arr[1]=2;
> > > >
> > > > foreach a in arr {
> > > >   trace("for", a);
> > > > }
> > > >
> > > > login1$ swift zz3.swift
> > > > Swift svn swift-r4157 cog-r3056
> > > >
> > > > RunID: 20110331-0134-bfzhkgaa
> > > > Progress:
> > > > SwiftScript trace: for, 2
> > > > SwiftScript trace: for, 1
> > > > SwiftScript trace: for, 1
> > > > SwiftScript trace: for, 2
> > > > Final status:
> > > >
> > > > When did the foreach loop become the twice-each loop?
> > > >
> > > > I need to try some other revision and hosts with this.
> > > >
> > > > - Mike
> > > >
> > > >
> > > > ----- Original Message -----
> > > > > Wow, I didn't know we can do that! I treat the docs too
> > > > > canonically :P
> > > > >
> > > > > 2011/3/30 Michael Wilde <wilde at mcs.anl.gov>:
> > > > >
> > > > > > login1$ cat zz2.swift
> > > > > >
> > > > > > foreach a in [0:3] {
> > > > > >  trace("for", a);
> > > > > > }
> > > > > >
> > > > > > login1$ swift zz2.swift
> > > > > > Swift svn swift-r4157 cog-r3056
> > > > > >
> > > > > > RunID: 20110331-0057-huo8jei0
> > > > > > Progress:
> > > > > > SwiftScript trace: for, 1
> > > > > > SwiftScript trace: for, 3
> > > > > > SwiftScript trace: for, 2
> > > > > > SwiftScript trace: for, 0
> > > > > > Final status:
> > > > > > login1$
> > > > > >
> > > > > > I suspect we need to make this more clear in the user guide
> > > > > > and
> > > > > > tutorials :)
> > > > >
> > > > > I agree.
> > > > >
> > > > > >
> > > > > > - Mike
> > > > > >
> > > > > >
> > > > > > ----- Original Message -----
> > > > > >> Or just use the concurrent mapper to let swift handle the
> > > > > >> output
> > > > > >> naming itself. The resume files can't persist through
> > > > > >> multiple
> > > > > >> sessions though.
> > > > > >>
> > > > > >> 2011/3/30 Michael Wilde <wilde at mcs.anl.gov>:
> > > > > >> > The most common case for this error occurs when two
> > > > > >> > iterations
> > > > > >> > within a foreach loop map an output file to the same
> > > > > >> > physical
> > > > > >> > file
> > > > > >> > name. When swift runs and tries to put the output object
> > > > > >> > into
> > > > > >> > its
> > > > > >> > site cache, it sees that a file of the name name is
> > > > > >> > already
> > > > > >> > in
> > > > > >> > the
> > > > > >> > cache, and its semantics do not allow that.
> > > > > >> >
> > > > > >> > I have not yet stared at this code long enough to see if
> > > > > >> > this
> > > > > >> > explains what is happening here.
> > > > > >> >
> > > > > >> > I also dont know why it might work under one version and
> > > > > >> > fail
> > > > > >> > under
> > > > > >> > 0.92. If the above situation is occurring, perhaps there
> > > > > >> > is
> > > > > >> > some
> > > > > >> > randomness involved: loop iteration ordering; filename
> > > > > >> > generation
> > > > > >> > randomness or difference, etc.
> > > > > >> >
> > > > > >> > But I would debug with that in mind: make sure that all
> > > > > >> > *output*
> > > > > >> > fie
> > > > > >> > names mapped by the script are unique. Ideally, one
> > > > > >> > should be
> > > > > >> > able
> > > > > >> > to find the culprit by grepping the swift log for all the
> > > > > >> > mapped
> > > > > >> > file names and look for duplicates.
> > > > > >> >
> > > > > >> > - Mike
> > > > > >> >
> > > > > >> >
> > > > > >> > ----- Original Message -----
> > > > > >> >> Or maybe local variables are static? Maybe they mapped
> > > > > >> >> to
> > > > > >> >> different
> > > > > >> >> files but to the same cache object? But I have been
> > > > > >> >> doing
> > > > > >> >> local
> > > > > >> >> variables in my own workflows though.
> > > > > >> >>
> > > > > >> >> 2011/3/30 Jonathan Monette <jon.monette at gmail.com>:
> > > > > >> >> > Ok. I understand this error better. But shouldn't that
> > > > > >> >> > be
> > > > > >> >> > a
> > > > > >> >> > different
> > > > > >> >> > error then? Like a and b are mapped to the same file?
> > > > > >> >> > I
> > > > > >> >> > don't
> > > > > >> >> > know
> > > > > >> >> > if Swift
> > > > > >> >> > can know this but looking at the explanation and error
> > > > > >> >> > it
> > > > > >> >> > should
> > > > > >> >> > unless this
> > > > > >> >> > cache message has a deeper meaning.
> > > > > >> >> >
> > > > > >> >> > On Wed, Mar 30, 2011 at 6:21 PM, Allan Espinosa
> > > > > >> >> > <aespinosa at cs.uchicago.edu>
> > > > > >> >> > wrote:
> > > > > >> >> >>
> > > > > >> >> >> I had this error before when two output mapper
> > > > > >> >> >> objects
> > > > > >> >> >> mapped
> > > > > >> >> >> to
> > > > > >> >> >> the same
> > > > > >> >> >> file.
> > > > > >> >> >>
> > > > > >> >> >> $ swift bug_same.swift
> > > > > >> >> >> Swift svn swift-r4208 cog-r3073
> > > > > >> >> >>
> > > > > >> >> >> RunID: 20110330-1818-ygec7ppa
> > > > > >> >> >> Progress: time:0
> > > > > >> >> >> The cache already contains
> > > > > >> >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo.
> > > > > >> >> >>
> > > > > >> >> >> The cache already contains
> > > > > >> >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo.
> > > > > >> >> >>
> > > > > >> >> >> Progress: time:1960 Stage in:1 Finished
> > > > > >> >> >> successfully:1
> > > > > >> >> >> The cache already contains
> > > > > >> >> >> localhost:bug_same-20110330-1818-ygec7ppa/shared/foo.
> > > > > >> >> >>
> > > > > >> >> >> [aespinosa at communicado testing]$
> > > > > >> >> >> [aespinosa at communicado testing]$ cat bug_same.swift
> > > > > >> >> >> type file;
> > > > > >> >> >>
> > > > > >> >> >> app (file out) echo(string input) {
> > > > > >> >> >>  echo input stdout=@filename(out);
> > > > > >> >> >> }
> > > > > >> >> >>
> > > > > >> >> >> file a <"foo">;
> > > > > >> >> >> file b <"foo">;
> > > > > >> >> >>
> > > > > >> >> >> a = echo("hello world");
> > > > > >> >> >> b = echo("foo bar");
> > > > > >> >> >>
> > > > > >> >> >> But i think you should be using other Swift mappers
> > > > > >> >> >> that
> > > > > >> >> >> does
> > > > > >> >> >> auto-numbering of files by default.
> > > > > >> >> >>
> > > > > >> >> >> -Allan
> > > > > >> >> >>
> > > > > >> >> >> 2011/3/30 Zhao Zhang <zhaozhang at uchicago.edu>:
> > > > > >> >> >> > Hi guys,
> > > > > >> >> >> >
> > > > > >> >> >> > I am seeing something weird in swfit-0.92. Any idea
> > > > > >> >> >> > about
> > > > > >> >> >> > this?
> > > > > >> >> >> > The swift script is very simple:
> > > > > >> >> >> >
> > > > > >> >> >> > zzhang at sandbox:~/workplace/Andrey> cat movies.swift
> > > > > >> >> >> > type Pickle {}
> > > > > >> >> >> > type History {}
> > > > > >> >> >> > type Image {}
> > > > > >> >> >> >
> > > > > >> >> >> > app (History historyout) movie_graph (int rerun,
> > > > > >> >> >> > int
> > > > > >> >> >> > epochs,
> > > > > >> >> >> > Pickle
> > > > > >> >> >> > picklefile)
> > > > > >> >> >> > {
> > > > > >> >> >> >   movie_graph rerun epochs;
> > > > > >> >> >> > }
> > > > > >> >> >> >
> > > > > >> >> >> > int arr[];
> > > > > >> >> >> > iterate i
> > > > > >> >> >> > {
> > > > > >> >> >> >  arr[i] = i+1;
> > > > > >> >> >> > }until(i == 1);
> > > > > >> >> >> >
> > > > > >> >> >> > int epochs;
> > > > > >> >> >> > epochs = 3;
> > > > > >> >> >> > Pickle picklefile <single_file_mapper;
> > > > > >> >> >> > file="for_movies.pickled">;
> > > > > >> >> >> > foreach a in arr{
> > > > > >> >> >> >  History historyout <single_file_mapper;
> > > > > >> >> >> >  file=@strcat("output/rerun", a,
> > > > > >> >> >> > "/histories.pickled-", a)>;
> > > > > >> >> >> >  historyout = movie_graph(a, epochs, picklefile);
> > > > > >> >> >> > }
> > > > > >> >> >> >
> > > > > >> >> >> >
> > > > > >> >> >> >
> > > > > >> >> >> > I ran the script with the latest 0.92 version,
> > > > > >> >> >> > which is
> > > > > >> >> >> > loaded
> > > > > >> >> >> > as
> > > > > >> >> >> > a
> > > > > >> >> >> > module
> > > > > >> >> >> > on beagle. The I saw this:
> > > > > >> >> >> > zzhang at sandbox:~/workplace/Andrey> swift -tc.file
> > > > > >> >> >> > ./tc.data
> > > > > >> >> >> > movies.swift
> > > > > >> >> >> > Variable epochs defined in scope 99878388 shadows
> > > > > >> >> >> > variable
> > > > > >> >> >> > of
> > > > > >> >> >> > same name
> > > > > >> >> >> > in
> > > > > >> >> >> > scope 1813605401
> > > > > >> >> >> > Variable picklefile defined in scope 99878388
> > > > > >> >> >> > shadows
> > > > > >> >> >> > variable
> > > > > >> >> >> > of
> > > > > >> >> >> > same
> > > > > >> >> >> > name
> > > > > >> >> >> > in scope 1813605401
> > > > > >> >> >> > Swift svn swift-r4157 cog-r3056
> > > > > >> >> >> >
> > > > > >> >> >> > RunID: 20110330-1636-ev8vm8gb
> > > > > >> >> >> > Progress:
> > > > > >> >> >> > Progress: Selecting site:3 Active:1
> > > > > >> >> >> > Progress: Selecting site:3 Checking status:1
> > > > > >> >> >> > Progress: Selecting site:2 Stage in:1 Finished
> > > > > >> >> >> > successfully:1
> > > > > >> >> >> > Progress: Selecting site:2 Active:1 Finished
> > > > > >> >> >> > successfully:1
> > > > > >> >> >> > Progress: Selecting site:2 Active:1 Finished
> > > > > >> >> >> > successfully:1
> > > > > >> >> >> > Progress: Selecting site:1 Stage in:1 Finished
> > > > > >> >> >> > successfully:2
> > > > > >> >> >> > Progress: Selecting site:1 Active:1 Finished
> > > > > >> >> >> > successfully:2
> > > > > >> >> >> > Progress: Selecting site:1 Checking status:1
> > > > > >> >> >> > Finished
> > > > > >> >> >> > successfully:2
> > > > > >> >> >> > The cache already contains
> > > > > >> >> >> >
> > > > > >> >> >> > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1.
> > > > > >> >> >> >
> > > > > >> >> >> > Execution failed:
> > > > > >> >> >> >        The cache already contains
> > > > > >> >> >> >
> > > > > >> >> >> > localhost:movies-20110330-1636-ev8vm8gb/shared/output/rerun1/histories.pickled-1.
> > > > > >> >> >> >
> > > > > >> >> >> >
> > > > > >> >> >> > Then I switched to an older version, it worked
> > > > > >> >> >> > well.
> > > > > >> >> >> > zzhang at sandbox:~/workplace/Andrey> swift -tc.file
> > > > > >> >> >> > ./tc.data
> > > > > >> >> >> > movies.swift
> > > > > >> >> >> > Variable epochs defined in scope 212602028 shadows
> > > > > >> >> >> > variable
> > > > > >> >> >> > of
> > > > > >> >> >> > same name
> > > > > >> >> >> > in
> > > > > >> >> >> > scope 1538939834
> > > > > >> >> >> > Variable picklefile defined in scope 212602028
> > > > > >> >> >> > shadows
> > > > > >> >> >> > variable
> > > > > >> >> >> > of same
> > > > > >> >> >> > name
> > > > > >> >> >> > in scope 1538939834
> > > > > >> >> >> > Swift svn swift-r3291 (swift modified locally)
> > > > > >> >> >> > cog-r2750
> > > > > >> >> >> > (cog
> > > > > >> >> >> > modified
> > > > > >> >> >> > locally)
> > > > > >> >> >> >
> > > > > >> >> >> > RunID: 20110330-1639-gmbyz1qa
> > > > > >> >> >> > Progress:
> > > > > >> >> >> > Progress: Active:2
> > > > > >> >> >> > Progress: Active:1 Checking status:1
> > > > > >> >> >> > Final status: Finished successfully:2
> > > > > >> >> >> _______________________________________________
> > > > > >
> > > >
> >

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list