[Swift-devel] cache already contains error

Michael Wilde wilde at mcs.anl.gov
Mon Apr 1 22:20:27 CDT 2013


Ketan,

Can you post a pointer to your code, the Swift log, and the Swift stdout/err?

How are you mapping the file "...outdir/out_l0000_0000.0010.out"?

Does 35 correspond to any of the array bounds?

Does it fail if you use only one host? (I.e. my first thought was some kind of NSF sync error).

Could you try it using local disk and provider staging with the N hosts?

- Mike


----- Original Message -----
> From: "Ketan Maheshwari" <ketancmaheshwari at gmail.com>
> To: "Michael Wilde" <wilde at mcs.anl.gov>
> Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> Sent: Monday, April 1, 2013 9:02:24 PM
> Subject: Re: [Swift-devel] cache already contains error
> 
> 
> Thanks Mike, that fixed the cache issue. However, now I am seeing an
> unusual behavior from my Swift run:
> 
> 
> The ampl run crashes after completing a fixed number of jobs (35 to
> be precise).
> 
> 
> Some diagnostics:
> 
> 
> -- It runs to completion when I do a Swift resume. Once again only
> the next 35 jobs complete successfully. On a next resume the rest of
> them complete.
> 
> 
> -- Runs outside of Swift with a bash for-loop using the same
> parameters as in Swift script.
> 
> 
> -- A catsn script of similar parameters runs to completion without
> any failures. So, nothing seem to be wrong with the OS parameters.
> 
> 
> I am using a single MCS workstation, no provider staging, no
> coasters.
> 
> 
> The error message is:
> 
> 
> Caused by: File not found:
> /nfs2/ketan/powergridapps/swiftscripts/swift.work/inference-20130401-2043-78i7o5m3/shared/outdir/out_l0000_0000.0010.out
> 
> 
> Which is reflected in the logs as well as in the workdir's info
> files.
> 
> 
> Has anyone seen this kind of behavior? Any remedial suggestions?
> 
> 
> Thanks,
> Ketan
> 
> 
> 
> On Mon, Apr 1, 2013 at 5:59 PM, Michael Wilde < wilde at mcs.anl.gov >
> wrote:
> 
> 
> I think you need to make out 2-dimensional.
> 
> Your script is going to evaluate "out[j] = cat(data)" for both i=0
> and i=1.
> 
> The second of those evaluations is probably encountering the "cache
> already contains" for j=0.
> 
> If it didnt hit that (ie if you used the concurrent mapper) you'd
> likely then get an error that out[0] is already set.
> 
> - Mike
> 
> 
> 
> 
> ----- Original Message -----
> > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com >
> > To: "Swift Devel" < swift-devel at ci.uchicago.edu >
> > Sent: Monday, April 1, 2013 5:49:36 PM
> > Subject: [Swift-devel] cache already contains error
> > 
> > 
> > 
> > 
> > Hi,
> > 
> > I am running into the "cache already contains" error when using a
> > nested loop with file mappers. Here is a simple reproduction of the
> > issue with a nested loop variant of catsn.swift:
> > 
> > 
> > 
> > type file;
> > app (file o) cat (file i){
> > cat @i stdout=@o;
> > }
> > 
> > 
> > #file out[];
> > #file out[]<concurrent_mapper; location="outdir",
> > prefix="f.",suffix=".out">;
> > file out[]<simple_mapper; location="outdir",
> > prefix="f.",suffix=".out">;
> > 
> > 
> > foreach i in [0:1] {
> > foreach j in [0:1]{
> > file data<"data.txt">;
> > out[j] = cat(data);
> > }
> > }
> > 
> > 
> > It runs into the cache error after completing few tasks
> > successfully:
> > 
> > $ swift catsn.swift
> > Swift trunk swift-r6410 cog-r3648
> > 
> > 
> > RunID: 20130401-1745-7khkyrqc
> > Progress: time: Mon, 01 Apr 2013 17:45:59 -0500
> > Progress: time: Mon, 01 Apr 2013 17:46:00 -0500 Selecting site:1
> > Active:1 Finished successfully:2
> > Execution failed:
> > Exception in cat:
> > Arguments: [data.txt]
> > Host: localhost
> > Directory: catsn-20130401-1745-7khkyrqc/jobs/y/cat-yzf9fg7l
> > Caused by:
> > The cache already contains
> > localhost:catsn-20130401-1745-7khkyrqc/shared/outdir/f.0000.out.
> > cat, catsn.swift, line 14
> > 
> > 
> > The cause, I think is that the nested loop triggers the same series
> > of random sequences in mappers code which collides. Both, the
> > simple
> > and the concurrent mappers fail with same message.
> > 
> > 
> > Does anyone know of a workaround?
> > 
> > 
> > Thanks,
> > --
> > Ketan
> > 
> > 
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > 
> 
> 
> 
> 
> --
> Ketan
> 
> 



More information about the Swift-devel mailing list