[Swift-devel] cache already contains error
Ketan Maheshwari
ketancmaheshwari at gmail.com
Tue Apr 2 09:43:19 CDT 2013
Hi Mike,
It works with provider staging. So, it does look like an NFS sync issue.
Thanks,
Ketan
On Mon, Apr 1, 2013 at 10:20 PM, Michael Wilde <wilde at mcs.anl.gov> wrote:
> Ketan,
>
> Can you post a pointer to your code, the Swift log, and the Swift
> stdout/err?
>
> How are you mapping the file "...outdir/out_l0000_0000.0010.out"?
>
> Does 35 correspond to any of the array bounds?
>
> Does it fail if you use only one host? (I.e. my first thought was some
> kind of NSF sync error).
>
> Could you try it using local disk and provider staging with the N hosts?
>
> - Mike
>
>
> ----- Original Message -----
> > From: "Ketan Maheshwari" <ketancmaheshwari at gmail.com>
> > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> > Sent: Monday, April 1, 2013 9:02:24 PM
> > Subject: Re: [Swift-devel] cache already contains error
> >
> >
> > Thanks Mike, that fixed the cache issue. However, now I am seeing an
> > unusual behavior from my Swift run:
> >
> >
> > The ampl run crashes after completing a fixed number of jobs (35 to
> > be precise).
> >
> >
> > Some diagnostics:
> >
> >
> > -- It runs to completion when I do a Swift resume. Once again only
> > the next 35 jobs complete successfully. On a next resume the rest of
> > them complete.
> >
> >
> > -- Runs outside of Swift with a bash for-loop using the same
> > parameters as in Swift script.
> >
> >
> > -- A catsn script of similar parameters runs to completion without
> > any failures. So, nothing seem to be wrong with the OS parameters.
> >
> >
> > I am using a single MCS workstation, no provider staging, no
> > coasters.
> >
> >
> > The error message is:
> >
> >
> > Caused by: File not found:
> >
> /nfs2/ketan/powergridapps/swiftscripts/swift.work/inference-20130401-2043-78i7o5m3/shared/outdir/out_l0000_0000.0010.out
> >
> >
> > Which is reflected in the logs as well as in the workdir's info
> > files.
> >
> >
> > Has anyone seen this kind of behavior? Any remedial suggestions?
> >
> >
> > Thanks,
> > Ketan
> >
> >
> >
> > On Mon, Apr 1, 2013 at 5:59 PM, Michael Wilde < wilde at mcs.anl.gov >
> > wrote:
> >
> >
> > I think you need to make out 2-dimensional.
> >
> > Your script is going to evaluate "out[j] = cat(data)" for both i=0
> > and i=1.
> >
> > The second of those evaluations is probably encountering the "cache
> > already contains" for j=0.
> >
> > If it didnt hit that (ie if you used the concurrent mapper) you'd
> > likely then get an error that out[0] is already set.
> >
> > - Mike
> >
> >
> >
> >
> > ----- Original Message -----
> > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com >
> > > To: "Swift Devel" < swift-devel at ci.uchicago.edu >
> > > Sent: Monday, April 1, 2013 5:49:36 PM
> > > Subject: [Swift-devel] cache already contains error
> > >
> > >
> > >
> > >
> > > Hi,
> > >
> > > I am running into the "cache already contains" error when using a
> > > nested loop with file mappers. Here is a simple reproduction of the
> > > issue with a nested loop variant of catsn.swift:
> > >
> > >
> > >
> > > type file;
> > > app (file o) cat (file i){
> > > cat @i stdout=@o;
> > > }
> > >
> > >
> > > #file out[];
> > > #file out[]<concurrent_mapper; location="outdir",
> > > prefix="f.",suffix=".out">;
> > > file out[]<simple_mapper; location="outdir",
> > > prefix="f.",suffix=".out">;
> > >
> > >
> > > foreach i in [0:1] {
> > > foreach j in [0:1]{
> > > file data<"data.txt">;
> > > out[j] = cat(data);
> > > }
> > > }
> > >
> > >
> > > It runs into the cache error after completing few tasks
> > > successfully:
> > >
> > > $ swift catsn.swift
> > > Swift trunk swift-r6410 cog-r3648
> > >
> > >
> > > RunID: 20130401-1745-7khkyrqc
> > > Progress: time: Mon, 01 Apr 2013 17:45:59 -0500
> > > Progress: time: Mon, 01 Apr 2013 17:46:00 -0500 Selecting site:1
> > > Active:1 Finished successfully:2
> > > Execution failed:
> > > Exception in cat:
> > > Arguments: [data.txt]
> > > Host: localhost
> > > Directory: catsn-20130401-1745-7khkyrqc/jobs/y/cat-yzf9fg7l
> > > Caused by:
> > > The cache already contains
> > > localhost:catsn-20130401-1745-7khkyrqc/shared/outdir/f.0000.out.
> > > cat, catsn.swift, line 14
> > >
> > >
> > > The cause, I think is that the nested loop triggers the same series
> > > of random sequences in mappers code which collides. Both, the
> > > simple
> > > and the concurrent mappers fail with same message.
> > >
> > >
> > > Does anyone know of a workaround?
> > >
> > >
> > > Thanks,
> > > --
> > > Ketan
> > >
> > >
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > >
> >
> >
> >
> >
> > --
> > Ketan
> >
> >
>
--
Ketan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20130402/a0173899/attachment.html>
More information about the Swift-devel
mailing list