[Swift-devel] Kickstart runs on localhost are failing

Mihael Hategan hategan at mcs.anl.gov
Sun Nov 4 21:32:04 CST 2007


On Sun, 2007-11-04 at 21:26 -0600, Michael Wilde wrote:
> [resending to cc swift-devel]
> 
> On 11/4/07 9:07 PM, Mihael Hategan wrote:
> > On Sun, 2007-11-04 at 19:37 -0600, Michael Wilde wrote:
> >> I get job exceptions when I run with kickstart on localhost,
> >> regardless of whether clustered or not.
> >>
> >> The jobs seem to run (3x each) but fail each time. First time gets 
> >> "Application exception: Missing argument jobdir", 2nd & 3rd get 
> >> "Application exception: The cache already contains 
> >> localhost:awf4-20071104-1843-ds8hn11a..."
> > 
> > That probably shouldn't happen unless you're trying to assign to the
> > same variable twice. Does this work without kickstart?
> 
> Yes, it works without kickstart (r1453)
> Trying again on r1456.
> 
> It looked to me like the "cache already contains" error was a result of
> the first failure (which Ben thinks he's fixed in 1456 if I understand
> right) leaving the cache in a state where the retry gets confused.

I thought I made sure in some r that things are added to the cache
transactionally (i.e. when it's known that no bad things can happen).
Maybe I got something wrong.

> 
> I should note that in all these cases, I got all the output, so the job
> runs despite the first error, likely causing the duplicate cache entry
> problems.

Ah, I see. The failure occurs when dealing with kickstart which is after
the files are added to the cache. I did get something wrong.

> 
> - Mike
> 
> > 
> >> Clustered run is in run137, unclustered in run138
> >> The latter log dir has a file swiftdata.find.out which lists all the 
> >> files in my data dir (has a local/ branch at the top for localhost jobs).
> >>
> >> Error in both cases is below.
> >>
> >> Will try next doing kickstart in both ways via gram.
> >>
> >> - Mike
> >>
> >> 2007-11-04 18:47:40,946-0600 DEBUG vdl:execute2 APPLICATION_EXCEPTION 
> >> jobid=angle4-cgqcmmji - Application exception: Missing argument jobdir 
> >> for sys:element(rhost, wfdir, jobid, jobdir)
> >> 2007-11-04 18:47:41,085-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
> >> identity=urn:0-2-1194223436415) setting status to Failed 
> >> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
> >> angle4-cgqcmmji-stderr.txt not found.
> >> 2007-11-04 18:47:41,344-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
> >> identity=urn:0-2-1194223436424) setting status to Failed 
> >> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
> >> angle4-cgqcmmji-stdout.txt not found.
> >> 2007-11-04 18:47:41,503-0600 DEBUG vdl:execute2 APPLICATION_EXCEPTION 
> >> jobid=angle4-bgqcmmji - Application exception: Missing argument jobdir 
> >> for sys:element(rhost, wfdir, jobid, jobdir)
> >> 2007-11-04 18:47:41,553-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
> >> identity=urn:0-1-1194223436458) setting status to Failed 
> >> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
> >> angle4-bgqcmmji-stderr.txt not found.
> >> 2007-11-04 18:47:41,638-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
> >> identity=urn:0-1-1194223436467) setting status to Failed 
> >> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
> >> angle4-bgqcmmji-stdout.txt not found.
> >> 2007-11-04 18:47:41,882-0600 DEBUG vdl:execute2 APPLICATION_EXCEPTION 
> >> jobid=angle4-agqcmmji - Application exception: Missing argument jobdir 
> >> for sys:element(rhost, wfdir, jobid, jobdir)
> >> 2007-11-04 18:47:41,987-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
> >> identity=urn:0-3-1194223436500) setting status to Failed 
> >> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
> >> angle4-agqcmmji-stderr.txt not found.
> >> 2007-11-04 18:47:42,047-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
> >> identity=urn:0-3-1194223436507) setting status to Failed 
> >> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
> >> angle4-agqcmmji-stdout.txt not found.
> >> 2007-11-04 18:51:18,439-0600 DEBUG vdl:execute2 APPLICATION_EXCEPTION 
> >> jobid=angle4-dgqcmmji - Application exception: The cache already 
> >> contains localhost:awf4-20071104-1843-ds8hn11a/shared/cf0000.angle.
> >> 2007-11-04 18:51:18,628-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
> >> identity=urn:0-2-1194223436543) setting status to Failed 
> >> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
> >> angle4-dgqcmmji-stderr.txt not found.
> >> 2007-11-04 18:51:18,762-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
> >> identity=urn:0-2-1194223436550) setting status to Failed 
> >> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
> >> angle4-dgqcmmji-stdout.txt not found.
> >> 2007-11-04 18:51:25,976-0600 DEBUG vdl:execute2 APPLICATION_EXCEPTION 
> >> jobid=angle4-egqcmmji - Application exception: The cache already 
> >> contains localhost:awf4-20071104-1843-ds8hn11a/shared/of0002.angle.
> >> 2007-11-04 18:51:26,401-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
> >> identity=urn:0-1-1194223436585) setting status to Failed 
> >> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
> >> angle4-egqcmmji-stderr.txt not found.
> >> 2007-11-04 18:51:26,726-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
> >> identity=urn:0-1-1194223436592) setting status to Failed 
> >> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
> >> angle4-egqcmmji-stdout.txt not found.
> >> 2007-11-04 18:51:28,040-0600 DEBUG vdl:execute2 APPLICATION_EXCEPTION 
> >> jobid=angle4-fgqcmmji - Application exception: The cache already 
> >> contains localhost:awf4-20071104-1843-ds8hn11a/shared/cf0001.angle.
> >> 2007-11-04 18:51:28,492-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
> >> identity=urn:0-3-1194223436627) setting status to Failed 
> >> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
> >> angle4-fgqcmmji-stderr.txt not found.
> >> 2007-11-04 18:51:28,816-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
> >> identity=urn:0-3-1194223436634) setting status to Failed 
> >> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
> >> angle4-fgqcmmji-stdout.txt not found.
> >> 2007-11-04 18:54:44,088-0600 DEBUG vdl:execute2 APPLICATION_EXCEPTION 
> >> jobid=angle4-hgqcmmji - Application exception: The cache already 
> >> contains localhost:awf4-20071104-1843-ds8hn11a/shared/of0002.angle.
> >> 2007-11-04 18:54:44,440-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
> >> identity=urn:0-1-1194223436670) setting status to Failed 
> >> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
> >> angle4-hgqcmmji-stderr.txt not found.
> >> 2007-11-04 18:54:44,652-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
> >> identity=urn:0-1-1194223436677) setting status to Failed 
> >> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
> >> angle4-hgqcmmji-stdout.txt not found.
> >> 2007-11-04 18:54:44,741-0600 DEBUG VDL2ExecutionContext Exception in angle4:
> >> Exception in angle4:
> >>          sys:exception @ vdl-int.k, line: 423
> >>          at 
> >> org.globus.cog.karajan.workflow.nodes.functions.KException.function(KException.java:29)
> >> 2007-11-04 18:54:46,190-0600 INFO  ExecutionContext Detailed exception:
> >> Exception in angle4:
> >>          sys:exception @ vdl-int.k, line: 423
> >>          at 
> >> org.globus.cog.karajan.workflow.nodes.functions.KException.function(KException.java:29)
> >> _______________________________________________
> >> Swift-devel mailing list
> >> Swift-devel at ci.uchicago.edu
> >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>
> > 
> > 
> 




More information about the Swift-devel mailing list