[Swift-devel] Kickstart runs on localhost are failing

Michael Wilde wilde at mcs.anl.gov
Sun Nov 4 21:26:17 CST 2007


[resending to cc swift-devel]

On 11/4/07 9:07 PM, Mihael Hategan wrote:
> On Sun, 2007-11-04 at 19:37 -0600, Michael Wilde wrote:
>> I get job exceptions when I run with kickstart on localhost,
>> regardless of whether clustered or not.
>>
>> The jobs seem to run (3x each) but fail each time. First time gets 
>> "Application exception: Missing argument jobdir", 2nd & 3rd get 
>> "Application exception: The cache already contains 
>> localhost:awf4-20071104-1843-ds8hn11a..."
> 
> That probably shouldn't happen unless you're trying to assign to the
> same variable twice. Does this work without kickstart?

Yes, it works without kickstart (r1453)
Trying again on r1456.

It looked to me like the "cache already contains" error was a result of
the first failure (which Ben thinks he's fixed in 1456 if I understand
right) leaving the cache in a state where the retry gets confused.

I should note that in all these cases, I got all the output, so the job
runs despite the first error, likely causing the duplicate cache entry
problems.

- Mike

> 
>> Clustered run is in run137, unclustered in run138
>> The latter log dir has a file swiftdata.find.out which lists all the 
>> files in my data dir (has a local/ branch at the top for localhost jobs).
>>
>> Error in both cases is below.
>>
>> Will try next doing kickstart in both ways via gram.
>>
>> - Mike
>>
>> 2007-11-04 18:47:40,946-0600 DEBUG vdl:execute2 APPLICATION_EXCEPTION 
>> jobid=angle4-cgqcmmji - Application exception: Missing argument jobdir 
>> for sys:element(rhost, wfdir, jobid, jobdir)
>> 2007-11-04 18:47:41,085-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
>> identity=urn:0-2-1194223436415) setting status to Failed 
>> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
>> angle4-cgqcmmji-stderr.txt not found.
>> 2007-11-04 18:47:41,344-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
>> identity=urn:0-2-1194223436424) setting status to Failed 
>> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
>> angle4-cgqcmmji-stdout.txt not found.
>> 2007-11-04 18:47:41,503-0600 DEBUG vdl:execute2 APPLICATION_EXCEPTION 
>> jobid=angle4-bgqcmmji - Application exception: Missing argument jobdir 
>> for sys:element(rhost, wfdir, jobid, jobdir)
>> 2007-11-04 18:47:41,553-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
>> identity=urn:0-1-1194223436458) setting status to Failed 
>> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
>> angle4-bgqcmmji-stderr.txt not found.
>> 2007-11-04 18:47:41,638-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
>> identity=urn:0-1-1194223436467) setting status to Failed 
>> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
>> angle4-bgqcmmji-stdout.txt not found.
>> 2007-11-04 18:47:41,882-0600 DEBUG vdl:execute2 APPLICATION_EXCEPTION 
>> jobid=angle4-agqcmmji - Application exception: Missing argument jobdir 
>> for sys:element(rhost, wfdir, jobid, jobdir)
>> 2007-11-04 18:47:41,987-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
>> identity=urn:0-3-1194223436500) setting status to Failed 
>> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
>> angle4-agqcmmji-stderr.txt not found.
>> 2007-11-04 18:47:42,047-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
>> identity=urn:0-3-1194223436507) setting status to Failed 
>> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
>> angle4-agqcmmji-stdout.txt not found.
>> 2007-11-04 18:51:18,439-0600 DEBUG vdl:execute2 APPLICATION_EXCEPTION 
>> jobid=angle4-dgqcmmji - Application exception: The cache already 
>> contains localhost:awf4-20071104-1843-ds8hn11a/shared/cf0000.angle.
>> 2007-11-04 18:51:18,628-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
>> identity=urn:0-2-1194223436543) setting status to Failed 
>> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
>> angle4-dgqcmmji-stderr.txt not found.
>> 2007-11-04 18:51:18,762-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
>> identity=urn:0-2-1194223436550) setting status to Failed 
>> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
>> angle4-dgqcmmji-stdout.txt not found.
>> 2007-11-04 18:51:25,976-0600 DEBUG vdl:execute2 APPLICATION_EXCEPTION 
>> jobid=angle4-egqcmmji - Application exception: The cache already 
>> contains localhost:awf4-20071104-1843-ds8hn11a/shared/of0002.angle.
>> 2007-11-04 18:51:26,401-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
>> identity=urn:0-1-1194223436585) setting status to Failed 
>> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
>> angle4-egqcmmji-stderr.txt not found.
>> 2007-11-04 18:51:26,726-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
>> identity=urn:0-1-1194223436592) setting status to Failed 
>> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
>> angle4-egqcmmji-stdout.txt not found.
>> 2007-11-04 18:51:28,040-0600 DEBUG vdl:execute2 APPLICATION_EXCEPTION 
>> jobid=angle4-fgqcmmji - Application exception: The cache already 
>> contains localhost:awf4-20071104-1843-ds8hn11a/shared/cf0001.angle.
>> 2007-11-04 18:51:28,492-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
>> identity=urn:0-3-1194223436627) setting status to Failed 
>> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
>> angle4-fgqcmmji-stderr.txt not found.
>> 2007-11-04 18:51:28,816-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
>> identity=urn:0-3-1194223436634) setting status to Failed 
>> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
>> angle4-fgqcmmji-stdout.txt not found.
>> 2007-11-04 18:54:44,088-0600 DEBUG vdl:execute2 APPLICATION_EXCEPTION 
>> jobid=angle4-hgqcmmji - Application exception: The cache already 
>> contains localhost:awf4-20071104-1843-ds8hn11a/shared/of0002.angle.
>> 2007-11-04 18:54:44,440-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
>> identity=urn:0-1-1194223436670) setting status to Failed 
>> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
>> angle4-hgqcmmji-stderr.txt not found.
>> 2007-11-04 18:54:44,652-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
>> identity=urn:0-1-1194223436677) setting status to Failed 
>> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
>> angle4-hgqcmmji-stdout.txt not found.
>> 2007-11-04 18:54:44,741-0600 DEBUG VDL2ExecutionContext Exception in angle4:
>> Exception in angle4:
>>          sys:exception @ vdl-int.k, line: 423
>>          at 
>> org.globus.cog.karajan.workflow.nodes.functions.KException.function(KException.java:29)
>> 2007-11-04 18:54:46,190-0600 INFO  ExecutionContext Detailed exception:
>> Exception in angle4:
>>          sys:exception @ vdl-int.k, line: 423
>>          at 
>> org.globus.cog.karajan.workflow.nodes.functions.KException.function(KException.java:29)
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>
> 
> 




More information about the Swift-devel mailing list