[Swift-devel] Kickstart runs on localhost are failing

Mihael Hategan hategan at mcs.anl.gov
Sun Nov 4 21:37:17 CST 2007


> > 
> > It looked to me like the "cache already contains" error was a result of
> > the first failure (which Ben thinks he's fixed in 1456 if I understand
> > right) leaving the cache in a state where the retry gets confused.
> 
> I thought I made sure in some r that things are added to the cache
> transactionally (i.e. when it's known that no bad things can happen).
> Maybe I got something wrong.
> 
> > 
> > I should note that in all these cases, I got all the output, so the job
> > runs despite the first error, likely causing the duplicate cache entry
> > problems.
> 
> Ah, I see. The failure occurs when dealing with kickstart which is after
> the files are added to the cache. I did get something wrong.

One solution would be to make kickstart transfer failure warnings
instead of them being thrown as exceptions (easy).

The other would be to only add the stageout files to the cache as the
last thing in the execute2 big try block. (very slightly harder).

Let me know which one you want.

Mihael

> 
> > 
> > - Mike
> > 
> > > 
> > >> Clustered run is in run137, unclustered in run138
> > >> The latter log dir has a file swiftdata.find.out which lists all the 
> > >> files in my data dir (has a local/ branch at the top for localhost jobs).
> > >>
> > >> Error in both cases is below.
> > >>
> > >> Will try next doing kickstart in both ways via gram.
> > >>
> > >> - Mike
> > >>
> > >> 2007-11-04 18:47:40,946-0600 DEBUG vdl:execute2 APPLICATION_EXCEPTION 
> > >> jobid=angle4-cgqcmmji - Application exception: Missing argument jobdir 
> > >> for sys:element(rhost, wfdir, jobid, jobdir)
> > >> 2007-11-04 18:47:41,085-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
> > >> identity=urn:0-2-1194223436415) setting status to Failed 
> > >> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
> > >> angle4-cgqcmmji-stderr.txt not found.
> > >> 2007-11-04 18:47:41,344-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
> > >> identity=urn:0-2-1194223436424) setting status to Failed 
> > >> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
> > >> angle4-cgqcmmji-stdout.txt not found.
> > >> 2007-11-04 18:47:41,503-0600 DEBUG vdl:execute2 APPLICATION_EXCEPTION 
> > >> jobid=angle4-bgqcmmji - Application exception: Missing argument jobdir 
> > >> for sys:element(rhost, wfdir, jobid, jobdir)
> > >> 2007-11-04 18:47:41,553-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
> > >> identity=urn:0-1-1194223436458) setting status to Failed 
> > >> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
> > >> angle4-bgqcmmji-stderr.txt not found.
> > >> 2007-11-04 18:47:41,638-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
> > >> identity=urn:0-1-1194223436467) setting status to Failed 
> > >> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
> > >> angle4-bgqcmmji-stdout.txt not found.
> > >> 2007-11-04 18:47:41,882-0600 DEBUG vdl:execute2 APPLICATION_EXCEPTION 
> > >> jobid=angle4-agqcmmji - Application exception: Missing argument jobdir 
> > >> for sys:element(rhost, wfdir, jobid, jobdir)
> > >> 2007-11-04 18:47:41,987-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
> > >> identity=urn:0-3-1194223436500) setting status to Failed 
> > >> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
> > >> angle4-agqcmmji-stderr.txt not found.
> > >> 2007-11-04 18:47:42,047-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
> > >> identity=urn:0-3-1194223436507) setting status to Failed 
> > >> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
> > >> angle4-agqcmmji-stdout.txt not found.
> > >> 2007-11-04 18:51:18,439-0600 DEBUG vdl:execute2 APPLICATION_EXCEPTION 
> > >> jobid=angle4-dgqcmmji - Application exception: The cache already 
> > >> contains localhost:awf4-20071104-1843-ds8hn11a/shared/cf0000.angle.
> > >> 2007-11-04 18:51:18,628-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
> > >> identity=urn:0-2-1194223436543) setting status to Failed 
> > >> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
> > >> angle4-dgqcmmji-stderr.txt not found.
> > >> 2007-11-04 18:51:18,762-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
> > >> identity=urn:0-2-1194223436550) setting status to Failed 
> > >> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
> > >> angle4-dgqcmmji-stdout.txt not found.
> > >> 2007-11-04 18:51:25,976-0600 DEBUG vdl:execute2 APPLICATION_EXCEPTION 
> > >> jobid=angle4-egqcmmji - Application exception: The cache already 
> > >> contains localhost:awf4-20071104-1843-ds8hn11a/shared/of0002.angle.
> > >> 2007-11-04 18:51:26,401-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
> > >> identity=urn:0-1-1194223436585) setting status to Failed 
> > >> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
> > >> angle4-egqcmmji-stderr.txt not found.
> > >> 2007-11-04 18:51:26,726-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
> > >> identity=urn:0-1-1194223436592) setting status to Failed 
> > >> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
> > >> angle4-egqcmmji-stdout.txt not found.
> > >> 2007-11-04 18:51:28,040-0600 DEBUG vdl:execute2 APPLICATION_EXCEPTION 
> > >> jobid=angle4-fgqcmmji - Application exception: The cache already 
> > >> contains localhost:awf4-20071104-1843-ds8hn11a/shared/cf0001.angle.
> > >> 2007-11-04 18:51:28,492-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
> > >> identity=urn:0-3-1194223436627) setting status to Failed 
> > >> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
> > >> angle4-fgqcmmji-stderr.txt not found.
> > >> 2007-11-04 18:51:28,816-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
> > >> identity=urn:0-3-1194223436634) setting status to Failed 
> > >> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
> > >> angle4-fgqcmmji-stdout.txt not found.
> > >> 2007-11-04 18:54:44,088-0600 DEBUG vdl:execute2 APPLICATION_EXCEPTION 
> > >> jobid=angle4-hgqcmmji - Application exception: The cache already 
> > >> contains localhost:awf4-20071104-1843-ds8hn11a/shared/of0002.angle.
> > >> 2007-11-04 18:54:44,440-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
> > >> identity=urn:0-1-1194223436670) setting status to Failed 
> > >> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
> > >> angle4-hgqcmmji-stderr.txt not found.
> > >> 2007-11-04 18:54:44,652-0600 DEBUG TaskImpl Task(type=FILE_OPERATION, 
> > >> identity=urn:0-1-1194223436677) setting status to Failed 
> > >> org.globus.cog.abstraction.impl.file.FileNotFoundException: 
> > >> angle4-hgqcmmji-stdout.txt not found.
> > >> 2007-11-04 18:54:44,741-0600 DEBUG VDL2ExecutionContext Exception in angle4:
> > >> Exception in angle4:
> > >>          sys:exception @ vdl-int.k, line: 423
> > >>          at 
> > >> org.globus.cog.karajan.workflow.nodes.functions.KException.function(KException.java:29)
> > >> 2007-11-04 18:54:46,190-0600 INFO  ExecutionContext Detailed exception:
> > >> Exception in angle4:
> > >>          sys:exception @ vdl-int.k, line: 423
> > >>          at 
> > >> org.globus.cog.karajan.workflow.nodes.functions.KException.function(KException.java:29)
> > >> _______________________________________________
> > >> Swift-devel mailing list
> > >> Swift-devel at ci.uchicago.edu
> > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > >>
> > > 
> > > 
> > 
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 




More information about the Swift-devel mailing list