[Swift-devel] restarts

Mihael Hategan hategan at mcs.anl.gov
Mon Oct 1 11:00:11 CDT 2007


It's caused by the addition of generalized files. So basically restarts
are broken at this point. When I get some time, I'll work on the file
management part and this.

On Mon, 2007-10-01 at 14:59 +0000, Ben Clifford wrote:
> I was looking at restarts a bit. If I run the SwiftApps badmonkey 
> workflow, let it fail I get this restart log:
> 
> $ cat badmonkey-20071001-1404-9cqjt7of.0.rlog 
> # Log file created Mon Oct 01 14:04:22 BST 2007
> outg.0001.txt/soju.hawaga.org.uk/badmonkey-20071001-1404-9cqjt7of/shared/outg.0001.txt
> outg.0000.txt/soju.hawaga.org.uk/badmonkey-20071001-1404-9cqjt7of/shared/outg.0000.txt
> 
> If I then restart it with this command:
> 
> swift -tc.file ./tc.data -sites.file ./sites.xml -resume 
> ./badmonkey-20071001-1404-9cqjt7of.0.rlog badmonkey.swift
> 
> (which is the command used to start it in the first place with
> -resume ./badmonkey-20071001-1404-9cqjt7of.0.rlog added)
> 
> swift appears to run the goodmonkey jobs again, in addition to attempting 
> the broken badmonkey jobs, giving this restart log:
> 
> $ cat badmonkey-20071001-1404-9cqjt7of.0.rlog 
> # Log file created Mon Oct 01 14:04:22 BST 2007
> outg.0001.txt/soju.hawaga.org.uk/badmonkey-20071001-1404-9cqjt7of/shared/outg.0001.txt
> outg.0000.txt/soju.hawaga.org.uk/badmonkey-20071001-1404-9cqjt7of/shared/outg.0000.txt
> # Log file updated Mon Oct 01 15:35:58 BST 2007
> outg.0000.txt/soju.hawaga.org.uk/badmonkey-20071001-1535-oqxh1t78/shared/outg.0000.txt
> outg.0001.txt/soju.hawaga.org.uk/badmonkey-20071001-1535-oqxh1t78/shared/outg.0001.txt
> 
> Is this caused by the presence of the unique run id (different for the two 
> runs) in the restart entry?
> 
> And also will site selection interfere there? (soju.hawaga.org.uk is the 
> site name in my sites.xml)
> 




More information about the Swift-devel mailing list