[Swift-devel] restarts
Mihael Hategan
hategan at mcs.anl.gov
Mon Oct 1 11:00:11 CDT 2007
It's caused by the addition of generalized files. So basically restarts
are broken at this point. When I get some time, I'll work on the file
management part and this.
On Mon, 2007-10-01 at 14:59 +0000, Ben Clifford wrote:
> I was looking at restarts a bit. If I run the SwiftApps badmonkey
> workflow, let it fail I get this restart log:
>
> $ cat badmonkey-20071001-1404-9cqjt7of.0.rlog
> # Log file created Mon Oct 01 14:04:22 BST 2007
> outg.0001.txt/soju.hawaga.org.uk/badmonkey-20071001-1404-9cqjt7of/shared/outg.0001.txt
> outg.0000.txt/soju.hawaga.org.uk/badmonkey-20071001-1404-9cqjt7of/shared/outg.0000.txt
>
> If I then restart it with this command:
>
> swift -tc.file ./tc.data -sites.file ./sites.xml -resume
> ./badmonkey-20071001-1404-9cqjt7of.0.rlog badmonkey.swift
>
> (which is the command used to start it in the first place with
> -resume ./badmonkey-20071001-1404-9cqjt7of.0.rlog added)
>
> swift appears to run the goodmonkey jobs again, in addition to attempting
> the broken badmonkey jobs, giving this restart log:
>
> $ cat badmonkey-20071001-1404-9cqjt7of.0.rlog
> # Log file created Mon Oct 01 14:04:22 BST 2007
> outg.0001.txt/soju.hawaga.org.uk/badmonkey-20071001-1404-9cqjt7of/shared/outg.0001.txt
> outg.0000.txt/soju.hawaga.org.uk/badmonkey-20071001-1404-9cqjt7of/shared/outg.0000.txt
> # Log file updated Mon Oct 01 15:35:58 BST 2007
> outg.0000.txt/soju.hawaga.org.uk/badmonkey-20071001-1535-oqxh1t78/shared/outg.0000.txt
> outg.0001.txt/soju.hawaga.org.uk/badmonkey-20071001-1535-oqxh1t78/shared/outg.0001.txt
>
> Is this caused by the presence of the unique run id (different for the two
> runs) in the restart entry?
>
> And also will site selection interfere there? (soju.hawaga.org.uk is the
> site name in my sites.xml)
>
More information about the Swift-devel
mailing list