[Swift-user] Re: [falkon-user] 1) disable retry mechanism and 2) continue on failure?

Ben Clifford benc at hawaga.org.uk
Sun Mar 30 20:35:59 CDT 2008


On Sun, 30 Mar 2008, Ioan Raicu wrote:

> > runam6 failed

> > Directory: amps2-20080330-1849-hnpls37c/jobs/y/runam6-yvkudjqi
> > stderr.txt: mkdir: cannot create directory `am.000000': File exists

I think when I've seen that error before, its not been swift-level retries 
that have been hapepning - when Swift retries a job, it gets a different 
identifier ('yvkudjqi' in the above). If a job gets partly executed and 
then retried by the underlying execution mechanism below swift (eg. any 
part of cog downwards) then the above will happen.

Does falkon ever try to retry a job that its been given if it thinks 
something went wrong? If so, that might cause a problem here - what needs 
to hapepn is that the failure gets reported all the way back to swift for 
swift to do a retry.

Another cause might be duplicate job IDs generated within swift (the 
'yvkudjqi' string again) but that would be very unusual (as in, I've never 
seen that happen)

> 1) How do we disable the retry mechanism, to make sure that Swift won't retry
> failed jobs?

What Quan said - set execution.retries=0 in swift.properties

> 2) How do we configure Swift to continue sending all tasks it is able to (in
> our case, it should be all tasks, as we only have 1 for loop, with no data
> dependencies between iterations), although all tasks will eventually fail?

throttle.score.job.factor=off 

I think will do what you want.

-- 




More information about the Swift-user mailing list