[Swift-user] Re: [falkon-user] 1) disable retry mechanism and 2) continue on failure?
Ben Clifford
benc at hawaga.org.uk
Sun Mar 30 20:35:59 CDT 2008
On Sun, 30 Mar 2008, Ioan Raicu wrote:
> > runam6 failed
> > Directory: amps2-20080330-1849-hnpls37c/jobs/y/runam6-yvkudjqi
> > stderr.txt: mkdir: cannot create directory `am.000000': File exists
I think when I've seen that error before, its not been swift-level retries
that have been hapepning - when Swift retries a job, it gets a different
identifier ('yvkudjqi' in the above). If a job gets partly executed and
then retried by the underlying execution mechanism below swift (eg. any
part of cog downwards) then the above will happen.
Does falkon ever try to retry a job that its been given if it thinks
something went wrong? If so, that might cause a problem here - what needs
to hapepn is that the failure gets reported all the way back to swift for
swift to do a retry.
Another cause might be duplicate job IDs generated within swift (the
'yvkudjqi' string again) but that would be very unusual (as in, I've never
seen that happen)
> 1) How do we disable the retry mechanism, to make sure that Swift won't retry
> failed jobs?
What Quan said - set execution.retries=0 in swift.properties
> 2) How do we configure Swift to continue sending all tasks it is able to (in
> our case, it should be all tasks, as we only have 1 for loop, with no data
> dependencies between iterations), although all tasks will eventually fail?
throttle.score.job.factor=off
I think will do what you want.
--
More information about the Swift-user
mailing list