[Swift-user] Re: [falkon-user] 1) disable retry mechanism and 2) continue on failure?
Zhao Zhang
zhaozhang at uchicago.edu
Sun Mar 30 21:06:24 CDT 2008
Thanks, Ben
Ben Clifford wrote:
> On Sun, 30 Mar 2008, Ioan Raicu wrote:
>
>
>>> runam6 failed
>>>
>
>
>>> Directory: amps2-20080330-1849-hnpls37c/jobs/y/runam6-yvkudjqi
>>> stderr.txt: mkdir: cannot create directory `am.000000': File exists
>>>
>
> I think when I've seen that error before, its not been swift-level retries
> that have been hapepning - when Swift retries a job, it gets a different
> identifier ('yvkudjqi' in the above). If a job gets partly executed and
> then retried by the underlying execution mechanism below swift (eg. any
> part of cog downwards) then the above will happen.
>
> Does falkon ever try to retry a job that its been given if it thinks
> something went wrong? If so, that might cause a problem here - what needs
> to hapepn is that the failure gets reported all the way back to swift for
> swift to do a retry.
>
nope, falkon doesn't do any retry for now.
> Another cause might be duplicate job IDs generated within swift (the
> 'yvkudjqi' string again) but that would be very unusual (as in, I've never
> seen that happen)
>
>
>> 1) How do we disable the retry mechanism, to make sure that Swift won't retry
>> failed jobs?
>>
>
> What Quan said - set execution.retries=0 in swift.properties
>
>
>> 2) How do we configure Swift to continue sending all tasks it is able to (in
>> our case, it should be all tasks, as we only have 1 for loop, with no data
>> dependencies between iterations), although all tasks will eventually fail?
>>
>
> throttle.score.job.factor=off
>
> I think will do what you want.
>
ok, I will try this.
best wishes
zhangzhao
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20080330/8b8c5053/attachment.html>
More information about the Swift-user
mailing list