[Swift-user] Re: [falkon-user] 1) disable retry mechanism and 2) continue on failure?

Zhao Zhang zhaozhang at uchicago.edu
Sun Mar 30 21:06:24 CDT 2008


Thanks, Ben

Ben Clifford wrote:
> On Sun, 30 Mar 2008, Ioan Raicu wrote:
>
>   
>>> runam6 failed
>>>       
>
>   
>>> Directory: amps2-20080330-1849-hnpls37c/jobs/y/runam6-yvkudjqi
>>> stderr.txt: mkdir: cannot create directory `am.000000': File exists
>>>       
>
> I think when I've seen that error before, its not been swift-level retries 
> that have been hapepning - when Swift retries a job, it gets a different 
> identifier ('yvkudjqi' in the above). If a job gets partly executed and 
> then retried by the underlying execution mechanism below swift (eg. any 
> part of cog downwards) then the above will happen.
>
> Does falkon ever try to retry a job that its been given if it thinks 
> something went wrong? If so, that might cause a problem here - what needs 
> to hapepn is that the failure gets reported all the way back to swift for 
> swift to do a retry.
>   
nope, falkon doesn't do any retry for now.
> Another cause might be duplicate job IDs generated within swift (the 
> 'yvkudjqi' string again) but that would be very unusual (as in, I've never 
> seen that happen)
>
>   
>> 1) How do we disable the retry mechanism, to make sure that Swift won't retry
>> failed jobs?
>>     
>
> What Quan said - set execution.retries=0 in swift.properties
>
>   
>> 2) How do we configure Swift to continue sending all tasks it is able to (in
>> our case, it should be all tasks, as we only have 1 for loop, with no data
>> dependencies between iterations), although all tasks will eventually fail?
>>     
>
> throttle.score.job.factor=off 
>
> I think will do what you want.
>   
ok, I will try this.


best wishes
zhangzhao
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20080330/8b8c5053/attachment.html>


More information about the Swift-user mailing list