[Swift-user] Swift run finished with errors

Thu Jul 3 07:21:24 CDT 2008

Thank you for detailed explanations.

In addition, I want to know to which sites were this 3 tries 
submitted and how about the replications, because I want to 
explore details of scheduler's behavior.

Thanks,

Xi 

---- Original message ----
>Date: Thu, 3 Jul 2008 08:22:57 +0000 (GMT)
>From: Ben Clifford <benc at hawaga.org.uk>  
>Subject: Re: [Swift-user] Swift run finished with errors  
>To: lixi at uchicago.edu
>Cc: swift-user <swift-user at ci.uchicago.edu>
>
>
>That job failed 3 times. Sometimes that will happen.
>
>There are various things you can do to reduce the effect 
this has on your 
>run:
>
>Turn on lazy.errors in swift.properties:
>    Normally when one job has failed (eg. it has used up 
all of its 
>    retries) then the whole run is immediately abandoned.
>    If you turn on lazy errors, then the rest of the run 
will attempt to 
>    continue. This means that you might end up with a run 
in which only 
>    that one job (or perhaps only a small number of jobs) 
has failed. The 
>    restart log (*.rlog) should then let you run again to 
try that small 
>    number again.
>
>Increase the number of retries in swift.properties - 
execution.retries.
>   This is set to 2 by default, meaning that a job will be 
executed up to
>   three times - once originally, and twice more as retries 
if there are
>   failures. You can increase this a small amount, eg to 5, 
to massively 
>   reduce the probability of of a job caused by random job 
failures. (eg 
>   if you have p=0.01 chance of a job submission failing, 
then 
>   exection.retries=2 gives p^3 = 0.000001 chance of 
failure; but 
>   execution.retries=5 gives p^6 = 0..000000000001 chance 
of failure
>
>   This does not help when the failures are caused by a 
broken job (such 
>   as missing input files on the submit side); in such a 
case it will 
>   increase load on remote systems and slow the run down.
>
>-- 
>