[Swift-user] Swift run finished with errors

Ben Clifford benc at hawaga.org.uk
Thu Jul 3 03:22:57 CDT 2008


That job failed 3 times. Sometimes that will happen.

There are various things you can do to reduce the effect this has on your 
run:

Turn on lazy.errors in swift.properties:
    Normally when one job has failed (eg. it has used up all of its 
    retries) then the whole run is immediately abandoned.
    If you turn on lazy errors, then the rest of the run will attempt to 
    continue. This means that you might end up with a run in which only 
    that one job (or perhaps only a small number of jobs) has failed. The 
    restart log (*.rlog) should then let you run again to try that small 
    number again.

Increase the number of retries in swift.properties - execution.retries.
   This is set to 2 by default, meaning that a job will be executed up to
   three times - once originally, and twice more as retries if there are
   failures. You can increase this a small amount, eg to 5, to massively 
   reduce the probability of of a job caused by random job failures. (eg 
   if you have p=0.01 chance of a job submission failing, then 
   exection.retries=2 gives p^3 = 0.000001 chance of failure; but 
   execution.retries=5 gives p^6 = 0..000000000001 chance of failure

   This does not help when the failures are caused by a broken job (such 
   as missing input files on the submit side); in such a case it will 
   increase load on remote systems and slow the run down.

-- 




More information about the Swift-user mailing list