[Swift-user] Swift run finished with errors
Ben Clifford
benc at hawaga.org.uk
Thu Jul 3 03:22:57 CDT 2008
That job failed 3 times. Sometimes that will happen.
There are various things you can do to reduce the effect this has on your
run:
Turn on lazy.errors in swift.properties:
Normally when one job has failed (eg. it has used up all of its
retries) then the whole run is immediately abandoned.
If you turn on lazy errors, then the rest of the run will attempt to
continue. This means that you might end up with a run in which only
that one job (or perhaps only a small number of jobs) has failed. The
restart log (*.rlog) should then let you run again to try that small
number again.
Increase the number of retries in swift.properties - execution.retries.
This is set to 2 by default, meaning that a job will be executed up to
three times - once originally, and twice more as retries if there are
failures. You can increase this a small amount, eg to 5, to massively
reduce the probability of of a job caused by random job failures. (eg
if you have p=0.01 chance of a job submission failing, then
exection.retries=2 gives p^3 = 0.000001 chance of failure; but
execution.retries=5 gives p^6 = 0..000000000001 chance of failure
This does not help when the failures are caused by a broken job (such
as missing input files on the submit side); in such a case it will
increase load on remote systems and slow the run down.
--
More information about the Swift-user
mailing list