[Swift-devel] jobs that go active forever, and their effect on multisite osg runs

Mats Rynge rynge at renci.org
Mon Dec 15 19:34:02 CST 2008


Ben Clifford wrote:
> On Mon, 15 Dec 2008, Mats Rynge wrote:
> 
>> When we use OSG MatchMaker, we have timeouts for all job states, and
>> that seem to work well.
> 
> What durations of timeouts do you use?


It depends a little bit on what model we are running, but here is an
example:

Submitting, Staging, other "quick" states - 10 minutes
Pending (sitting the the remote queue) - 30 minutes
Running - 2x the expected runtime

These are all handled on the local side. We set the wallclock time in
the  RSL as well, but that is for giving the sites a better shot at job
scheduling, not for job failure detection/recovery.

-- 
Mats Rynge
Renaissance Computing Institute <http://www.renci.org>



More information about the Swift-devel mailing list