[Swift-devel] jobs that go active forever, and their effect on multisite osg runs
Mats Rynge
rynge at renci.org
Mon Dec 15 19:34:02 CST 2008
Ben Clifford wrote:
> On Mon, 15 Dec 2008, Mats Rynge wrote:
>
>> When we use OSG MatchMaker, we have timeouts for all job states, and
>> that seem to work well.
>
> What durations of timeouts do you use?
It depends a little bit on what model we are running, but here is an
example:
Submitting, Staging, other "quick" states - 10 minutes
Pending (sitting the the remote queue) - 30 minutes
Running - 2x the expected runtime
These are all handled on the local side. We set the wallclock time in
the RSL as well, but that is for giving the sites a better shot at job
scheduling, not for job failure detection/recovery.
--
Mats Rynge
Renaissance Computing Institute <http://www.renci.org>
More information about the Swift-devel
mailing list