[Swift-devel] jobs that go active forever, and their effect on multisite osg runs
Ben Clifford
benc at hawaga.org.uk
Sun Dec 14 22:03:10 CST 2008
During my experimentation last week with point Swift at the OSG Engage VO,
I repeatedly ran into a problem where jobs aimed at a particular small
subset of sites would go into the Active state and then never (for some
multiple-hours value of never) be reported as Completed or Failed.
This was the only site-misbehaviour problem that I encountered which
caused Swift runs to not complete and required manual intervention to
remove those sites before a run. Other site problems were dealt with by
various mechanisms already implemented in Swift (site scoring,
replication).
I'm desirous, then, of some way to get round this problem.
One approach we discussed previously was making maxwalltime enforced at
the client side.
--
More information about the Swift-devel
mailing list