[Swift-devel] jobs that go active forever, and their effect on multisite osg runs

Ben Clifford benc at hawaga.org.uk
Sun Dec 14 22:03:10 CST 2008


During my experimentation last week with point Swift at the OSG Engage VO, 
I repeatedly ran into a problem where jobs aimed at a particular small 
subset of sites would go into the Active state and then never (for some 
multiple-hours value of never) be reported as Completed or Failed.

This was the only site-misbehaviour problem that I encountered which 
caused Swift runs to not complete and required manual intervention to 
remove those sites before a run. Other site problems were dealt with by 
various mechanisms already implemented in Swift (site scoring, 
replication).

I'm desirous, then, of some way to get round this problem.

One approach we discussed previously was making maxwalltime enforced at 
the client side.

-- 




More information about the Swift-devel mailing list