[Swift-devel] 0.9 release notes draft

Ben Clifford benc at hawaga.org.uk
Fri Apr 24 08:59:53 CDT 2009


On Fri, 24 Apr 2009, Michael Wilde wrote:

> > *** when replication is enabled, swift will locally kill jobs that have
> >     run for twice their specified walltime
> 
> what does "locally kill" mean?

traditionally the maxwalltime setting was passed to the execution layer 
and would end up somewhere like PBS for enforcement deep in the stack. If 
something was broken or not implemented somewhere between Swift and that 
enforcer, then maxwalltimes would have no effect.

Local enforcement means that the Swift client will attempt to kill a job 
that 2* a specified maxwalltime, without relying on communication with 
anythign outside of the swit client.

This is tied in with replication in that its behaviour that happens when 
the swift client believes a job has been in a particular state for too 
long.

For queued state, the time is based on a simple analysis of existing queue 
times, and enforcement behaviour is to launch a replica.

For active (running) state, the time is based on maxwall tim, and 
enforcement behaviour is to kill the job, which will then cause a retry 
(or too-many-retries failure)

> > *** Recompilation will happen if a .kml file was compiled with a different
> 
> This was an occasional problem: does this close the last know glitch in
> avoiding recompilation, or are there any other cases where a user could get
> tripped up on this? I.e., ideally should the user not need to know that Swift
> applies this heuristic?

I think this catches everything in this area that has been catching people 
in the past.

-- 




More information about the Swift-devel mailing list