[Swift-devel] Re: timing stats from run194

Ben Clifford benc at hawaga.org.uk
Fri Nov 9 08:24:50 CST 2007


On Fri, 9 Nov 2007, Michael Wilde wrote:

> Ah - a perfectly logical explanation, and a hard case to handle with retry.
> Perhaps the retry mechanism should be taught to recognize over-walltime errors
> and bump up the walltime for the failures based on per-application settings.

well, that's not really the semantics of maxwalltime - you as the 
application user assert in your maxwalltime spec that it is an error for 
your jobs to take longer than that.

it is perhaps bad to allow one job breaking that assertion to cause a 
clusterful of jobs to fail.

it may also be more sensible in the case of widely varying loads to 
specify the clusteriness in terms of jobs-per-cluster rather than the 
present maxwalltime based approach.

exciting application-specific estimation of appropriate maxwalltimes for 
invocations, rather than for all invocations of an app - based (eg) on 
input file or other parameters is an option to also investigate in the 
future.

-- 



More information about the Swift-devel mailing list