[Swift-devel] Re: timing stats from run194
Ben Clifford
benc at hawaga.org.uk
Fri Nov 9 08:24:50 CST 2007
On Fri, 9 Nov 2007, Michael Wilde wrote:
> Ah - a perfectly logical explanation, and a hard case to handle with retry.
> Perhaps the retry mechanism should be taught to recognize over-walltime errors
> and bump up the walltime for the failures based on per-application settings.
well, that's not really the semantics of maxwalltime - you as the
application user assert in your maxwalltime spec that it is an error for
your jobs to take longer than that.
it is perhaps bad to allow one job breaking that assertion to cause a
clusterful of jobs to fail.
it may also be more sensible in the case of widely varying loads to
specify the clusteriness in terms of jobs-per-cluster rather than the
present maxwalltime based approach.
exciting application-specific estimation of appropriate maxwalltimes for
invocations, rather than for all invocations of an app - based (eg) on
input file or other parameters is an option to also investigate in the
future.
--
More information about the Swift-devel
mailing list