[Swift-devel] Re: Adjust site scores on job start not job end

Wed Aug 25 15:22:51 CDT 2010

On Wed, 2010-08-25 at 14:08 -0600, Michael Wilde wrote:
> We discussed in the Swift internals review meetings the desirability
> of adjusting the scheduler's site scores more by how many job start
> events than by sucessful completion events.
> 
> The rationale was that for workloads consisting entirely of long
> running jobs, on for example OSG, this approach would much more
> quickly reward sites that have been starting jobs with additional
> jobs, until the start rate diminishes when the jobs start queuing up.

Right. The score should take into account multiple things, such as
overall throughput and queue throughput rather than just number of jobs
finished ok.

> 
> Another approach we discussed (which was demonstrated by Dinah Sulakhe
> to be successful in VDS) was to keep sending jobs to sites until each
> site has some fixed threshold of jobs sitting in its queue, and to
> keep all the sites at some threshold (possibly a per-site threshold
> based on the site's throughput).

That threshold is currently the site score.

> 
> We're now at the point where a few users (Glen and Allan) would
> benefit from this change in scheduling algorithm.
> 
> Mihael, all, can you where and how to explore such changes
> (module-wise) and what pitfalls are likely to be encountered?

Essentially the decision problem of how to distribute a number of jobs
to a number of sites (assuming hard constraints are resolved) only
requires one number for each site. So I think the score should be kept
because it is the right abstraction and makes it easy to sub-divide the
problem.

So I think somebody (or somebodies) needs to figure out exactly what the
formula for the score should be and why. That's the hard part. Then we
can add the various raw measures into the sites properties and change
the score calculations according to those. That's probably easier.

Mihael