[Swift-devel] Re: Adjust site scores on job start not job end

Michael Wilde wilde at mcs.anl.gov
Wed Aug 25 16:53:06 CDT 2010


Cool - what you say below makes good sense.

I suspect that once Allan and Glen start getting more data (logfiles) and experience on OSG we'll have some ideas on what score formulas are worth trying.

- Mike


----- "Mihael Hategan" <hategan at mcs.anl.gov> wrote:

> On Wed, 2010-08-25 at 14:08 -0600, Michael Wilde wrote:
> > We discussed in the Swift internals review meetings the
> desirability
> > of adjusting the scheduler's site scores more by how many job start
> > events than by sucessful completion events.
> > 
> > The rationale was that for workloads consisting entirely of long
> > running jobs, on for example OSG, this approach would much more
> > quickly reward sites that have been starting jobs with additional
> > jobs, until the start rate diminishes when the jobs start queuing
> up.
> 
> Right. The score should take into account multiple things, such as
> overall throughput and queue throughput rather than just number of
> jobs
> finished ok.
> 
> > 
> > Another approach we discussed (which was demonstrated by Dinah
> Sulakhe
> > to be successful in VDS) was to keep sending jobs to sites until
> each
> > site has some fixed threshold of jobs sitting in its queue, and to
> > keep all the sites at some threshold (possibly a per-site threshold
> > based on the site's throughput).
> 
> That threshold is currently the site score.
> 
> > 
> > We're now at the point where a few users (Glen and Allan) would
> > benefit from this change in scheduling algorithm.
> > 
> > Mihael, all, can you where and how to explore such changes
> > (module-wise) and what pitfalls are likely to be encountered?
> 
> Essentially the decision problem of how to distribute a number of
> jobs
> to a number of sites (assuming hard constraints are resolved) only
> requires one number for each site. So I think the score should be
> kept
> because it is the right abstraction and makes it easy to sub-divide
> the
> problem.
> 
> So I think somebody (or somebodies) needs to figure out exactly what
> the
> formula for the score should be and why. That's the hard part. Then
> we
> can add the various raw measures into the sites properties and change
> the score calculations according to those. That's probably easier.
> 
> Mihael

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list