[Swift-devel] Fwd: [pads-notify] Changes to scheduling policy

Mon Jun 14 10:49:01 CDT 2010

This is relevant to our scheduling and configuration discussions...

- Mike

----- Forwarded Message -----
From: "Ti Leggett" <leggett at ci.uchicago.edu>
To: pads-notify at ci.uchicago.edu
Sent: Monday, June 14, 2010 10:37:00 AM GMT -06:00 US/Canada Central
Subject: [pads-notify] Changes to scheduling policy

After listening to the feedback from you about how jobs flow on the PADS cluster, and after some monitoring and observations by us, we've made some changes to the scheduling policy on PADS. The PADS wiki documentation should already be updated, but I'll explain it here as well. We welcome all feedback, so please don't be shy about praising or criticizing these changes. Our goal is to make PADS a useful development and analysis resource that accommodates a wide range of jobs fairly.

We've increased the number of nodes in the development reservation from 3 non-gpu compute nodes to 7, bringing the total of nodes available for "development" jobs - those that are less than 1 hour - to 8: 7 non-gpu nodes and 1 gpu node. This standing reservation is in place from 8am - 7pm Monday thru Friday.

Next we changed the priorities of the queues:

fast: 3120
short: 2880
long: 1440
extended: 0

What do these numbers really mean? Assuming all things equal and knowing that a job's priority increases by one every minute it's in the queue it will take a job submitted to the extended queue 1 day before it has the same priority of a job submitted to the long queue. A job submitted to the long queue will take 1 day before it has the same priority of a job submitted to the short queue. And a job submitted to the short queue will take 4 hours before it has the same priority as a job submitted to the fast queue. And it will take 2 days, 4 hours before an extended job has the same priority as a job submitted to the fast queue. Keep in mind that these priorities are static and do not change based on how long the job sits idle in the queue so their impact on a job's place in the queue diminishes the longer a job sits idle. The longer a job sits idle in the queue the other 2 factors - queue time and fairshare - have a bigger and bigger impact. All these queue priorities do is give a shorter, smaller job a head start over bigger jobs, but they won't always preempt longer, bigger jobs if those jobs have been waiting in the queue for some time.

Which brings us to the topic of fairshare. We've also changed the fairshare window to be the last 7 days instead of 3.5 so more history will be used for determining fairshare usage.

Next, we've changed users' fairshare usage to be a target instead of a ceiling. Before, your job's priority would only be decreased if you exceeded fairshare. This is still the case, but now if you are under the fairshare target, your job priority will be increased.

We've now implemented a fairshare ceiling for project utilization. Now if your project, as a whole, exceeds a fairshare usage, your job's priority will be decreased but not as much as if you, as a user, exceed your fairshare usage. The project ceiling is also higher that the per user target.

Before it was common to see job priorities in the range from -85,000 to +6,000. This seemed a bit excessive. With these new policies in places the range is -600 to +3,000  for the same queued jobs. We hope that these changes will make your PADS experience better overall.
_______________________________________________
pads-notify mailing list
pads-notify at ci.uchicago.edu
http://mail.ci.uchicago.edu/mailman/listinfo/pads-notify

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory