[Swift-devel] Coaster capabilities for release 0.9

Mihael Hategan hategan at mcs.anl.gov
Wed Apr 22 12:47:08 CDT 2009


On Wed, 2009-04-22 at 12:13 -0500, Michael Wilde wrote:
> On 4/22/09 11:51 AM, Mihael Hategan wrote:
> 
> >> Automation of the core scheduler has proven to be hard, but has made 
> >> good progress.
> > 
> > I don't think the scehduler has changed much, and I don't think it was
> > hard. The problem I've seen was a misunderstanding of the problem rather
> > than a problem with the solution.
> 
> I say this based on experience using the system and helping users use 
> the system.
> 
> The problems I observe, which persist, are:
> 
> --
> 
> 1) slow start starts too slow.

What objective measurement/determination of how it should be makes it
too slow? I'm all for re-adjusting the defaults. So far, nobody came up
with a set of values with a reasonable motivation behind them.

>  Maybe the user needs a simple setting of 
> how aggressive to schedule, where the automated default is somewhat more 
> aggressive than the current default.
> 
> And it should be based on job starting, not job completion - I dont know 
> if it is, but it *seemed* to me it was not. I might be mistaken.

Interesting point. I think both should count in different measures. But
I think increasing the score when a job goes from queued to active is a
good thing.

> 
> 2) the throttle settings by which a user can seek to adjust things are 
> too complex for any of the users I have worked with to deal with, 
> including me. Some of that can be fixed by documentation.

Which particular throttle settings? All?

> 
> 3) the settings need to get tuned by experts on a per-site basis to be 
> considered "automated".

Which settings, specifically? 

>  Thats not a defect in the scheduler per se (in 
> fact, its a valuable feature that such tuning can be done).
> 
> BUT end uses should not have to dicker with these settings for every 
> site. WE need to provide site definitions for TG, OSG, etc (ie 
> "supported" sites) and we need documentation that tells a user (eg a 
> "swift admin") how to do this for new sites.
> 
> --
> 
> I think if we do #3, then #1 and #2 are solved issues.
> 
> But part of the "misunderstanding of the problem" is understanding that 
> the feature is not done till its working well for ordinary end users.

I think one problem is that we keep using unquantifiable measures like
"well", "ordinary", "too slow", "simple", etc, which are seen
differently by different people.

We need concrete suggestions and specific comments on what isn't as it
should be, and proposed solutions should be also detailed and mindful of
the problem(s) at hand.




More information about the Swift-devel mailing list