[Swift-devel] Coaster capabilities for release 0.9

Michael Wilde wilde at mcs.anl.gov
Wed Apr 22 11:42:08 CDT 2009


I agree - automated should be the default.

I prefer, Mihael, that you get the work now underway completed as you 
have it envisioned - with the caveat that if the automation process 
looks like it will exceed the ~10 day estimate you gave yesterday, that 
you raise a flag and discuss with the group what the difficulties and 
alternatives are.

My view is:

- manual system that works OK: good
- automated system that works poorly: bad
- automated system that works well: best

Automation of the core scheduler has proven to be hard, but has made 
good progress.  One fear I have is of similar difficulties automating 
the coaster scheduler/provisioner.

If that proves similarly problematic, I dont want to hold back users 
from getting work done with a manual system when the fully automated 
system takes a long development effort.

But either way, once its working well, I feel automated should be the 
default.

- Mike

On 4/22/09 11:19 AM, Ian Foster wrote:
> Yes, perhaps the automated system should be default. I don't feel 
> strongly about that.
> 
> 
> On Apr 22, 2009, at 11:16 AM, Mihael Hategan wrote:
> 
>> On Wed, 2009-04-22 at 10:49 -0500, Ian Foster wrote:
>>>>>>
>>>>>>
>>>>>> What you say does beg for a couple of questions:
>>>>>> - if all work is done in a run but the allocation has more time
>>>>>> left,
>>>>>> should the workers be shut down or not?
>>>
>>>
>>> Shut down.
>>
>> Ok.
>>
>>>
>>>>>>
>>>>>> - if more work remains to be done in a run after an explicit
>>>>>> allocation
>>>>>> was used, should the system attempt to allocate more nodes? If
>>>>>> not,
>>>>>> should it hang? Fail?
>>>
>>>
>>> Fail.
>>
>> I disagree. If the user didn't want the work to complete, they wouldn't
>> run it. It should be possible to force this mode, but I don't think it
>> should be the default.
>>
>>>
>>>>>>
>>>>>> - if the allocation is far in the distance from now, and a run
>>>>>> is
>>>>>> started now, is allocating nodes now a matter of second-guessing
>>>>>> or a
>>>>>> matter of trying to finish the work faster? What, besides
>>>>>> alleged
>>>>>> complexity of the algorithm, would be the downside of doing so?
>>>
>>>
>>> Maybe someone has requested an allocation at 10am tomorrow because
>>> that is when they want to run the application.
>>
>> I'd assume then that they would start swift somewhere around 10am
>> tomorrow, not one or two days in advance.
>>
>>>
>>>
>>> Maybe they are benchmarking, and want things to run with a specified
>>> number of nodes).
>>>
>>
>> Being able to force a "use exactly these nodes for this amount of time
>> at this time" is a given. Making it the default I have issue with.
>>
>>>
>>> Maybe someone doesn't trust the clever algorithm, or finds that it
>>> fails for odd reason.
>>>
>>
>> Right. Many people don't trust garbage collection either. I find it
>> funny that people insist that non-trivial things such as distributed
>> computing, GC, special relativity be entirely intuitive.
>>
>>>
>>> Having a more complex algorithm as well is great. I'm not saying this
>>> would not be wonderful. But it shouldn't be obligatory.
>>
>> That's far from the statement of making the other one the default.
>>
>> Let's ask our users though!
>>
> 



More information about the Swift-devel mailing list