[Swift-devel] bug 53

Michael Wilde wilde at mcs.anl.gov
Mon Sep 17 12:12:38 CDT 2007


Some comments on this thread:

- We need to agree on a rule of thumb on what workflow profiles will run 
OK on GRAM and which wont, and need Falkon.  We could approximate an 
answer to this with a few calculations and assumptions.
Measuring this would not hurt. Thr dominant factor seems to be queuing 
delays.

I'm in favor of using only GRAM when that is effective, and of agreeing 
on a metric of when that is, and when that is not. This will need more 
discussion. Mihael I think made a start on that in this thread; it needs 
to be developed further.

- I'm waiting to hear comments (and see action) on to the thread I 
started asking to test if the existing Swift throttles will get past the 
last known blocker of MolDyn-244.

- We need to discuss on-list with Ioan when his Falkon-side error 
recovery logic will be available and when to test it, in part based on 
assessment of whether it is necessary given the throttling. I think the 
main issue here is whether a Swift-level throttle override that does not 
deal with taking bad nodes out-of-service can be effective or not.

- We still need to determine a direction for support of Falkon vs 
development of a supportable alternative to "glide-in" provisioning.
Progress is being made on Falkon (both in its development and 
integration/support); its a question of how much of whose development 
time to devote to the overall problem, on what schedule.

- As Ioan goes on to develop data-aware methods in Falkon, we need to 
determine how to support the basic Swift needs and to isolate the two 
efforts from each other until such time as we decide they should be coupled.

Mike








Mihael Hategan wrote:
> Did you update cog?
> 
> On Mon, 2007-09-17 at 08:38 -0500, Veronika Nefedova wrote:
>> No, I've tried with r1740, it still hanged (timed out).
>> the log is on viper:/home/nefedova/alamines/MolDyn-244- 
>> loops-20070914-1834-pvhyji75.log
>>
>> NIka
>>
>> On Sep 15, 2007, at 10:59 AM, Mihael Hategan wrote:
>>
>>> On Sat, 2007-09-15 at 09:06 +0000, Ben Clifford wrote:
>>>> On Fri, 14 Sep 2007, Mihael Hategan wrote:
>>>>
>>>>> On Thu, 2007-09-13 at 16:41 -0500, Mihael Hategan wrote:
>>>>>> Ok, so there's something in.
>>>>> That something was throttling a bit too much (not just jobs, but all
>>>>> tasks on that site). I need to take a second look at it.
>>>> Is that fixed by cog r1740? It looks like that commit is intended to.
>>> It's an attempt to fix it, but it needs to be confirmed by Nika.
>>>
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 
> 



More information about the Swift-devel mailing list