[Swift-devel] bug 53
Michael Wilde
wilde at mcs.anl.gov
Mon Sep 17 12:12:38 CDT 2007
Some comments on this thread:
- We need to agree on a rule of thumb on what workflow profiles will run
OK on GRAM and which wont, and need Falkon. We could approximate an
answer to this with a few calculations and assumptions.
Measuring this would not hurt. Thr dominant factor seems to be queuing
delays.
I'm in favor of using only GRAM when that is effective, and of agreeing
on a metric of when that is, and when that is not. This will need more
discussion. Mihael I think made a start on that in this thread; it needs
to be developed further.
- I'm waiting to hear comments (and see action) on to the thread I
started asking to test if the existing Swift throttles will get past the
last known blocker of MolDyn-244.
- We need to discuss on-list with Ioan when his Falkon-side error
recovery logic will be available and when to test it, in part based on
assessment of whether it is necessary given the throttling. I think the
main issue here is whether a Swift-level throttle override that does not
deal with taking bad nodes out-of-service can be effective or not.
- We still need to determine a direction for support of Falkon vs
development of a supportable alternative to "glide-in" provisioning.
Progress is being made on Falkon (both in its development and
integration/support); its a question of how much of whose development
time to devote to the overall problem, on what schedule.
- As Ioan goes on to develop data-aware methods in Falkon, we need to
determine how to support the basic Swift needs and to isolate the two
efforts from each other until such time as we decide they should be coupled.
Mike
Mihael Hategan wrote:
> Did you update cog?
>
> On Mon, 2007-09-17 at 08:38 -0500, Veronika Nefedova wrote:
>> No, I've tried with r1740, it still hanged (timed out).
>> the log is on viper:/home/nefedova/alamines/MolDyn-244-
>> loops-20070914-1834-pvhyji75.log
>>
>> NIka
>>
>> On Sep 15, 2007, at 10:59 AM, Mihael Hategan wrote:
>>
>>> On Sat, 2007-09-15 at 09:06 +0000, Ben Clifford wrote:
>>>> On Fri, 14 Sep 2007, Mihael Hategan wrote:
>>>>
>>>>> On Thu, 2007-09-13 at 16:41 -0500, Mihael Hategan wrote:
>>>>>> Ok, so there's something in.
>>>>> That something was throttling a bit too much (not just jobs, but all
>>>>> tasks on that site). I need to take a second look at it.
>>>> Is that fixed by cog r1740? It looks like that commit is intended to.
>>> It's an attempt to fix it, but it needs to be confirmed by Nika.
>>>
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
>
More information about the Swift-devel
mailing list