[Swift-devel] excessive rate throttling for apparently temporally-restricted failures

Mihael Hategan hategan at mcs.anl.gov
Sun Oct 28 16:42:02 CDT 2007


On Sun, 2007-10-28 at 15:05 -0500, Ioan Raicu wrote:
> But my argument was, and still is, if there is only one site to submit
> to, changing situations are almost irrelevant,

Missed that. It is not irrelevant. The speed/capacity of a service is
determined by: the jobs you submit, the jobs others submit, the specific
type of hardware, and the load on the service node (and other things
like network latency). The jobs other submit and the load on the service
node vary with time. The bad thing about them is that it's hard to
predict how they affect things.

Furthermore, user specified rates suffer fundamentally from the problem
of the user having to understand how the whole thing works and picking
good values. What I've observed is that this doesn't work very well.

>  as there are no options anyhow.  Give me one example, where you have
> only 1 site, set X and Y properly, yet you need site scores as an
> additional throttling mechanism!
> 
> Mihael Hategan wrote: 
> > On Sun, 2007-10-28 at 11:23 -0500, Ioan Raicu wrote:
> >   
> > > I mentioned 2 throttling mechanisms, one is to have X outstanding jobs
> > > at any given time (limits jobs in the queue), and Y jobs/sec
> > > submit rate (limits the rate of submission).  I believe both of these
> > > throttling mechanisms could exist without computing site scores,
> > > assuming the user knows what to set X and Y to.
> > >     
> > 
> > They do exist, but they don't deal with asymmetries between sites. Nor
> > do they deal with changing situations.
> > 
> >   
> > > Ioan
> > > 
> > > Mihael Hategan wrote: 
> > >     
> > > > On Sun, 2007-10-28 at 10:25 -0500, Ioan Raicu wrote:
> > > >   
> > > >       
> > > > > Assuming you have a single site to submit to, then I don't see why you
> > > > > don't want to disable the site scoring altogether?
> > > > >     
> > > > >         
> > > > Because having too many jobs on that one site may still cause problems.
> > > > 
> > > > That said, the algorithm currently there needs some work.
> > > > 
> > > >   
> > > >       
> > > > > Of course you still want throttling, but that is more on the level
> > > > > of X outstanding jobs at any given time (and possibly Y jobs/sec
> > > > > submit rate), so you don't overrun the LRM, but you would not want to
> > > > > lower X to some low value just because some jobs are failing.  Again,
> > > > > once you go to multi-site runs, you need the site scoring to decide
> > > > > among the different sites, but with a single site, I see no drawbacks
> > > > > to disabling the site scoring mechanism.  
> > > > > 
> > > > > Ioan
> > > > > 
> > > > > Ben Clifford wrote: 
> > > > >     
> > > > >         
> > > > > > On Sun, 28 Oct 2007, Ioan Raicu wrote:
> > > > > > 
> > > > > >   
> > > > > >       
> > > > > >           
> > > > > > > they were due to the stale NFS handle error.  I think Mihael outlined in an
> > > > > > > email a while back how to disable the task submission throttling due to a bad
> > > > > > > score, assuming that you have a single site to submit to anyways. 
> > > > > > >     
> > > > > > >         
> > > > > > >             
> > > > > > I know how to disable it. I don't particularly want it running rate free.
> > > > > > 
> > > > > > Whats happening here is that the feedback loop feeding back too much / too 
> > > > > > fast for the situation I experience.
> > > > > > 
> > > > > > There's plenty of fun to be had experimenting there; and I suspect there 
> > > > > > will be no One True Rate Controller.
> > > > > > 
> > > > > >   
> > > > > >       
> > > > > >           
> > > > > -- 
> > > > > ============================================
> > > > > Ioan Raicu
> > > > > Ph.D. Student
> > > > > ============================================
> > > > > Distributed Systems Laboratory
> > > > > Computer Science Department
> > > > > University of Chicago
> > > > > 1100 E. 58th Street, Ryerson Hall
> > > > > Chicago, IL 60637
> > > > > ============================================
> > > > > Email: iraicu at cs.uchicago.edu
> > > > > Web:   http://www.cs.uchicago.edu/~iraicu
> > > > >        http://dsl.cs.uchicago.edu/
> > > > > ============================================
> > > > > ============================================
> > > > > _______________________________________________
> > > > > Swift-devel mailing list
> > > > > Swift-devel at ci.uchicago.edu
> > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > > >     
> > > > >         
> > > > 
> > > >       
> > > -- 
> > > ============================================
> > > Ioan Raicu
> > > Ph.D. Student
> > > ============================================
> > > Distributed Systems Laboratory
> > > Computer Science Department
> > > University of Chicago
> > > 1100 E. 58th Street, Ryerson Hall
> > > Chicago, IL 60637
> > > ============================================
> > > Email: iraicu at cs.uchicago.edu
> > > Web:   http://www.cs.uchicago.edu/~iraicu
> > >        http://dsl.cs.uchicago.edu/
> > > ============================================
> > > ============================================
> > >     
> > 
> > 
> >   
> 
> -- 
> ============================================
> Ioan Raicu
> Ph.D. Student
> ============================================
> Distributed Systems Laboratory
> Computer Science Department
> University of Chicago
> 1100 E. 58th Street, Ryerson Hall
> Chicago, IL 60637
> ============================================
> Email: iraicu at cs.uchicago.edu
> Web:   http://www.cs.uchicago.edu/~iraicu
>        http://dsl.cs.uchicago.edu/
> ============================================
> ============================================




More information about the Swift-devel mailing list