[Swift-devel] Re: [Swft] Q about throttling

Mihael Hategan hategan at mcs.anl.gov
Sat Jun 23 15:48:45 CDT 2007


On Fri, 2007-06-22 at 15:32 -0500, Yong Zhao wrote:
> There is no delay for submitting retry jobs. However, these retry jobs may
> be queued after the 'ready' jobs that swift already processed, which could
> be be held by swift, if there is job throttling.

Indeed. 2500 jobs failed may bring the score for the site down a bit.
But then it doesn't look like there was much throttling, since 6800
tasks were submitted in bulk.

> 
> Yong.
> 
> On Fri, 22 Jun 2007, Mike Wilde wrote:
> 
> > [forgot to hit send on this - my apology if its no longer relevant]
> >
> > OK, thanks, Yong.
> >
> > Regarding the retry delay, I phrased the question poorly. I meant:
> >
> > Is it possible that the 2500 failing jobs are being retried too slowly? Ie that
> > Karajan delays each re-run after a failure, and thus cant keep Falkon fed with
> > retried jobs at a high rate?
> >
> > - Mike
> >
> >
> > Yong Zhao wrote, On 6/22/2007 9:45 AM:
> > > The retry mechanism is currently in some karajan script, and we can easily
> > > add some delay there.
> > >
> > > There is not a configuration option to disable pipeline. I did that
> > > manually (modified some code segment) to get a perf chart.
> > >
> > > Yong.
> > >
> > > On Fri, 22 Jun 2007, Mike Wilde wrote:
> > >
> > >> Is there a configurable retry delay after failure?
> > >>
> > >> I think you need to examine the overall workflow dependency structure.
> > >>
> > >> Also, I recall from older perf charts that there's an option to enable/disable
> > >> pipelining.  With pipelining disabled, it seems that Swift will wait for an
> > >> entire dataset/foreach or procedure to finish before starting any tasks that
> > >> depend on the foreach or procedure.
> > >>
> > >> Mihael, can you look at some of these issues when you are back online and rested?
> > >>
> > >> - Mike
> > >>
> > >> Ioan Raicu wrote, On 6/22/2007 9:06 AM:
> > >>> No, I didn't keep track of this info, unless Swift does this through
> > >>> some of its logs.
> > >>>
> > >>> Over the last week, my observations have been the following: Swift is
> > >>> more than capable and willing to send out many tasks as long as they are
> > >>> independent (as can be seen in this graph where probably 6800 tasks got
> > >>> submitted), but thereafter, it had no other burst of task submission,
> > >>> although I believe it could have send out more.  For example, there were
> > >>> 2500+ tasks that failed in the middle of those 6800 tasks (which were
> > >>> all independent), why were 2500 tasks not resubmitted all at once...
> > >>> they were each about 200 seconds long, so most of them should have
> > >>> certainly showed up in the wait queue.
> > >>>
> > >>> Ioan
> > >>>
> > >>> Ben Clifford wrote:
> > >>>>> kept busy, and the Falkon queue length was relatively at 0... so this means
> > >>>>> that Swift was not submitting fast enough to keep all the executors busy.
> > >>>>>
> > >>>> interesting. though around t=1000 there is a rapid burst of submission
> > >>>> getting the queue length up to about 6000 in a few minutes.
> > >>>>
> > >>>> Do you know what the cpu time usage of the swift submitting JVM was over
> > >>>> that time period?
> > >>>>
> > >>>>
> > >>> --
> > >>> ============================================
> > >>> Ioan Raicu
> > >>> Ph.D. Student
> > >>> ============================================
> > >>> Distributed Systems Laboratory
> > >>> Computer Science Department
> > >>> University of Chicago
> > >>> 1100 E. 58th Street, Ryerson Hall
> > >>> Chicago, IL 60637
> > >>> ============================================
> > >>> Email: iraicu at cs.uchicago.edu
> > >>> Web:   http://www.cs.uchicago.edu/~iraicu
> > >>>        http://dsl.cs.uchicago.edu/
> > >>> ============================================
> > >>> ============================================
> > >>>
> > >> --
> > >> Mike Wilde
> > >> Computation Institute, University of Chicago
> > >> Math & Computer Science Division
> > >> Argonne National Laboratory
> > >> Argonne, IL   60439    USA
> > >> tel 630-252-7497 fax 630-252-1997
> > >> _______________________________________________
> > >> Swift-devel mailing list
> > >> Swift-devel at ci.uchicago.edu
> > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > >>
> > >
> > >
> >
> > --
> > Mike Wilde
> > Computation Institute, University of Chicago
> > Math & Computer Science Division
> > Argonne National Laboratory
> > Argonne, IL   60439    USA
> > tel 630-252-7497 fax 630-252-1997
> >
> 




More information about the Swift-devel mailing list