[Swift-devel] Re: [Swft] Q about throttling
Yong Zhao
yongzh at cs.uchicago.edu
Fri Jun 22 15:32:46 CDT 2007
There is no delay for submitting retry jobs. However, these retry jobs may
be queued after the 'ready' jobs that swift already processed, which could
be be held by swift, if there is job throttling.
Yong.
On Fri, 22 Jun 2007, Mike Wilde wrote:
> [forgot to hit send on this - my apology if its no longer relevant]
>
> OK, thanks, Yong.
>
> Regarding the retry delay, I phrased the question poorly. I meant:
>
> Is it possible that the 2500 failing jobs are being retried too slowly? Ie that
> Karajan delays each re-run after a failure, and thus cant keep Falkon fed with
> retried jobs at a high rate?
>
> - Mike
>
>
> Yong Zhao wrote, On 6/22/2007 9:45 AM:
> > The retry mechanism is currently in some karajan script, and we can easily
> > add some delay there.
> >
> > There is not a configuration option to disable pipeline. I did that
> > manually (modified some code segment) to get a perf chart.
> >
> > Yong.
> >
> > On Fri, 22 Jun 2007, Mike Wilde wrote:
> >
> >> Is there a configurable retry delay after failure?
> >>
> >> I think you need to examine the overall workflow dependency structure.
> >>
> >> Also, I recall from older perf charts that there's an option to enable/disable
> >> pipelining. With pipelining disabled, it seems that Swift will wait for an
> >> entire dataset/foreach or procedure to finish before starting any tasks that
> >> depend on the foreach or procedure.
> >>
> >> Mihael, can you look at some of these issues when you are back online and rested?
> >>
> >> - Mike
> >>
> >> Ioan Raicu wrote, On 6/22/2007 9:06 AM:
> >>> No, I didn't keep track of this info, unless Swift does this through
> >>> some of its logs.
> >>>
> >>> Over the last week, my observations have been the following: Swift is
> >>> more than capable and willing to send out many tasks as long as they are
> >>> independent (as can be seen in this graph where probably 6800 tasks got
> >>> submitted), but thereafter, it had no other burst of task submission,
> >>> although I believe it could have send out more. For example, there were
> >>> 2500+ tasks that failed in the middle of those 6800 tasks (which were
> >>> all independent), why were 2500 tasks not resubmitted all at once...
> >>> they were each about 200 seconds long, so most of them should have
> >>> certainly showed up in the wait queue.
> >>>
> >>> Ioan
> >>>
> >>> Ben Clifford wrote:
> >>>>> kept busy, and the Falkon queue length was relatively at 0... so this means
> >>>>> that Swift was not submitting fast enough to keep all the executors busy.
> >>>>>
> >>>> interesting. though around t=1000 there is a rapid burst of submission
> >>>> getting the queue length up to about 6000 in a few minutes.
> >>>>
> >>>> Do you know what the cpu time usage of the swift submitting JVM was over
> >>>> that time period?
> >>>>
> >>>>
> >>> --
> >>> ============================================
> >>> Ioan Raicu
> >>> Ph.D. Student
> >>> ============================================
> >>> Distributed Systems Laboratory
> >>> Computer Science Department
> >>> University of Chicago
> >>> 1100 E. 58th Street, Ryerson Hall
> >>> Chicago, IL 60637
> >>> ============================================
> >>> Email: iraicu at cs.uchicago.edu
> >>> Web: http://www.cs.uchicago.edu/~iraicu
> >>> http://dsl.cs.uchicago.edu/
> >>> ============================================
> >>> ============================================
> >>>
> >> --
> >> Mike Wilde
> >> Computation Institute, University of Chicago
> >> Math & Computer Science Division
> >> Argonne National Laboratory
> >> Argonne, IL 60439 USA
> >> tel 630-252-7497 fax 630-252-1997
> >> _______________________________________________
> >> Swift-devel mailing list
> >> Swift-devel at ci.uchicago.edu
> >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>
> >
> >
>
> --
> Mike Wilde
> Computation Institute, University of Chicago
> Math & Computer Science Division
> Argonne National Laboratory
> Argonne, IL 60439 USA
> tel 630-252-7497 fax 630-252-1997
>
More information about the Swift-devel
mailing list