[Swift-devel] submitting jobs to the queue
Mihael Hategan
hategan at mcs.anl.gov
Fri Mar 9 11:38:42 CST 2007
The other thing that could be done is giving sites a different initial
score.
On Fri, 2007-03-09 at 11:35 -0600, Mihael Hategan wrote:
> On Fri, 2007-03-09 at 11:27 -0600, Yong Zhao wrote:
> > I have been thinking that the system should be smarter in dealing with
> > such issues, without relying too much on a user's manual intervention. For
> > job submission rate, or transfer rate, if we observe abnormality, for
> > instance: ftp errors due to high transfer rate, the system should be able
> > to slow down automatically. I am not quite sure about how to detect that
> > jobs go through quickly to a scheduler, but if that is the case, the
> > submission rate should be increased automatically.
>
> It is increased automatically. But the problem is at the start. Do you
> send many jobs to a site without knowing anything about it? The
> site-selector that Luiz worked on would split the jobs equally to sites
> on the first round. That may be bad if you have highly asymmetrical
> sites.
>
> >
> > Yong.
> >
> > On Fri, 9 Mar 2007, Tiberiu Stef-Praun wrote:
> >
> > > Knob means "while in progress"
> > > Is that doable ? (Probably extending your rudimentary debugger would do it).
> > > How about the following extension: can we easily create hooks
> > > (webservices) into a running swift engine, that would allow this
> > > manipulation with an external client (the knob driver) ?
> > > Having more interactivity with a running workflow is something that
> > > might be appealing for long-running or never-ending workflows, and
> > > would differentiate us from others in a nice way. You would not
> > > believe how many people are working on workflows: everybody and their
> > > brother at the OSG meeting had some offering labeled "workflow". (I'm
> > > exaggerating a bit here)
> > >
> > > Tibi
> > >
> > > On 3/9/07, Mihael Hategan <hategan at mcs.anl.gov> wrote:
> > > > Yes, although we need to come up with a nicer way to do it.
> > > > In libexec/scheduler.xml, change <property name="jobThrottle"
> > > > value="4"/> to value="large number" (not literally).
> > > >
> > > > Mihael
> > > >
> > > > On Fri, 2007-03-09 at 11:06 -0600, Veronika V. Nefedova wrote:
> > > > > Hi, Mihael:
> > > > >
> > > > > Is it possible to remove this feature in the one site case ? For example,
> > > > > the queue is now almost empty on TG, but I have to wait for 1.5 hours for
> > > > > the rest of my jobs to be submitted (thats the average running time of my
> > > > > job) - and the queue might be full by that time...
> > > > >
> > > > > Nika
> > > > >
> > > > > At 04:36 PM 3/7/2007, Mihael Hategan wrote:
> > > > > >On Wed, 2007-03-07 at 16:30 -0600, Veronika V. Nefedova wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > I've noticed one very strange behavior. For example, I have 68 jobs to be
> > > > > > > submitted to the remote host simultaneously. Swift submits at first
> > > > > > just 26
> > > > > > > jobs. I checked that several times - its always 26 jobs. Then, when at
> > > > > > > least one job out of those 26 is finished - swift goes ahead and submits
> > > > > > > the rest (all of those left - 42 in my case).
> > > > > > > Is it a bug or a feature?
> > > > > >
> > > > > >Feature. Although it should probably be tamed down in the one site case.
> > > > > >Each site has a score that changes based on how it behaves. If a site
> > > > > >completes jobs ok, it gets a higher score in time. If jobs fail on it,
> > > > > >it gets a lower score.
> > > > > >
> > > > > >Now, let's consider the following scenario: 2 sites, one fast one slow.
> > > > > >With no scores and no limitations, half of the jobs would go to one, and
> > > > > >half to the other. The workflow finishes when the slow site finishes
> > > > > >half the jobs.
> > > > > >What happens however, is that Swift limits the number of initial jobs,
> > > > > >and does "probing". This allows it to infer some stuff about the sites
> > > > > >by the time it gets to submit lots of jobs. It should yield better
> > > > > >performance on larger workflows with imbalanced sites, which is, I'm
> > > > > >guessing, our main scenario.
> > > > > >
> > > > > > >
> > > > > > > Nika
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > Swift-devel mailing list
> > > > > > > Swift-devel at ci.uchicago.edu
> > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > > > > >
> > > > >
> > > > >
> > > >
> > > > _______________________________________________
> > > > Swift-devel mailing list
> > > > Swift-devel at ci.uchicago.edu
> > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > >
> > >
> > >
> > > --
> > > Tiberiu (Tibi) Stef-Praun, PhD
> > > Research Staff, Computation Institute
> > > 5640 S. Ellis Ave, #405
> > > University of Chicago
> > > http://www-unix.mcs.anl.gov/~tiberius/
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > >
> >
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
More information about the Swift-devel
mailing list