[Swift-devel] submitting jobs to the queue

Mihael Hategan hategan at mcs.anl.gov
Fri Mar 9 11:46:42 CST 2007


Yeah. I think that's it. The ability to control the initial score. And
possibly automate that a little by considering a total score that gets
divided by the number of sites. That would limit the number of jobs sent
initially to all the sites (and this could be a much larger number). In
the one site case, that larger number would belong exclusively to the
one site.

On Fri, 2007-03-09 at 11:43 -0600, Yong Zhao wrote:
> right, for first batches, a user supplied hint would be more appropriate.
> 
> Yong.
> 
> On Fri, 9 Mar 2007, Mihael Hategan wrote:
> 
> > On Fri, 2007-03-09 at 11:27 -0600, Yong Zhao wrote:
> > > I have been thinking that the system should be smarter in dealing with
> > > such issues, without relying too much on a user's manual intervention. For
> > > job submission rate, or transfer rate, if we observe abnormality, for
> > > instance: ftp errors due to high transfer rate, the system should be able
> > > to slow down automatically. I am not quite sure about how to detect that
> > > jobs go through quickly to a scheduler, but if that is the case, the
> > > submission rate should be increased automatically.
> >
> > It is increased automatically. But the problem is at the start. Do you
> > send many jobs to a site without knowing anything about it? The
> > site-selector that Luiz worked on would split the jobs equally to sites
> > on the first round. That may be bad if you have highly asymmetrical
> > sites.
> >
> > >
> > > Yong.
> > >
> > > On Fri, 9 Mar 2007, Tiberiu Stef-Praun wrote:
> > >
> > > > Knob means "while in progress"
> > > > Is that doable ? (Probably extending your rudimentary debugger would do it).
> > > > How about the  following extension: can we easily create hooks
> > > > (webservices) into a running swift engine, that would allow this
> > > > manipulation with an external client (the knob driver) ?
> > > > Having more interactivity with a running workflow is something that
> > > > might be appealing for long-running or never-ending workflows, and
> > > > would differentiate us from others in a nice way. You would not
> > > > believe how many people are working on workflows: everybody and their
> > > > brother at the OSG meeting had some offering labeled "workflow". (I'm
> > > > exaggerating a bit here)
> > > >
> > > > Tibi
> > > >
> > > > On 3/9/07, Mihael Hategan <hategan at mcs.anl.gov> wrote:
> > > > > Yes, although we need to come up with a nicer way to do it.
> > > > > In libexec/scheduler.xml, change <property name="jobThrottle"
> > > > > value="4"/> to value="large number" (not literally).
> > > > >
> > > > > Mihael
> > > > >
> > > > > On Fri, 2007-03-09 at 11:06 -0600, Veronika V. Nefedova wrote:
> > > > > > Hi, Mihael:
> > > > > >
> > > > > > Is it possible to remove this feature in the one site case ? For example,
> > > > > > the queue is now almost empty on TG, but I have to wait for 1.5 hours for
> > > > > > the rest of my jobs to be submitted (thats the average running time of my
> > > > > > job) - and the queue might be full by that time...
> > > > > >
> > > > > > Nika
> > > > > >
> > > > > > At 04:36 PM 3/7/2007, Mihael Hategan wrote:
> > > > > > >On Wed, 2007-03-07 at 16:30 -0600, Veronika V. Nefedova wrote:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I've noticed one very strange behavior. For example, I have 68 jobs to be
> > > > > > > > submitted to the remote host simultaneously. Swift submits at first
> > > > > > > just 26
> > > > > > > > jobs. I checked that several times - its always 26 jobs. Then, when at
> > > > > > > > least one job out of those 26 is finished - swift goes ahead and submits
> > > > > > > > the rest (all of those left - 42 in my case).
> > > > > > > > Is it a bug or a feature?
> > > > > > >
> > > > > > >Feature. Although it should probably be tamed down in the one site case.
> > > > > > >Each site has a score that changes based on how it behaves. If a site
> > > > > > >completes jobs ok, it gets a higher score in time. If jobs fail on it,
> > > > > > >it gets a lower score.
> > > > > > >
> > > > > > >Now, let's consider the following scenario: 2 sites, one fast one slow.
> > > > > > >With no scores and no limitations, half of the jobs would go to one, and
> > > > > > >half to the other. The workflow finishes when the slow site finishes
> > > > > > >half the jobs.
> > > > > > >What happens however, is that Swift limits the number of initial jobs,
> > > > > > >and does "probing". This allows it to infer some stuff about the sites
> > > > > > >by the time it gets to submit lots of jobs. It should yield better
> > > > > > >performance on larger workflows with imbalanced sites, which is, I'm
> > > > > > >guessing, our main scenario.
> > > > > > >
> > > > > > > >
> > > > > > > > Nika
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > Swift-devel mailing list
> > > > > > > > Swift-devel at ci.uchicago.edu
> > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > > _______________________________________________
> > > > > Swift-devel mailing list
> > > > > Swift-devel at ci.uchicago.edu
> > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > > >
> > > >
> > > >
> > > > --
> > > > Tiberiu (Tibi) Stef-Praun, PhD
> > > > Research Staff, Computation Institute
> > > > 5640 S. Ellis Ave, #405
> > > > University of Chicago
> > > > http://www-unix.mcs.anl.gov/~tiberius/
> > > > _______________________________________________
> > > > Swift-devel mailing list
> > > > Swift-devel at ci.uchicago.edu
> > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > >
> > >
> >
> >
> 




More information about the Swift-devel mailing list