[Swift-devel] submitting jobs to the queue

Veronika V. Nefedova nefedova at mcs.anl.gov
Fri Mar 9 12:08:19 CST 2007


So how the initial score is determined ? By the waiting time in the queue ? 
Or does it send any probing job (or qstat ) to check the queue availability 
? If we have two sites - one has an empty queue, another has a full queue - 
how the submission of jobs will be handled to both sites?

Nika

At 11:46 AM 3/9/2007, Mihael Hategan wrote:
>Yeah. I think that's it. The ability to control the initial score. And
>possibly automate that a little by considering a total score that gets
>divided by the number of sites. That would limit the number of jobs sent
>initially to all the sites (and this could be a much larger number). In
>the one site case, that larger number would belong exclusively to the
>one site.
>
>On Fri, 2007-03-09 at 11:43 -0600, Yong Zhao wrote:
> > right, for first batches, a user supplied hint would be more appropriate.
> >
> > Yong.
> >
> > On Fri, 9 Mar 2007, Mihael Hategan wrote:
> >
> > > On Fri, 2007-03-09 at 11:27 -0600, Yong Zhao wrote:
> > > > I have been thinking that the system should be smarter in dealing with
> > > > such issues, without relying too much on a user's manual 
> intervention. For
> > > > job submission rate, or transfer rate, if we observe abnormality, for
> > > > instance: ftp errors due to high transfer rate, the system should 
> be able
> > > > to slow down automatically. I am not quite sure about how to detect 
> that
> > > > jobs go through quickly to a scheduler, but if that is the case, the
> > > > submission rate should be increased automatically.
> > >
> > > It is increased automatically. But the problem is at the start. Do you
> > > send many jobs to a site without knowing anything about it? The
> > > site-selector that Luiz worked on would split the jobs equally to sites
> > > on the first round. That may be bad if you have highly asymmetrical
> > > sites.
> > >
> > > >
> > > > Yong.
> > > >
> > > > On Fri, 9 Mar 2007, Tiberiu Stef-Praun wrote:
> > > >
> > > > > Knob means "while in progress"
> > > > > Is that doable ? (Probably extending your rudimentary debugger 
> would do it).
> > > > > How about the  following extension: can we easily create hooks
> > > > > (webservices) into a running swift engine, that would allow this
> > > > > manipulation with an external client (the knob driver) ?
> > > > > Having more interactivity with a running workflow is something that
> > > > > might be appealing for long-running or never-ending workflows, and
> > > > > would differentiate us from others in a nice way. You would not
> > > > > believe how many people are working on workflows: everybody and their
> > > > > brother at the OSG meeting had some offering labeled "workflow". (I'm
> > > > > exaggerating a bit here)
> > > > >
> > > > > Tibi
> > > > >
> > > > > On 3/9/07, Mihael Hategan <hategan at mcs.anl.gov> wrote:
> > > > > > Yes, although we need to come up with a nicer way to do it.
> > > > > > In libexec/scheduler.xml, change <property name="jobThrottle"
> > > > > > value="4"/> to value="large number" (not literally).
> > > > > >
> > > > > > Mihael
> > > > > >
> > > > > > On Fri, 2007-03-09 at 11:06 -0600, Veronika V. Nefedova wrote:
> > > > > > > Hi, Mihael:
> > > > > > >
> > > > > > > Is it possible to remove this feature in the one site case ? 
> For example,
> > > > > > > the queue is now almost empty on TG, but I have to wait for 
> 1.5 hours for
> > > > > > > the rest of my jobs to be submitted (thats the average 
> running time of my
> > > > > > > job) - and the queue might be full by that time...
> > > > > > >
> > > > > > > Nika
> > > > > > >
> > > > > > > At 04:36 PM 3/7/2007, Mihael Hategan wrote:
> > > > > > > >On Wed, 2007-03-07 at 16:30 -0600, Veronika V. Nefedova wrote:
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > I've noticed one very strange behavior. For example, I 
> have 68 jobs to be
> > > > > > > > > submitted to the remote host simultaneously. Swift 
> submits at first
> > > > > > > > just 26
> > > > > > > > > jobs. I checked that several times - its always 26 jobs. 
> Then, when at
> > > > > > > > > least one job out of those 26 is finished - swift goes 
> ahead and submits
> > > > > > > > > the rest (all of those left - 42 in my case).
> > > > > > > > > Is it a bug or a feature?
> > > > > > > >
> > > > > > > >Feature. Although it should probably be tamed down in the 
> one site case.
> > > > > > > >Each site has a score that changes based on how it behaves. 
> If a site
> > > > > > > >completes jobs ok, it gets a higher score in time. If jobs 
> fail on it,
> > > > > > > >it gets a lower score.
> > > > > > > >
> > > > > > > >Now, let's consider the following scenario: 2 sites, one 
> fast one slow.
> > > > > > > >With no scores and no limitations, half of the jobs would go 
> to one, and
> > > > > > > >half to the other. The workflow finishes when the slow site 
> finishes
> > > > > > > >half the jobs.
> > > > > > > >What happens however, is that Swift limits the number of 
> initial jobs,
> > > > > > > >and does "probing". This allows it to infer some stuff about 
> the sites
> > > > > > > >by the time it gets to submit lots of jobs. It should yield 
> better
> > > > > > > >performance on larger workflows with imbalanced sites, which 
> is, I'm
> > > > > > > >guessing, our main scenario.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Nika
> > > > > > > > >
> > > > > > > > > _______________________________________________
> > > > > > > > > Swift-devel mailing list
> > > > > > > > > Swift-devel at ci.uchicago.edu
> > > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > Swift-devel mailing list
> > > > > > Swift-devel at ci.uchicago.edu
> > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Tiberiu (Tibi) Stef-Praun, PhD
> > > > > Research Staff, Computation Institute
> > > > > 5640 S. Ellis Ave, #405
> > > > > University of Chicago
> > > > > http://www-unix.mcs.anl.gov/~tiberius/
> > > > > _______________________________________________
> > > > > Swift-devel mailing list
> > > > > Swift-devel at ci.uchicago.edu
> > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > > >
> > > >
> > >
> > >
> >
>
>_______________________________________________
>Swift-devel mailing list
>Swift-devel at ci.uchicago.edu
>http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel





More information about the Swift-devel mailing list