[Swift-devel] submitting jobs to the queue

Mihael Hategan hategan at mcs.anl.gov
Fri Mar 9 12:32:24 CST 2007


On Fri, 2007-03-09 at 12:08 -0600, Veronika V. Nefedova wrote:
> So how the initial score is determined ? 

The initial score is currently 1.
The limit to the number of jobs is score*jobThrottle + 2.

> By the waiting time in the queue ? 
> Or does it send any probing job (or qstat ) to check the queue availability 
> ?

That is a possibility, but it may require some assumptions about the
exact queuing system that is installed. I heard that MDS should provide
this information. I have though yet been unable to see any details on
that. Perhaps somebody has some pointers.

>  If we have two sites - one has an empty queue, another has a full queue - 
> how the submission of jobs will be handled to both sites?

Initially both will get the same score and the same amount of jobs. When
a job completes successfully, the score for that site is increased. In
the above case, the one with the empty queue will finish the jobs, which
will increase its score and cause it to get more jobs, while the one
with the full queue will still only have the initial jobs.

> 
> Nika
> 
> At 11:46 AM 3/9/2007, Mihael Hategan wrote:
> >Yeah. I think that's it. The ability to control the initial score. And
> >possibly automate that a little by considering a total score that gets
> >divided by the number of sites. That would limit the number of jobs sent
> >initially to all the sites (and this could be a much larger number). In
> >the one site case, that larger number would belong exclusively to the
> >one site.
> >
> >On Fri, 2007-03-09 at 11:43 -0600, Yong Zhao wrote:
> > > right, for first batches, a user supplied hint would be more appropriate.
> > >
> > > Yong.
> > >
> > > On Fri, 9 Mar 2007, Mihael Hategan wrote:
> > >
> > > > On Fri, 2007-03-09 at 11:27 -0600, Yong Zhao wrote:
> > > > > I have been thinking that the system should be smarter in dealing with
> > > > > such issues, without relying too much on a user's manual 
> > intervention. For
> > > > > job submission rate, or transfer rate, if we observe abnormality, for
> > > > > instance: ftp errors due to high transfer rate, the system should 
> > be able
> > > > > to slow down automatically. I am not quite sure about how to detect 
> > that
> > > > > jobs go through quickly to a scheduler, but if that is the case, the
> > > > > submission rate should be increased automatically.
> > > >
> > > > It is increased automatically. But the problem is at the start. Do you
> > > > send many jobs to a site without knowing anything about it? The
> > > > site-selector that Luiz worked on would split the jobs equally to sites
> > > > on the first round. That may be bad if you have highly asymmetrical
> > > > sites.
> > > >
> > > > >
> > > > > Yong.
> > > > >
> > > > > On Fri, 9 Mar 2007, Tiberiu Stef-Praun wrote:
> > > > >
> > > > > > Knob means "while in progress"
> > > > > > Is that doable ? (Probably extending your rudimentary debugger 
> > would do it).
> > > > > > How about the  following extension: can we easily create hooks
> > > > > > (webservices) into a running swift engine, that would allow this
> > > > > > manipulation with an external client (the knob driver) ?
> > > > > > Having more interactivity with a running workflow is something that
> > > > > > might be appealing for long-running or never-ending workflows, and
> > > > > > would differentiate us from others in a nice way. You would not
> > > > > > believe how many people are working on workflows: everybody and their
> > > > > > brother at the OSG meeting had some offering labeled "workflow". (I'm
> > > > > > exaggerating a bit here)
> > > > > >
> > > > > > Tibi
> > > > > >
> > > > > > On 3/9/07, Mihael Hategan <hategan at mcs.anl.gov> wrote:
> > > > > > > Yes, although we need to come up with a nicer way to do it.
> > > > > > > In libexec/scheduler.xml, change <property name="jobThrottle"
> > > > > > > value="4"/> to value="large number" (not literally).
> > > > > > >
> > > > > > > Mihael
> > > > > > >
> > > > > > > On Fri, 2007-03-09 at 11:06 -0600, Veronika V. Nefedova wrote:
> > > > > > > > Hi, Mihael:
> > > > > > > >
> > > > > > > > Is it possible to remove this feature in the one site case ? 
> > For example,
> > > > > > > > the queue is now almost empty on TG, but I have to wait for 
> > 1.5 hours for
> > > > > > > > the rest of my jobs to be submitted (thats the average 
> > running time of my
> > > > > > > > job) - and the queue might be full by that time...
> > > > > > > >
> > > > > > > > Nika
> > > > > > > >
> > > > > > > > At 04:36 PM 3/7/2007, Mihael Hategan wrote:
> > > > > > > > >On Wed, 2007-03-07 at 16:30 -0600, Veronika V. Nefedova wrote:
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > I've noticed one very strange behavior. For example, I 
> > have 68 jobs to be
> > > > > > > > > > submitted to the remote host simultaneously. Swift 
> > submits at first
> > > > > > > > > just 26
> > > > > > > > > > jobs. I checked that several times - its always 26 jobs. 
> > Then, when at
> > > > > > > > > > least one job out of those 26 is finished - swift goes 
> > ahead and submits
> > > > > > > > > > the rest (all of those left - 42 in my case).
> > > > > > > > > > Is it a bug or a feature?
> > > > > > > > >
> > > > > > > > >Feature. Although it should probably be tamed down in the 
> > one site case.
> > > > > > > > >Each site has a score that changes based on how it behaves. 
> > If a site
> > > > > > > > >completes jobs ok, it gets a higher score in time. If jobs 
> > fail on it,
> > > > > > > > >it gets a lower score.
> > > > > > > > >
> > > > > > > > >Now, let's consider the following scenario: 2 sites, one 
> > fast one slow.
> > > > > > > > >With no scores and no limitations, half of the jobs would go 
> > to one, and
> > > > > > > > >half to the other. The workflow finishes when the slow site 
> > finishes
> > > > > > > > >half the jobs.
> > > > > > > > >What happens however, is that Swift limits the number of 
> > initial jobs,
> > > > > > > > >and does "probing". This allows it to infer some stuff about 
> > the sites
> > > > > > > > >by the time it gets to submit lots of jobs. It should yield 
> > better
> > > > > > > > >performance on larger workflows with imbalanced sites, which 
> > is, I'm
> > > > > > > > >guessing, our main scenario.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Nika
> > > > > > > > > >
> > > > > > > > > > _______________________________________________
> > > > > > > > > > Swift-devel mailing list
> > > > > > > > > > Swift-devel at ci.uchicago.edu
> > > > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > Swift-devel mailing list
> > > > > > > Swift-devel at ci.uchicago.edu
> > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Tiberiu (Tibi) Stef-Praun, PhD
> > > > > > Research Staff, Computation Institute
> > > > > > 5640 S. Ellis Ave, #405
> > > > > > University of Chicago
> > > > > > http://www-unix.mcs.anl.gov/~tiberius/
> > > > > > _______________________________________________
> > > > > > Swift-devel mailing list
> > > > > > Swift-devel at ci.uchicago.edu
> > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > > > >
> > > > >
> > > >
> > > >
> > >
> >
> >_______________________________________________
> >Swift-devel mailing list
> >Swift-devel at ci.uchicago.edu
> >http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 
> 




More information about the Swift-devel mailing list