[Swift-devel] submitting jobs to the queue

Mihael Hategan hategan at mcs.anl.gov
Wed Mar 7 23:03:08 CST 2007


On Wed, 2007-03-07 at 22:49 -0600, Ian Foster wrote:
> I think that all of these issues will go away soon, when we start
> using the dynamic provisioning code that Ioan is working on.

In theory, yes. Practice has a tendency to come up with new problems
though. I would not bet all my money on something complex that's not yet
there.

Mihael

>  So I wonder if they are worth worrying about too much?
> 
> Ian.
> 
> Mihael Hategan wrote: 
> > So this limit would have to be a per-site limit.
> > There is no such thing right now. You can limit the total number of
> > concurrent jobs, but it's not exposed through swift.properties.
> > 
> > In libexec/scheduler.xml, you can try adding the following thing inside
> > <scheduler>...</scheduler>:
> > 
> > <property name="maxSimultaneousJobs" value="384"/>
> > 
> > Mihael
> > 
> > On Wed, 2007-03-07 at 17:27 -0600, Veronika V. Nefedova wrote:
> >   
> > > Right. Teragrid at NCSA has the limit of 384 queued or running jobs per user.
> > > 
> > > Nika
> > > 
> > > At 05:19 PM 3/7/2007, Mihael Hategan wrote:
> > >     
> > > > On Wed, 2007-03-07 at 16:58 -0600, Veronika V. Nefedova wrote:
> > > >       
> > > > > OK, Here is my another question.
> > > > > Teragrid allows the user to have 385 jobs in a queue. If I run my complete
> > > > > workflow (244 molecules), on stage four I'll have 80 times 244 jobs (i.e.
> > > > > close to 20K). How do I set the limit for the number of submitted jobs to
> > > > > the queue to 385 ? I remember that condor had a specific parameter to
> > > > > condor_submit that was managing exactly that...
> > > > >         
> > > > Is this 385 jobs per site?
> > > > 
> > > >       
> > > > > Nika
> > > > > 
> > > > > At 04:36 PM 3/7/2007, Mihael Hategan wrote:
> > > > >         
> > > > > > On Wed, 2007-03-07 at 16:30 -0600, Veronika V. Nefedova wrote:
> > > > > >           
> > > > > > > Hi,
> > > > > > > 
> > > > > > > I've noticed one very strange behavior. For example, I have 68 jobs 
> > > > > > >             
> > > > to be
> > > >       
> > > > > > > submitted to the remote host simultaneously. Swift submits at first
> > > > > > >             
> > > > > > just 26
> > > > > >           
> > > > > > > jobs. I checked that several times - its always 26 jobs. Then, when at
> > > > > > > least one job out of those 26 is finished - swift goes ahead and 
> > > > > > >             
> > > > submits
> > > >       
> > > > > > > the rest (all of those left - 42 in my case).
> > > > > > > Is it a bug or a feature?
> > > > > > >             
> > > > > > Feature. Although it should probably be tamed down in the one site case.
> > > > > > Each site has a score that changes based on how it behaves. If a site
> > > > > > completes jobs ok, it gets a higher score in time. If jobs fail on it,
> > > > > > it gets a lower score.
> > > > > > 
> > > > > > Now, let's consider the following scenario: 2 sites, one fast one slow.
> > > > > > With no scores and no limitations, half of the jobs would go to one, and
> > > > > > half to the other. The workflow finishes when the slow site finishes
> > > > > > half the jobs.
> > > > > > What happens however, is that Swift limits the number of initial jobs,
> > > > > > and does "probing". This allows it to infer some stuff about the sites
> > > > > > by the time it gets to submit lots of jobs. It should yield better
> > > > > > performance on larger workflows with imbalanced sites, which is, I'm
> > > > > > guessing, our main scenario.
> > > > > > 
> > > > > >           
> > > > > > > Nika
> > > > > > > 
> > > > > > > _______________________________________________
> > > > > > > Swift-devel mailing list
> > > > > > > Swift-devel at ci.uchicago.edu
> > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > > > > > 
> > > > > > >             
> > > > >         
> > >     
> > 
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > 
> >   
> 
> -- 
> 
>    Ian Foster, Director, Computation Institute
> Argonne National Laboratory & University of Chicago
> Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
> Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
> Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
>       Globus Alliance: www.globus.org.




More information about the Swift-devel mailing list