[Swift-devel] submitting jobs to the queue

Ian Foster foster at mcs.anl.gov
Wed Mar 7 22:49:28 CST 2007


I think that all of these issues will go away soon, when we start using 
the dynamic provisioning code that Ioan is working on. So I wonder if 
they are worth worrying about too much?

Ian.

Mihael Hategan wrote:
> So this limit would have to be a per-site limit.
> There is no such thing right now. You can limit the total number of
> concurrent jobs, but it's not exposed through swift.properties.
>
> In libexec/scheduler.xml, you can try adding the following thing inside
> <scheduler>...</scheduler>:
>
> <property name="maxSimultaneousJobs" value="384"/>
>
> Mihael
>
> On Wed, 2007-03-07 at 17:27 -0600, Veronika V. Nefedova wrote:
>   
>> Right. Teragrid at NCSA has the limit of 384 queued or running jobs per user.
>>
>> Nika
>>
>> At 05:19 PM 3/7/2007, Mihael Hategan wrote:
>>     
>>> On Wed, 2007-03-07 at 16:58 -0600, Veronika V. Nefedova wrote:
>>>       
>>>> OK, Here is my another question.
>>>> Teragrid allows the user to have 385 jobs in a queue. If I run my complete
>>>> workflow (244 molecules), on stage four I'll have 80 times 244 jobs (i.e.
>>>> close to 20K). How do I set the limit for the number of submitted jobs to
>>>> the queue to 385 ? I remember that condor had a specific parameter to
>>>> condor_submit that was managing exactly that...
>>>>         
>>> Is this 385 jobs per site?
>>>
>>>       
>>>> Nika
>>>>
>>>> At 04:36 PM 3/7/2007, Mihael Hategan wrote:
>>>>         
>>>>> On Wed, 2007-03-07 at 16:30 -0600, Veronika V. Nefedova wrote:
>>>>>           
>>>>>> Hi,
>>>>>>
>>>>>> I've noticed one very strange behavior. For example, I have 68 jobs 
>>>>>>             
>>> to be
>>>       
>>>>>> submitted to the remote host simultaneously. Swift submits at first
>>>>>>             
>>>>> just 26
>>>>>           
>>>>>> jobs. I checked that several times - its always 26 jobs. Then, when at
>>>>>> least one job out of those 26 is finished - swift goes ahead and 
>>>>>>             
>>> submits
>>>       
>>>>>> the rest (all of those left - 42 in my case).
>>>>>> Is it a bug or a feature?
>>>>>>             
>>>>> Feature. Although it should probably be tamed down in the one site case.
>>>>> Each site has a score that changes based on how it behaves. If a site
>>>>> completes jobs ok, it gets a higher score in time. If jobs fail on it,
>>>>> it gets a lower score.
>>>>>
>>>>> Now, let's consider the following scenario: 2 sites, one fast one slow.
>>>>> With no scores and no limitations, half of the jobs would go to one, and
>>>>> half to the other. The workflow finishes when the slow site finishes
>>>>> half the jobs.
>>>>> What happens however, is that Swift limits the number of initial jobs,
>>>>> and does "probing". This allows it to infer some stuff about the sites
>>>>> by the time it gets to submit lots of jobs. It should yield better
>>>>> performance on larger workflows with imbalanced sites, which is, I'm
>>>>> guessing, our main scenario.
>>>>>
>>>>>           
>>>>>> Nika
>>>>>>
>>>>>> _______________________________________________
>>>>>> Swift-devel mailing list
>>>>>> Swift-devel at ci.uchicago.edu
>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>>
>>>>>>             
>>>>         
>>     
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
>   

-- 

   Ian Foster, Director, Computation Institute
Argonne National Laboratory & University of Chicago
Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
      Globus Alliance: www.globus.org.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070307/821a51c9/attachment.html>


More information about the Swift-devel mailing list