[Swift-devel] Why does SGE provider only start one job per second?

David Kelly davidk at ci.uchicago.edu
Tue Sep 10 12:05:05 CDT 2013


I tried running similar tests with the slurm and pbs providers. I see a very similar rate of job submission (0.94.1 RC3) 

Submit script times on midway (slurm): 

Modify: 2013-09-10 11:43:31.266499000 -0500 
Modify: 2013-09-10 11:43:32.287140000 -0500 
Modify: 2013-09-10 11:43:33.308472000 -0500 
Modify: 2013-09-10 11:43:34.328796000 -0500 
Modify: 2013-09-10 11:43:35.329563000 -0500 

Submit script times on raven (pbs): 

Modify: 2013-09-10 11:54:17.000000000 -0500 
Modify: 2013-09-10 11:54:18.000000000 -0500 
Modify: 2013-09-10 11:54:19.000000000 -0500 
Modify: 2013-09-10 11:54:20.000000000 -0500 
Modify: 2013-09-10 11:54:21.000000000 -0500 
Modify: 2013-09-10 11:54:22.000000000 -0500 

----- Original Message -----

> From: "David Kelly" <davidk at ci.uchicago.edu>
> To: "Michael Wilde" <wilde at mcs.anl.gov>
> Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> Sent: Tuesday, September 10, 2013 10:53:15 AM
> Subject: Re: [Swift-devel] Why does SGE provider only start one job
> per second?

> It does seem to be something on the Swift side that is throttling
> somehow, only one submit file is being written per second.

> 013-09-09 21:41:42,670-0500 DEBUG AbstractExecutor Wrote SGE script
> to /clhome/WILDE/.globus/scripts/SGE7757422543183944116.submit
> 2013-09-09 21:41:44,002-0500 DEBUG AbstractExecutor Wrote SGE script
> to /clhome/WILDE/.globus/scripts/SGE7192137066159443413.submit
> 2013-09-09 21:41:45,040-0500 DEBUG AbstractExecutor Wrote SGE script
> to /clhome/WILDE/.globus/scripts/SGE906390723669036085.submit
> 2013-09-09 21:41:46,075-0500 DEBUG AbstractExecutor Wrote SGE script
> to /clhome/WILDE/.globus/scripts/SGE4921900111206072571.submit

> It could be something with the SGE provider. I'll take a look and see
> if I can find anything.

> I also remember seeing the setting below in the user guide. I've
> never tried changing this value before, but it might be worth a
> quick try:

> throttle.host.submit

> Valid values: <int>, off

> Default value: 2

> Limits the number of concurrent submissions for any of the sites
> Swift will try to send jobs to. In other words it guarantees that no
> more than the value of this throttle jobs sent to any site will be
> concurrently in a state of being submitted.
> ----- Original Message -----

> > From: "Michael Wilde" <wilde at mcs.anl.gov>
> 
> > To: "Swift Devel" <swift-devel at ci.uchicago.edu>
> 
> > Sent: Tuesday, September 10, 2013 10:31:40 AM
> 
> > Subject: [Swift-devel] Why does SGE provider only start one job per
> > second?
> 

> > Running on the SGE systems orthros, when I configure a sites file
> > for
> > 256 single-core workers, I see the workers starting one per second.
> > It seems that the SGE jobs themselves are only emitted one per
> > second.
> 

> > A log of a typical run is in:
> 
> > http://www.ci.uchicago.edu/~wilde/IndexStrain-20130909-2141-zq403y5f.log
> 

> > It starts off like this:
> 

> > 2013-09-09 21:41:40,195-0500 INFO RuntimeStats$ProgressTicker
> 
> > 2013-09-09 21:41:43,078-0500 INFO RuntimeStats$ProgressTicker Stage
> > in:1 Submitted:99
> 
> > 2013-09-09 21:41:44,428-0500 INFO RuntimeStats$ProgressTicker Stage
> > in:1 Submitted:98 Active:1
> 
> > 2013-09-09 21:41:45,476-0500 INFO RuntimeStats$ProgressTicker Stage
> > in:1 Submitted:97 Active:2
> 
> > 2013-09-09 21:41:46,500-0500 INFO RuntimeStats$ProgressTicker Stage
> > in:1 Submitted:96 Active:3
> 
> > 2013-09-09 21:41:47,814-0500 INFO RuntimeStats$ProgressTicker Stage
> > in:1 Submitted:95 Active:4
> 
> > 2013-09-09 21:41:49,643-0500 INFO RuntimeStats$ProgressTicker Stage
> > in:1 Submitted:93 Active:6
> 
> > 2013-09-09 21:41:50,686-0500 INFO RuntimeStats$ProgressTicker Stage
> > in:1 Submitted:92 Active:7
> 
> > 2013-09-09 21:41:51,704-0500 INFO RuntimeStats$ProgressTicker Stage
> > in:1 Submitted:91 Active:8
> 
> > 2013-09-09 21:41:52,772-0500 INFO RuntimeStats$ProgressTicker Stage
> > in:1 Submitted:90 Active:9
> 
> > 2013-09-09 21:41:53,807-0500 INFO RuntimeStats$ProgressTicker Stage
> > in:1 Submitted:89 Active:10
> 

> > In this case the Swift script consists of 100 independent apps
> > emitted right away from a foreach loop.
> 

> > As you can see above and in the log, the 100 apps become active at
> > a
> > 1-per-second rate.
> 

> > Can anyone tell me whats throttling the app launches in this
> > fashion?
> 

> > I dont recall this happening elsewhere, so perhaps its an artifact
> > of
> > the SGE provider.
> 

> > The sites.xml file is below.
> 

> > Thanks,
> 

> > - Mike
> 

> > <config>
> 

> > <pool handle="localhost">
> 
> > <execution provider="local"/>
> 
> > <filesystem provider="local"/>
> 
> > <workdirectory>{env.HOME}/swiftwork</workdirectory>
> 
> > </pool>
> 

> > <pool handle="cluster">
> 
> > <execution provider="coaster" jobmanager="local:sge"/>
> 

> > <!-- Set partition and account here: -->
> 
> > <profile namespace="globus" key="queue">sec1all.q</profile> -->
> 
> > <profile namespace="globus" key="pe">sec1_all</profile> -->
> 
> > <profile namespace="globus" key="ppn">1</profile>
> 
> > <!-- <profile namespace="globus" key="project">pi-wilde</profile>
> > -->
> 

> > <!-- Set number of jobs and nodes per job here: -->
> 
> > <profile namespace="globus" key="slots">320</profile>
> 
> > <profile namespace="globus" key="maxnodes">1</profile>
> 
> > <profile namespace="globus" key="nodegranularity">1</profile>
> 
> > <profile namespace="globus" key="jobsPerNode">1</profile> <!-- apps
> > per node! -->
> 
> > <profile namespace="karajan" key="jobThrottle">4.00</profile> <!--
> > eg
> > .11 -> 12 -->
> 

> > <!-- Set estimated app time (maxwalltime) and requested job time
> > (maxtime) here: -->
> 
> > <profile namespace="globus" key="maxWalltime">00:10:00</profile>
> 
> > <profile namespace="globus" key="maxtime">200000</profile> <!-- in
> > seconds! -->
> 

> > <!-- Set data staging model and work dir here: -->
> 
> > <filesystem provider="local"/>
> 
> > <workdirectory>/tmp/wilde/swiftwork</workdirectory>
> 

> > <!-- Typically leave these constant: -->
> 
> > <!-- <profile namespace="globus"
> > key="slurm.exclusive">false</profile> -->
> 
> > <profile namespace="globus" key="highOverAllocation">100</profile>
> 
> > <profile namespace="globus" key="lowOverAllocation">100</profile>
> 
> > <profile namespace="karajan" key="initialScore">10000</profile>
> 
> > </pool>
> 

> > </config>
> 
> > _______________________________________________
> 
> > Swift-devel mailing list
> 
> > Swift-devel at ci.uchicago.edu
> 
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> 

> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20130910/65415a57/attachment.html>


More information about the Swift-devel mailing list