[Swift-devel] Why does SGE provider only start one job per second?

David Kelly davidk at ci.uchicago.edu
Tue Sep 10 10:53:15 CDT 2013


It does seem to be something on the Swift side that is throttling somehow, only one submit file is being written per second. 

013-09-09 21:41:42,670-0500 DEBUG AbstractExecutor Wrote SGE script to /clhome/WILDE/.globus/scripts/SGE7757422543183944116.submit 
2013-09-09 21:41:44,002-0500 DEBUG AbstractExecutor Wrote SGE script to /clhome/WILDE/.globus/scripts/SGE7192137066159443413.submit 
2013-09-09 21:41:45,040-0500 DEBUG AbstractExecutor Wrote SGE script to /clhome/WILDE/.globus/scripts/SGE906390723669036085.submit 
2013-09-09 21:41:46,075-0500 DEBUG AbstractExecutor Wrote SGE script to /clhome/WILDE/.globus/scripts/SGE4921900111206072571.submit 

It could be something with the SGE provider. I'll take a look and see if I can find anything. 

I also remember seeing the setting below in the user guide. I've never tried changing this value before, but it might be worth a quick try: 

throttle.host.submit 

Valid values: <int>, off 

Default value: 2 

Limits the number of concurrent submissions for any of the sites
Swift will try to send jobs to. In other words it guarantees that no
more than the value of this throttle jobs sent to any site will be
concurrently in a state of being submitted. 
----- Original Message -----

> From: "Michael Wilde" <wilde at mcs.anl.gov>
> To: "Swift Devel" <swift-devel at ci.uchicago.edu>
> Sent: Tuesday, September 10, 2013 10:31:40 AM
> Subject: [Swift-devel] Why does SGE provider only start one job per
> second?

> Running on the SGE systems orthros, when I configure a sites file for
> 256 single-core workers, I see the workers starting one per second.
> It seems that the SGE jobs themselves are only emitted one per
> second.

> A log of a typical run is in:
> http://www.ci.uchicago.edu/~wilde/IndexStrain-20130909-2141-zq403y5f.log

> It starts off like this:

> 2013-09-09 21:41:40,195-0500 INFO RuntimeStats$ProgressTicker
> 2013-09-09 21:41:43,078-0500 INFO RuntimeStats$ProgressTicker Stage
> in:1 Submitted:99
> 2013-09-09 21:41:44,428-0500 INFO RuntimeStats$ProgressTicker Stage
> in:1 Submitted:98 Active:1
> 2013-09-09 21:41:45,476-0500 INFO RuntimeStats$ProgressTicker Stage
> in:1 Submitted:97 Active:2
> 2013-09-09 21:41:46,500-0500 INFO RuntimeStats$ProgressTicker Stage
> in:1 Submitted:96 Active:3
> 2013-09-09 21:41:47,814-0500 INFO RuntimeStats$ProgressTicker Stage
> in:1 Submitted:95 Active:4
> 2013-09-09 21:41:49,643-0500 INFO RuntimeStats$ProgressTicker Stage
> in:1 Submitted:93 Active:6
> 2013-09-09 21:41:50,686-0500 INFO RuntimeStats$ProgressTicker Stage
> in:1 Submitted:92 Active:7
> 2013-09-09 21:41:51,704-0500 INFO RuntimeStats$ProgressTicker Stage
> in:1 Submitted:91 Active:8
> 2013-09-09 21:41:52,772-0500 INFO RuntimeStats$ProgressTicker Stage
> in:1 Submitted:90 Active:9
> 2013-09-09 21:41:53,807-0500 INFO RuntimeStats$ProgressTicker Stage
> in:1 Submitted:89 Active:10

> In this case the Swift script consists of 100 independent apps
> emitted right away from a foreach loop.

> As you can see above and in the log, the 100 apps become active at a
> 1-per-second rate.

> Can anyone tell me whats throttling the app launches in this fashion?

> I dont recall this happening elsewhere, so perhaps its an artifact of
> the SGE provider.

> The sites.xml file is below.

> Thanks,

> - Mike

> <config>

> <pool handle="localhost">
> <execution provider="local"/>
> <filesystem provider="local"/>
> <workdirectory>{env.HOME}/swiftwork</workdirectory>
> </pool>

> <pool handle="cluster">
> <execution provider="coaster" jobmanager="local:sge"/>

> <!-- Set partition and account here: -->
> <profile namespace="globus" key="queue">sec1all.q</profile> -->
> <profile namespace="globus" key="pe">sec1_all</profile> -->
> <profile namespace="globus" key="ppn">1</profile>
> <!-- <profile namespace="globus" key="project">pi-wilde</profile> -->

> <!-- Set number of jobs and nodes per job here: -->
> <profile namespace="globus" key="slots">320</profile>
> <profile namespace="globus" key="maxnodes">1</profile>
> <profile namespace="globus" key="nodegranularity">1</profile>
> <profile namespace="globus" key="jobsPerNode">1</profile> <!-- apps
> per node! -->
> <profile namespace="karajan" key="jobThrottle">4.00</profile> <!-- eg
> .11 -> 12 -->

> <!-- Set estimated app time (maxwalltime) and requested job time
> (maxtime) here: -->
> <profile namespace="globus" key="maxWalltime">00:10:00</profile>
> <profile namespace="globus" key="maxtime">200000</profile> <!-- in
> seconds! -->

> <!-- Set data staging model and work dir here: -->
> <filesystem provider="local"/>
> <workdirectory>/tmp/wilde/swiftwork</workdirectory>

> <!-- Typically leave these constant: -->
> <!-- <profile namespace="globus"
> key="slurm.exclusive">false</profile> -->
> <profile namespace="globus" key="highOverAllocation">100</profile>
> <profile namespace="globus" key="lowOverAllocation">100</profile>
> <profile namespace="karajan" key="initialScore">10000</profile>
> </pool>

> </config>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20130910/3e34e6c0/attachment.html>


More information about the Swift-devel mailing list