[Swift-devel] Why does SGE provider only start one job per second?
Michael Wilde
wilde at mcs.anl.gov
Tue Sep 10 10:31:40 CDT 2013
Running on the SGE systems orthros, when I configure a sites file for 256 single-core workers, I see the workers starting one per second. It seems that the SGE jobs themselves are only emitted one per second.
A log of a typical run is in:
http://www.ci.uchicago.edu/~wilde/IndexStrain-20130909-2141-zq403y5f.log
It starts off like this:
2013-09-09 21:41:40,195-0500 INFO RuntimeStats$ProgressTicker
2013-09-09 21:41:43,078-0500 INFO RuntimeStats$ProgressTicker Stage in:1 Submitted:99
2013-09-09 21:41:44,428-0500 INFO RuntimeStats$ProgressTicker Stage in:1 Submitted:98 Active:1
2013-09-09 21:41:45,476-0500 INFO RuntimeStats$ProgressTicker Stage in:1 Submitted:97 Active:2
2013-09-09 21:41:46,500-0500 INFO RuntimeStats$ProgressTicker Stage in:1 Submitted:96 Active:3
2013-09-09 21:41:47,814-0500 INFO RuntimeStats$ProgressTicker Stage in:1 Submitted:95 Active:4
2013-09-09 21:41:49,643-0500 INFO RuntimeStats$ProgressTicker Stage in:1 Submitted:93 Active:6
2013-09-09 21:41:50,686-0500 INFO RuntimeStats$ProgressTicker Stage in:1 Submitted:92 Active:7
2013-09-09 21:41:51,704-0500 INFO RuntimeStats$ProgressTicker Stage in:1 Submitted:91 Active:8
2013-09-09 21:41:52,772-0500 INFO RuntimeStats$ProgressTicker Stage in:1 Submitted:90 Active:9
2013-09-09 21:41:53,807-0500 INFO RuntimeStats$ProgressTicker Stage in:1 Submitted:89 Active:10
In this case the Swift script consists of 100 independent apps emitted right away from a foreach loop.
As you can see above and in the log, the 100 apps become active at a 1-per-second rate.
Can anyone tell me whats throttling the app launches in this fashion?
I dont recall this happening elsewhere, so perhaps its an artifact of the SGE provider.
The sites.xml file is below.
Thanks,
- Mike
<config>
<pool handle="localhost">
<execution provider="local"/>
<filesystem provider="local"/>
<workdirectory>{env.HOME}/swiftwork</workdirectory>
</pool>
<pool handle="cluster">
<execution provider="coaster" jobmanager="local:sge"/>
<!-- Set partition and account here: -->
<profile namespace="globus" key="queue">sec1all.q</profile> -->
<profile namespace="globus" key="pe">sec1_all</profile> -->
<profile namespace="globus" key="ppn">1</profile>
<!-- <profile namespace="globus" key="project">pi-wilde</profile> -->
<!-- Set number of jobs and nodes per job here: -->
<profile namespace="globus" key="slots">320</profile>
<profile namespace="globus" key="maxnodes">1</profile>
<profile namespace="globus" key="nodegranularity">1</profile>
<profile namespace="globus" key="jobsPerNode">1</profile> <!-- apps per node! -->
<profile namespace="karajan" key="jobThrottle">4.00</profile> <!-- eg .11 -> 12 -->
<!-- Set estimated app time (maxwalltime) and requested job time (maxtime) here: -->
<profile namespace="globus" key="maxWalltime">00:10:00</profile>
<profile namespace="globus" key="maxtime">200000</profile> <!-- in seconds! -->
<!-- Set data staging model and work dir here: -->
<filesystem provider="local"/>
<workdirectory>/tmp/wilde/swiftwork</workdirectory>
<!-- Typically leave these constant: -->
<!-- <profile namespace="globus" key="slurm.exclusive">false</profile> -->
<profile namespace="globus" key="highOverAllocation">100</profile>
<profile namespace="globus" key="lowOverAllocation">100</profile>
<profile namespace="karajan" key="initialScore">10000</profile>
</pool>
</config>
More information about the Swift-devel
mailing list