[Swift-devel] Why does SGE provider only start one job per second?

Michael Wilde wilde at mcs.anl.gov
Tue Sep 10 10:31:40 CDT 2013


Running on the SGE systems orthros, when I configure a sites file for 256 single-core workers, I see the workers starting one per second. It seems that the SGE jobs themselves are only emitted one per second.

A log of a typical run is in:
  http://www.ci.uchicago.edu/~wilde/IndexStrain-20130909-2141-zq403y5f.log

It starts off like this:

2013-09-09 21:41:40,195-0500 INFO  RuntimeStats$ProgressTicker 
2013-09-09 21:41:43,078-0500 INFO  RuntimeStats$ProgressTicker   Stage in:1  Submitted:99
2013-09-09 21:41:44,428-0500 INFO  RuntimeStats$ProgressTicker   Stage in:1  Submitted:98  Active:1
2013-09-09 21:41:45,476-0500 INFO  RuntimeStats$ProgressTicker   Stage in:1  Submitted:97  Active:2
2013-09-09 21:41:46,500-0500 INFO  RuntimeStats$ProgressTicker   Stage in:1  Submitted:96  Active:3
2013-09-09 21:41:47,814-0500 INFO  RuntimeStats$ProgressTicker   Stage in:1  Submitted:95  Active:4
2013-09-09 21:41:49,643-0500 INFO  RuntimeStats$ProgressTicker   Stage in:1  Submitted:93  Active:6
2013-09-09 21:41:50,686-0500 INFO  RuntimeStats$ProgressTicker   Stage in:1  Submitted:92  Active:7
2013-09-09 21:41:51,704-0500 INFO  RuntimeStats$ProgressTicker   Stage in:1  Submitted:91  Active:8
2013-09-09 21:41:52,772-0500 INFO  RuntimeStats$ProgressTicker   Stage in:1  Submitted:90  Active:9
2013-09-09 21:41:53,807-0500 INFO  RuntimeStats$ProgressTicker   Stage in:1  Submitted:89  Active:10

In this case the Swift script consists of 100 independent apps emitted right away from a foreach loop.

As you can see above and in the log, the 100 apps become active at a 1-per-second rate.

Can anyone tell me whats throttling the app launches in this fashion?

I dont recall this happening elsewhere, so perhaps its an artifact of the SGE provider.

The sites.xml file is below.

Thanks,

- Mike


<config>

 <pool handle="localhost">
   <execution provider="local"/>
   <filesystem provider="local"/>
   <workdirectory>{env.HOME}/swiftwork</workdirectory>
 </pool>

 <pool handle="cluster">
   <execution provider="coaster" jobmanager="local:sge"/>

   <!-- Set partition and account here: -->
   <profile namespace="globus" key="queue">sec1all.q</profile> -->
   <profile namespace="globus" key="pe">sec1_all</profile> -->
   <profile namespace="globus" key="ppn">1</profile>
   <!-- <profile namespace="globus" key="project">pi-wilde</profile> -->

   <!-- Set number of jobs and nodes per job here: -->
   <profile namespace="globus" key="slots">320</profile>
   <profile namespace="globus" key="maxnodes">1</profile>
   <profile namespace="globus" key="nodegranularity">1</profile>
   <profile namespace="globus" key="jobsPerNode">1</profile> <!-- apps per node! -->
   <profile namespace="karajan" key="jobThrottle">4.00</profile> <!-- eg .11 -> 12 -->

   <!-- Set estimated app time (maxwalltime) and requested job time (maxtime) here: -->
   <profile namespace="globus" key="maxWalltime">00:10:00</profile>
   <profile namespace="globus" key="maxtime">200000</profile>  <!-- in seconds! -->

   <!-- Set data staging model and work dir here: -->
   <filesystem provider="local"/>
   <workdirectory>/tmp/wilde/swiftwork</workdirectory>

   <!-- Typically leave these constant: -->
   <!-- <profile namespace="globus" key="slurm.exclusive">false</profile> -->
   <profile namespace="globus" key="highOverAllocation">100</profile>
   <profile namespace="globus" key="lowOverAllocation">100</profile>
   <profile namespace="karajan" key="initialScore">10000</profile>
 </pool>

</config>



More information about the Swift-devel mailing list