[Swift-devel] Intentional change to behavior of high/lowOverAllocation parameter?

Michael Wilde wilde at mcs.anl.gov
Wed Sep 14 18:10:32 CDT 2011


Mihael, two logs that show the "job never fits in a block" behavior with the overallocations both set to 0.999 are on the CI net at:

$ grep 0.999 *.log
catsn-20110914-1304-7mgf77k8.log:       lowOverallocation = 0.999
catsn-20110914-1304-7mgf77k8.log:       highOverallocation = 0.999
catsn-20110914-1326-6ma0ple4.log:       lowOverallocation = 0.999
catsn-20110914-1326-6ma0ple4.log:       highOverallocation = 0.999

The script is a single catsn job; the older log is with no maxwalltime specified; the more recent log is for a maxwalltime of 30 secons specified in sites.xml:

<pool handle="localhost">
  <execution jobmanager="local:pbs" provider="coaster" url="none"/>
  <profile namespace="globus" key="maxtime">3600</profile>
  <profile namespace="globus" key="maxwalltime">00:00:30</profile>
  <profile namespace="globus" key="jobsPerNode">6</profile>
  <!-- <profile namespace="globus" key="workersPerNode">1</profile> -->
  <profile namespace="globus" key="slots">6</profile>
  <profile namespace="globus"  key="nodeGranularity">1</profile>
  <profile namespace="globus" key="maxNodes">1</profile>
  <profile namespace="globus" key="queue">shared</profile>
  <profile namespace="karajan" key="jobThrottle">5.99</profile>
  <profile namespace="karajan" key="initialScore">10000</profile>
  <profile namespace="globus" key="project">parvis</profile>
  <profile namespace="globus"  key="lowOverAllocation">0.999</profile>
  <profile namespace="globus"  key="highOverAllocation">0.999</profile>
  <filesystem provider="local"/>
  <workdirectory>/home/wilde/amwg/run01</workdirectory>
</pool>



----- Original Message -----
> From: "Michael Wilde" <wilde at mcs.anl.gov>
> To: "Mihael Hategan" <hategan at mcs.anl.gov>
> Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> Sent: Wednesday, September 14, 2011 5:57:42 PM
> Subject: [Swift-devel] Intentional change to behavior of high/lowOverAllocation parameter?
> Mihael,
> 
> For quite a while now I have been telling people to set the coaster
> parameters lowOverAllocation and highOverAllocation to 100 to force
> coasters to set the time allocation of every block to maxTime. (This
> was based on your advice around the time we started running on Beagle,
> to enable the user to force a specific and constant PBS job walltime).
> 
> Cog rev 3225 seems to have introduced a change that insists that these
> two parameters have a value < 1.0:
> + checkLessThan("lowOverallocation", 1);
> + checkLessThan("highOverallocation", 1);
> 
> If not, the job fails with an exception:
> 
> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> lowOverallocation must be < 1.0 (currently 100.0)
> 
> In addition, when I *do* set both parameters to 0.9999 (to try to
> achieve the same effect as they gave before) then I encounter the
> phenomenon of coasters starting with a ~10 minute walltime, but my app
> job doesnt seem to "fit" into the coaster block, and hence Swift just
> idles making no progress. If I try to reduce my app maxwalltime, then
> walltime of the PBS job is lowered, but the app job still doesnt "fit"
> and never gets run. I can send a log if the cause for this not obvious
> to you.
> 
> Can you explain what the intended behavior is here, and whether you
> think the new check (circa Aug 7) introduced a bug?
> 
> Thanks,
> 
> - Mike
> 
> 
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list