[Swift-devel] Intentional change to behavior of high/lowOverAllocation parameter?
Michael Wilde
wilde at mcs.anl.gov
Wed Sep 14 18:10:32 CDT 2011
Mihael, two logs that show the "job never fits in a block" behavior with the overallocations both set to 0.999 are on the CI net at:
$ grep 0.999 *.log
catsn-20110914-1304-7mgf77k8.log: lowOverallocation = 0.999
catsn-20110914-1304-7mgf77k8.log: highOverallocation = 0.999
catsn-20110914-1326-6ma0ple4.log: lowOverallocation = 0.999
catsn-20110914-1326-6ma0ple4.log: highOverallocation = 0.999
The script is a single catsn job; the older log is with no maxwalltime specified; the more recent log is for a maxwalltime of 30 secons specified in sites.xml:
<pool handle="localhost">
<execution jobmanager="local:pbs" provider="coaster" url="none"/>
<profile namespace="globus" key="maxtime">3600</profile>
<profile namespace="globus" key="maxwalltime">00:00:30</profile>
<profile namespace="globus" key="jobsPerNode">6</profile>
<!-- <profile namespace="globus" key="workersPerNode">1</profile> -->
<profile namespace="globus" key="slots">6</profile>
<profile namespace="globus" key="nodeGranularity">1</profile>
<profile namespace="globus" key="maxNodes">1</profile>
<profile namespace="globus" key="queue">shared</profile>
<profile namespace="karajan" key="jobThrottle">5.99</profile>
<profile namespace="karajan" key="initialScore">10000</profile>
<profile namespace="globus" key="project">parvis</profile>
<profile namespace="globus" key="lowOverAllocation">0.999</profile>
<profile namespace="globus" key="highOverAllocation">0.999</profile>
<filesystem provider="local"/>
<workdirectory>/home/wilde/amwg/run01</workdirectory>
</pool>
----- Original Message -----
> From: "Michael Wilde" <wilde at mcs.anl.gov>
> To: "Mihael Hategan" <hategan at mcs.anl.gov>
> Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> Sent: Wednesday, September 14, 2011 5:57:42 PM
> Subject: [Swift-devel] Intentional change to behavior of high/lowOverAllocation parameter?
> Mihael,
>
> For quite a while now I have been telling people to set the coaster
> parameters lowOverAllocation and highOverAllocation to 100 to force
> coasters to set the time allocation of every block to maxTime. (This
> was based on your advice around the time we started running on Beagle,
> to enable the user to force a specific and constant PBS job walltime).
>
> Cog rev 3225 seems to have introduced a change that insists that these
> two parameters have a value < 1.0:
> + checkLessThan("lowOverallocation", 1);
> + checkLessThan("highOverallocation", 1);
>
> If not, the job fails with an exception:
>
> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> lowOverallocation must be < 1.0 (currently 100.0)
>
> In addition, when I *do* set both parameters to 0.9999 (to try to
> achieve the same effect as they gave before) then I encounter the
> phenomenon of coasters starting with a ~10 minute walltime, but my app
> job doesnt seem to "fit" into the coaster block, and hence Swift just
> idles making no progress. If I try to reduce my app maxwalltime, then
> walltime of the PBS job is lowered, but the app job still doesnt "fit"
> and never gets run. I can send a log if the cause for this not obvious
> to you.
>
> Can you explain what the intended behavior is here, and whether you
> think the new check (circa Aug 7) introduced a bug?
>
> Thanks,
>
> - Mike
>
>
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
--
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory
More information about the Swift-devel
mailing list