[Swift-devel] Re: Need precise throttle on local provider

Mihael Hategan hategan at mcs.anl.gov
Tue Sep 28 22:16:26 CDT 2010


On Tue, 2010-09-28 at 18:29 -0600, wilde at mcs.anl.gov wrote:
> Mihael,
> 
> I have the need (for the Swift R interface) to either throttle the
> local execution provider to run *exactly* one job at a time, or to
> enhance the provider to set a SWIFT_JOB_SLOT env var to a value that
> signifies a virtual "slot number" for N concurrent jobs being run by
> the provider.
> 
> I use this env var to associate Swift jobs with persistent R
> evaluation servers that need to run serially: they can handle only one
> job at a time.
> 
> Ive modified the coaster worker.pl script to do this and it works very well.
> 
> I'm now trying to get the same behavior from the local execution
> provider, and rather than tackle inserting this into the Java provider
> code, I tried the shortcut of configuring a small set of local
> provider pool entries, each with the throttle set to what I *thought*
> would guarantee me no more than one job at at time running on each
> "pool":
> 

Try <profile namespace="karajan" key="jobsPerCpu">1</profile>

>     <profile key="jobThrottle" namespace="karajan">-0.001</profile>
>     <profile namespace="karajan" key="initialScore">10000</profile>
> 
> I thought that the correct value for jobThrottle would be 0.0 to
> ensure 1 job, but from experimentation I found that I needed to set it
> to a slightly negative value, as above (-0.001).

Right. There's a +2 there somewhere in the formula.

> 
> But it seems like even this is not sufficient: under heavy load, Im
> seeing a second job start on the same pool before the prior job has
> completed (I use "mkdir" as a pseudo-mutex, and Im running on a local
> filesystem under /tmp).

Explain "seeing" in the above sentence. But before that, try jobsPerCpu.

> 
> So, my first question is: Is there some set of throttling or other
> sites.xml entries that will ensure <= 1 job per local provider pool?
> 
> Second question: If you can point me to the right place, Justin or I
> could do this the "right" way by modifying the local execution
> provider set set "SLOT" numbers.  I initially thought the current hack
> would be easier, and it seemed to work under standalone testing, but
> seems to be failing now in the live setting.

The right way, I would think, is to modify the relevant throttling
parameters for the scheduler for that site. That is, the local provider
should not have anything to do with this. Luckily there already is a
parameter to limit the number of concurrent jobs (and I mentioned it
before).

Mihael




More information about the Swift-devel mailing list