[Swift-devel] Need precise throttle on local provider

Michael Wilde wilde at mcs.anl.gov
Tue Sep 28 20:04:26 CDT 2010


I read a bit more on this in the User Guide property section. I tried throttle.host.submit=1 (along with the jobThrottle element in sites.xml) but this did not solve the problem.

The documentation on the throttle confused me, as Ive always thought that the throttle behavior was to run (n * 100)+1 job, but the doc says +2.

Regardless, I implemented a workaround with some simple sh mutex, so this is low prio for now, and I'll probably try the provider enhancement route rather than counting on throttle behavior for concurrency control.

- Mike



----- wilde at mcs.anl.gov wrote:

> Mihael,
> 
> I have the need (for the Swift R interface) to either throttle the
> local execution provider to run *exactly* one job at a time, or to
> enhance the provider to set a SWIFT_JOB_SLOT env var to a value that
> signifies a virtual "slot number" for N concurrent jobs being run by
> the provider.
> 
> I use this env var to associate Swift jobs with persistent R
> evaluation servers that need to run serially: they can handle only one
> job at a time.
> 
> Ive modified the coaster worker.pl script to do this and it works very
> well.
> 
> I'm now trying to get the same behavior from the local execution
> provider, and rather than tackle inserting this into the Java provider
> code, I tried the shortcut of configuring a small set of local
> provider pool entries, each with the throttle set to what I *thought*
> would guarantee me no more than one job at at time running on each
> "pool":
> 
>     <profile key="jobThrottle" namespace="karajan">-0.001</profile>
>     <profile namespace="karajan" key="initialScore">10000</profile>
> 
> I thought that the correct value for jobThrottle would be 0.0 to
> ensure 1 job, but from experimentation I found that I needed to set it
> to a slightly negative value, as above (-0.001).
> 
> But it seems like even this is not sufficient: under heavy load, Im
> seeing a second job start on the same pool before the prior job has
> completed (I use "mkdir" as a pseudo-mutex, and Im running on a local
> filesystem under /tmp).
> 
> So, my first question is: Is there some set of throttling or other
> sites.xml entries that will ensure <= 1 job per local provider pool?
> 
> Second question: If you can point me to the right place, Justin or I
> could do this the "right" way by modifying the local execution
> provider set set "SLOT" numbers.  I initially thought the current hack
> would be easier, and it seemed to work under standalone testing, but
> seems to be failing now in the live setting.
> 
> Thanks,
> 
> Mike
> 
> 
> 
> 
> -- 
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list