[Swift-devel] Try coaster on BG/P ?

Michael Wilde wilde at mcs.anl.gov
Thu Jun 19 18:56:50 CDT 2008


Cool - very helpful info.  It will still be 1-2 weeks before I can try 
this. Perhaps Zhao can try it sooner.

Re:
 > The basic pattern is as long as there are more jobs ready to be run than
 > there are workers, then more workers will be started

This is where, as you suggested earlier, for BGP-like-systems would be 
best if this was static and disabled.

What Ioan did in Falkon when he went to the multiple-server architecture 
is relevant here: the client load-shares among all the servers, 
round-robin, only sending a job to a server when it knows that the 
server has a free cpu slot. In this way, no queues build up on the 
servers, and it avoids having a job wait in any server's queue when a 
free cpu might be available on some other server.

- Mike


On 6/19/08 6:45 PM, Ben Clifford wrote:
> more notes on running this on a normal site:
> 
> Set the jobThrottle parameter for a site based on the mechanism used to 
> submit coasters - that is, for a site that is submitting through GRAM2, 
> set the throttle to 0.2, which will limit you to 20 jobs at once, and 
> likely cause just under 20 coaster workers to run. When GRAM4 works, 
> should be able to use a jobThrottle of 4 and get 400 workers to run (at 
> least as far as the coaster-submission side of things).
> 
> I have done any tests (though Mihael might have) about how many coaster 
> workers can run sensibly at once - most of my testing has been poking 
> round at the low end of things.
> 
> When you start running jobs, the timings will look a bit different - 
> you'll see a much longer delay than usual when the first job goes into 
> execute state, as this is when the coaster master starts up on the remote 
> site and submits a worker into the queue But once workers start actually 
> executing, subsequent job executions will be faster (in as much as there 
> should be no GRAM latency and no LRM latency).
> 
> The basic pattern is as long as there are more jobs ready to be run than 
> there are workers, then more workers will be started (but will be subject 
> to LRM delays in starting up).
> 



More information about the Swift-devel mailing list