[Swift-user] Tuning parameters of coaster execution

Tue Oct 20 10:55:47 CDT 2009

On Mon, 2009-10-19 at 23:35 -0400, Andriy Fedorov wrote:
> Hi,
> 
> I am trying to understand how to set correctly the coaster-related
> parameters to optimize execution of my workflow. A single task I have
> takes around 1-2 minutes. I set maxWalltime to 2 minutes, and there 40
> of these tasks in my toy workflow. Coasters are configured as
> gt2:gt2:pbs. When I run it with the default parameters, the workflow
> completes (this is great!).
> 
> Now I am trying to understand what's going on and how to improve the
> performance. Looking at the scheduler queue, I see that two jobs are
> submitted in the beginning of the execution for 18 min each, one with
> 1 node, and one with 2 nodes. All of the execution is happening in
> these two jobs (the number of jobs submitted is just two, for 40 taks,
> so looks like things work). First question: why does it happen this
> way? (two jobs, 18 minutes each, specific node allocation) I assume
> only one of them (2-node) is executing worker tasks, but in this case
> why allocation time is 18 minutes, not 20 (each worker walltime is 2
> min)?
> 
> Second question: how do I make coaster to request more nodes? I tried
> to increase nodeGranularity to 10. This resulted in only one (!) job
> with 10 nodes and 20 min walltime showing up on the scheduler. But it
> looks like the jobs are still executed 2 at a time!

You need a more recent version of the code.

A few weeks ago the "parallelism" option was added. By default it's set
to try to allocate as many nodes as there are jobs (parallelism=0.0),
whereas the behavior you see would have parallelism=1.0. I should change
the way the numbers are specified. It's not exactly intuitive unless you
look at how it works.

Anyway, it boils down to the notion of job size and block size. The
block size is defined as workers*bwalltime^parallelism, while the job
size is jwalltime^parallelism. At any given time you can fit roughly
workers*bwalltime^parallelism/jwalltime^parallelism jobs in a block.

You can see that with parallelism=0, that reduces to
workers/count(jobs).

Conversely, with parallelism=1 the jobs size is jwalltime and if your
block had bwalltime you could fit workers*bwalltime/jwalltime jobs in
it.

At the same time, bwalltime is controlled by the overallocation factors.
Once the block walltime is decided, the width (number of workers) is
picked based on the job sizes that need to be fit (according to the
above scheme).

Anyway, to sum it up, use a more recent version.