[Swift-user] Coaster jobs are not running with expected parallelism

Mihael Hategan hategan at mcs.anl.gov
Tue Jan 19 13:55:33 CST 2010


On Tue, 2010-01-19 at 13:49 -0600, Michael Wilde wrote:
> 
> On 1/19/10 1:44 PM, Mihael Hategan wrote:
> > On Tue, 2010-01-19 at 13:38 -0600, Michael Wilde wrote:
> >> On 1/19/10 1:32 PM, Mihael Hategan wrote:
> >>> Maybe PBS is lying about that 18 node job. 
> >> I would be surprised if thats the case. But even if it had *1* node you 
> >> would think it would run at least 8 jobs in parallel.
> > 
> > I see. Though not with your current setup. You should use
> > "workersPerNode" instead of "coastersPerNode".
> 
> Thanks!  I'll fix that and try again. This makes more sense now, if its 
> assuming 1 worker per node.
> 
> Still doesnt explain why its not starting more jobs, since it allocated 
> abundant nodes (even assuming 1 worker per node).

Trunk or branch?

> 
> 
> > 
> >> Im confused why it has started three jobs, two with only one core and 
> >> one with 18 nodes.
> > 
> > It does that. It spreads out the block sizes to exploit non-linearities
> > in queuing times.
> > 
> >> But the 18 node job just hit its wall time limit; now coasters seems to 
> >> have started a 10 node job:
> > 
> > Don't know about that. Logs please.
> > 
> 
> Here's the logs from that dir for this run. I dont understand why the 
> coasters.log file in that directory has not been written to since Jan 13.

If you run swift on the head node and the coaster bootstrap provider is
"local", then the coaster service runs in the same jvm as swift, and it
writes to the same log as swift.

> 
> login2$ more *0119-090116*

[...]

Seems fine so far. Swift log then.




More information about the Swift-user mailing list