on coaster accounting (was Re: [Swift-devel] current workers < 0 ?)

Mihael Hategan hategan at mcs.anl.gov
Thu Feb 26 13:03:18 CST 2009


There are 50 running workers and 61 somewhere between being submitted
and contacting the service. What's the question?

On Thu, 2009-02-26 at 12:44 -0600, Allan Espinosa wrote:
> Here i reverted to the 1 coaster per node configuration:  Here is the
> content of the LRM :
> 
> JOBID     JOBNAME    USERNAME      STATE   CORE  REMAINING  STARTTIME
> ================================================================================561497
>    data       tg802895      Running 16     00:21:26  Thu Feb 26
> 12:36:45
> 561498    data       tg802895      Running 16     00:21:26  Thu Feb 26 12:36:45
> 561499    data       tg802895      Running 16     00:21:26  Thu Feb 26 12:36:45
> ....
> ....
> ...
> 561547    data       tg802895      Running 16     00:23:42  Thu Feb 26 12:39:01
> 
>     50 active jobs :   50 of 3896 hosts (  1.28 %)
> 
> 
> Total jobs: 50    Active Jobs: 50    Waiting Jobs: 0     Dep/Unsched Jobs: 0
> 
> Here is the current workers:
> 
> 2009-02-26 12:38:50,412-0600 INFO  WorkerManager Current workers: 111
> 2009-02-26 12:38:50,412-0600 INFO  CoasterQueueProcessor Coaster
> queue: [org.glo2009-02-26 12:38:50,413-0600 INFO  WorkerManager Ready:
> 0 {}
> 2009-02-26 12:38:50,413-0600 INFO  WorkerManager Busy: 0
> [Worker[-1480006551], W2009-02-26 12:38:50,413-0600 INFO
> WorkerManager Requested: 61 {2109491608=Worke2009-02-26
> 12:38:50,414-0600 INFO  WorkerManager Starting: 32
> [Task(type=JOB_SUB2009-02-26 12:38:50,414-0600 INFO  WorkerManager
> Ids: 13 {1104104218=Worker[11042009-02-26 12:38:50,414-0600 INFO
> WorkerManager AllocationR: [org.globus.cog.ab
> 
> 
> 
> On Wed, Feb 25, 2009 at 11:27 PM, Mihael Hategan <hategan at mcs.anl.gov> wrote:
> > I suspect the issue was introduced by the addition of multiple coasters
> > per node. The manager expects one worker, but gets 16 instead.
> >
> > On Wed, 2009-02-25 at 22:29 -0600, Allan Espinosa wrote:
> >> It still has the same issues.  It subtracts too much when a task if finished.
> >>
> >> Also, observing the LRM queue, i see swift  creating 18-20 "make
> >> coaster" requests (4 at start then 16-18 after 5 mins).  with a 16
> >> coastersPerNode you get a 320 processor allocation.  this more than
> >> MAX_WORKERS~256 and the max score possible from my sites.xml (102 max)
> >
> > Regarding MAX_WORKERS, that probably suffers from the same problem, in
> > that it may request less than 256 workers, but given that each request
> > means 16 workers, the end result may be different than what's expected.
> >
> > However, MAX_WORKERS was introduced merely to limit damage in case the
> > code is bad and it doesn't otherwise put an upper bound on the limit of
> > worker requests (/jobs in the queue).




More information about the Swift-devel mailing list