on coaster accounting (was Re: [Swift-devel] current workers < 0 ?)
Mihael Hategan
hategan at mcs.anl.gov
Thu Feb 26 13:03:18 CST 2009
There are 50 running workers and 61 somewhere between being submitted
and contacting the service. What's the question?
On Thu, 2009-02-26 at 12:44 -0600, Allan Espinosa wrote:
> Here i reverted to the 1 coaster per node configuration: Here is the
> content of the LRM :
>
> JOBID JOBNAME USERNAME STATE CORE REMAINING STARTTIME
> ================================================================================561497
> data tg802895 Running 16 00:21:26 Thu Feb 26
> 12:36:45
> 561498 data tg802895 Running 16 00:21:26 Thu Feb 26 12:36:45
> 561499 data tg802895 Running 16 00:21:26 Thu Feb 26 12:36:45
> ....
> ....
> ...
> 561547 data tg802895 Running 16 00:23:42 Thu Feb 26 12:39:01
>
> 50 active jobs : 50 of 3896 hosts ( 1.28 %)
>
>
> Total jobs: 50 Active Jobs: 50 Waiting Jobs: 0 Dep/Unsched Jobs: 0
>
> Here is the current workers:
>
> 2009-02-26 12:38:50,412-0600 INFO WorkerManager Current workers: 111
> 2009-02-26 12:38:50,412-0600 INFO CoasterQueueProcessor Coaster
> queue: [org.glo2009-02-26 12:38:50,413-0600 INFO WorkerManager Ready:
> 0 {}
> 2009-02-26 12:38:50,413-0600 INFO WorkerManager Busy: 0
> [Worker[-1480006551], W2009-02-26 12:38:50,413-0600 INFO
> WorkerManager Requested: 61 {2109491608=Worke2009-02-26
> 12:38:50,414-0600 INFO WorkerManager Starting: 32
> [Task(type=JOB_SUB2009-02-26 12:38:50,414-0600 INFO WorkerManager
> Ids: 13 {1104104218=Worker[11042009-02-26 12:38:50,414-0600 INFO
> WorkerManager AllocationR: [org.globus.cog.ab
>
>
>
> On Wed, Feb 25, 2009 at 11:27 PM, Mihael Hategan <hategan at mcs.anl.gov> wrote:
> > I suspect the issue was introduced by the addition of multiple coasters
> > per node. The manager expects one worker, but gets 16 instead.
> >
> > On Wed, 2009-02-25 at 22:29 -0600, Allan Espinosa wrote:
> >> It still has the same issues. It subtracts too much when a task if finished.
> >>
> >> Also, observing the LRM queue, i see swift creating 18-20 "make
> >> coaster" requests (4 at start then 16-18 after 5 mins). with a 16
> >> coastersPerNode you get a 320 processor allocation. this more than
> >> MAX_WORKERS~256 and the max score possible from my sites.xml (102 max)
> >
> > Regarding MAX_WORKERS, that probably suffers from the same problem, in
> > that it may request less than 256 workers, but given that each request
> > means 16 workers, the end result may be different than what's expected.
> >
> > However, MAX_WORKERS was introduced merely to limit damage in case the
> > code is bad and it doesn't otherwise put an upper bound on the limit of
> > worker requests (/jobs in the queue).
More information about the Swift-devel
mailing list