on coaster accounting (was Re: [Swift-devel] current workers < 0 ?)

Allan Espinosa aespinosa at cs.uchicago.edu
Thu Feb 26 12:44:34 CST 2009


Here i reverted to the 1 coaster per node configuration:  Here is the
content of the LRM :

JOBID     JOBNAME    USERNAME      STATE   CORE  REMAINING  STARTTIME
================================================================================561497
   data       tg802895      Running 16     00:21:26  Thu Feb 26
12:36:45
561498    data       tg802895      Running 16     00:21:26  Thu Feb 26 12:36:45
561499    data       tg802895      Running 16     00:21:26  Thu Feb 26 12:36:45
....
....
...
561547    data       tg802895      Running 16     00:23:42  Thu Feb 26 12:39:01

    50 active jobs :   50 of 3896 hosts (  1.28 %)


Total jobs: 50    Active Jobs: 50    Waiting Jobs: 0     Dep/Unsched Jobs: 0

Here is the current workers:

2009-02-26 12:38:50,412-0600 INFO  WorkerManager Current workers: 111
2009-02-26 12:38:50,412-0600 INFO  CoasterQueueProcessor Coaster
queue: [org.glo2009-02-26 12:38:50,413-0600 INFO  WorkerManager Ready:
0 {}
2009-02-26 12:38:50,413-0600 INFO  WorkerManager Busy: 0
[Worker[-1480006551], W2009-02-26 12:38:50,413-0600 INFO
WorkerManager Requested: 61 {2109491608=Worke2009-02-26
12:38:50,414-0600 INFO  WorkerManager Starting: 32
[Task(type=JOB_SUB2009-02-26 12:38:50,414-0600 INFO  WorkerManager
Ids: 13 {1104104218=Worker[11042009-02-26 12:38:50,414-0600 INFO
WorkerManager AllocationR: [org.globus.cog.ab



On Wed, Feb 25, 2009 at 11:27 PM, Mihael Hategan <hategan at mcs.anl.gov> wrote:
> I suspect the issue was introduced by the addition of multiple coasters
> per node. The manager expects one worker, but gets 16 instead.
>
> On Wed, 2009-02-25 at 22:29 -0600, Allan Espinosa wrote:
>> It still has the same issues.  It subtracts too much when a task if finished.
>>
>> Also, observing the LRM queue, i see swift  creating 18-20 "make
>> coaster" requests (4 at start then 16-18 after 5 mins).  with a 16
>> coastersPerNode you get a 320 processor allocation.  this more than
>> MAX_WORKERS~256 and the max score possible from my sites.xml (102 max)
>
> Regarding MAX_WORKERS, that probably suffers from the same problem, in
> that it may request less than 256 workers, but given that each request
> means 16 workers, the end result may be different than what's expected.
>
> However, MAX_WORKERS was introduced merely to limit damage in case the
> code is bad and it doesn't otherwise put an upper bound on the limit of
> worker requests (/jobs in the queue).



More information about the Swift-devel mailing list