on coaster accounting (was Re: [Swift-devel] current workers < 0 ?)
Allan Espinosa
aespinosa at cs.uchicago.edu
Thu Feb 26 12:44:34 CST 2009
Here i reverted to the 1 coaster per node configuration: Here is the
content of the LRM :
JOBID JOBNAME USERNAME STATE CORE REMAINING STARTTIME
================================================================================561497
data tg802895 Running 16 00:21:26 Thu Feb 26
12:36:45
561498 data tg802895 Running 16 00:21:26 Thu Feb 26 12:36:45
561499 data tg802895 Running 16 00:21:26 Thu Feb 26 12:36:45
....
....
...
561547 data tg802895 Running 16 00:23:42 Thu Feb 26 12:39:01
50 active jobs : 50 of 3896 hosts ( 1.28 %)
Total jobs: 50 Active Jobs: 50 Waiting Jobs: 0 Dep/Unsched Jobs: 0
Here is the current workers:
2009-02-26 12:38:50,412-0600 INFO WorkerManager Current workers: 111
2009-02-26 12:38:50,412-0600 INFO CoasterQueueProcessor Coaster
queue: [org.glo2009-02-26 12:38:50,413-0600 INFO WorkerManager Ready:
0 {}
2009-02-26 12:38:50,413-0600 INFO WorkerManager Busy: 0
[Worker[-1480006551], W2009-02-26 12:38:50,413-0600 INFO
WorkerManager Requested: 61 {2109491608=Worke2009-02-26
12:38:50,414-0600 INFO WorkerManager Starting: 32
[Task(type=JOB_SUB2009-02-26 12:38:50,414-0600 INFO WorkerManager
Ids: 13 {1104104218=Worker[11042009-02-26 12:38:50,414-0600 INFO
WorkerManager AllocationR: [org.globus.cog.ab
On Wed, Feb 25, 2009 at 11:27 PM, Mihael Hategan <hategan at mcs.anl.gov> wrote:
> I suspect the issue was introduced by the addition of multiple coasters
> per node. The manager expects one worker, but gets 16 instead.
>
> On Wed, 2009-02-25 at 22:29 -0600, Allan Espinosa wrote:
>> It still has the same issues. It subtracts too much when a task if finished.
>>
>> Also, observing the LRM queue, i see swift creating 18-20 "make
>> coaster" requests (4 at start then 16-18 after 5 mins). with a 16
>> coastersPerNode you get a 320 processor allocation. this more than
>> MAX_WORKERS~256 and the max score possible from my sites.xml (102 max)
>
> Regarding MAX_WORKERS, that probably suffers from the same problem, in
> that it may request less than 256 workers, but given that each request
> means 16 workers, the end result may be different than what's expected.
>
> However, MAX_WORKERS was introduced merely to limit damage in case the
> code is bad and it doesn't otherwise put an upper bound on the limit of
> worker requests (/jobs in the queue).
More information about the Swift-devel
mailing list