[Swift-devel] current workers < 0 ?

Allan Espinosa aespinosa at cs.uchicago.edu
Wed Feb 25 22:29:14 CST 2009


It still has the same issues.  It subtracts too much when a task if finished.

Also, observing the LRM queue, i see swift  creating 18-20 "make
coaster" requests (4 at start then 16-18 after 5 mins).  with a 16
coastersPerNode you get a 320 processor allocation.  this more than
MAX_WORKERS~256 and the max score possible from my sites.xml (102 max)

    <profile namespace="karajan" key="initialScore">1</profile>
    <profile namespace="karajan" key="jobThrottle">1</profile>


2009-02-25 20:31:15,590-0600 INFO  Worker Worker stderr: null
2009-02-25 20:31:15,590-0600 WARN  WorkerManager Worker terminated:
Worker[-1909333457]
2009-02-25 20:31:15,590-0600 WARN  Worker Worker 335457820 status
change: Completed
2009-02-25 20:31:15,590-0600 INFO  Worker Worker stdout: Job You has completed.
Writing job STDOUT and STDERR to cache files.
Returning job success.

2009-02-25 20:31:15,590-0600 INFO  Worker Worker stderr: null
2009-02-25 20:31:15,590-0600 WARN  WorkerManager Worker terminated:
Worker[335457820]
******2009-02-25 20:31:15,742-0600 INFO  WorkerManager Current workers: -32****
2009-02-25 20:31:15,745-0600 INFO  WorkerManager Ready: {}
2009-02-25 20:31:15,745-0600 INFO  WorkerManager Busy:
[Worker[-1260987422], Worker[2142641145], Worker[2053757208
2009-02-25 20:31:15,751-0600 INFO  WorkerManager Requested:
{640597733=Worker[640597733], -692025578=Worker[-69202
2009-02-25 20:31:15,751-0600 INFO  WorkerManager Starting:
[Task(type=JOB_SUBMISSION, identity=urn:1235615211813-1
2009-02-25 20:31:15,752-0600 INFO  WorkerManager Ids:
{1078934147=Worker[1078934147], 264613139=Worker[264613139],
2009-02-25 20:31:15,753-0600 INFO  WorkerManager AllocationR:
[org.globus.cog.abstraction.coaster.service.job.mana
2009-02-25 20:31:15,873-0600 INFO  AbstractKarajanChannel SC-null REQ:
Handler(JOBSTATUS)


On Wed, Feb 25, 2009 at 9:39 PM, Mihael Hategan <hategan at mcs.anl.gov> wrote:
>
> ----- Mihael Hategan <hategan at mcs.anl.gov> wrote:
>>
>> ----- Allan Espinosa <aespinosa at cs.uchicago.edu> wrote:
>> > Ooops. I copy pasted the wrong line.  It should be:
>> >
>> > 2009-02-25 15:33:14,665-0600 INFO  WorkerManager Current workers: -110
>>
>> Heh. Yes. That increment should be synchronized. I guess I didn't bother
>> because it was only there for informal reasons.
>>
>
> cog r2306 should fix this. Let me know if it works or if I screwed up
> something else.
>
>



-- 
Allan M. Espinosa <http://allan.88-mph.net/blog>
PhD student, Computer Science
University of Chicago <http://people.cs.uchicago.edu/~aespinosa>



More information about the Swift-devel mailing list