[Swift-devel] current workers < 0 ?
Allan Espinosa
aespinosa at cs.uchicago.edu
Wed Feb 25 22:29:14 CST 2009
It still has the same issues. It subtracts too much when a task if finished.
Also, observing the LRM queue, i see swift creating 18-20 "make
coaster" requests (4 at start then 16-18 after 5 mins). with a 16
coastersPerNode you get a 320 processor allocation. this more than
MAX_WORKERS~256 and the max score possible from my sites.xml (102 max)
<profile namespace="karajan" key="initialScore">1</profile>
<profile namespace="karajan" key="jobThrottle">1</profile>
2009-02-25 20:31:15,590-0600 INFO Worker Worker stderr: null
2009-02-25 20:31:15,590-0600 WARN WorkerManager Worker terminated:
Worker[-1909333457]
2009-02-25 20:31:15,590-0600 WARN Worker Worker 335457820 status
change: Completed
2009-02-25 20:31:15,590-0600 INFO Worker Worker stdout: Job You has completed.
Writing job STDOUT and STDERR to cache files.
Returning job success.
2009-02-25 20:31:15,590-0600 INFO Worker Worker stderr: null
2009-02-25 20:31:15,590-0600 WARN WorkerManager Worker terminated:
Worker[335457820]
******2009-02-25 20:31:15,742-0600 INFO WorkerManager Current workers: -32****
2009-02-25 20:31:15,745-0600 INFO WorkerManager Ready: {}
2009-02-25 20:31:15,745-0600 INFO WorkerManager Busy:
[Worker[-1260987422], Worker[2142641145], Worker[2053757208
2009-02-25 20:31:15,751-0600 INFO WorkerManager Requested:
{640597733=Worker[640597733], -692025578=Worker[-69202
2009-02-25 20:31:15,751-0600 INFO WorkerManager Starting:
[Task(type=JOB_SUBMISSION, identity=urn:1235615211813-1
2009-02-25 20:31:15,752-0600 INFO WorkerManager Ids:
{1078934147=Worker[1078934147], 264613139=Worker[264613139],
2009-02-25 20:31:15,753-0600 INFO WorkerManager AllocationR:
[org.globus.cog.abstraction.coaster.service.job.mana
2009-02-25 20:31:15,873-0600 INFO AbstractKarajanChannel SC-null REQ:
Handler(JOBSTATUS)
On Wed, Feb 25, 2009 at 9:39 PM, Mihael Hategan <hategan at mcs.anl.gov> wrote:
>
> ----- Mihael Hategan <hategan at mcs.anl.gov> wrote:
>>
>> ----- Allan Espinosa <aespinosa at cs.uchicago.edu> wrote:
>> > Ooops. I copy pasted the wrong line. It should be:
>> >
>> > 2009-02-25 15:33:14,665-0600 INFO WorkerManager Current workers: -110
>>
>> Heh. Yes. That increment should be synchronized. I guess I didn't bother
>> because it was only there for informal reasons.
>>
>
> cog r2306 should fix this. Let me know if it works or if I screwed up
> something else.
>
>
--
Allan M. Espinosa <http://allan.88-mph.net/blog>
PhD student, Computer Science
University of Chicago <http://people.cs.uchicago.edu/~aespinosa>
More information about the Swift-devel
mailing list