[Swift-devel] Re: high load on tg-grid1

Michael Wilde wilde at mcs.anl.gov
Mon Nov 5 22:38:37 CST 2007


I ran it with the options in the swift.properties file of that log dir 
run153.

Ive been using these for a bit, you'll need to check there what the 
settings were.

If you suggest new ones I'll try them now when I set up a clustered run.

Any suggestion on clustering size?

Also: this, unlike previous runs, is running a dummy (sleep) angle job.
What should I set that simulated run time to?  The real angle run time 
is O(60 seconds).  Want it "real" or "faster"?

- Mike


On 11/5/07 10:04 PM, Ben Clifford wrote:
> Did you run this with swift default throttling? If so, I'm interested to 
> see the swift site scores.
> 
> On Mon, 5 Nov 2007, Michael Wilde wrote:
> 
>> Joe, I started a workflow with 1000 jobs - most likely thats what caused this.
>> I need to check the throttles on this workflow - its possible they were open
>> too wide.
>>
>> Another possibility - not sure if this was cause or effect - was that I got
>> hundreds of messages from PBS (job aborted messages) of the form that I
>> reported to help at tg yesterday.
>>
>> Im about to investigate the logs, but all my jobs are out of the queue now,
>> and the workflow has completed.
>>
>> (Ben: I'll be filing the log momentarily after I do an initial check of it. Of
>> 1000 jobs I got about 533 result datasets returned. This was w/o clustering).
>> I got 396 emails from PBS.
>>
>> - Mike
>>
>> (Ti: responding to tg-support as thats where Joe sent this...)
>>
>> On 11/5/07 9:15 PM, joseph insley wrote:
>>> I'm not sure what was causing this, but the load on tg-grid1 spiked at over
>>> 200 a short while ago.  It's coming back down now, but while it was high I
>>> tried to submit a job through GRAM (pre-WS) and after a long wait I got the
>>> error "GRAM Job submission failed because an I/O operation failed (error
>>> code 3)"
>>>
>>> At the time there were a number of globus-job-manager processes belonging to
>>> Mike Wilde, but only on the order of ~30something.. it doesn't seem like
>>> this should cause such a high load, so I don't know what was up...
>>>
>>> joe.
>>>
>>> ===================================================
>>> joseph a. insley
>>> insley at mcs.anl.gov
>>> mathematics & computer science division       (630) 252-5649
>>> argonne national laboratory                               (630) 252-5986
>>> (fax)
>>>
>>>
>>>
>>
> 
> 



More information about the Swift-devel mailing list