[Swift-devel] Re: high load on tg-grid1

Michael Wilde wilde at mcs.anl.gov
Mon Nov 5 23:02:53 CST 2007


The job throttles were all set to off - thats the way I had set them for 
Falkon and I forgot to change them for PBS.

The data throttles were set to the defaults.

I'll start the next run with clustering with all throttles set to 
default, unless you suggest different (in time)

Note that my swift.properties is a subset of the full file (I only 
include ones I plan to mess with).

- Mike


On 11/5/07 10:04 PM, Ben Clifford wrote:
> Did you run this with swift default throttling? If so, I'm interested to 
> see the swift site scores.
> 
> On Mon, 5 Nov 2007, Michael Wilde wrote:
> 
>> Joe, I started a workflow with 1000 jobs - most likely thats what caused this.
>> I need to check the throttles on this workflow - its possible they were open
>> too wide.
>>
>> Another possibility - not sure if this was cause or effect - was that I got
>> hundreds of messages from PBS (job aborted messages) of the form that I
>> reported to help at tg yesterday.
>>
>> Im about to investigate the logs, but all my jobs are out of the queue now,
>> and the workflow has completed.
>>
>> (Ben: I'll be filing the log momentarily after I do an initial check of it. Of
>> 1000 jobs I got about 533 result datasets returned. This was w/o clustering).
>> I got 396 emails from PBS.
>>
>> - Mike
>>
>> (Ti: responding to tg-support as thats where Joe sent this...)
>>
>> On 11/5/07 9:15 PM, joseph insley wrote:
>>> I'm not sure what was causing this, but the load on tg-grid1 spiked at over
>>> 200 a short while ago.  It's coming back down now, but while it was high I
>>> tried to submit a job through GRAM (pre-WS) and after a long wait I got the
>>> error "GRAM Job submission failed because an I/O operation failed (error
>>> code 3)"
>>>
>>> At the time there were a number of globus-job-manager processes belonging to
>>> Mike Wilde, but only on the order of ~30something.. it doesn't seem like
>>> this should cause such a high load, so I don't know what was up...
>>>
>>> joe.
>>>
>>> ===================================================
>>> joseph a. insley
>>> insley at mcs.anl.gov
>>> mathematics & computer science division       (630) 252-5649
>>> argonne national laboratory                               (630) 252-5986
>>> (fax)
>>>
>>>
>>>
>>
> 
> 



More information about the Swift-devel mailing list