[Swift-devel] Re: high load on tg-grid1

Ben Clifford benc at hawaga.org.uk
Mon Nov 5 22:04:24 CST 2007


Did you run this with swift default throttling? If so, I'm interested to 
see the swift site scores.

On Mon, 5 Nov 2007, Michael Wilde wrote:

> Joe, I started a workflow with 1000 jobs - most likely thats what caused this.
> I need to check the throttles on this workflow - its possible they were open
> too wide.
> 
> Another possibility - not sure if this was cause or effect - was that I got
> hundreds of messages from PBS (job aborted messages) of the form that I
> reported to help at tg yesterday.
> 
> Im about to investigate the logs, but all my jobs are out of the queue now,
> and the workflow has completed.
> 
> (Ben: I'll be filing the log momentarily after I do an initial check of it. Of
> 1000 jobs I got about 533 result datasets returned. This was w/o clustering).
> I got 396 emails from PBS.
> 
> - Mike
> 
> (Ti: responding to tg-support as thats where Joe sent this...)
> 
> On 11/5/07 9:15 PM, joseph insley wrote:
> > I'm not sure what was causing this, but the load on tg-grid1 spiked at over
> > 200 a short while ago.  It's coming back down now, but while it was high I
> > tried to submit a job through GRAM (pre-WS) and after a long wait I got the
> > error "GRAM Job submission failed because an I/O operation failed (error
> > code 3)"
> > 
> > At the time there were a number of globus-job-manager processes belonging to
> > Mike Wilde, but only on the order of ~30something.. it doesn't seem like
> > this should cause such a high load, so I don't know what was up...
> > 
> > joe.
> > 
> > ===================================================
> > joseph a. insley
> > insley at mcs.anl.gov
> > mathematics & computer science division       (630) 252-5649
> > argonne national laboratory                               (630) 252-5986
> > (fax)
> > 
> > 
> > 
> 
> 



More information about the Swift-devel mailing list