[Swift-devel] Re: high load on tg-grid1
Ben Clifford
benc at hawaga.org.uk
Mon Nov 5 22:04:24 CST 2007
Did you run this with swift default throttling? If so, I'm interested to
see the swift site scores.
On Mon, 5 Nov 2007, Michael Wilde wrote:
> Joe, I started a workflow with 1000 jobs - most likely thats what caused this.
> I need to check the throttles on this workflow - its possible they were open
> too wide.
>
> Another possibility - not sure if this was cause or effect - was that I got
> hundreds of messages from PBS (job aborted messages) of the form that I
> reported to help at tg yesterday.
>
> Im about to investigate the logs, but all my jobs are out of the queue now,
> and the workflow has completed.
>
> (Ben: I'll be filing the log momentarily after I do an initial check of it. Of
> 1000 jobs I got about 533 result datasets returned. This was w/o clustering).
> I got 396 emails from PBS.
>
> - Mike
>
> (Ti: responding to tg-support as thats where Joe sent this...)
>
> On 11/5/07 9:15 PM, joseph insley wrote:
> > I'm not sure what was causing this, but the load on tg-grid1 spiked at over
> > 200 a short while ago. It's coming back down now, but while it was high I
> > tried to submit a job through GRAM (pre-WS) and after a long wait I got the
> > error "GRAM Job submission failed because an I/O operation failed (error
> > code 3)"
> >
> > At the time there were a number of globus-job-manager processes belonging to
> > Mike Wilde, but only on the order of ~30something.. it doesn't seem like
> > this should cause such a high load, so I don't know what was up...
> >
> > joe.
> >
> > ===================================================
> > joseph a. insley
> > insley at mcs.anl.gov
> > mathematics & computer science division (630) 252-5649
> > argonne national laboratory (630) 252-5986
> > (fax)
> >
> >
> >
>
>
More information about the Swift-devel
mailing list