[Swift-devel] Re: high load on tg-grid1
Michael Wilde
wilde at mcs.anl.gov
Mon Nov 5 21:43:51 CST 2007
Joe, I started a workflow with 1000 jobs - most likely thats what caused
this. I need to check the throttles on this workflow - its possible they
were open too wide.
Another possibility - not sure if this was cause or effect - was that I
got hundreds of messages from PBS (job aborted messages) of the form
that I reported to help at tg yesterday.
Im about to investigate the logs, but all my jobs are out of the queue
now, and the workflow has completed.
(Ben: I'll be filing the log momentarily after I do an initial check of
it. Of 1000 jobs I got about 533 result datasets returned. This was w/o
clustering). I got 396 emails from PBS.
- Mike
(Ti: responding to tg-support as thats where Joe sent this...)
On 11/5/07 9:15 PM, joseph insley wrote:
> I'm not sure what was causing this, but the load on tg-grid1 spiked at
> over 200 a short while ago. It's coming back down now, but while it was
> high I tried to submit a job through GRAM (pre-WS) and after a long wait
> I got the error "GRAM Job submission failed because an I/O operation
> failed (error code 3)"
>
> At the time there were a number of globus-job-manager processes
> belonging to Mike Wilde, but only on the order of ~30something.. it
> doesn't seem like this should cause such a high load, so I don't know
> what was up...
>
> joe.
>
> ===================================================
> joseph a. insley
> insley at mcs.anl.gov
> mathematics & computer science division (630) 252-5649
> argonne national laboratory (630) 252-5986
> (fax)
>
>
>
More information about the Swift-devel
mailing list