[Swift-devel] Re: high load on tg-grid1

Michael Wilde wilde at mcs.anl.gov
Mon Nov 5 21:43:51 CST 2007


Joe, I started a workflow with 1000 jobs - most likely thats what caused 
this. I need to check the throttles on this workflow - its possible they 
were open too wide.

Another possibility - not sure if this was cause or effect - was that I 
got hundreds of messages from PBS (job aborted messages) of the form 
that I reported to help at tg yesterday.

Im about to investigate the logs, but all my jobs are out of the queue 
now, and the workflow has completed.

(Ben: I'll be filing the log momentarily after I do an initial check of 
it. Of 1000 jobs I got about 533 result datasets returned. This was w/o 
clustering). I got 396 emails from PBS.

- Mike

(Ti: responding to tg-support as thats where Joe sent this...)

On 11/5/07 9:15 PM, joseph insley wrote:
> I'm not sure what was causing this, but the load on tg-grid1 spiked at 
> over 200 a short while ago.  It's coming back down now, but while it was 
> high I tried to submit a job through GRAM (pre-WS) and after a long wait 
> I got the error "GRAM Job submission failed because an I/O operation 
> failed (error code 3)"
> 
> At the time there were a number of globus-job-manager processes 
> belonging to Mike Wilde, but only on the order of ~30something.. it 
> doesn't seem like this should cause such a high load, so I don't know 
> what was up...
> 
> joe.
> 
> ===================================================
> joseph a. insley                                                      
> insley at mcs.anl.gov
> mathematics & computer science division       (630) 252-5649
> argonne national laboratory                               (630) 252-5986 
> (fax)
> 
> 
> 



More information about the Swift-devel mailing list