[Swift-devel] Swift jobs on UC/ANL TG

joseph insley insley at mcs.anl.gov
Mon Jan 28 17:15:41 CST 2008


Earlier today tg-grid.uc.teragrid.org (the UC/ANL TG GRAM host)  
became unresponsive and had to be rebooted.  I am now seeing slow  
response times from the Gatekeeper there again.  Authenticating to  
the gatekeeper should only take a second or two, but it is  
periodically taking up to 16 seconds:

insley at tg-viz-login1:~> time globusrun -a -r tg-grid.uc.teragrid.org
GRAM Authentication test successful
real    0m16.096s
user    0m0.060s
sys     0m0.020s

looking at the load on tg-grid, it is rather high:

top - 16:55:26 up  2:06,  1 user,  load average: 89.59, 78.69, 62.92
Tasks: 398 total,  20 running, 378 sleeping,   0 stopped,   0 zombie

And there appear to be a large number of processes owned by kubal:
insley at tg-grid1:~> ps -ef | grep kubal | wc -l
     380

I assume that Mike is using swift to do the job submission.  Is there  
some throttling of the rate at which jobs are submitted to the  
gatekeeper that could be done that would lighten this load some?  (Or  
has that already been done since earlier today?)  The current  
response times are not unacceptable, but I'm hoping to avoid having  
the machine grind to a halt as it did earlier today.

Thanks,
joe.


===================================================
joseph a. insley                                                       
insley at mcs.anl.gov
mathematics & computer science division       (630) 252-5649
argonne national laboratory                               (630)  
252-5986 (fax)


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20080128/63b0c817/attachment.html>


More information about the Swift-devel mailing list