[Swift-devel] Re: Swift jobs on UC/ANL TG

Ti Leggett leggett at mcs.anl.gov
Sun Feb 3 15:36:57 CST 2008


I should say I killed all your processes running on tg-grid1 so your  
jobs most likely are going to fail.

On Feb 3, 2008, at 3:34 PM, Ti Leggett wrote:

> Mike, You're killing tg-grid1 again. Can someone work with Mike to  
> get some swift settings that don't kill our server?
>
> On Jan 28, 2008, at 7:13 PM, Mike Kubal wrote:
>
>> Yes, I'm submitting molecular dynamics simulations
>> using Swift.
>>
>> Is there a default wall-time limit for jobs on tg-uc?
>>
>>
>>
>> --- joseph insley <insley at mcs.anl.gov> wrote:
>>
>>> Actually, these numbers are now escalating...
>>>
>>> top - 17:18:54 up  2:29,  1 user,  load average:
>>> 149.02, 123.63, 91.94
>>> Tasks: 469 total,   4 running, 465 sleeping,   0
>>> stopped,   0 zombie
>>>
>>> insley at tg-grid1:~> ps -ef | grep kubal | wc -l
>>>    479
>>>
>>> insley at tg-viz-login1:~> time globusrun -a -r
>>> tg-grid.uc.teragrid.org
>>> GRAM Authentication test successful
>>> real    0m26.134s
>>> user    0m0.090s
>>> sys     0m0.010s
>>>
>>>
>>> On Jan 28, 2008, at 5:15 PM, joseph insley wrote:
>>>
>>>> Earlier today tg-grid.uc.teragrid.org (the UC/ANL
>>> TG GRAM host)
>>>> became unresponsive and had to be rebooted.  I am
>>> now seeing slow
>>>> response times from the Gatekeeper there again.
>>> Authenticating to
>>>> the gatekeeper should only take a second or two,
>>> but it is
>>>> periodically taking up to 16 seconds:
>>>>
>>>> insley at tg-viz-login1:~> time globusrun -a -r
>>> tg-grid.uc.teragrid.org
>>>> GRAM Authentication test successful
>>>> real    0m16.096s
>>>> user    0m0.060s
>>>> sys     0m0.020s
>>>>
>>>> looking at the load on tg-grid, it is rather high:
>>>>
>>>> top - 16:55:26 up  2:06,  1 user,  load average:
>>> 89.59, 78.69, 62.92
>>>> Tasks: 398 total,  20 running, 378 sleeping,   0
>>> stopped,   0 zombie
>>>>
>>>> And there appear to be a large number of processes
>>> owned by kubal:
>>>> insley at tg-grid1:~> ps -ef | grep kubal | wc -l
>>>>   380
>>>>
>>>> I assume that Mike is using swift to do the job
>>> submission.  Is
>>>> there some throttling of the rate at which jobs
>>> are submitted to
>>>> the gatekeeper that could be done that would
>>> lighten this load
>>>> some?  (Or has that already been done since
>>> earlier today?)  The
>>>> current response times are not unacceptable, but
>>> I'm hoping to
>>>> avoid having the machine grind to a halt as it did
>>> earlier today.
>>>>
>>>> Thanks,
>>>> joe.
>>>>
>>>>
>>>>
>>> ===================================================
>>>> joseph a.
>>>> insley
>>>
>>>> insley at mcs.anl.gov
>>>> mathematics & computer science division
>>> (630) 252-5649
>>>> argonne national laboratory
>>>      (630)
>>>> 252-5986 (fax)
>>>>
>>>>
>>>
>>> ===================================================
>>> joseph a. insley
>>>
>>> insley at mcs.anl.gov
>>> mathematics & computer science division       (630)
>>> 252-5649
>>> argonne national laboratory
>>>    (630)
>>> 252-5986 (fax)
>>>
>>>
>>>
>>
>>
>>
>>      
>> ____________________________________________________________________________________
>> Be a better friend, newshound, and
>> know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>>
>




More information about the Swift-devel mailing list