[Swift-devel] Re: Swift jobs on UC/ANL TG
Ti Leggett
leggett at mcs.anl.gov
Sun Feb 3 15:36:57 CST 2008
I should say I killed all your processes running on tg-grid1 so your
jobs most likely are going to fail.
On Feb 3, 2008, at 3:34 PM, Ti Leggett wrote:
> Mike, You're killing tg-grid1 again. Can someone work with Mike to
> get some swift settings that don't kill our server?
>
> On Jan 28, 2008, at 7:13 PM, Mike Kubal wrote:
>
>> Yes, I'm submitting molecular dynamics simulations
>> using Swift.
>>
>> Is there a default wall-time limit for jobs on tg-uc?
>>
>>
>>
>> --- joseph insley <insley at mcs.anl.gov> wrote:
>>
>>> Actually, these numbers are now escalating...
>>>
>>> top - 17:18:54 up 2:29, 1 user, load average:
>>> 149.02, 123.63, 91.94
>>> Tasks: 469 total, 4 running, 465 sleeping, 0
>>> stopped, 0 zombie
>>>
>>> insley at tg-grid1:~> ps -ef | grep kubal | wc -l
>>> 479
>>>
>>> insley at tg-viz-login1:~> time globusrun -a -r
>>> tg-grid.uc.teragrid.org
>>> GRAM Authentication test successful
>>> real 0m26.134s
>>> user 0m0.090s
>>> sys 0m0.010s
>>>
>>>
>>> On Jan 28, 2008, at 5:15 PM, joseph insley wrote:
>>>
>>>> Earlier today tg-grid.uc.teragrid.org (the UC/ANL
>>> TG GRAM host)
>>>> became unresponsive and had to be rebooted. I am
>>> now seeing slow
>>>> response times from the Gatekeeper there again.
>>> Authenticating to
>>>> the gatekeeper should only take a second or two,
>>> but it is
>>>> periodically taking up to 16 seconds:
>>>>
>>>> insley at tg-viz-login1:~> time globusrun -a -r
>>> tg-grid.uc.teragrid.org
>>>> GRAM Authentication test successful
>>>> real 0m16.096s
>>>> user 0m0.060s
>>>> sys 0m0.020s
>>>>
>>>> looking at the load on tg-grid, it is rather high:
>>>>
>>>> top - 16:55:26 up 2:06, 1 user, load average:
>>> 89.59, 78.69, 62.92
>>>> Tasks: 398 total, 20 running, 378 sleeping, 0
>>> stopped, 0 zombie
>>>>
>>>> And there appear to be a large number of processes
>>> owned by kubal:
>>>> insley at tg-grid1:~> ps -ef | grep kubal | wc -l
>>>> 380
>>>>
>>>> I assume that Mike is using swift to do the job
>>> submission. Is
>>>> there some throttling of the rate at which jobs
>>> are submitted to
>>>> the gatekeeper that could be done that would
>>> lighten this load
>>>> some? (Or has that already been done since
>>> earlier today?) The
>>>> current response times are not unacceptable, but
>>> I'm hoping to
>>>> avoid having the machine grind to a halt as it did
>>> earlier today.
>>>>
>>>> Thanks,
>>>> joe.
>>>>
>>>>
>>>>
>>> ===================================================
>>>> joseph a.
>>>> insley
>>>
>>>> insley at mcs.anl.gov
>>>> mathematics & computer science division
>>> (630) 252-5649
>>>> argonne national laboratory
>>> (630)
>>>> 252-5986 (fax)
>>>>
>>>>
>>>
>>> ===================================================
>>> joseph a. insley
>>>
>>> insley at mcs.anl.gov
>>> mathematics & computer science division (630)
>>> 252-5649
>>> argonne national laboratory
>>> (630)
>>> 252-5986 (fax)
>>>
>>>
>>>
>>
>>
>>
>>
>> ____________________________________________________________________________________
>> Be a better friend, newshound, and
>> know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>>
>
More information about the Swift-devel
mailing list