[Swift-devel] Re: Swift jobs on UC/ANL TG
Ti Leggett
leggett at mcs.anl.gov
Tue Jan 29 08:11:06 CST 2008
The default walltime is 15 minutes. Are you doing fork jobs or pbs
jobs? You shouldn't be doing fork jobs at all. Mike W, I thought there
were throttles in place in Swift to prevent this type of overrun? Mike
K, I'll need you to either stop these types of jobs until Mike W can
verify throttling or only submit a few 10s of jobs at a time.
On Jan 28, 2008, at 01/28/08 07:13 PM, Mike Kubal wrote:
> Yes, I'm submitting molecular dynamics simulations
> using Swift.
>
> Is there a default wall-time limit for jobs on tg-uc?
>
>
>
> --- joseph insley <insley at mcs.anl.gov> wrote:
>
>> Actually, these numbers are now escalating...
>>
>> top - 17:18:54 up 2:29, 1 user, load average:
>> 149.02, 123.63, 91.94
>> Tasks: 469 total, 4 running, 465 sleeping, 0
>> stopped, 0 zombie
>>
>> insley at tg-grid1:~> ps -ef | grep kubal | wc -l
>> 479
>>
>> insley at tg-viz-login1:~> time globusrun -a -r
>> tg-grid.uc.teragrid.org
>> GRAM Authentication test successful
>> real 0m26.134s
>> user 0m0.090s
>> sys 0m0.010s
>>
>>
>> On Jan 28, 2008, at 5:15 PM, joseph insley wrote:
>>
>>> Earlier today tg-grid.uc.teragrid.org (the UC/ANL
>> TG GRAM host)
>>> became unresponsive and had to be rebooted. I am
>> now seeing slow
>>> response times from the Gatekeeper there again.
>> Authenticating to
>>> the gatekeeper should only take a second or two,
>> but it is
>>> periodically taking up to 16 seconds:
>>>
>>> insley at tg-viz-login1:~> time globusrun -a -r
>> tg-grid.uc.teragrid.org
>>> GRAM Authentication test successful
>>> real 0m16.096s
>>> user 0m0.060s
>>> sys 0m0.020s
>>>
>>> looking at the load on tg-grid, it is rather high:
>>>
>>> top - 16:55:26 up 2:06, 1 user, load average:
>> 89.59, 78.69, 62.92
>>> Tasks: 398 total, 20 running, 378 sleeping, 0
>> stopped, 0 zombie
>>>
>>> And there appear to be a large number of processes
>> owned by kubal:
>>> insley at tg-grid1:~> ps -ef | grep kubal | wc -l
>>> 380
>>>
>>> I assume that Mike is using swift to do the job
>> submission. Is
>>> there some throttling of the rate at which jobs
>> are submitted to
>>> the gatekeeper that could be done that would
>> lighten this load
>>> some? (Or has that already been done since
>> earlier today?) The
>>> current response times are not unacceptable, but
>> I'm hoping to
>>> avoid having the machine grind to a halt as it did
>> earlier today.
>>>
>>> Thanks,
>>> joe.
>>>
>>>
>>>
>> ===================================================
>>> joseph a.
>>> insley
>>
>>> insley at mcs.anl.gov
>>> mathematics & computer science division
>> (630) 252-5649
>>> argonne national laboratory
>> (630)
>>> 252-5986 (fax)
>>>
>>>
>>
>> ===================================================
>> joseph a. insley
>>
>> insley at mcs.anl.gov
>> mathematics & computer science division (630)
>> 252-5649
>> argonne national laboratory
>> (630)
>> 252-5986 (fax)
>>
>>
>>
>
>
>
>
> ____________________________________________________________________________________
> Be a better friend, newshound, and
> know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>
More information about the Swift-devel
mailing list