[Swift-devel] Re: Swift jobs on UC/ANL TG

Ti Leggett leggett at mcs.anl.gov
Tue Jan 29 08:11:06 CST 2008


The default walltime is 15 minutes. Are you doing fork jobs or pbs  
jobs? You shouldn't be doing fork jobs at all. Mike W, I thought there  
were throttles in place in Swift to prevent this type of overrun? Mike  
K, I'll need you to either stop these types of jobs until Mike W can  
verify throttling or only submit a few 10s of jobs at a time.

On Jan 28, 2008, at 01/28/08 07:13 PM, Mike Kubal wrote:

> Yes, I'm submitting molecular dynamics simulations
> using Swift.
>
> Is there a default wall-time limit for jobs on tg-uc?
>
>
>
> --- joseph insley <insley at mcs.anl.gov> wrote:
>
>> Actually, these numbers are now escalating...
>>
>> top - 17:18:54 up  2:29,  1 user,  load average:
>> 149.02, 123.63, 91.94
>> Tasks: 469 total,   4 running, 465 sleeping,   0
>> stopped,   0 zombie
>>
>> insley at tg-grid1:~> ps -ef | grep kubal | wc -l
>>     479
>>
>> insley at tg-viz-login1:~> time globusrun -a -r
>> tg-grid.uc.teragrid.org
>> GRAM Authentication test successful
>> real    0m26.134s
>> user    0m0.090s
>> sys     0m0.010s
>>
>>
>> On Jan 28, 2008, at 5:15 PM, joseph insley wrote:
>>
>>> Earlier today tg-grid.uc.teragrid.org (the UC/ANL
>> TG GRAM host)
>>> became unresponsive and had to be rebooted.  I am
>> now seeing slow
>>> response times from the Gatekeeper there again.
>> Authenticating to
>>> the gatekeeper should only take a second or two,
>> but it is
>>> periodically taking up to 16 seconds:
>>>
>>> insley at tg-viz-login1:~> time globusrun -a -r
>> tg-grid.uc.teragrid.org
>>> GRAM Authentication test successful
>>> real    0m16.096s
>>> user    0m0.060s
>>> sys     0m0.020s
>>>
>>> looking at the load on tg-grid, it is rather high:
>>>
>>> top - 16:55:26 up  2:06,  1 user,  load average:
>> 89.59, 78.69, 62.92
>>> Tasks: 398 total,  20 running, 378 sleeping,   0
>> stopped,   0 zombie
>>>
>>> And there appear to be a large number of processes
>> owned by kubal:
>>> insley at tg-grid1:~> ps -ef | grep kubal | wc -l
>>>    380
>>>
>>> I assume that Mike is using swift to do the job
>> submission.  Is
>>> there some throttling of the rate at which jobs
>> are submitted to
>>> the gatekeeper that could be done that would
>> lighten this load
>>> some?  (Or has that already been done since
>> earlier today?)  The
>>> current response times are not unacceptable, but
>> I'm hoping to
>>> avoid having the machine grind to a halt as it did
>> earlier today.
>>>
>>> Thanks,
>>> joe.
>>>
>>>
>>>
>> ===================================================
>>> joseph a.
>>> insley
>>
>>> insley at mcs.anl.gov
>>> mathematics & computer science division
>> (630) 252-5649
>>> argonne national laboratory
>>       (630)
>>> 252-5986 (fax)
>>>
>>>
>>
>> ===================================================
>> joseph a. insley
>>
>> insley at mcs.anl.gov
>> mathematics & computer science division       (630)
>> 252-5649
>> argonne national laboratory
>>     (630)
>> 252-5986 (fax)
>>
>>
>>
>
>
>
>       
> ____________________________________________________________________________________
> Be a better friend, newshound, and
> know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>




More information about the Swift-devel mailing list