[Swift-devel] Support request: Swift jobs flooding uc-teragrid?

Ian Foster foster at mcs.anl.gov
Tue Jan 29 13:15:51 CST 2008


Hi,

I've CCed Stuart Martin--I'd greatly appreciate some insights into what 
is causing this. I assume that you are using GRAM4 (aka WS-GRAM)?

Ian.

Michael Wilde wrote:
> [ was Re: Swift jobs on UC/ANL TG ]
>
> Hi. Im at OHare and will be flying soon.
> Ben or Mihael, if you are online, can you investigate?
>
> Yes, there are significant throttles turned on by default, and the 
> system opens those very gradually.
>
> MikeK, can you post to the swift-devel list your swift.properties 
> file, command line options, and your swift source code?
>
> Thanks,
>
> MikeW
>
>
> On 1/29/08 8:11 AM, Ti Leggett wrote:
>> The default walltime is 15 minutes. Are you doing fork jobs or pbs 
>> jobs? You shouldn't be doing fork jobs at all. Mike W, I thought 
>> there were throttles in place in Swift to prevent this type of 
>> overrun? Mike K, I'll need you to either stop these types of jobs 
>> until Mike W can verify throttling or only submit a few 10s of jobs 
>> at a time.
>>
>> On Jan 28, 2008, at 01/28/08 07:13 PM, Mike Kubal wrote:
>>
>>> Yes, I'm submitting molecular dynamics simulations
>>> using Swift.
>>>
>>> Is there a default wall-time limit for jobs on tg-uc?
>>>
>>>
>>>
>>> --- joseph insley <insley at mcs.anl.gov> wrote:
>>>
>>>> Actually, these numbers are now escalating...
>>>>
>>>> top - 17:18:54 up  2:29,  1 user,  load average:
>>>> 149.02, 123.63, 91.94
>>>> Tasks: 469 total,   4 running, 465 sleeping,   0
>>>> stopped,   0 zombie
>>>>
>>>> insley at tg-grid1:~> ps -ef | grep kubal | wc -l
>>>>     479
>>>>
>>>> insley at tg-viz-login1:~> time globusrun -a -r
>>>> tg-grid.uc.teragrid.org
>>>> GRAM Authentication test successful
>>>> real    0m26.134s
>>>> user    0m0.090s
>>>> sys     0m0.010s
>>>>
>>>>
>>>> On Jan 28, 2008, at 5:15 PM, joseph insley wrote:
>>>>
>>>>> Earlier today tg-grid.uc.teragrid.org (the UC/ANL
>>>> TG GRAM host)
>>>>> became unresponsive and had to be rebooted.  I am
>>>> now seeing slow
>>>>> response times from the Gatekeeper there again.
>>>> Authenticating to
>>>>> the gatekeeper should only take a second or two,
>>>> but it is
>>>>> periodically taking up to 16 seconds:
>>>>>
>>>>> insley at tg-viz-login1:~> time globusrun -a -r
>>>> tg-grid.uc.teragrid.org
>>>>> GRAM Authentication test successful
>>>>> real    0m16.096s
>>>>> user    0m0.060s
>>>>> sys     0m0.020s
>>>>>
>>>>> looking at the load on tg-grid, it is rather high:
>>>>>
>>>>> top - 16:55:26 up  2:06,  1 user,  load average:
>>>> 89.59, 78.69, 62.92
>>>>> Tasks: 398 total,  20 running, 378 sleeping,   0
>>>> stopped,   0 zombie
>>>>>
>>>>> And there appear to be a large number of processes
>>>> owned by kubal:
>>>>> insley at tg-grid1:~> ps -ef | grep kubal | wc -l
>>>>>    380
>>>>>
>>>>> I assume that Mike is using swift to do the job
>>>> submission.  Is
>>>>> there some throttling of the rate at which jobs
>>>> are submitted to
>>>>> the gatekeeper that could be done that would
>>>> lighten this load
>>>>> some?  (Or has that already been done since
>>>> earlier today?)  The
>>>>> current response times are not unacceptable, but
>>>> I'm hoping to
>>>>> avoid having the machine grind to a halt as it did
>>>> earlier today.
>>>>>
>>>>> Thanks,
>>>>> joe.
>>>>>
>>>>>
>>>>>
>>>> ===================================================
>>>>> joseph a.
>>>>> insley
>>>>
>>>>> insley at mcs.anl.gov
>>>>> mathematics & computer science division
>>>> (630) 252-5649
>>>>> argonne national laboratory
>>>>       (630)
>>>>> 252-5986 (fax)
>>>>>
>>>>>
>>>>
>>>> ===================================================
>>>> joseph a. insley
>>>>
>>>> insley at mcs.anl.gov
>>>> mathematics & computer science division       (630)
>>>> 252-5649
>>>> argonne national laboratory
>>>>     (630)
>>>> 252-5986 (fax)
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>      
>>> ____________________________________________________________________________________ 
>>>
>>> Be a better friend, newshound, and
>>> know-it-all with Yahoo! Mobile.  Try it now.  
>>> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>>>
>>
>>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>



More information about the Swift-devel mailing list