[Swift-devel] Support request: Swift jobs flooding uc-teragrid?

Michael Wilde wilde at mcs.anl.gov
Tue Jan 29 20:02:40 CST 2008


MikeK, no attachment.

Ive narrowed the cc list, and need to read back through the email thread 
  on this to see what Mihael observed.

- MikeW

On 1/29/08 8:00 PM, Mike Kubal wrote:
> The attachment contains the swift script, tc file,
> sites file and swift.properties file.
> 
> I didn't provide any additional command line
> arguments.
> 
> MikeK
> 
> 
> --- Michael Wilde <wilde at mcs.anl.gov> wrote:
> 
>> [ was Re: Swift jobs on UC/ANL TG ]
>>
>> Hi. Im at OHare and will be flying soon.
>> Ben or Mihael, if you are online, can you
>> investigate?
>>
>> Yes, there are significant throttles turned on by
>> default, and the 
>> system opens those very gradually.
>>
>> MikeK, can you post to the swift-devel list your
>> swift.properties file, 
>> command line options, and your swift source code?
>>
>> Thanks,
>>
>> MikeW
>>
>>
>> On 1/29/08 8:11 AM, Ti Leggett wrote:
>>> The default walltime is 15 minutes. Are you doing
>> fork jobs or pbs jobs? 
>>> You shouldn't be doing fork jobs at all. Mike W, I
>> thought there were 
>>> throttles in place in Swift to prevent this type
>> of overrun? Mike K, 
>>> I'll need you to either stop these types of jobs
>> until Mike W can verify 
>>> throttling or only submit a few 10s of jobs at a
>> time.
>>> On Jan 28, 2008, at 01/28/08 07:13 PM, Mike Kubal
>> wrote:
>>>> Yes, I'm submitting molecular dynamics
>> simulations
>>>> using Swift.
>>>>
>>>> Is there a default wall-time limit for jobs on
>> tg-uc?
>>>>
>>>>
>>>> --- joseph insley <insley at mcs.anl.gov> wrote:
>>>>
>>>>> Actually, these numbers are now escalating...
>>>>>
>>>>> top - 17:18:54 up  2:29,  1 user,  load average:
>>>>> 149.02, 123.63, 91.94
>>>>> Tasks: 469 total,   4 running, 465 sleeping,   0
>>>>> stopped,   0 zombie
>>>>>
>>>>> insley at tg-grid1:~> ps -ef | grep kubal | wc -l
>>>>>     479
>>>>>
>>>>> insley at tg-viz-login1:~> time globusrun -a -r
>>>>> tg-grid.uc.teragrid.org
>>>>> GRAM Authentication test successful
>>>>> real    0m26.134s
>>>>> user    0m0.090s
>>>>> sys     0m0.010s
>>>>>
>>>>>
>>>>> On Jan 28, 2008, at 5:15 PM, joseph insley
>> wrote:
>>>>>> Earlier today tg-grid.uc.teragrid.org (the
>> UC/ANL
>>>>> TG GRAM host)
>>>>>> became unresponsive and had to be rebooted.  I
>> am
>>>>> now seeing slow
>>>>>> response times from the Gatekeeper there again.
>>>>> Authenticating to
>>>>>> the gatekeeper should only take a second or
>> two,
>>>>> but it is
>>>>>> periodically taking up to 16 seconds:
>>>>>>
>>>>>> insley at tg-viz-login1:~> time globusrun -a -r
>>>>> tg-grid.uc.teragrid.org
>>>>>> GRAM Authentication test successful
>>>>>> real    0m16.096s
>>>>>> user    0m0.060s
>>>>>> sys     0m0.020s
>>>>>>
>>>>>> looking at the load on tg-grid, it is rather
>> high:
>>>>>> top - 16:55:26 up  2:06,  1 user,  load
>> average:
>>>>> 89.59, 78.69, 62.92
>>>>>> Tasks: 398 total,  20 running, 378 sleeping,  
>> 0
>>>>> stopped,   0 zombie
>>>>>> And there appear to be a large number of
>> processes
>>>>> owned by kubal:
>>>>>> insley at tg-grid1:~> ps -ef | grep kubal | wc -l
>>>>>>    380
>>>>>>
>>>>>> I assume that Mike is using swift to do the job
>>>>> submission.  Is
>>>>>> there some throttling of the rate at which jobs
>>>>> are submitted to
>>>>>> the gatekeeper that could be done that would
>>>>> lighten this load
>>>>>> some?  (Or has that already been done since
>>>>> earlier today?)  The
>>>>>> current response times are not unacceptable,
>> but
>>>>> I'm hoping to
>>>>>> avoid having the machine grind to a halt as it
>> did
>>>>> earlier today.
>>>>>> Thanks,
>>>>>> joe.
>>>>>>
>>>>>>
>>>>>>
>> ===================================================
>>>>>> joseph a.
>>>>>> insley
>>>>>> insley at mcs.anl.gov
>>>>>> mathematics & computer science division
>>>>> (630) 252-5649
>>>>>> argonne national laboratory
>>>>>       (630)
>>>>>> 252-5986 (fax)
>>>>>>
>>>>>>
>>>>>
>> ===================================================
>>>>> joseph a. insley
>>>>>
>>>>> insley at mcs.anl.gov
>>>>> mathematics & computer science division      
>> (630)
>>>>> 252-5649
>>>>> argonne national laboratory
>>>>>     (630)
>>>>> 252-5986 (fax)
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>      
>>>>
> ____________________________________________________________________________________
>>>> Be a better friend, newshound, and
>>>> know-it-all with Yahoo! Mobile.  Try it now.  
>>>>
> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>>
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>
> 
> 
> 
>       ____________________________________________________________________________________
> Never miss a thing.  Make Yahoo your home page. 
> http://www.yahoo.com/r/hs
> 
> 



More information about the Swift-devel mailing list