[Swift-devel] Support request: Swift jobs flooding uc-teragrid?
Michael Wilde
wilde at mcs.anl.gov
Tue Jan 29 20:02:40 CST 2008
MikeK, no attachment.
Ive narrowed the cc list, and need to read back through the email thread
on this to see what Mihael observed.
- MikeW
On 1/29/08 8:00 PM, Mike Kubal wrote:
> The attachment contains the swift script, tc file,
> sites file and swift.properties file.
>
> I didn't provide any additional command line
> arguments.
>
> MikeK
>
>
> --- Michael Wilde <wilde at mcs.anl.gov> wrote:
>
>> [ was Re: Swift jobs on UC/ANL TG ]
>>
>> Hi. Im at OHare and will be flying soon.
>> Ben or Mihael, if you are online, can you
>> investigate?
>>
>> Yes, there are significant throttles turned on by
>> default, and the
>> system opens those very gradually.
>>
>> MikeK, can you post to the swift-devel list your
>> swift.properties file,
>> command line options, and your swift source code?
>>
>> Thanks,
>>
>> MikeW
>>
>>
>> On 1/29/08 8:11 AM, Ti Leggett wrote:
>>> The default walltime is 15 minutes. Are you doing
>> fork jobs or pbs jobs?
>>> You shouldn't be doing fork jobs at all. Mike W, I
>> thought there were
>>> throttles in place in Swift to prevent this type
>> of overrun? Mike K,
>>> I'll need you to either stop these types of jobs
>> until Mike W can verify
>>> throttling or only submit a few 10s of jobs at a
>> time.
>>> On Jan 28, 2008, at 01/28/08 07:13 PM, Mike Kubal
>> wrote:
>>>> Yes, I'm submitting molecular dynamics
>> simulations
>>>> using Swift.
>>>>
>>>> Is there a default wall-time limit for jobs on
>> tg-uc?
>>>>
>>>>
>>>> --- joseph insley <insley at mcs.anl.gov> wrote:
>>>>
>>>>> Actually, these numbers are now escalating...
>>>>>
>>>>> top - 17:18:54 up 2:29, 1 user, load average:
>>>>> 149.02, 123.63, 91.94
>>>>> Tasks: 469 total, 4 running, 465 sleeping, 0
>>>>> stopped, 0 zombie
>>>>>
>>>>> insley at tg-grid1:~> ps -ef | grep kubal | wc -l
>>>>> 479
>>>>>
>>>>> insley at tg-viz-login1:~> time globusrun -a -r
>>>>> tg-grid.uc.teragrid.org
>>>>> GRAM Authentication test successful
>>>>> real 0m26.134s
>>>>> user 0m0.090s
>>>>> sys 0m0.010s
>>>>>
>>>>>
>>>>> On Jan 28, 2008, at 5:15 PM, joseph insley
>> wrote:
>>>>>> Earlier today tg-grid.uc.teragrid.org (the
>> UC/ANL
>>>>> TG GRAM host)
>>>>>> became unresponsive and had to be rebooted. I
>> am
>>>>> now seeing slow
>>>>>> response times from the Gatekeeper there again.
>>>>> Authenticating to
>>>>>> the gatekeeper should only take a second or
>> two,
>>>>> but it is
>>>>>> periodically taking up to 16 seconds:
>>>>>>
>>>>>> insley at tg-viz-login1:~> time globusrun -a -r
>>>>> tg-grid.uc.teragrid.org
>>>>>> GRAM Authentication test successful
>>>>>> real 0m16.096s
>>>>>> user 0m0.060s
>>>>>> sys 0m0.020s
>>>>>>
>>>>>> looking at the load on tg-grid, it is rather
>> high:
>>>>>> top - 16:55:26 up 2:06, 1 user, load
>> average:
>>>>> 89.59, 78.69, 62.92
>>>>>> Tasks: 398 total, 20 running, 378 sleeping,
>> 0
>>>>> stopped, 0 zombie
>>>>>> And there appear to be a large number of
>> processes
>>>>> owned by kubal:
>>>>>> insley at tg-grid1:~> ps -ef | grep kubal | wc -l
>>>>>> 380
>>>>>>
>>>>>> I assume that Mike is using swift to do the job
>>>>> submission. Is
>>>>>> there some throttling of the rate at which jobs
>>>>> are submitted to
>>>>>> the gatekeeper that could be done that would
>>>>> lighten this load
>>>>>> some? (Or has that already been done since
>>>>> earlier today?) The
>>>>>> current response times are not unacceptable,
>> but
>>>>> I'm hoping to
>>>>>> avoid having the machine grind to a halt as it
>> did
>>>>> earlier today.
>>>>>> Thanks,
>>>>>> joe.
>>>>>>
>>>>>>
>>>>>>
>> ===================================================
>>>>>> joseph a.
>>>>>> insley
>>>>>> insley at mcs.anl.gov
>>>>>> mathematics & computer science division
>>>>> (630) 252-5649
>>>>>> argonne national laboratory
>>>>> (630)
>>>>>> 252-5986 (fax)
>>>>>>
>>>>>>
>>>>>
>> ===================================================
>>>>> joseph a. insley
>>>>>
>>>>> insley at mcs.anl.gov
>>>>> mathematics & computer science division
>> (630)
>>>>> 252-5649
>>>>> argonne national laboratory
>>>>> (630)
>>>>> 252-5986 (fax)
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
> ____________________________________________________________________________________
>>>> Be a better friend, newshound, and
>>>> know-it-all with Yahoo! Mobile. Try it now.
>>>>
> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>>
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>
>
>
>
> ____________________________________________________________________________________
> Never miss a thing. Make Yahoo your home page.
> http://www.yahoo.com/r/hs
>
>
More information about the Swift-devel
mailing list