[Swift-devel] Re: Swift jobs on UC/ANL TG
Ti Leggett
leggett at mcs.anl.gov
Mon Feb 4 09:58:40 CST 2008
That inca tests were timing out after 5 minutes and the load on the
machine was ~27. How are you concluding when things aren't acceptable?
On Feb 4, 2008, at 9:30 AM, Mihael Hategan wrote:
> That's odd. Clearly if that's not acceptable from your perspective,
> yet
> I thought 130 are fine, there's a disconnect between what you think is
> acceptable and what I think is acceptable.
>
> What was that prompted you to conclude things are bad?
>
> On Mon, 2008-02-04 at 07:16 -0600, Ti Leggett wrote:
>> Around 80.
>>
>> On Feb 4, 2008, at 12:14 AM, Mihael Hategan wrote:
>>
>>>
>>> On Sun, 2008-02-03 at 22:11 -0800, Mike Kubal wrote:
>>>> Sorry for killing the server. I'm pushing to get
>>>> results to guide the selection of compounds for
>>>> wet-lab testing.
>>>>
>>>> I had set the throttle.score.job.factor to 1 in the
>>>> swift.properties file.
>>>
>>> Hmm. Ti, at the time of the massacre, how many did you kill?
>>>
>>> Mihael
>>>
>>>>
>>>> I certainly appreciate everyone's efforts and
>>>> responsiveness.
>>>>
>>>> Let me know what to try next, before I kill again.
>>>>
>>>> Cheers,
>>>>
>>>> Mike
>>>>
>>>>
>>>>
>>>> --- Mihael Hategan <hategan at mcs.anl.gov> wrote:
>>>>
>>>>> So I was trying some stuff on Friday night. I guess
>>>>> I've found the
>>>>> strategy on when to run the tests: when nobody else
>>>>> has jobs there
>>>>> (besides Buzz doing gridftp tests, Ioan having some
>>>>> Falkon workers
>>>>> running, and the occasional Inca tests).
>>>>>
>>>>> In any event, the machine jumps to about 100%
>>>>> utilization at around 130
>>>>> jobs with pre-ws gram. So Mike, please set
>>>>> throttle.score.job.factor to
>>>>> 1 in swift.properties.
>>>>>
>>>>> There's still more work I need to do test-wise.
>>>>>
>>>>> On Sun, 2008-02-03 at 15:34 -0600, Ti Leggett wrote:
>>>>>> Mike, You're killing tg-grid1 again. Can someone
>>>>> work with Mike to get
>>>>>> some swift settings that don't kill our server?
>>>>>>
>>>>>> On Jan 28, 2008, at 7:13 PM, Mike Kubal wrote:
>>>>>>
>>>>>>> Yes, I'm submitting molecular dynamics
>>>>> simulations
>>>>>>> using Swift.
>>>>>>>
>>>>>>> Is there a default wall-time limit for jobs on
>>>>> tg-uc?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --- joseph insley <insley at mcs.anl.gov> wrote:
>>>>>>>
>>>>>>>> Actually, these numbers are now escalating...
>>>>>>>>
>>>>>>>> top - 17:18:54 up 2:29, 1 user, load
>>>>> average:
>>>>>>>> 149.02, 123.63, 91.94
>>>>>>>> Tasks: 469 total, 4 running, 465 sleeping,
>>>>> 0
>>>>>>>> stopped, 0 zombie
>>>>>>>>
>>>>>>>> insley at tg-grid1:~> ps -ef | grep kubal | wc -l
>>>>>>>> 479
>>>>>>>>
>>>>>>>> insley at tg-viz-login1:~> time globusrun -a -r
>>>>>>>> tg-grid.uc.teragrid.org
>>>>>>>> GRAM Authentication test successful
>>>>>>>> real 0m26.134s
>>>>>>>> user 0m0.090s
>>>>>>>> sys 0m0.010s
>>>>>>>>
>>>>>>>>
>>>>>>>> On Jan 28, 2008, at 5:15 PM, joseph insley
>>>>> wrote:
>>>>>>>>
>>>>>>>>> Earlier today tg-grid.uc.teragrid.org (the
>>>>> UC/ANL
>>>>>>>> TG GRAM host)
>>>>>>>>> became unresponsive and had to be rebooted. I
>>>>> am
>>>>>>>> now seeing slow
>>>>>>>>> response times from the Gatekeeper there
>>>>> again.
>>>>>>>> Authenticating to
>>>>>>>>> the gatekeeper should only take a second or
>>>>> two,
>>>>>>>> but it is
>>>>>>>>> periodically taking up to 16 seconds:
>>>>>>>>>
>>>>>>>>> insley at tg-viz-login1:~> time globusrun -a -r
>>>>>>>> tg-grid.uc.teragrid.org
>>>>>>>>> GRAM Authentication test successful
>>>>>>>>> real 0m16.096s
>>>>>>>>> user 0m0.060s
>>>>>>>>> sys 0m0.020s
>>>>>>>>>
>>>>>>>>> looking at the load on tg-grid, it is rather
>>>>> high:
>>>>>>>>>
>>>>>>>>> top - 16:55:26 up 2:06, 1 user, load
>>>>> average:
>>>>>>>> 89.59, 78.69, 62.92
>>>>>>>>> Tasks: 398 total, 20 running, 378 sleeping,
>>>>> 0
>>>>>>>> stopped, 0 zombie
>>>>>>>>>
>>>>>>>>> And there appear to be a large number of
>>>>> processes
>>>>>>>> owned by kubal:
>>>>>>>>> insley at tg-grid1:~> ps -ef | grep kubal | wc -l
>>>>>>>>> 380
>>>>>>>>>
>>>>>>>>> I assume that Mike is using swift to do the
>>>>> job
>>>>>>>> submission. Is
>>>>>>>>> there some throttling of the rate at which
>>>>> jobs
>>>>>>>> are submitted to
>>>>>>>>> the gatekeeper that could be done that would
>>>>>>>> lighten this load
>>>>>>>>> some? (Or has that already been done since
>>>>>>>> earlier today?) The
>>>>>>>>> current response times are not unacceptable,
>>>>> but
>>>>>>>> I'm hoping to
>>>>>>>>> avoid having the machine grind to a halt as it
>>>>> did
>>>>>>>> earlier today.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> joe.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>> ===================================================
>>>>>>>>> joseph a.
>>>>>>>>> insley
>>>>>>>>
>>>>>>>>> insley at mcs.anl.gov
>>>>>>>>> mathematics & computer science division
>>>>>>>> (630) 252-5649
>>>>>>>>> argonne national laboratory
>>>>>>>> (630)
>>>>>>>>> 252-5986 (fax)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>> ===================================================
>>>>>>>> joseph a. insley
>>>>>>>>
>>>>>>>> insley at mcs.anl.gov
>>>>>>>> mathematics & computer science division
>>>>> (630)
>>>>>>>> 252-5649
>>>>>>>> argonne national laboratory
>>>>>>>> (630)
>>>>>>>> 252-5986 (fax)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>> ____________________________________________________________________________________
>>>>>>> Be a better friend, newshound, and
>>>>>>> know-it-all with Yahoo! Mobile. Try it now.
>>>>>
>>>> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Swift-devel mailing list
>>>>>> Swift-devel at ci.uchicago.edu
>>>>>>
>>>>>
>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Swift-devel mailing list
>>>>> Swift-devel at ci.uchicago.edu
>>>>>
>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ____________________________________________________________________________________
>>>> Never miss a thing. Make Yahoo your home page.
>>>> http://www.yahoo.com/r/hs
>>>>
>>>
>>
>
More information about the Swift-devel
mailing list