[Swift-devel] Re: Swift jobs on UC/ANL TG

Ti Leggett leggett at mcs.anl.gov
Mon Feb 4 07:16:38 CST 2008


Around 80.

On Feb 4, 2008, at 12:14 AM, Mihael Hategan wrote:

>
> On Sun, 2008-02-03 at 22:11 -0800, Mike Kubal wrote:
>> Sorry for killing the server. I'm pushing to get
>> results to guide the selection of compounds for
>> wet-lab testing.
>>
>> I had set the throttle.score.job.factor to 1 in the
>> swift.properties file.
>
> Hmm. Ti, at the time of the massacre, how many did you kill?
>
> Mihael
>
>>
>> I certainly appreciate everyone's efforts and
>> responsiveness.
>>
>> Let me know what to try next, before I kill again.
>>
>> Cheers,
>>
>> Mike
>>
>>
>>
>> --- Mihael Hategan <hategan at mcs.anl.gov> wrote:
>>
>>> So I was trying some stuff on Friday night. I guess
>>> I've found the
>>> strategy on when to run the tests: when nobody else
>>> has jobs there
>>> (besides Buzz doing gridftp tests, Ioan having some
>>> Falkon workers
>>> running, and the occasional Inca tests).
>>>
>>> In any event, the machine jumps to about 100%
>>> utilization at around 130
>>> jobs with pre-ws gram. So Mike, please set
>>> throttle.score.job.factor to
>>> 1 in swift.properties.
>>>
>>> There's still more work I need to do test-wise.
>>>
>>> On Sun, 2008-02-03 at 15:34 -0600, Ti Leggett wrote:
>>>> Mike, You're killing tg-grid1 again. Can someone
>>> work with Mike to get
>>>> some swift settings that don't kill our server?
>>>>
>>>> On Jan 28, 2008, at 7:13 PM, Mike Kubal wrote:
>>>>
>>>>> Yes, I'm submitting molecular dynamics
>>> simulations
>>>>> using Swift.
>>>>>
>>>>> Is there a default wall-time limit for jobs on
>>> tg-uc?
>>>>>
>>>>>
>>>>>
>>>>> --- joseph insley <insley at mcs.anl.gov> wrote:
>>>>>
>>>>>> Actually, these numbers are now escalating...
>>>>>>
>>>>>> top - 17:18:54 up  2:29,  1 user,  load
>>> average:
>>>>>> 149.02, 123.63, 91.94
>>>>>> Tasks: 469 total,   4 running, 465 sleeping,
>>> 0
>>>>>> stopped,   0 zombie
>>>>>>
>>>>>> insley at tg-grid1:~> ps -ef | grep kubal | wc -l
>>>>>>    479
>>>>>>
>>>>>> insley at tg-viz-login1:~> time globusrun -a -r
>>>>>> tg-grid.uc.teragrid.org
>>>>>> GRAM Authentication test successful
>>>>>> real    0m26.134s
>>>>>> user    0m0.090s
>>>>>> sys     0m0.010s
>>>>>>
>>>>>>
>>>>>> On Jan 28, 2008, at 5:15 PM, joseph insley
>>> wrote:
>>>>>>
>>>>>>> Earlier today tg-grid.uc.teragrid.org (the
>>> UC/ANL
>>>>>> TG GRAM host)
>>>>>>> became unresponsive and had to be rebooted.  I
>>> am
>>>>>> now seeing slow
>>>>>>> response times from the Gatekeeper there
>>> again.
>>>>>> Authenticating to
>>>>>>> the gatekeeper should only take a second or
>>> two,
>>>>>> but it is
>>>>>>> periodically taking up to 16 seconds:
>>>>>>>
>>>>>>> insley at tg-viz-login1:~> time globusrun -a -r
>>>>>> tg-grid.uc.teragrid.org
>>>>>>> GRAM Authentication test successful
>>>>>>> real    0m16.096s
>>>>>>> user    0m0.060s
>>>>>>> sys     0m0.020s
>>>>>>>
>>>>>>> looking at the load on tg-grid, it is rather
>>> high:
>>>>>>>
>>>>>>> top - 16:55:26 up  2:06,  1 user,  load
>>> average:
>>>>>> 89.59, 78.69, 62.92
>>>>>>> Tasks: 398 total,  20 running, 378 sleeping,
>>> 0
>>>>>> stopped,   0 zombie
>>>>>>>
>>>>>>> And there appear to be a large number of
>>> processes
>>>>>> owned by kubal:
>>>>>>> insley at tg-grid1:~> ps -ef | grep kubal | wc -l
>>>>>>>   380
>>>>>>>
>>>>>>> I assume that Mike is using swift to do the
>>> job
>>>>>> submission.  Is
>>>>>>> there some throttling of the rate at which
>>> jobs
>>>>>> are submitted to
>>>>>>> the gatekeeper that could be done that would
>>>>>> lighten this load
>>>>>>> some?  (Or has that already been done since
>>>>>> earlier today?)  The
>>>>>>> current response times are not unacceptable,
>>> but
>>>>>> I'm hoping to
>>>>>>> avoid having the machine grind to a halt as it
>>> did
>>>>>> earlier today.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> joe.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>> ===================================================
>>>>>>> joseph a.
>>>>>>> insley
>>>>>>
>>>>>>> insley at mcs.anl.gov
>>>>>>> mathematics & computer science division
>>>>>> (630) 252-5649
>>>>>>> argonne national laboratory
>>>>>>      (630)
>>>>>>> 252-5986 (fax)
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>> ===================================================
>>>>>> joseph a. insley
>>>>>>
>>>>>> insley at mcs.anl.gov
>>>>>> mathematics & computer science division
>>> (630)
>>>>>> 252-5649
>>>>>> argonne national laboratory
>>>>>>    (630)
>>>>>> 252-5986 (fax)
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>> ____________________________________________________________________________________
>>>>> Be a better friend, newshound, and
>>>>> know-it-all with Yahoo! Mobile.  Try it now.
>>>
>> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>>>>>
>>>>
>>>> _______________________________________________
>>>> Swift-devel mailing list
>>>> Swift-devel at ci.uchicago.edu
>>>>
>>>
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>
>>>
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>>
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>
>>>
>>
>>
>>
>>       
>> ____________________________________________________________________________________
>> Never miss a thing.  Make Yahoo your home page.
>> http://www.yahoo.com/r/hs
>>
>




More information about the Swift-devel mailing list