[Swift-devel] Re: Swift jobs on UC/ANL TG

Ian Foster foster at mcs.anl.gov
Sun Feb 3 21:12:08 CST 2008


Mihael:

Is there any chance you can try GRAM4, as was requested early last week?

Ian.

Mihael Hategan wrote:
> So I was trying some stuff on Friday night. I guess I've found the
> strategy on when to run the tests: when nobody else has jobs there
> (besides Buzz doing gridftp tests, Ioan having some Falkon workers
> running, and the occasional Inca tests).
>
> In any event, the machine jumps to about 100% utilization at around 130
> jobs with pre-ws gram. So Mike, please set throttle.score.job.factor to
> 1 in swift.properties.
>
> There's still more work I need to do test-wise.
>
> On Sun, 2008-02-03 at 15:34 -0600, Ti Leggett wrote:
>   
>> Mike, You're killing tg-grid1 again. Can someone work with Mike to get  
>> some swift settings that don't kill our server?
>>
>> On Jan 28, 2008, at 7:13 PM, Mike Kubal wrote:
>>
>>     
>>> Yes, I'm submitting molecular dynamics simulations
>>> using Swift.
>>>
>>> Is there a default wall-time limit for jobs on tg-uc?
>>>
>>>
>>>
>>> --- joseph insley <insley at mcs.anl.gov> wrote:
>>>
>>>       
>>>> Actually, these numbers are now escalating...
>>>>
>>>> top - 17:18:54 up  2:29,  1 user,  load average:
>>>> 149.02, 123.63, 91.94
>>>> Tasks: 469 total,   4 running, 465 sleeping,   0
>>>> stopped,   0 zombie
>>>>
>>>> insley at tg-grid1:~> ps -ef | grep kubal | wc -l
>>>>     479
>>>>
>>>> insley at tg-viz-login1:~> time globusrun -a -r
>>>> tg-grid.uc.teragrid.org
>>>> GRAM Authentication test successful
>>>> real    0m26.134s
>>>> user    0m0.090s
>>>> sys     0m0.010s
>>>>
>>>>
>>>> On Jan 28, 2008, at 5:15 PM, joseph insley wrote:
>>>>
>>>>         
>>>>> Earlier today tg-grid.uc.teragrid.org (the UC/ANL
>>>>>           
>>>> TG GRAM host)
>>>>         
>>>>> became unresponsive and had to be rebooted.  I am
>>>>>           
>>>> now seeing slow
>>>>         
>>>>> response times from the Gatekeeper there again.
>>>>>           
>>>> Authenticating to
>>>>         
>>>>> the gatekeeper should only take a second or two,
>>>>>           
>>>> but it is
>>>>         
>>>>> periodically taking up to 16 seconds:
>>>>>
>>>>> insley at tg-viz-login1:~> time globusrun -a -r
>>>>>           
>>>> tg-grid.uc.teragrid.org
>>>>         
>>>>> GRAM Authentication test successful
>>>>> real    0m16.096s
>>>>> user    0m0.060s
>>>>> sys     0m0.020s
>>>>>
>>>>> looking at the load on tg-grid, it is rather high:
>>>>>
>>>>> top - 16:55:26 up  2:06,  1 user,  load average:
>>>>>           
>>>> 89.59, 78.69, 62.92
>>>>         
>>>>> Tasks: 398 total,  20 running, 378 sleeping,   0
>>>>>           
>>>> stopped,   0 zombie
>>>>         
>>>>> And there appear to be a large number of processes
>>>>>           
>>>> owned by kubal:
>>>>         
>>>>> insley at tg-grid1:~> ps -ef | grep kubal | wc -l
>>>>>    380
>>>>>
>>>>> I assume that Mike is using swift to do the job
>>>>>           
>>>> submission.  Is
>>>>         
>>>>> there some throttling of the rate at which jobs
>>>>>           
>>>> are submitted to
>>>>         
>>>>> the gatekeeper that could be done that would
>>>>>           
>>>> lighten this load
>>>>         
>>>>> some?  (Or has that already been done since
>>>>>           
>>>> earlier today?)  The
>>>>         
>>>>> current response times are not unacceptable, but
>>>>>           
>>>> I'm hoping to
>>>>         
>>>>> avoid having the machine grind to a halt as it did
>>>>>           
>>>> earlier today.
>>>>         
>>>>> Thanks,
>>>>> joe.
>>>>>
>>>>>
>>>>>
>>>>>           
>>>> ===================================================
>>>>         
>>>>> joseph a.
>>>>> insley
>>>>>           
>>>>> insley at mcs.anl.gov
>>>>> mathematics & computer science division
>>>>>           
>>>> (630) 252-5649
>>>>         
>>>>> argonne national laboratory
>>>>>           
>>>>       (630)
>>>>         
>>>>> 252-5986 (fax)
>>>>>
>>>>>
>>>>>           
>>>> ===================================================
>>>> joseph a. insley
>>>>
>>>> insley at mcs.anl.gov
>>>> mathematics & computer science division       (630)
>>>> 252-5649
>>>> argonne national laboratory
>>>>     (630)
>>>> 252-5986 (fax)
>>>>
>>>>
>>>>
>>>>         
>>>
>>>       
>>> ____________________________________________________________________________________
>>> Be a better friend, newshound, and
>>> know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>>>
>>>       
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>
>>     
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20080203/533fe40c/attachment.html>


More information about the Swift-devel mailing list