[Swift-devel] Re: Swift jobs on UC/ANL TG

Mihael Hategan hategan at mcs.anl.gov
Mon Feb 4 10:18:36 CST 2008


On Mon, 2008-02-04 at 09:58 -0600, Ti Leggett wrote:
> That inca tests were timing out after 5 minutes and the load on the  
> machine was ~27. How are you concluding when things aren't acceptable?

It's got 2 cpus. So to me an average load of under 100 and the SSH
session being responsive looks fine.

The fact that inca tests are timing out may be because inca has too low
of a tolerance for things.

> 
> On Feb 4, 2008, at 9:30 AM, Mihael Hategan wrote:
> 
> > That's odd. Clearly if that's not acceptable from your perspective,  
> > yet
> > I thought 130 are fine, there's a disconnect between what you think is
> > acceptable and what I think is acceptable.
> >
> > What was that prompted you to conclude things are bad?
> >
> > On Mon, 2008-02-04 at 07:16 -0600, Ti Leggett wrote:
> >> Around 80.
> >>
> >> On Feb 4, 2008, at 12:14 AM, Mihael Hategan wrote:
> >>
> >>>
> >>> On Sun, 2008-02-03 at 22:11 -0800, Mike Kubal wrote:
> >>>> Sorry for killing the server. I'm pushing to get
> >>>> results to guide the selection of compounds for
> >>>> wet-lab testing.
> >>>>
> >>>> I had set the throttle.score.job.factor to 1 in the
> >>>> swift.properties file.
> >>>
> >>> Hmm. Ti, at the time of the massacre, how many did you kill?
> >>>
> >>> Mihael
> >>>
> >>>>
> >>>> I certainly appreciate everyone's efforts and
> >>>> responsiveness.
> >>>>
> >>>> Let me know what to try next, before I kill again.
> >>>>
> >>>> Cheers,
> >>>>
> >>>> Mike
> >>>>
> >>>>
> >>>>
> >>>> --- Mihael Hategan <hategan at mcs.anl.gov> wrote:
> >>>>
> >>>>> So I was trying some stuff on Friday night. I guess
> >>>>> I've found the
> >>>>> strategy on when to run the tests: when nobody else
> >>>>> has jobs there
> >>>>> (besides Buzz doing gridftp tests, Ioan having some
> >>>>> Falkon workers
> >>>>> running, and the occasional Inca tests).
> >>>>>
> >>>>> In any event, the machine jumps to about 100%
> >>>>> utilization at around 130
> >>>>> jobs with pre-ws gram. So Mike, please set
> >>>>> throttle.score.job.factor to
> >>>>> 1 in swift.properties.
> >>>>>
> >>>>> There's still more work I need to do test-wise.
> >>>>>
> >>>>> On Sun, 2008-02-03 at 15:34 -0600, Ti Leggett wrote:
> >>>>>> Mike, You're killing tg-grid1 again. Can someone
> >>>>> work with Mike to get
> >>>>>> some swift settings that don't kill our server?
> >>>>>>
> >>>>>> On Jan 28, 2008, at 7:13 PM, Mike Kubal wrote:
> >>>>>>
> >>>>>>> Yes, I'm submitting molecular dynamics
> >>>>> simulations
> >>>>>>> using Swift.
> >>>>>>>
> >>>>>>> Is there a default wall-time limit for jobs on
> >>>>> tg-uc?
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --- joseph insley <insley at mcs.anl.gov> wrote:
> >>>>>>>
> >>>>>>>> Actually, these numbers are now escalating...
> >>>>>>>>
> >>>>>>>> top - 17:18:54 up  2:29,  1 user,  load
> >>>>> average:
> >>>>>>>> 149.02, 123.63, 91.94
> >>>>>>>> Tasks: 469 total,   4 running, 465 sleeping,
> >>>>> 0
> >>>>>>>> stopped,   0 zombie
> >>>>>>>>
> >>>>>>>> insley at tg-grid1:~> ps -ef | grep kubal | wc -l
> >>>>>>>>   479
> >>>>>>>>
> >>>>>>>> insley at tg-viz-login1:~> time globusrun -a -r
> >>>>>>>> tg-grid.uc.teragrid.org
> >>>>>>>> GRAM Authentication test successful
> >>>>>>>> real    0m26.134s
> >>>>>>>> user    0m0.090s
> >>>>>>>> sys     0m0.010s
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Jan 28, 2008, at 5:15 PM, joseph insley
> >>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Earlier today tg-grid.uc.teragrid.org (the
> >>>>> UC/ANL
> >>>>>>>> TG GRAM host)
> >>>>>>>>> became unresponsive and had to be rebooted.  I
> >>>>> am
> >>>>>>>> now seeing slow
> >>>>>>>>> response times from the Gatekeeper there
> >>>>> again.
> >>>>>>>> Authenticating to
> >>>>>>>>> the gatekeeper should only take a second or
> >>>>> two,
> >>>>>>>> but it is
> >>>>>>>>> periodically taking up to 16 seconds:
> >>>>>>>>>
> >>>>>>>>> insley at tg-viz-login1:~> time globusrun -a -r
> >>>>>>>> tg-grid.uc.teragrid.org
> >>>>>>>>> GRAM Authentication test successful
> >>>>>>>>> real    0m16.096s
> >>>>>>>>> user    0m0.060s
> >>>>>>>>> sys     0m0.020s
> >>>>>>>>>
> >>>>>>>>> looking at the load on tg-grid, it is rather
> >>>>> high:
> >>>>>>>>>
> >>>>>>>>> top - 16:55:26 up  2:06,  1 user,  load
> >>>>> average:
> >>>>>>>> 89.59, 78.69, 62.92
> >>>>>>>>> Tasks: 398 total,  20 running, 378 sleeping,
> >>>>> 0
> >>>>>>>> stopped,   0 zombie
> >>>>>>>>>
> >>>>>>>>> And there appear to be a large number of
> >>>>> processes
> >>>>>>>> owned by kubal:
> >>>>>>>>> insley at tg-grid1:~> ps -ef | grep kubal | wc -l
> >>>>>>>>>  380
> >>>>>>>>>
> >>>>>>>>> I assume that Mike is using swift to do the
> >>>>> job
> >>>>>>>> submission.  Is
> >>>>>>>>> there some throttling of the rate at which
> >>>>> jobs
> >>>>>>>> are submitted to
> >>>>>>>>> the gatekeeper that could be done that would
> >>>>>>>> lighten this load
> >>>>>>>>> some?  (Or has that already been done since
> >>>>>>>> earlier today?)  The
> >>>>>>>>> current response times are not unacceptable,
> >>>>> but
> >>>>>>>> I'm hoping to
> >>>>>>>>> avoid having the machine grind to a halt as it
> >>>>> did
> >>>>>>>> earlier today.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> joe.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>> ===================================================
> >>>>>>>>> joseph a.
> >>>>>>>>> insley
> >>>>>>>>
> >>>>>>>>> insley at mcs.anl.gov
> >>>>>>>>> mathematics & computer science division
> >>>>>>>> (630) 252-5649
> >>>>>>>>> argonne national laboratory
> >>>>>>>>     (630)
> >>>>>>>>> 252-5986 (fax)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>> ===================================================
> >>>>>>>> joseph a. insley
> >>>>>>>>
> >>>>>>>> insley at mcs.anl.gov
> >>>>>>>> mathematics & computer science division
> >>>>> (630)
> >>>>>>>> 252-5649
> >>>>>>>> argonne national laboratory
> >>>>>>>>   (630)
> >>>>>>>> 252-5986 (fax)
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>> ____________________________________________________________________________________
> >>>>>>> Be a better friend, newshound, and
> >>>>>>> know-it-all with Yahoo! Mobile.  Try it now.
> >>>>>
> >>>> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> >>>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Swift-devel mailing list
> >>>>>> Swift-devel at ci.uchicago.edu
> >>>>>>
> >>>>>
> >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Swift-devel mailing list
> >>>>> Swift-devel at ci.uchicago.edu
> >>>>>
> >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> ____________________________________________________________________________________
> >>>> Never miss a thing.  Make Yahoo your home page.
> >>>> http://www.yahoo.com/r/hs
> >>>>
> >>>
> >>
> >
> 




More information about the Swift-devel mailing list