[Swift-devel] Re: Swift jobs on UC/ANL TG

Mihael Hategan hategan at mcs.anl.gov
Mon Feb 4 09:30:54 CST 2008


That's odd. Clearly if that's not acceptable from your perspective, yet
I thought 130 are fine, there's a disconnect between what you think is
acceptable and what I think is acceptable.

What was that prompted you to conclude things are bad?

On Mon, 2008-02-04 at 07:16 -0600, Ti Leggett wrote:
> Around 80.
> 
> On Feb 4, 2008, at 12:14 AM, Mihael Hategan wrote:
> 
> >
> > On Sun, 2008-02-03 at 22:11 -0800, Mike Kubal wrote:
> >> Sorry for killing the server. I'm pushing to get
> >> results to guide the selection of compounds for
> >> wet-lab testing.
> >>
> >> I had set the throttle.score.job.factor to 1 in the
> >> swift.properties file.
> >
> > Hmm. Ti, at the time of the massacre, how many did you kill?
> >
> > Mihael
> >
> >>
> >> I certainly appreciate everyone's efforts and
> >> responsiveness.
> >>
> >> Let me know what to try next, before I kill again.
> >>
> >> Cheers,
> >>
> >> Mike
> >>
> >>
> >>
> >> --- Mihael Hategan <hategan at mcs.anl.gov> wrote:
> >>
> >>> So I was trying some stuff on Friday night. I guess
> >>> I've found the
> >>> strategy on when to run the tests: when nobody else
> >>> has jobs there
> >>> (besides Buzz doing gridftp tests, Ioan having some
> >>> Falkon workers
> >>> running, and the occasional Inca tests).
> >>>
> >>> In any event, the machine jumps to about 100%
> >>> utilization at around 130
> >>> jobs with pre-ws gram. So Mike, please set
> >>> throttle.score.job.factor to
> >>> 1 in swift.properties.
> >>>
> >>> There's still more work I need to do test-wise.
> >>>
> >>> On Sun, 2008-02-03 at 15:34 -0600, Ti Leggett wrote:
> >>>> Mike, You're killing tg-grid1 again. Can someone
> >>> work with Mike to get
> >>>> some swift settings that don't kill our server?
> >>>>
> >>>> On Jan 28, 2008, at 7:13 PM, Mike Kubal wrote:
> >>>>
> >>>>> Yes, I'm submitting molecular dynamics
> >>> simulations
> >>>>> using Swift.
> >>>>>
> >>>>> Is there a default wall-time limit for jobs on
> >>> tg-uc?
> >>>>>
> >>>>>
> >>>>>
> >>>>> --- joseph insley <insley at mcs.anl.gov> wrote:
> >>>>>
> >>>>>> Actually, these numbers are now escalating...
> >>>>>>
> >>>>>> top - 17:18:54 up  2:29,  1 user,  load
> >>> average:
> >>>>>> 149.02, 123.63, 91.94
> >>>>>> Tasks: 469 total,   4 running, 465 sleeping,
> >>> 0
> >>>>>> stopped,   0 zombie
> >>>>>>
> >>>>>> insley at tg-grid1:~> ps -ef | grep kubal | wc -l
> >>>>>>    479
> >>>>>>
> >>>>>> insley at tg-viz-login1:~> time globusrun -a -r
> >>>>>> tg-grid.uc.teragrid.org
> >>>>>> GRAM Authentication test successful
> >>>>>> real    0m26.134s
> >>>>>> user    0m0.090s
> >>>>>> sys     0m0.010s
> >>>>>>
> >>>>>>
> >>>>>> On Jan 28, 2008, at 5:15 PM, joseph insley
> >>> wrote:
> >>>>>>
> >>>>>>> Earlier today tg-grid.uc.teragrid.org (the
> >>> UC/ANL
> >>>>>> TG GRAM host)
> >>>>>>> became unresponsive and had to be rebooted.  I
> >>> am
> >>>>>> now seeing slow
> >>>>>>> response times from the Gatekeeper there
> >>> again.
> >>>>>> Authenticating to
> >>>>>>> the gatekeeper should only take a second or
> >>> two,
> >>>>>> but it is
> >>>>>>> periodically taking up to 16 seconds:
> >>>>>>>
> >>>>>>> insley at tg-viz-login1:~> time globusrun -a -r
> >>>>>> tg-grid.uc.teragrid.org
> >>>>>>> GRAM Authentication test successful
> >>>>>>> real    0m16.096s
> >>>>>>> user    0m0.060s
> >>>>>>> sys     0m0.020s
> >>>>>>>
> >>>>>>> looking at the load on tg-grid, it is rather
> >>> high:
> >>>>>>>
> >>>>>>> top - 16:55:26 up  2:06,  1 user,  load
> >>> average:
> >>>>>> 89.59, 78.69, 62.92
> >>>>>>> Tasks: 398 total,  20 running, 378 sleeping,
> >>> 0
> >>>>>> stopped,   0 zombie
> >>>>>>>
> >>>>>>> And there appear to be a large number of
> >>> processes
> >>>>>> owned by kubal:
> >>>>>>> insley at tg-grid1:~> ps -ef | grep kubal | wc -l
> >>>>>>>   380
> >>>>>>>
> >>>>>>> I assume that Mike is using swift to do the
> >>> job
> >>>>>> submission.  Is
> >>>>>>> there some throttling of the rate at which
> >>> jobs
> >>>>>> are submitted to
> >>>>>>> the gatekeeper that could be done that would
> >>>>>> lighten this load
> >>>>>>> some?  (Or has that already been done since
> >>>>>> earlier today?)  The
> >>>>>>> current response times are not unacceptable,
> >>> but
> >>>>>> I'm hoping to
> >>>>>>> avoid having the machine grind to a halt as it
> >>> did
> >>>>>> earlier today.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> joe.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>> ===================================================
> >>>>>>> joseph a.
> >>>>>>> insley
> >>>>>>
> >>>>>>> insley at mcs.anl.gov
> >>>>>>> mathematics & computer science division
> >>>>>> (630) 252-5649
> >>>>>>> argonne national laboratory
> >>>>>>      (630)
> >>>>>>> 252-5986 (fax)
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>> ===================================================
> >>>>>> joseph a. insley
> >>>>>>
> >>>>>> insley at mcs.anl.gov
> >>>>>> mathematics & computer science division
> >>> (630)
> >>>>>> 252-5649
> >>>>>> argonne national laboratory
> >>>>>>    (630)
> >>>>>> 252-5986 (fax)
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>
> >> ____________________________________________________________________________________
> >>>>> Be a better friend, newshound, and
> >>>>> know-it-all with Yahoo! Mobile.  Try it now.
> >>>
> >> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> >>>>>
> >>>>
> >>>> _______________________________________________
> >>>> Swift-devel mailing list
> >>>> Swift-devel at ci.uchicago.edu
> >>>>
> >>>
> >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>>>
> >>>
> >>> _______________________________________________
> >>> Swift-devel mailing list
> >>> Swift-devel at ci.uchicago.edu
> >>>
> >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>>
> >>>
> >>
> >>
> >>
> >>       
> >> ____________________________________________________________________________________
> >> Never miss a thing.  Make Yahoo your home page.
> >> http://www.yahoo.com/r/hs
> >>
> >
> 




More information about the Swift-devel mailing list