[Swift-devel] Re: Swift jobs on UC/ANL TG
Mihael Hategan
hategan at mcs.anl.gov
Mon Feb 4 00:14:09 CST 2008
On Sun, 2008-02-03 at 22:11 -0800, Mike Kubal wrote:
> Sorry for killing the server. I'm pushing to get
> results to guide the selection of compounds for
> wet-lab testing.
>
> I had set the throttle.score.job.factor to 1 in the
> swift.properties file.
Hmm. Ti, at the time of the massacre, how many did you kill?
Mihael
>
> I certainly appreciate everyone's efforts and
> responsiveness.
>
> Let me know what to try next, before I kill again.
>
> Cheers,
>
> Mike
>
>
>
> --- Mihael Hategan <hategan at mcs.anl.gov> wrote:
>
> > So I was trying some stuff on Friday night. I guess
> > I've found the
> > strategy on when to run the tests: when nobody else
> > has jobs there
> > (besides Buzz doing gridftp tests, Ioan having some
> > Falkon workers
> > running, and the occasional Inca tests).
> >
> > In any event, the machine jumps to about 100%
> > utilization at around 130
> > jobs with pre-ws gram. So Mike, please set
> > throttle.score.job.factor to
> > 1 in swift.properties.
> >
> > There's still more work I need to do test-wise.
> >
> > On Sun, 2008-02-03 at 15:34 -0600, Ti Leggett wrote:
> > > Mike, You're killing tg-grid1 again. Can someone
> > work with Mike to get
> > > some swift settings that don't kill our server?
> > >
> > > On Jan 28, 2008, at 7:13 PM, Mike Kubal wrote:
> > >
> > > > Yes, I'm submitting molecular dynamics
> > simulations
> > > > using Swift.
> > > >
> > > > Is there a default wall-time limit for jobs on
> > tg-uc?
> > > >
> > > >
> > > >
> > > > --- joseph insley <insley at mcs.anl.gov> wrote:
> > > >
> > > >> Actually, these numbers are now escalating...
> > > >>
> > > >> top - 17:18:54 up 2:29, 1 user, load
> > average:
> > > >> 149.02, 123.63, 91.94
> > > >> Tasks: 469 total, 4 running, 465 sleeping,
> > 0
> > > >> stopped, 0 zombie
> > > >>
> > > >> insley at tg-grid1:~> ps -ef | grep kubal | wc -l
> > > >> 479
> > > >>
> > > >> insley at tg-viz-login1:~> time globusrun -a -r
> > > >> tg-grid.uc.teragrid.org
> > > >> GRAM Authentication test successful
> > > >> real 0m26.134s
> > > >> user 0m0.090s
> > > >> sys 0m0.010s
> > > >>
> > > >>
> > > >> On Jan 28, 2008, at 5:15 PM, joseph insley
> > wrote:
> > > >>
> > > >>> Earlier today tg-grid.uc.teragrid.org (the
> > UC/ANL
> > > >> TG GRAM host)
> > > >>> became unresponsive and had to be rebooted. I
> > am
> > > >> now seeing slow
> > > >>> response times from the Gatekeeper there
> > again.
> > > >> Authenticating to
> > > >>> the gatekeeper should only take a second or
> > two,
> > > >> but it is
> > > >>> periodically taking up to 16 seconds:
> > > >>>
> > > >>> insley at tg-viz-login1:~> time globusrun -a -r
> > > >> tg-grid.uc.teragrid.org
> > > >>> GRAM Authentication test successful
> > > >>> real 0m16.096s
> > > >>> user 0m0.060s
> > > >>> sys 0m0.020s
> > > >>>
> > > >>> looking at the load on tg-grid, it is rather
> > high:
> > > >>>
> > > >>> top - 16:55:26 up 2:06, 1 user, load
> > average:
> > > >> 89.59, 78.69, 62.92
> > > >>> Tasks: 398 total, 20 running, 378 sleeping,
> > 0
> > > >> stopped, 0 zombie
> > > >>>
> > > >>> And there appear to be a large number of
> > processes
> > > >> owned by kubal:
> > > >>> insley at tg-grid1:~> ps -ef | grep kubal | wc -l
> > > >>> 380
> > > >>>
> > > >>> I assume that Mike is using swift to do the
> > job
> > > >> submission. Is
> > > >>> there some throttling of the rate at which
> > jobs
> > > >> are submitted to
> > > >>> the gatekeeper that could be done that would
> > > >> lighten this load
> > > >>> some? (Or has that already been done since
> > > >> earlier today?) The
> > > >>> current response times are not unacceptable,
> > but
> > > >> I'm hoping to
> > > >>> avoid having the machine grind to a halt as it
> > did
> > > >> earlier today.
> > > >>>
> > > >>> Thanks,
> > > >>> joe.
> > > >>>
> > > >>>
> > > >>>
> > > >>
> > ===================================================
> > > >>> joseph a.
> > > >>> insley
> > > >>
> > > >>> insley at mcs.anl.gov
> > > >>> mathematics & computer science division
> > > >> (630) 252-5649
> > > >>> argonne national laboratory
> > > >> (630)
> > > >>> 252-5986 (fax)
> > > >>>
> > > >>>
> > > >>
> > > >>
> > ===================================================
> > > >> joseph a. insley
> > > >>
> > > >> insley at mcs.anl.gov
> > > >> mathematics & computer science division
> > (630)
> > > >> 252-5649
> > > >> argonne national laboratory
> > > >> (630)
> > > >> 252-5986 (fax)
> > > >>
> > > >>
> > > >>
> > > >
> > > >
> > > >
> > > >
> > > >
> >
> ____________________________________________________________________________________
> > > > Be a better friend, newshound, and
> > > > know-it-all with Yahoo! Mobile. Try it now.
> >
> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> > > >
> > >
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > >
> >
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > >
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> >
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >
> >
>
>
>
> ____________________________________________________________________________________
> Never miss a thing. Make Yahoo your home page.
> http://www.yahoo.com/r/hs
>
More information about the Swift-devel
mailing list