[Swift-devel] Re: Swift jobs on UC/ANL TG
Mihael Hategan
hategan at mcs.anl.gov
Sun Feb 3 21:53:51 CST 2008
If you want to prioritize things differently, then please do so from the
beginning instead of pointing out the priorities were wrong after a
while. So please stop doing this. It is frustrating and it is not what I
signed up for.
Mihael
On Sun, 2008-02-03 at 21:23 -0600, Ian Foster wrote:
> Mihael:
>
> The motivation for doing the tests is so that we can provide
> appropriate advice to Mike, our super-high-priority Swift user who we
> want to help as much and as quickly as possible. I'm concerned that we
> don't seem to feel any sense of urgency in doing this. I'd like to
> emphasize that the sole reason for anyone funding work on Swift is
> because they believe us when we say that Swift can help people make
> more effective use of high-performance computing systems (parallel and
> grid). Mike K. is our most engaged and committed user, and if he is
> successful, will bring us fame and fortune (and fun, I think, to
> provide three Fs!). It shouldn't take a week for us to get back to him
> with information on how to run his application efficiently on TG.
>
> Ian.
>
> Mihael Hategan wrote:
> > On Sun, 2008-02-03 at 21:12 -0600, Ian Foster wrote:
> >
> > > Mihael:
> > >
> > > Is there any chance you can try GRAM4, as was requested early last
> > > week?
> > >
> >
> > For the tests, sure. That's a big part of why I'm doing them.
> >
> > If we're talking about the workflow that seems to be repeatedly killing
> > tg-grid1, then Mike Kubal would be the right person to ask.
> >
> >
> > > Ian.
> > >
> > > Mihael Hategan wrote:
> > >
> > > > So I was trying some stuff on Friday night. I guess I've found the
> > > > strategy on when to run the tests: when nobody else has jobs there
> > > > (besides Buzz doing gridftp tests, Ioan having some Falkon workers
> > > > running, and the occasional Inca tests).
> > > >
> > > > In any event, the machine jumps to about 100% utilization at around 130
> > > > jobs with pre-ws gram. So Mike, please set throttle.score.job.factor to
> > > > 1 in swift.properties.
> > > >
> > > > There's still more work I need to do test-wise.
> > > >
> > > > On Sun, 2008-02-03 at 15:34 -0600, Ti Leggett wrote:
> > > >
> > > >
> > > > > Mike, You're killing tg-grid1 again. Can someone work with Mike to get
> > > > > some swift settings that don't kill our server?
> > > > >
> > > > > On Jan 28, 2008, at 7:13 PM, Mike Kubal wrote:
> > > > >
> > > > >
> > > > >
> > > > > > Yes, I'm submitting molecular dynamics simulations
> > > > > > using Swift.
> > > > > >
> > > > > > Is there a default wall-time limit for jobs on tg-uc?
> > > > > >
> > > > > >
> > > > > >
> > > > > > --- joseph insley <insley at mcs.anl.gov> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > > Actually, these numbers are now escalating...
> > > > > > >
> > > > > > > top - 17:18:54 up 2:29, 1 user, load average:
> > > > > > > 149.02, 123.63, 91.94
> > > > > > > Tasks: 469 total, 4 running, 465 sleeping, 0
> > > > > > > stopped, 0 zombie
> > > > > > >
> > > > > > > insley at tg-grid1:~> ps -ef | grep kubal | wc -l
> > > > > > > 479
> > > > > > >
> > > > > > > insley at tg-viz-login1:~> time globusrun -a -r
> > > > > > > tg-grid.uc.teragrid.org
> > > > > > > GRAM Authentication test successful
> > > > > > > real 0m26.134s
> > > > > > > user 0m0.090s
> > > > > > > sys 0m0.010s
> > > > > > >
> > > > > > >
> > > > > > > On Jan 28, 2008, at 5:15 PM, joseph insley wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > Earlier today tg-grid.uc.teragrid.org (the UC/ANL
> > > > > > > >
> > > > > > > >
> > > > > > > TG GRAM host)
> > > > > > >
> > > > > > >
> > > > > > > > became unresponsive and had to be rebooted. I am
> > > > > > > >
> > > > > > > >
> > > > > > > now seeing slow
> > > > > > >
> > > > > > >
> > > > > > > > response times from the Gatekeeper there again.
> > > > > > > >
> > > > > > > >
> > > > > > > Authenticating to
> > > > > > >
> > > > > > >
> > > > > > > > the gatekeeper should only take a second or two,
> > > > > > > >
> > > > > > > >
> > > > > > > but it is
> > > > > > >
> > > > > > >
> > > > > > > > periodically taking up to 16 seconds:
> > > > > > > >
> > > > > > > > insley at tg-viz-login1:~> time globusrun -a -r
> > > > > > > >
> > > > > > > >
> > > > > > > tg-grid.uc.teragrid.org
> > > > > > >
> > > > > > >
> > > > > > > > GRAM Authentication test successful
> > > > > > > > real 0m16.096s
> > > > > > > > user 0m0.060s
> > > > > > > > sys 0m0.020s
> > > > > > > >
> > > > > > > > looking at the load on tg-grid, it is rather high:
> > > > > > > >
> > > > > > > > top - 16:55:26 up 2:06, 1 user, load average:
> > > > > > > >
> > > > > > > >
> > > > > > > 89.59, 78.69, 62.92
> > > > > > >
> > > > > > >
> > > > > > > > Tasks: 398 total, 20 running, 378 sleeping, 0
> > > > > > > >
> > > > > > > >
> > > > > > > stopped, 0 zombie
> > > > > > >
> > > > > > >
> > > > > > > > And there appear to be a large number of processes
> > > > > > > >
> > > > > > > >
> > > > > > > owned by kubal:
> > > > > > >
> > > > > > >
> > > > > > > > insley at tg-grid1:~> ps -ef | grep kubal | wc -l
> > > > > > > > 380
> > > > > > > >
> > > > > > > > I assume that Mike is using swift to do the job
> > > > > > > >
> > > > > > > >
> > > > > > > submission. Is
> > > > > > >
> > > > > > >
> > > > > > > > there some throttling of the rate at which jobs
> > > > > > > >
> > > > > > > >
> > > > > > > are submitted to
> > > > > > >
> > > > > > >
> > > > > > > > the gatekeeper that could be done that would
> > > > > > > >
> > > > > > > >
> > > > > > > lighten this load
> > > > > > >
> > > > > > >
> > > > > > > > some? (Or has that already been done since
> > > > > > > >
> > > > > > > >
> > > > > > > earlier today?) The
> > > > > > >
> > > > > > >
> > > > > > > > current response times are not unacceptable, but
> > > > > > > >
> > > > > > > >
> > > > > > > I'm hoping to
> > > > > > >
> > > > > > >
> > > > > > > > avoid having the machine grind to a halt as it did
> > > > > > > >
> > > > > > > >
> > > > > > > earlier today.
> > > > > > >
> > > > > > >
> > > > > > > > Thanks,
> > > > > > > > joe.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > ===================================================
> > > > > > >
> > > > > > >
> > > > > > > > joseph a.
> > > > > > > > insley
> > > > > > > >
> > > > > > > > insley at mcs.anl.gov
> > > > > > > > mathematics & computer science division
> > > > > > > >
> > > > > > > >
> > > > > > > (630) 252-5649
> > > > > > >
> > > > > > >
> > > > > > > > argonne national laboratory
> > > > > > > >
> > > > > > > >
> > > > > > > (630)
> > > > > > >
> > > > > > >
> > > > > > > > 252-5986 (fax)
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > ===================================================
> > > > > > > joseph a. insley
> > > > > > >
> > > > > > > insley at mcs.anl.gov
> > > > > > > mathematics & computer science division (630)
> > > > > > > 252-5649
> > > > > > > argonne national laboratory
> > > > > > > (630)
> > > > > > > 252-5986 (fax)
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > ____________________________________________________________________________________
> > > > > > Be a better friend, newshound, and
> > > > > > know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> > > > > >
> > > > > >
> > > > > >
> > > > > _______________________________________________
> > > > > Swift-devel mailing list
> > > > > Swift-devel at ci.uchicago.edu
> > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > > >
> > > > >
> > > > >
> > > > _______________________________________________
> > > > Swift-devel mailing list
> > > > Swift-devel at ci.uchicago.edu
> > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > >
> > > >
> > > >
> >
> >
More information about the Swift-devel
mailing list