[Swift-devel] Support request: Swift jobs flooding uc-teragrid?

Mihael Hategan hategan at mcs.anl.gov
Tue Jan 29 20:42:52 CST 2008


You may want to try to lower throttle.score.job.factor from 4 to 1. That
will cap the number of jobs at ~100 instead of ~400.

Mihael

On Tue, 2008-01-29 at 18:31 -0800, Mike Kubal wrote:
> sorry, long day : )
> 
> 
> --- Mihael Hategan <hategan at mcs.anl.gov> wrote:
> 
> > 
> > On Tue, 2008-01-29 at 20:02 -0600, Michael Wilde
> > wrote:
> > > MikeK, no attachment.
> > > 
> > > Ive narrowed the cc list, and need to read back
> > through the email thread 
> > >   on this to see what Mihael observed.
> > 
> > Let me summarize: too many gt2 gram jobs running
> > concurrently = too many
> > job manager processes = high load on gram node. Not
> > a new issue.
> > 
> > > 
> > > - MikeW
> > > 
> > > On 1/29/08 8:00 PM, Mike Kubal wrote:
> > > > The attachment contains the swift script, tc
> > file,
> > > > sites file and swift.properties file.
> > > > 
> > > > I didn't provide any additional command line
> > > > arguments.
> > > > 
> > > > MikeK
> > > > 
> > > > 
> > > > --- Michael Wilde <wilde at mcs.anl.gov> wrote:
> > > > 
> > > >> [ was Re: Swift jobs on UC/ANL TG ]
> > > >>
> > > >> Hi. Im at OHare and will be flying soon.
> > > >> Ben or Mihael, if you are online, can you
> > > >> investigate?
> > > >>
> > > >> Yes, there are significant throttles turned on
> > by
> > > >> default, and the 
> > > >> system opens those very gradually.
> > > >>
> > > >> MikeK, can you post to the swift-devel list
> > your
> > > >> swift.properties file, 
> > > >> command line options, and your swift source
> > code?
> > > >>
> > > >> Thanks,
> > > >>
> > > >> MikeW
> > > >>
> > > >>
> > > >> On 1/29/08 8:11 AM, Ti Leggett wrote:
> > > >>> The default walltime is 15 minutes. Are you
> > doing
> > > >> fork jobs or pbs jobs? 
> > > >>> You shouldn't be doing fork jobs at all. Mike
> > W, I
> > > >> thought there were 
> > > >>> throttles in place in Swift to prevent this
> > type
> > > >> of overrun? Mike K, 
> > > >>> I'll need you to either stop these types of
> > jobs
> > > >> until Mike W can verify 
> > > >>> throttling or only submit a few 10s of jobs at
> > a
> > > >> time.
> > > >>> On Jan 28, 2008, at 01/28/08 07:13 PM, Mike
> > Kubal
> > > >> wrote:
> > > >>>> Yes, I'm submitting molecular dynamics
> > > >> simulations
> > > >>>> using Swift.
> > > >>>>
> > > >>>> Is there a default wall-time limit for jobs
> > on
> > > >> tg-uc?
> > > >>>>
> > > >>>>
> > > >>>> --- joseph insley <insley at mcs.anl.gov> wrote:
> > > >>>>
> > > >>>>> Actually, these numbers are now
> > escalating...
> > > >>>>>
> > > >>>>> top - 17:18:54 up  2:29,  1 user,  load
> > average:
> > > >>>>> 149.02, 123.63, 91.94
> > > >>>>> Tasks: 469 total,   4 running, 465 sleeping,
> >   0
> > > >>>>> stopped,   0 zombie
> > > >>>>>
> > > >>>>> insley at tg-grid1:~> ps -ef | grep kubal | wc
> > -l
> > > >>>>>     479
> > > >>>>>
> > > >>>>> insley at tg-viz-login1:~> time globusrun -a -r
> > > >>>>> tg-grid.uc.teragrid.org
> > > >>>>> GRAM Authentication test successful
> > > >>>>> real    0m26.134s
> > > >>>>> user    0m0.090s
> > > >>>>> sys     0m0.010s
> > > >>>>>
> > > >>>>>
> > > >>>>> On Jan 28, 2008, at 5:15 PM, joseph insley
> > > >> wrote:
> > > >>>>>> Earlier today tg-grid.uc.teragrid.org (the
> > > >> UC/ANL
> > > >>>>> TG GRAM host)
> > > >>>>>> became unresponsive and had to be rebooted.
> >  I
> > > >> am
> > > >>>>> now seeing slow
> > > >>>>>> response times from the Gatekeeper there
> > again.
> > > >>>>> Authenticating to
> > > >>>>>> the gatekeeper should only take a second or
> > > >> two,
> > > >>>>> but it is
> > > >>>>>> periodically taking up to 16 seconds:
> > > >>>>>>
> > > >>>>>> insley at tg-viz-login1:~> time globusrun -a
> > -r
> > > >>>>> tg-grid.uc.teragrid.org
> > > >>>>>> GRAM Authentication test successful
> > > >>>>>> real    0m16.096s
> > > >>>>>> user    0m0.060s
> > > >>>>>> sys     0m0.020s
> > > >>>>>>
> > > >>>>>> looking at the load on tg-grid, it is
> > rather
> > > >> high:
> > > >>>>>> top - 16:55:26 up  2:06,  1 user,  load
> > > >> average:
> > > >>>>> 89.59, 78.69, 62.92
> > > >>>>>> Tasks: 398 total,  20 running, 378
> > sleeping,  
> > > >> 0
> > > >>>>> stopped,   0 zombie
> > > >>>>>> And there appear to be a large number of
> > > >> processes
> > > >>>>> owned by kubal:
> > > >>>>>> insley at tg-grid1:~> ps -ef | grep kubal | wc
> > -l
> > > >>>>>>    380
> > > >>>>>>
> > > >>>>>> I assume that Mike is using swift to do the
> > job
> > > >>>>> submission.  Is
> > > >>>>>> there some throttling of the rate at which
> > jobs
> > > >>>>> are submitted to
> > > >>>>>> the gatekeeper that could be done that
> > would
> > > >>>>> lighten this load
> > > >>>>>> some?  (Or has that already been done since
> > > >>>>> earlier today?)  The
> > > >>>>>> current response times are not
> > unacceptable,
> > > >> but
> > > >>>>> I'm hoping to
> > > >>>>>> avoid having the machine grind to a halt as
> > it
> > > >> did
> > > >>>>> earlier today.
> > > >>>>>> Thanks,
> > > >>>>>> joe.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>
> > ===================================================
> > > >>>>>> joseph a.
> > > >>>>>> insley
> > > >>>>>> insley at mcs.anl.gov
> > > >>>>>> mathematics & computer science division
> > > >>>>> (630) 252-5649
> > > >>>>>> argonne national laboratory
> > > >>>>>       (630)
> > > >>>>>> 252-5986 (fax)
> > > >>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>
> > ===================================================
> > > >>>>> joseph a. insley
> > > >>>>>
> > > >>>>> insley at mcs.anl.gov
> > > >>>>> mathematics & computer science division     
> > 
> > > >> (630)
> > > >>>>> 252-5649
> > > >>>>> argonne national laboratory
> > > >>>>>     (630)
> > > >>>>> 252-5986 (fax)
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>>>      
> > 
> === message truncated ===
> 
> 
>       ____________________________________________________________________________________
> Looking for last minute shopping deals?  
> Find them fast with Yahoo! Search.  http://tools.search.yahoo.com/newsearch/category.php?category=shopping




More information about the Swift-devel mailing list