[Swift-devel] Support request: Swift jobs flooding uc-teragrid?

Mihael Hategan hategan at mcs.anl.gov
Tue Jan 29 13:42:45 CST 2008


On Tue, 2008-01-29 at 13:39 -0600, Mihael Hategan wrote:
> On Tue, 2008-01-29 at 13:31 -0600, Mihael Hategan wrote:
> > Ah, I was seeing the 16 second submission on Teraport (a cluster at UC),
> > right after an upgrade of sorts. I can ask more about this upgrade...
> 
> So teraport uses VDT. Which makes it odd. Whatever change triggers this
> is both in VDT and that SDCTTTRWSC thing teragrid uses.

Hmm. So it turns out VDT came with the concept of "managed-fork" which
seems to go through a condor queue. I hope this doesn't apply to TG?

> 
> > 
> > On Tue, 2008-01-29 at 13:15 -0600, Ian Foster wrote:
> > > Hi,
> > > 
> > > I've CCed Stuart Martin--I'd greatly appreciate some insights into what 
> > > is causing this. I assume that you are using GRAM4 (aka WS-GRAM)?
> > > 
> > > Ian.
> > > 
> > > Michael Wilde wrote:
> > > > [ was Re: Swift jobs on UC/ANL TG ]
> > > >
> > > > Hi. Im at OHare and will be flying soon.
> > > > Ben or Mihael, if you are online, can you investigate?
> > > >
> > > > Yes, there are significant throttles turned on by default, and the 
> > > > system opens those very gradually.
> > > >
> > > > MikeK, can you post to the swift-devel list your swift.properties 
> > > > file, command line options, and your swift source code?
> > > >
> > > > Thanks,
> > > >
> > > > MikeW
> > > >
> > > >
> > > > On 1/29/08 8:11 AM, Ti Leggett wrote:
> > > >> The default walltime is 15 minutes. Are you doing fork jobs or pbs 
> > > >> jobs? You shouldn't be doing fork jobs at all. Mike W, I thought 
> > > >> there were throttles in place in Swift to prevent this type of 
> > > >> overrun? Mike K, I'll need you to either stop these types of jobs 
> > > >> until Mike W can verify throttling or only submit a few 10s of jobs 
> > > >> at a time.
> > > >>
> > > >> On Jan 28, 2008, at 01/28/08 07:13 PM, Mike Kubal wrote:
> > > >>
> > > >>> Yes, I'm submitting molecular dynamics simulations
> > > >>> using Swift.
> > > >>>
> > > >>> Is there a default wall-time limit for jobs on tg-uc?
> > > >>>
> > > >>>
> > > >>>
> > > >>> --- joseph insley <insley at mcs.anl.gov> wrote:
> > > >>>
> > > >>>> Actually, these numbers are now escalating...
> > > >>>>
> > > >>>> top - 17:18:54 up  2:29,  1 user,  load average:
> > > >>>> 149.02, 123.63, 91.94
> > > >>>> Tasks: 469 total,   4 running, 465 sleeping,   0
> > > >>>> stopped,   0 zombie
> > > >>>>
> > > >>>> insley at tg-grid1:~> ps -ef | grep kubal | wc -l
> > > >>>>     479
> > > >>>>
> > > >>>> insley at tg-viz-login1:~> time globusrun -a -r
> > > >>>> tg-grid.uc.teragrid.org
> > > >>>> GRAM Authentication test successful
> > > >>>> real    0m26.134s
> > > >>>> user    0m0.090s
> > > >>>> sys     0m0.010s
> > > >>>>
> > > >>>>
> > > >>>> On Jan 28, 2008, at 5:15 PM, joseph insley wrote:
> > > >>>>
> > > >>>>> Earlier today tg-grid.uc.teragrid.org (the UC/ANL
> > > >>>> TG GRAM host)
> > > >>>>> became unresponsive and had to be rebooted.  I am
> > > >>>> now seeing slow
> > > >>>>> response times from the Gatekeeper there again.
> > > >>>> Authenticating to
> > > >>>>> the gatekeeper should only take a second or two,
> > > >>>> but it is
> > > >>>>> periodically taking up to 16 seconds:
> > > >>>>>
> > > >>>>> insley at tg-viz-login1:~> time globusrun -a -r
> > > >>>> tg-grid.uc.teragrid.org
> > > >>>>> GRAM Authentication test successful
> > > >>>>> real    0m16.096s
> > > >>>>> user    0m0.060s
> > > >>>>> sys     0m0.020s
> > > >>>>>
> > > >>>>> looking at the load on tg-grid, it is rather high:
> > > >>>>>
> > > >>>>> top - 16:55:26 up  2:06,  1 user,  load average:
> > > >>>> 89.59, 78.69, 62.92
> > > >>>>> Tasks: 398 total,  20 running, 378 sleeping,   0
> > > >>>> stopped,   0 zombie
> > > >>>>>
> > > >>>>> And there appear to be a large number of processes
> > > >>>> owned by kubal:
> > > >>>>> insley at tg-grid1:~> ps -ef | grep kubal | wc -l
> > > >>>>>    380
> > > >>>>>
> > > >>>>> I assume that Mike is using swift to do the job
> > > >>>> submission.  Is
> > > >>>>> there some throttling of the rate at which jobs
> > > >>>> are submitted to
> > > >>>>> the gatekeeper that could be done that would
> > > >>>> lighten this load
> > > >>>>> some?  (Or has that already been done since
> > > >>>> earlier today?)  The
> > > >>>>> current response times are not unacceptable, but
> > > >>>> I'm hoping to
> > > >>>>> avoid having the machine grind to a halt as it did
> > > >>>> earlier today.
> > > >>>>>
> > > >>>>> Thanks,
> > > >>>>> joe.
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>> ===================================================
> > > >>>>> joseph a.
> > > >>>>> insley
> > > >>>>
> > > >>>>> insley at mcs.anl.gov
> > > >>>>> mathematics & computer science division
> > > >>>> (630) 252-5649
> > > >>>>> argonne national laboratory
> > > >>>>       (630)
> > > >>>>> 252-5986 (fax)
> > > >>>>>
> > > >>>>>
> > > >>>>
> > > >>>> ===================================================
> > > >>>> joseph a. insley
> > > >>>>
> > > >>>> insley at mcs.anl.gov
> > > >>>> mathematics & computer science division       (630)
> > > >>>> 252-5649
> > > >>>> argonne national laboratory
> > > >>>>     (630)
> > > >>>> 252-5986 (fax)
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>      
> > > >>> ____________________________________________________________________________________ 
> > > >>>
> > > >>> Be a better friend, newshound, and
> > > >>> know-it-all with Yahoo! Mobile.  Try it now.  
> > > >>> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> > > >>>
> > > >>
> > > >>
> > > > _______________________________________________
> > > > Swift-devel mailing list
> > > > Swift-devel at ci.uchicago.edu
> > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > >
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > 
> > 
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > 




More information about the Swift-devel mailing list