[Swift-devel] Support request: Swift jobs flooding uc-teragrid?

Mihael Hategan hategan at mcs.anl.gov
Tue Jan 29 21:38:15 CST 2008


I'm becoming confused now. Last time I spoke to Yong about WS-GRAM, it
was less scalable and slower (although that varied) than gt2 gram.

So unless I see some numbers, I personally won't believe either of the
statements.

On Tue, 2008-01-29 at 21:25 -0600, Ioan Raicu wrote:
> Yong and I ran most of our tests (from Swift) using WS-GRAM (aka GRAM4) 
> on UC/ANL TG, and I use Falkon on the same cluster using only WS-GRAM.  
> If I am not mistaken, all TG sites support WS-GRAM.
> 
> Ioan
> 
> Michael Wilde wrote:
> > MikeK, this may be obvious but just in case:
> >
> > On 1/29/08 8:47 PM, Mihael Hategan wrote:
> >> That and/or try using ws-gram:
> >> <jobmanager universe="vanilla" url="tg-grid1.uc.teragrid.org" major="4"
> >> minor="0" patch="0"/>
> >
> > (this goes in the sites.xml file)
> >
> > Q for the group: is ws-gram supported on uc.teragrid?
> >
> >>
> >>
> >> On Tue, 2008-01-29 at 20:42 -0600, Mihael Hategan wrote:
> >>> You may want to try to lower throttle.score.job.factor from 4 to 1. 
> >>> That
> >>> will cap the number of jobs at ~100 instead of ~400.
> >>>
> >>> Mihael
> >
> > for info on setting Swift properties, see "Swift Engine Configuration" 
> > in the users guide at:
> >
> > http://www.ci.uchicago.edu/swift/guides/userguide.php#properties
> >
> > - MikeW
> >
> >>>
> >>> On Tue, 2008-01-29 at 18:31 -0800, Mike Kubal wrote:
> >>>> sorry, long day : )
> >>>>
> >>>>
> >>>> --- Mihael Hategan <hategan at mcs.anl.gov> wrote:
> >>>>
> >>>>> On Tue, 2008-01-29 at 20:02 -0600, Michael Wilde
> >>>>> wrote:
> >>>>>> MikeK, no attachment.
> >>>>>>
> >>>>>> Ive narrowed the cc list, and need to read back
> >>>>> through the email thread
> >>>>>>   on this to see what Mihael observed.
> >>>>> Let me summarize: too many gt2 gram jobs running
> >>>>> concurrently = too many
> >>>>> job manager processes = high load on gram node. Not
> >>>>> a new issue.
> >>>>>
> >>>>>> - MikeW
> >>>>>>
> >>>>>> On 1/29/08 8:00 PM, Mike Kubal wrote:
> >>>>>>> The attachment contains the swift script, tc
> >>>>> file,
> >>>>>>> sites file and swift.properties file.
> >>>>>>>
> >>>>>>> I didn't provide any additional command line
> >>>>>>> arguments.
> >>>>>>>
> >>>>>>> MikeK
> >>>>>>>
> >>>>>>>
> >>>>>>> --- Michael Wilde <wilde at mcs.anl.gov> wrote:
> >>>>>>>
> >>>>>>>> [ was Re: Swift jobs on UC/ANL TG ]
> >>>>>>>>
> >>>>>>>> Hi. Im at OHare and will be flying soon.
> >>>>>>>> Ben or Mihael, if you are online, can you
> >>>>>>>> investigate?
> >>>>>>>>
> >>>>>>>> Yes, there are significant throttles turned on
> >>>>> by
> >>>>>>>> default, and the system opens those very gradually.
> >>>>>>>>
> >>>>>>>> MikeK, can you post to the swift-devel list
> >>>>> your
> >>>>>>>> swift.properties file, command line options, and your swift source
> >>>>> code?
> >>>>>>>> Thanks,
> >>>>>>>>
> >>>>>>>> MikeW
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 1/29/08 8:11 AM, Ti Leggett wrote:
> >>>>>>>>> The default walltime is 15 minutes. Are you
> >>>>> doing
> >>>>>>>> fork jobs or pbs jobs?
> >>>>>>>>> You shouldn't be doing fork jobs at all. Mike
> >>>>> W, I
> >>>>>>>> thought there were
> >>>>>>>>> throttles in place in Swift to prevent this
> >>>>> type
> >>>>>>>> of overrun? Mike K,
> >>>>>>>>> I'll need you to either stop these types of
> >>>>> jobs
> >>>>>>>> until Mike W can verify
> >>>>>>>>> throttling or only submit a few 10s of jobs at
> >>>>> a
> >>>>>>>> time.
> >>>>>>>>> On Jan 28, 2008, at 01/28/08 07:13 PM, Mike
> >>>>> Kubal
> >>>>>>>> wrote:
> >>>>>>>>>> Yes, I'm submitting molecular dynamics
> >>>>>>>> simulations
> >>>>>>>>>> using Swift.
> >>>>>>>>>>
> >>>>>>>>>> Is there a default wall-time limit for jobs
> >>>>> on
> >>>>>>>> tg-uc?
> >>>>>>>>>>
> >>>>>>>>>> --- joseph insley <insley at mcs.anl.gov> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Actually, these numbers are now
> >>>>> escalating...
> >>>>>>>>>>> top - 17:18:54 up  2:29,  1 user,  load
> >>>>> average:
> >>>>>>>>>>> 149.02, 123.63, 91.94
> >>>>>>>>>>> Tasks: 469 total,   4 running, 465 sleeping,
> >>>>>   0
> >>>>>>>>>>> stopped,   0 zombie
> >>>>>>>>>>>
> >>>>>>>>>>> insley at tg-grid1:~> ps -ef | grep kubal | wc
> >>>>> -l
> >>>>>>>>>>>     479
> >>>>>>>>>>>
> >>>>>>>>>>> insley at tg-viz-login1:~> time globusrun -a -r
> >>>>>>>>>>> tg-grid.uc.teragrid.org
> >>>>>>>>>>> GRAM Authentication test successful
> >>>>>>>>>>> real    0m26.134s
> >>>>>>>>>>> user    0m0.090s
> >>>>>>>>>>> sys     0m0.010s
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Jan 28, 2008, at 5:15 PM, joseph insley
> >>>>>>>> wrote:
> >>>>>>>>>>>> Earlier today tg-grid.uc.teragrid.org (the
> >>>>>>>> UC/ANL
> >>>>>>>>>>> TG GRAM host)
> >>>>>>>>>>>> became unresponsive and had to be rebooted.
> >>>>>  I
> >>>>>>>> am
> >>>>>>>>>>> now seeing slow
> >>>>>>>>>>>> response times from the Gatekeeper there
> >>>>> again.
> >>>>>>>>>>> Authenticating to
> >>>>>>>>>>>> the gatekeeper should only take a second or
> >>>>>>>> two,
> >>>>>>>>>>> but it is
> >>>>>>>>>>>> periodically taking up to 16 seconds:
> >>>>>>>>>>>>
> >>>>>>>>>>>> insley at tg-viz-login1:~> time globusrun -a
> >>>>> -r
> >>>>>>>>>>> tg-grid.uc.teragrid.org
> >>>>>>>>>>>> GRAM Authentication test successful
> >>>>>>>>>>>> real    0m16.096s
> >>>>>>>>>>>> user    0m0.060s
> >>>>>>>>>>>> sys     0m0.020s
> >>>>>>>>>>>>
> >>>>>>>>>>>> looking at the load on tg-grid, it is
> >>>>> rather
> >>>>>>>> high:
> >>>>>>>>>>>> top - 16:55:26 up  2:06,  1 user,  load
> >>>>>>>> average:
> >>>>>>>>>>> 89.59, 78.69, 62.92
> >>>>>>>>>>>> Tasks: 398 total,  20 running, 378
> >>>>> sleeping, 
> >>>>>>>> 0
> >>>>>>>>>>> stopped,   0 zombie
> >>>>>>>>>>>> And there appear to be a large number of
> >>>>>>>> processes
> >>>>>>>>>>> owned by kubal:
> >>>>>>>>>>>> insley at tg-grid1:~> ps -ef | grep kubal | wc
> >>>>> -l
> >>>>>>>>>>>>    380
> >>>>>>>>>>>>
> >>>>>>>>>>>> I assume that Mike is using swift to do the
> >>>>> job
> >>>>>>>>>>> submission.  Is
> >>>>>>>>>>>> there some throttling of the rate at which
> >>>>> jobs
> >>>>>>>>>>> are submitted to
> >>>>>>>>>>>> the gatekeeper that could be done that
> >>>>> would
> >>>>>>>>>>> lighten this load
> >>>>>>>>>>>> some?  (Or has that already been done since
> >>>>>>>>>>> earlier today?)  The
> >>>>>>>>>>>> current response times are not
> >>>>> unacceptable,
> >>>>>>>> but
> >>>>>>>>>>> I'm hoping to
> >>>>>>>>>>>> avoid having the machine grind to a halt as
> >>>>> it
> >>>>>>>> did
> >>>>>>>>>>> earlier today.
> >>>>>>>>>>>> Thanks,
> >>>>>>>>>>>> joe.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>> ===================================================
> >>>>>>>>>>>> joseph a.
> >>>>>>>>>>>> insley
> >>>>>>>>>>>> insley at mcs.anl.gov
> >>>>>>>>>>>> mathematics & computer science division
> >>>>>>>>>>> (630) 252-5649
> >>>>>>>>>>>> argonne national laboratory
> >>>>>>>>>>>       (630)
> >>>>>>>>>>>> 252-5986 (fax)
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>> ===================================================
> >>>>>>>>>>> joseph a. insley
> >>>>>>>>>>>
> >>>>>>>>>>> insley at mcs.anl.gov
> >>>>>>>>>>> mathematics & computer science division     
> >>>>>>>> (630)
> >>>>>>>>>>> 252-5649
> >>>>>>>>>>> argonne national laboratory
> >>>>>>>>>>>     (630)
> >>>>>>>>>>> 252-5986 (fax)
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>      
> >>>> === message truncated ===
> >>>>
> >>>>
> >>>>       
> >>>> ____________________________________________________________________________________ 
> >>>>
> >>>> Looking for last minute shopping deals?  Find them fast with Yahoo! 
> >>>> Search.  
> >>>> http://tools.search.yahoo.com/newsearch/category.php?category=shopping
> >>> _______________________________________________
> >>> Swift-devel mailing list
> >>> Swift-devel at ci.uchicago.edu
> >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>>
> >>
> >> _______________________________________________
> >> Swift-devel mailing list
> >> Swift-devel at ci.uchicago.edu
> >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>
> >>
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >
> 




More information about the Swift-devel mailing list