[Swift-devel] Support request: Swift jobs flooding uc-teragrid?

Wed Jan 30 12:37:39 CST 2008

I'm confused. Why would you want to test GRAM scalability while
introducing additional biasing elements, such as Condor-G?

On Wed, 2008-01-30 at 11:21 -0600, Stuart Martin wrote:
> All,
> 
> I wanted to chime in with a number of things being discussed here.
> 
> There is a GRAM RFT Core reliability group focused on ensuring the  
> GRAM service stays up and functional in spit of an onslaught from a  
> client.  http://confluence.globus.org/display/CDIGS/GRAM-RFT-Core+Reliability+Tiger+Team
> 
> The ultimate goal here is that a client may get a timeout and that  
> would be the signal to backoff some.
> 
> -----
> 
> OSG - VO testing: We worked with Terrence (CMS) recently and here are  
> his test results.
> 	http://hepuser.ucsd.edu/twiki/bin/view/UCSDTier2/WSGramTests
> 
> GRAM2 handled this 2000 jobs x 2 condor-g clients to the same GRAM  
> service better than GRAM4.  But again, this is with the condor-g  
> tricks.  Without the tricks, GRAM2 will handle the load better.
> 
> OSG VTB testing: These were using globusrun-ws and also condor-g.
> 	https://twiki.grid.iu.edu/twiki/bin/view/Integration/WSGramValidation
> 
> clients in these tests got a variety of errors depending on the jobs  
> run: timeouts, GridFTP authentication errors, client-side OOM, ...   
> GRAM4 functions pretty well, but it was not able to handle Terrence's  
> scenario.  But it handled 1000 jobs x 1 condor-g client just fine.
> 
> -----
> 
> It would be very interesting to see how swift does with GRAM4.  This  
> would make for a nice comparison to condor-g.
> 
> As far as having functioning GRAM4 services on TG, things have  
> improved.  LEAD is using GRAM4 exclusively and we've been working with  
> them to make sure the GRAM4 services are up and functioning.  INCA has  
> been updated to more effectively test and monitor GRAM4 and GridFTP  
> services that LEAD is targeting.  This could be extended for any hosts  
> that swift would like to test against.  Here are some interesting  
> charts from INCA - http://cuzco.sdsc.edu:8085/cgi-bin/lead.cgi
> 
> -Stu
> 
> On Jan 30, 2008, at Jan 30, 10:00 AM, Ti Leggett wrote:
> 
> >
> > On Jan 30, 2008, at 01/30/08 09:48 AM, Ben Clifford wrote:
> >
> > [snip]
> >
> >> No. The default behaviour when working with a user who is "just  
> >> trying to
> >> get their stuff to run" is "screw this, use GRAM2 because it works".
> >>
> >> Its a self-reinforcing feedback loop, that will be broken at the  
> >> point
> >> that it becomes easier for people to stick with GRAM4 than default  
> >> back to
> >> GRAM2. I guess we need to keep trying every now and then and hope  
> >> that one
> >> time it sticks ;-)
> >>
> >> -- 
> >
> > Well this works to a point, but if falling back to a technology that  
> > is known to not be scalable for your sizes results in killing a  
> > machine, I, as a site admin, will eventually either a) deny you  
> > service b) shut down the poorly performing service or c) all of the  
> > above. So it's in your best interest to find and use those  
> > technologies that are best suited to the task at hand so the users  
> > of your software don't get nailed by (a).
> >
> > In this case it seems to me that using WS-GRAM, extending WS-GRAM  
> > and/or MDS to report site statistics, and/or modifying WS-GRAM to  
> > throttle itself (think of how apache reports "Server busy. Try again  
> > later") is the best path forward. For the short term, it seems that  
> > the Swift developers should manually find those limits for sites  
> > that the users use regularly for them to use, *and* educate their  
> > users on how to identify that they could be adversely affecting a  
> > resource and throttle themselves till the ideal, automated method is  
> > a usable reality.
> >
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>