[Swift-devel] Status of coasters

Ian Foster foster at anl.gov
Fri Feb 13 09:17:38 CST 2009


Mike:

What is the scalability problem WRT GT2 GRAM sites?

Ian.


On Feb 13, 2009, at 8:59 AM, Michael Wilde wrote:

> Here's my understanding of status, issues and needs on coasters.
>
> Some side discussion with Mihael on various coaster issues is  
> summarized here as well; clarifications welcome.
>
> Work in progress:
>
> - Mihael has a good handle on the bootstrap issues and is working on  
> improvements. This is not working in trunk at the moment, will  
> likely be fixed soon. We think this will fix known issues in:  
> command line lenth for condor, spaces, quotes, newlines and other  
> offending argument issues; location of Java and tools (wget/curl and  
> mdsum).
>
> - still to do on above: sites.xml attribute to explicitly specify  
> location of tools, or at least of Java.
>
> - Ben has a patch to integrate to run the coaster service on a  
> worker node. Question: this is only usable when workers have  
> sufficient IP access, correct?
>
> - The scalability problem submitting to GT2 GRAM sites still exists.  
> Potential solutions are:
>
> -- Service submits workers via PBS (using jobmanger=gt2:pbs). Valid  
> only on PBS sites. Not yet tested.
>
> -- Service submits workers via Condor-G (using  
> jobmanager=gt2:condor). Mihael feels this requires a new Condor  
> provider, the one in the current code base being insufficient and  
> untested - really more of a prototype developed by a student).
>
> -- Service submits via WS-GRAM. This should be tested, on sites  
> where WS-GRAM is working.
> This woild use jobmanager=gt2:gt4:{pbs/condor/sge}, and needs to be  
> tested.
> For sites where WS-GRAM is not functional, I suggested we consider  
> configuring our own non-root WS-GRAM, ideally using already- 
> installed GT4 software, eg, from the OSG package on OSG and TG sites  
> where its installed. Mihael thought this would be considerable work.  
> I agree but it might be a stable solution with fewer unknowns and  
> suppot from the GRAM group. We can bring in the latest GT4 as needed  
> if that provides a better solution than some older installed GT4  
> which we have no control over and which wont change till upcoming  
> releases of say OSG or TG packages.
>
> Doing the above should then enable large-scale testing of user  
> workflows across many OSG and TG sites, without need to throttle  
> back the *number* of jobs waiting or running.
>
> Lastly: it seems that a Condor-G provide might be a powerful  
> capability (as one configuration option) to be able to submit all  
> swift jobs via Condor-G (e.g, for non-coaster runs as well).  Please  
> comment on the value of such a capability.
>
> - Mike
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel




More information about the Swift-devel mailing list