[Swift-user] propagating the properties channel to outside the scheduler.

wilde at mcs.anl.gov wilde at mcs.anl.gov
Tue Oct 5 15:01:24 CDT 2010


Allan, while you are debugging this, it would also be good to do a full end-to-end site testing in Swift, using a simple cat script as we discussed.

One ugly but effective way to do this is to run one cat job per site by defining say  identical cat apps cat01 through catN (where N is the number of sites to test), and then dynamically create a tc.data file that maps each catNN app to a specific Grid site.

So your script would, on OSG for example, need to run swift-osg-ress, and from that, create the tc.data file and a testosg.swift file.

Then set swift.properties for the desired level (perhaps 0) of retries etc, eg:
sitedir.keep=true
execution.retries=0
lazy.errors=false

Then let this run for as long as it takes for most of the jobs to either run or fail, and likely, a few to hang waiting in queues or Condor-G retry/hold states.

The Karajan script is a lower level test that is likely useful as well for diagnostics, but which doesnt replace a full Swift end-to-end test.

- Mike

 
----- "Allan Espinosa" <aespinosa at cs.uchicago.edu> wrote:

> Hi,
> 
> I'm writing this OSG site tester script that submits condor-g jobs.
> It seems that the property elements are not being used in my
> task:execute() call.
> 
> Here's the script:
> 
> import("task.k")
> import("sys.k")
> 
> element(pool, [handle, ..., optional(workdir), channel(properties)]
>   host(name = handle
>     each(...)
>     to(properties
>       each(properties)
>     )
>   )
> )
> 
> element(servicelist, [type, provider, url]
>   service(type, provider=provider, url=url)
> )
> 
> element(gridftp, [url, optional(storage), optional(major),
> optional(minor), optional(patch)]
>   if(
>     url == "local://localhost"
>     servicelist("file", "local", "")
>     servicelist("file", "gsiftp", url)
>   )
> )
> 
> element(execution, [provider, url]
>   servicelist(type="execution", provider=provider, url=url)
> )
> 
> element(filesystem, [provider, url, optional(storage)]
>   servicelist(type="file", provider=provider, url=url)
> )
> 
> element(profile, [namespace, key, value]
>   if(
>     namespace == "karajan"
>     property("{key}", value)
>     property("{namespace}:{key}", value)
>   )
> )
> 
> element(workdirectory, [dir]
>   property("workdir", dir)
> )
> 
> sitesFile := "condor_osg.xml"
> sites := list(executeFile(sitesFile))
> 
> for(site, sites
>   print(site)
>   task:execute("/bin/hostname",
> stdout="file:///home/aespinosa/workflows/pool_coaster/site_test/{site}",
> provider="condor", host=site)
> )
> 
> 
> sample generated condor submit file:
> $ cat *.submit
> universe = vanilla
> output =
> file:///home/aespinosa/workflows/pool_coaster/site_test/BNL-ATLAS
> error =
> /home/aespinosa/.globus/scripts/Condor8392886088313119280.submit.stderr
> 
> executable = /bin/hostname
> 
> notification = Never
> leave_in_queue = TRUE
> queue
> 
> 
> a pool entry:
>   <pool handle="BNL-ATLAS">
>     <execution provider="condor" url="none"/>
> 
>     <profile namespace="globus" key="jobType">grid</profile>
>     <profile namespace="globus" key="gridResource">gt2
> gridgk02.racf.bnl.gov/jobmanager-condor</profile>
> 
>     <profile namespace="karajan" key="initialScore">20.0</profile>
>     <profile namespace="karajan" key="jobThrottle">0.95</profile>
> 
>     <gridftp  url="gsiftp://gridgk02.racf.bnl.gov"/>
>    
> <workdirectory>/usatlas/prodjob/share/engage-scec/swift_scratch</workdirectory>
>   </pool>
> 
> 
> -- 
> Allan M. Espinosa <http://amespinosa.wordpress.com>
> PhD student, Computer Science
> University of Chicago <http://people.cs.uchicago.edu/~aespinosa>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-user mailing list