OSG site tester (was Re: [Swift-user] propagating the properties channel to outside the scheduler.)
Allan Espinosa
aespinosa at cs.uchicago.edu
Wed Oct 6 10:52:11 CDT 2010
The cat script generator as suggested by Mike:
http://gist.github.com/613551
2010/10/5 <wilde at mcs.anl.gov>:
> Allan, while you are debugging this, it would also be good to do a full end-to-end site testing in Swift, using a simple cat script as we discussed.
>
> One ugly but effective way to do this is to run one cat job per site by defining say identical cat apps cat01 through catN (where N is the number of sites to test), and then dynamically create a tc.data file that maps each catNN app to a specific Grid site.
>
> So your script would, on OSG for example, need to run swift-osg-ress, and from that, create the tc.data file and a testosg.swift file.
>
> Then set swift.properties for the desired level (perhaps 0) of retries etc, eg:
> sitedir.keep=true
> execution.retries=0
> lazy.errors=false
>
> Then let this run for as long as it takes for most of the jobs to either run or fail, and likely, a few to hang waiting in queues or Condor-G retry/hold states.
>
> The Karajan script is a lower level test that is likely useful as well for diagnostics, but which doesnt replace a full Swift end-to-end test.
>
> - Mike
>
>
> ----- "Allan Espinosa" <aespinosa at cs.uchicago.edu> wrote:
>
>> Hi,
>>
>> I'm writing this OSG site tester script that submits condor-g jobs.
>> It seems that the property elements are not being used in my
>> task:execute() call.
>>
>> Here's the script:
>>
>> import("task.k")
>> import("sys.k")
>>
>> element(pool, [handle, ..., optional(workdir), channel(properties)]
>> host(name = handle
>> each(...)
>> to(properties
>> each(properties)
>> )
>> )
>> )
>>
>> element(servicelist, [type, provider, url]
>> service(type, provider=provider, url=url)
>> )
>>
>> element(gridftp, [url, optional(storage), optional(major),
>> optional(minor), optional(patch)]
>> if(
>> url == "local://localhost"
>> servicelist("file", "local", "")
>> servicelist("file", "gsiftp", url)
>> )
>> )
>>
>> element(execution, [provider, url]
>> servicelist(type="execution", provider=provider, url=url)
>> )
>>
>> element(filesystem, [provider, url, optional(storage)]
>> servicelist(type="file", provider=provider, url=url)
>> )
>>
>> element(profile, [namespace, key, value]
>> if(
>> namespace == "karajan"
>> property("{key}", value)
>> property("{namespace}:{key}", value)
>> )
>> )
>>
>> element(workdirectory, [dir]
>> property("workdir", dir)
>> )
>>
>> sitesFile := "condor_osg.xml"
>> sites := list(executeFile(sitesFile))
>>
>> for(site, sites
>> print(site)
>> task:execute("/bin/hostname",
>> stdout="file:///home/aespinosa/workflows/pool_coaster/site_test/{site}",
>> provider="condor", host=site)
>> )
>>
>>
>> sample generated condor submit file:
>> $ cat *.submit
>> universe = vanilla
>> output =
>> file:///home/aespinosa/workflows/pool_coaster/site_test/BNL-ATLAS
>> error =
>> /home/aespinosa/.globus/scripts/Condor8392886088313119280.submit.stderr
>>
>> executable = /bin/hostname
>>
>> notification = Never
>> leave_in_queue = TRUE
>> queue
>>
>>
>> a pool entry:
>> <pool handle="BNL-ATLAS">
>> <execution provider="condor" url="none"/>
>>
>> <profile namespace="globus" key="jobType">grid</profile>
>> <profile namespace="globus" key="gridResource">gt2
>> gridgk02.racf.bnl.gov/jobmanager-condor</profile>
>>
>> <profile namespace="karajan" key="initialScore">20.0</profile>
>> <profile namespace="karajan" key="jobThrottle">0.95</profile>
>>
>> <gridftp url="gsiftp://gridgk02.racf.bnl.gov"/>
>>
>> <workdirectory>/usatlas/prodjob/share/engage-scec/swift_scratch</workdirectory>
>> </pool>
More information about the Swift-user
mailing list