OSG site tester (was Re: [Swift-user] propagating the properties channel to outside the scheduler.)

Allan Espinosa aespinosa at cs.uchicago.edu
Wed Oct 6 10:52:11 CDT 2010


The cat script generator as suggested by Mike:

http://gist.github.com/613551

2010/10/5  <wilde at mcs.anl.gov>:
> Allan, while you are debugging this, it would also be good to do a full end-to-end site testing in Swift, using a simple cat script as we discussed.
>
> One ugly but effective way to do this is to run one cat job per site by defining say  identical cat apps cat01 through catN (where N is the number of sites to test), and then dynamically create a tc.data file that maps each catNN app to a specific Grid site.
>
> So your script would, on OSG for example, need to run swift-osg-ress, and from that, create the tc.data file and a testosg.swift file.
>
> Then set swift.properties for the desired level (perhaps 0) of retries etc, eg:
> sitedir.keep=true
> execution.retries=0
> lazy.errors=false
>
> Then let this run for as long as it takes for most of the jobs to either run or fail, and likely, a few to hang waiting in queues or Condor-G retry/hold states.
>
> The Karajan script is a lower level test that is likely useful as well for diagnostics, but which doesnt replace a full Swift end-to-end test.
>
> - Mike
>
>
> ----- "Allan Espinosa" <aespinosa at cs.uchicago.edu> wrote:
>
>> Hi,
>>
>> I'm writing this OSG site tester script that submits condor-g jobs.
>> It seems that the property elements are not being used in my
>> task:execute() call.
>>
>> Here's the script:
>>
>> import("task.k")
>> import("sys.k")
>>
>> element(pool, [handle, ..., optional(workdir), channel(properties)]
>>   host(name = handle
>>     each(...)
>>     to(properties
>>       each(properties)
>>     )
>>   )
>> )
>>
>> element(servicelist, [type, provider, url]
>>   service(type, provider=provider, url=url)
>> )
>>
>> element(gridftp, [url, optional(storage), optional(major),
>> optional(minor), optional(patch)]
>>   if(
>>     url == "local://localhost"
>>     servicelist("file", "local", "")
>>     servicelist("file", "gsiftp", url)
>>   )
>> )
>>
>> element(execution, [provider, url]
>>   servicelist(type="execution", provider=provider, url=url)
>> )
>>
>> element(filesystem, [provider, url, optional(storage)]
>>   servicelist(type="file", provider=provider, url=url)
>> )
>>
>> element(profile, [namespace, key, value]
>>   if(
>>     namespace == "karajan"
>>     property("{key}", value)
>>     property("{namespace}:{key}", value)
>>   )
>> )
>>
>> element(workdirectory, [dir]
>>   property("workdir", dir)
>> )
>>
>> sitesFile := "condor_osg.xml"
>> sites := list(executeFile(sitesFile))
>>
>> for(site, sites
>>   print(site)
>>   task:execute("/bin/hostname",
>> stdout="file:///home/aespinosa/workflows/pool_coaster/site_test/{site}",
>> provider="condor", host=site)
>> )
>>
>>
>> sample generated condor submit file:
>> $ cat *.submit
>> universe = vanilla
>> output =
>> file:///home/aespinosa/workflows/pool_coaster/site_test/BNL-ATLAS
>> error =
>> /home/aespinosa/.globus/scripts/Condor8392886088313119280.submit.stderr
>>
>> executable = /bin/hostname
>>
>> notification = Never
>> leave_in_queue = TRUE
>> queue
>>
>>
>> a pool entry:
>>   <pool handle="BNL-ATLAS">
>>     <execution provider="condor" url="none"/>
>>
>>     <profile namespace="globus" key="jobType">grid</profile>
>>     <profile namespace="globus" key="gridResource">gt2
>> gridgk02.racf.bnl.gov/jobmanager-condor</profile>
>>
>>     <profile namespace="karajan" key="initialScore">20.0</profile>
>>     <profile namespace="karajan" key="jobThrottle">0.95</profile>
>>
>>     <gridftp  url="gsiftp://gridgk02.racf.bnl.gov"/>
>>
>> <workdirectory>/usatlas/prodjob/share/engage-scec/swift_scratch</workdirectory>
>>   </pool>



More information about the Swift-user mailing list