[Swift-user] question about xsede

Michael Wilde wilde at anl.gov
Mon Sep 29 12:23:04 CDT 2014


Justin, I see that Yadu provided tested configurations for running with 
Swift 0.94.1, 0.95RC7, and trunk, in his email to this list on 9/25 
(pasted below).

He pointed you to this directory for a sites.xml example for 0.94 and 0.95:
http://users.rcc.uchicago.edu/~yadunand/blacklight-sanity/0.94configs/sites.xml

The config he provided for 0.94.1 should also work for 0.95RC6 (which I 
see you are using)

(Note that we will be posting an 0.95 final release, or an RC7, by the 
end of this week).

Based on this discussion so far, I think the following is a good base 
for a sites entry for running on Blacklight:

<configxmlns="http://www.ci.uchicago.edu/swift/SwiftSites">
<poolhandle="blacklight">
<executionprovider="coaster"jobmanager="local:pbs"/>
<profilenamespace="globus"key="queue">debug</profile>
<profile namespace="karajan" key="jobThrottle">.320</profile>
<profilenamespace="karajan"key="initialScore">10000</profile>
<profilenamespace="globus"key="jobsPerNode">16</profile>
<profilenamespace="globus"key="maxtime">900</profile>
<profilenamespace="globus"key="maxwalltime">00:10:00</profile>
<profilenamespace="globus"key="ppn">16</profile>
<workdirectory>/usr/users/8/yadunand/swiftwork</workdirectory>
<filesystemprovider="local"/>
</pool>
</config>

for the "queue" tag, use either debug or batch.

for the "workdirectory" tag, specify a fully qualified directory 
pathname that you can write/create in. Note that you might have a 
different $HOME "users/N" dir than Yadu's.

Please double check the latest code at the link Yadu provided, to make 
sure I did not get anything wrong, above.

Regards,

- Mike


-------- Original Message --------
Subject: 	Re: [Swift-user] question about xsede
Date: 	Thu, 25 Sep 2014 17:25:01 -0500
From: 	Yadu Nand <yadudoc1729 at gmail.com>
To: 	Michael Wilde <wilde at anl.gov>
CC: 	Swift User <swift-user at ci.uchicago.edu>



Hi Justin,

Here are some tested configs and a small README from running the sanity 
test on Blacklight:
http://users.rcc.uchicago.edu/~yadunand/blacklight-sanity/ 
<http://users.rcc.uchicago.edu/%7Eyadunand/blacklight-sanity/>

There's an example each of configs for Swift 0.94, Swift 0.95 and the 
configs we would use going
forward (Swift 0.96 and current trunk) in that folder.

In the example, I've used ppn=16 (or any multiple of 16) which seems to 
work as a substitute for ncpus.

Hope that helps!

-Yadu


On 9/29/14, 9:24 AM, Michael Wilde wrote:
> Justin, in your sites entry below, this line looks suspect:
>
> <workdirectory>/brashear/usrname</workdirectory>
>
> That needs to be the name of a writable directory.
>
> - Mike
>
>
> On 9/28/14, 6:03 PM, Justin bbt wrote:
>> changing the queue to "batch", I get this
>> Execution failed:
>> Exception in simulate:
>>     Arguments: [--timesteps, 1, --range, 100, --nvalues, 5]
>>     Host: black
>>   Directory: p4-run026/jobs/5/simulate-58d6x2yl
>> exception @ swift-int-staging.k, line: 181
>> Caused by:
>> exception @ swift-int-staging.k, line: 177
>> Caused by: null
>> Caused by: 
>> org.globus.cog.abstraction.impl.common.execution.JobException: Job 
>> failed with an exit code of 1
>>
>> ????
>>
>> On Sat, Sep 27, 2014 at 5:17 PM, Mihael Hategan <hategan at mcs.anl.gov 
>> <mailto:hategan at mcs.anl.gov>> wrote:
>>
>>     Hi Justin,
>>
>>     Is there a queue named "normal" on that machine (qstat -q should tell
>>     you)?
>>
>>     Mihael
>>
>>     On Sat, 2014-09-27 at 15:41 -0400, Justin bbt wrote:
>>     > So, I am using this config
>>     >
>>     >
>>     > <pool handle="black">
>>     >     <execution provider="coaster" jobmanager="local:pbs"
>>     URL="none"/>
>>     >     <profile namespace="env"
>>     key="PATHPREFIX">{env.PWD}/../app</profile>
>>     >     <profile namespace="globus" key="jobsPerNode">1</profile>
>>     >     <profile namespace="globus" key="queue">normal</profile>
>>     > <profile namespace="globus" key="ppn">16</profile>
>>     >   <profile namespace="globus" key="maxWallTime">00:01:00</profile>
>>     >     <profile namespace="globus" key="maxTime">3600</profile>
>>     >     <profile namespace="globus"
>>     key="lowOverAllocation">100</profile>
>>     >     <profile namespace="globus"
>>     key="highOverAllocation">100</profile>
>>     >     <profile namespace="globus" key="slots">2</profile>
>>     >     <profile namespace="globus" key="maxNodes">1</profile>
>>     >     <profile namespace="globus" key="nodeGranularity">1</profile>
>>     >     <profile namespace="karajan" key="jobThrottle">.320</profile>
>>     >     <profile namespace="karajan" key="initialScore">10000</profile>
>>     >   <profile namespace="globus" key="slots">1</profile>
>>     >    <profile namespace="globus" key="project">TG-CCR134513</profile>
>>     >  <workdirectory>/brashear/usrname</workdirectory>
>>     >       <profile namespace="swift"
>>     key="stagingMethod">local</profile>
>>     > <filesystem provider="local" />
>>     > </pool>
>>     >
>>     > but, it does not work and give this error :
>>     >
>>     > Swift 0.95 RC6 swift-r7900 cog-r3908
>>     > RunID: run011
>>     > Warning: The @ syntax for function invocation is deprecated
>>     > [Error] sites.xml, line 2, col 10: cvc-elt.1: Cannot find the
>>     declaration
>>     > of element 'config'.
>>     > Progress: Sat, 27 Sep 2014 15:35:52-0400
>>     >
>>     > Could not submit job (qsub reported an exit code of 170).
>>     > qsub: Unknown queue MSG=cannot locate queue
>>     >
>>     > Execution failed:
>>     > Exception in simulate:
>>     >     Arguments: [--timesteps, 1, --range, 100, --nvalues, 5]
>>     >     Host: black
>>     >     Directory: p4-run011/jobs/s/simulate-sdm041yl
>>     > exception @ swift-int-staging.k, line: 181
>>     > Caused by:
>>     > exception @ swift-int-staging.k, line: 177
>>     > Caused by: Block task failed: Error submitting block task
>>     >
>>     org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>>     Could
>>     > not submit job (qsub reported an exit code of 170).
>>     > qsub: Unknown queue MSG=cannot locate queue
>>     >
>>     > at
>>     >
>>     org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:63)
>>     > at
>>     >
>>     org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:45)
>>     > at
>>     >
>>     org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:61)
>>     > at
>>     >
>>     org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.run(BlockTaskSubmitter.java:70)
>>     > Caused by:
>>     >
>>     org.globus.cog.abstraction.impl.scheduler.common.ProcessException: Could
>>     > not submit job (qsub reported an exit code of 170).
>>     > qsub: Unknown queue MSG=cannot locate queue
>>     >
>>     > at
>>     >
>>     org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.start(AbstractExecutor.java:113)
>>     > at
>>     >
>>     org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:53)
>>     > ... 3 more
>>     >
>>     >
>>     >
>>     > also, in some configs I saw this. But I dont know what this is
>>     and what
>>     > values should I set to
>>     >
>>     >     <profile namespace="globus"
>>     > key="providerAttributes">pbs.aprun;pbs.mpp;depth=32</profile>
>>     >
>>     >
>>     > On Tue, Sep 23, 2014 at 6:22 PM, Justin bbt
>>     <justinbbt at gmail.com <mailto:justinbbt at gmail.com>> wrote:
>>     >
>>     > > Thank you very much.
>>     > >
>>     > > I am actually using the Blacklight, which I guess is PBS based.
>>     > > So, should I use the Crays tutorial and setting ?
>>     > > http://swift-lang.org/tutorials/cray/tutorial.html
>>     > >
>>     > > On Tue, Sep 23, 2014 at 10:31 AM, Ketan Maheshwari
>>     <ketan at mcs.anl.gov <mailto:ketan at mcs.anl.gov>>
>>     > > wrote:
>>     > >
>>     > >> Hi Justin,
>>     > >>
>>     > >> If you are using xsede Stampede regular nodes (non xeon
>>     phi), here is a
>>     > >> site configuration that has worked for me in the past,
>>     connecting over ssh
>>     > >> to slurm:
>>     > >>
>>     > >> <?xml version="1.0" encoding="UTF-8"?>
>>     > >> <config xmlns="http://www.ci.uchicago.edu/swift/SwiftSites">
>>     > >>  <pool handle="stampede">
>>     > >>    <execution provider="coaster" jobmanager="ssh-cl:slurm" url="
>>     > >> stampede.tacc.utexas.edu <http://stampede.tacc.utexas.edu>"/>
>>     > >>    <filesystem provider="local" />
>>     > >>    <profile namespace="globus" key="jobsPerNode">1</profile>
>>     > >>    <profile namespace="globus" key="ppn">1</profile>
>>     > >>    <profile namespace="globus" key="maxTime">7500</profile>
>>     > >>    <profile namespace="globus"
>>     key="maxwalltime">00:10:00</profile>
>>     > >>    <profile namespace="globus"
>>     key="lowOverallocation">100</profile>
>>     > >>    <profile namespace="globus"
>>     key="highOverallocation">100</profile>
>>     > >>    <profile namespace="globus" key="queue">normal</profile>
>>     > >>    <profile namespace="globus" key="nodeGranularity">1</profile>
>>     > >>    <profile namespace="globus" key="maxNodes">1</profile>
>>     > >>    <profile namespace="globus" key="slots">1</profile>
>>     > >>    <profile namespace="globus"
>>     key="project">TG-EAR130015</profile>
>>     > >>    <profile namespace="karajan"
>>     key="jobThrottle">.3199</profile>
>>     > >>    <profile namespace="karajan"
>>     key="initialScore">10000</profile>
>>     > >> <workdirectory>/tmp/{env.USER}/swift.work</workdirectory>
>>     > >>   </pool>
>>     > >> </config>
>>     > >>
>>     > >> You will need to replace project id with yours.
>>     > >>
>>     > >> Thanks,
>>     > >> Ketan
>>     > >>
>>     > >> On Mon, Sep 22, 2014 at 8:32 PM, Justin bbt
>>     <justinbbt at gmail.com <mailto:justinbbt at gmail.com>> wrote:
>>     > >>
>>     > >>> Hi
>>     > >>>
>>     > >>> If I want to use resources on the xsede
>>     > >>> https://www.xsede.org/overview
>>     > >>> which site config should I use ?
>>     > >>>
>>     > >>>
>>     > >>> _______________________________________________
>>     > >>> Swift-user mailing list
>>     > >>> Swift-user at ci.uchicago.edu <mailto:Swift-user at ci.uchicago.edu>
>>     > >>>
>>     https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>>     > >>>
>>     > >>
>>     > >>
>>     > >
>>     > _______________________________________________
>>     > Swift-user mailing list
>>     > Swift-user at ci.uchicago.edu <mailto:Swift-user at ci.uchicago.edu>
>>     > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>>
>>
>>
>>
>>
>> _______________________________________________
>> Swift-user mailing list
>> Swift-user at ci.uchicago.edu
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>
> -- 
> Michael Wilde
> Mathematics and Computer Science          Computation Institute
> Argonne National Laboratory               The University of Chicago

-- 
Michael Wilde
Mathematics and Computer Science          Computation Institute
Argonne National Laboratory               The University of Chicago

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20140929/f075bafe/attachment.html>


More information about the Swift-user mailing list