[Swift-user] Setting up Swift at Stanford
Michael Wilde
wilde at mcs.anl.gov
Mon Jun 3 22:27:45 CDT 2013
Hi Robert,
To run swift from a workstation that can ssh to one or more cluster head nodes, use a sites file like this:
<pool handle="vsp-compute">
<execution provider="coaster" jobmanager="ssh-cl:pbs" url="vsp-compute-01.stanford.edu"/>
<profile namespace="globus" key="jobsPerNode">1</profile>
<profile namespace="globus" key="lowOverAllocation">100</profile>
<profile namespace="globus" key="highOverAllocation">100</profile>
<profile namespace="globus" key="maxtime">3600</profile>
<profile namespace="globus" key="maxWalltime">00:05:00</profile>
<profile namespace="globus" key="queue">default</profile>
<profile namespace="globus" key="slots">5</profile>
<profile namespace="globus" key="maxnodes">1</profile>
<profile namespace="globus" key="nodeGranularity">1</profile>
<profile namespace="karajan" key="jobThrottle">1.00</profile>
<profile namespace="karajan" key="initialScore">10000</profile>
<workdirectory>/scratch/rmcgibbo/swiftwork</workdirectory>
</pool>
This specifies that Swift should:
- use the "coaster" provider, which enables Swift to ssh to another system and qsub from there:
<execution provider="coaster" jobmanager="ssh-cl:pbs" url="vsp-compute-01.stanford.edu"/>
- run up to 100 Swift app() tasks in parallel on the remote system:
<profile namespace="karajan" key="jobThrottle">1.00</profile>
<profile namespace="karajan" key="initialScore">10000</profile>
- app() tasks should be limited to 5 minutes walltime:
<profile namespace="globus" key="maxWalltime">00:05:00</profile>
- app() tasks will be run within PBS coaster "pilot" jobs. Each PBS job should have a walltime of 750 seconds:
<profile namespace="globus" key="lowOverAllocation">100</profile>
<profile namespace="globus" key="highOverAllocation">100</profile>
<profile namespace="globus" key="maxtime">750</profile>
- Up to 5 concurrent PBS coaster jobs each asking for 1 node will be submitted to the default queue:
<profile namespace="globus" key="queue">default</profile>
<profile namespace="globus" key="slots">5</profile>
<profile namespace="globus" key="maxnodes">1</profile>
<profile namespace="globus" key="nodeGranularity">1</profile>
- Swift should run only one app() task at a time within each PBS job slot:
<profile namespace="globus" key="jobsPerNode">1</profile>
- On the remote PBS cluster, create per-run directories under this work directory:
<workdirectory>/scratch/rmcgibbo/swiftwork</workdirectory>
- And stage data to the site by using local copy operations:
<filesystem provider="local"/>
You can make the sites.xml entry more user-independent using, e.g.:
<workdirectory>/scratch/{env.USER}/swiftwork</workdirectory>
The overall sites entry above assumes:
- That /scratch/rmcgibbo is mounted on both the Swift run host and on the remote PBS system.
If there is no common shared filesystem, Swift can use a data transport technique called "coaster provider staging" to move the data for you. This is specified in the swift.properties file.
In many cases, with a shared filesystem bewteen the Swift client host and the execution cluster, its desirable to turn off staging altogether. This is done using a mode called "direct" data management (see http://www.ci.uchicago.edu/swift/guides/trunk/userguide/userguide.html#_collective_data_management. This is being simplified for future releases.)
- That each PBS job is given one CPU core, not one full node.
The PBS ppn attribute can be specified to request a specific number of cores (processors) per node:
<profile namespace="globus" key="ppn">16</profile>
...and then that each coaster pilot job should run up to 16 Swift app() tasks at once:
<profile namespace="globus" key="jobsPerNode">16</profile>
For more info on coasters, see:
http://www.ci.uchicago.edu/swift/guides/trunk/userguide/userguide.html#_coasters
and: http://www.ci.uchicago.edu/swift/papers/UCC-coasters.pdf
For more examples on site configurations, see:
http://www.ci.uchicago.edu/swift/guides/trunk/siteguide/siteguide.html
And lastly, note that in your initial sites.xml below:
- Omitting the filesystem provider tag is typically only done when "use.provider.staging" is specified in the swift.properties config file
- The stagingMethod tag only applies to provider staging.
We're working hard to document all this better and provider a better set of illustrated examples and templates for common site configurations. In the meantime, we'll help you create a set of useful configurations for your site(s).
Regards,
- Mike
> We just heard about the swift project from some colleagues at U
> Chicago, and we're interested in trying it out with some of our
> compute resources at Stanford to run parallel molecular dynamics and
> x-ray scatting simulations. Currently, I'm most interested in
> setting up the environment such that I can submit my swift script on
> a local workstation, with execution on a few different clusters. The
> head nodes of our local clusters are accessible via ssh, and then
> job execution is scheduled with pbs.
>
> When I run swift, it can't seem to find qsub on the cluster.
>
> rmcgibbo at Roberts-MacBook-Pro-2 ~/projects/swift
> $ swift -sites.file sites.xml hello.swift -tc.file tc.data
> Swift 0.94 swift-r6492 cog-r3658
>
> RunID: 20130603-1704-5xii8svc
> Progress: time: Mon, 03 Jun 2013 17:04:10 -0700
> 2013-06-03 17:04:10.735 java[77051:1f07] Loading Maximizer into
> bundle: com.apple.javajdk16.cmd
> 2013-06-03 17:04:11.410 java[77051:1f07] Maximizer: Unsupported
> window created of class: CocoaAppWindow
> Progress: time: Mon, 03 Jun 2013 17:04:13 -0700 Stage in:1
> Execution failed:
> Exception in uname:
> Arguments: [-a]
> Host: vsp-compute
> Directory: hello-20130603-1704-5xii8svc/jobs/y/uname-ydyn5fal
> Caused by:
> Cannot submit job: Cannot run program "qsub": error=2, No such file
> or directory
> uname, hello.swift, line 8
>
> When I switch the execution provider from pbs to ssh, the hob runs
> successfully, but only on the head node of the vsp-compute cluster.
> I'd like to run instead using the cluster's pbs queue. Any help
> would be greatly appreciated.
>
> -Robert
> Graduate Student, Pande Lab
> Stanford University, Department of Chemistry
>
> p.s.
>
> My sitess.xml file is
> ```
> <config>
> <pool handle="vsp-compute">
> <filesystem provider="ssh" url=" vsp-compute-01.stanford.edu "/>
> <execution provider="pbs" jobmanager="ssh:pbs" url="
> vsp-compute-01.stanford.edu "/>
>
> <profile namespace="globus" key="maxtime">750</profile>
> <profile namespace="globus" key="jobsPerNode">1</profile>
> <profile namespace="globus" key="queue">default</profile>
> <profile namespace="swift" key="stagingMethod">file</profile>
>
> <workdirectory>/scratch/rmcgibbo/swiftwork</workdirectory>
> </pool>
>
> <!-- End -->
> </config>
> ```
>
> My SwiftScript is
> ```
> #hello.swift
> type file;
>
> app (file o) uname() {
> uname "-a" stdout=@o;
> }
> file outfile <"uname.txt">;
>
> outfile = uname();
> ```
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
More information about the Swift-user
mailing list