[Swift-user] Setting up Swift at Stanford

Michael Wilde wilde at mcs.anl.gov
Mon Jun 3 22:27:45 CDT 2013


Hi Robert,

To run swift from a workstation that can ssh to one or more cluster head nodes, use a sites file like this:

  <pool handle="vsp-compute">
    <execution provider="coaster" jobmanager="ssh-cl:pbs" url="vsp-compute-01.stanford.edu"/>
    <profile namespace="globus" key="jobsPerNode">1</profile>
    <profile namespace="globus" key="lowOverAllocation">100</profile>
    <profile namespace="globus" key="highOverAllocation">100</profile>
    <profile namespace="globus" key="maxtime">3600</profile>
    <profile namespace="globus" key="maxWalltime">00:05:00</profile>
    <profile namespace="globus" key="queue">default</profile>
    <profile namespace="globus" key="slots">5</profile>
    <profile namespace="globus" key="maxnodes">1</profile>
    <profile namespace="globus" key="nodeGranularity">1</profile>
    <profile namespace="karajan" key="jobThrottle">1.00</profile>
    <profile namespace="karajan" key="initialScore">10000</profile>
    <workdirectory>/scratch/rmcgibbo/swiftwork</workdirectory>
  </pool>

This specifies that Swift should:

- use the "coaster" provider, which enables Swift to ssh to another system and qsub from there:

  <execution provider="coaster" jobmanager="ssh-cl:pbs" url="vsp-compute-01.stanford.edu"/>

- run up to 100 Swift app() tasks in parallel on the remote system:

  <profile namespace="karajan" key="jobThrottle">1.00</profile>
  <profile namespace="karajan" key="initialScore">10000</profile>

- app() tasks should be limited to 5 minutes walltime:

  <profile namespace="globus" key="maxWalltime">00:05:00</profile>

- app() tasks will be run within PBS coaster "pilot" jobs. Each PBS job should have a walltime of 750 seconds:

  <profile namespace="globus" key="lowOverAllocation">100</profile>
  <profile namespace="globus" key="highOverAllocation">100</profile>
  <profile namespace="globus" key="maxtime">750</profile>

- Up to 5 concurrent PBS coaster jobs each asking for 1 node will be submitted to the default queue:

  <profile namespace="globus" key="queue">default</profile>
  <profile namespace="globus" key="slots">5</profile>
  <profile namespace="globus" key="maxnodes">1</profile>
  <profile namespace="globus" key="nodeGranularity">1</profile>

- Swift should run only one app() task at a time within each PBS job slot:

  <profile namespace="globus" key="jobsPerNode">1</profile>

- On the remote PBS cluster, create per-run directories under this work directory:

  <workdirectory>/scratch/rmcgibbo/swiftwork</workdirectory>

- And stage data to the site by using local copy operations:

  <filesystem provider="local"/>

You can make the sites.xml entry more user-independent using, e.g.:

    <workdirectory>/scratch/{env.USER}/swiftwork</workdirectory>

The overall sites entry above assumes:

- That /scratch/rmcgibbo is mounted on both the Swift run host and on the remote PBS system.

If there is no common shared filesystem, Swift can use a data transport technique called "coaster provider staging" to move the data for you. This is specified in the swift.properties file.

In many cases, with a shared filesystem bewteen the Swift client host and the execution cluster, its desirable to turn off staging altogether. This is done using a mode called "direct" data management (see http://www.ci.uchicago.edu/swift/guides/trunk/userguide/userguide.html#_collective_data_management. This is being simplified for future releases.)

- That each PBS job is given one CPU core, not one full node.

The PBS ppn attribute can be specified to request a specific number of cores (processors) per node:

  <profile namespace="globus" key="ppn">16</profile>

...and then that each coaster pilot job should run up to 16 Swift app() tasks at once:

  <profile namespace="globus" key="jobsPerNode">16</profile>

For more info on coasters, see:
  http://www.ci.uchicago.edu/swift/guides/trunk/userguide/userguide.html#_coasters
  and: http://www.ci.uchicago.edu/swift/papers/UCC-coasters.pdf

For more examples on site configurations, see:

  http://www.ci.uchicago.edu/swift/guides/trunk/siteguide/siteguide.html

And lastly, note that in your initial sites.xml below:

- Omitting the filesystem provider tag is typically only done when "use.provider.staging" is specified in the swift.properties config file

- The stagingMethod tag only applies to provider staging.

We're working hard to document all this better and provider a better set of illustrated examples and templates for common site configurations.  In the meantime, we'll help you create a set of useful configurations for your site(s).

Regards,

- Mike

> We just heard about the swift project from some colleagues at U
> Chicago, and we're interested in trying it out with some of our
> compute resources at Stanford to run parallel molecular dynamics and
> x-ray scatting simulations. Currently, I'm most interested in
> setting up the environment such that I can submit my swift script on
> a local workstation, with execution on a few different clusters. The
> head nodes of our local clusters are accessible via ssh, and then
> job execution is scheduled with pbs.
> 
> When I run swift, it can't seem to find qsub on the cluster.
> 
> rmcgibbo at Roberts-MacBook-Pro-2 ~/projects/swift
> $ swift -sites.file sites.xml hello.swift -tc.file tc.data
> Swift 0.94 swift-r6492 cog-r3658
> 
> RunID: 20130603-1704-5xii8svc
> Progress: time: Mon, 03 Jun 2013 17:04:10 -0700
> 2013-06-03 17:04:10.735 java[77051:1f07] Loading Maximizer into
> bundle: com.apple.javajdk16.cmd
> 2013-06-03 17:04:11.410 java[77051:1f07] Maximizer: Unsupported
> window created of class: CocoaAppWindow
> Progress: time: Mon, 03 Jun 2013 17:04:13 -0700 Stage in:1
> Execution failed:
> Exception in uname:
> Arguments: [-a]
> Host: vsp-compute
> Directory: hello-20130603-1704-5xii8svc/jobs/y/uname-ydyn5fal
> Caused by:
> Cannot submit job: Cannot run program "qsub": error=2, No such file
> or directory
> uname, hello.swift, line 8
> 
> When I switch the execution provider from pbs to ssh, the hob runs
> successfully, but only on the head node of the vsp-compute cluster.
> I'd like to run instead using the cluster's pbs queue. Any help
> would be greatly appreciated.
> 
> -Robert
> Graduate Student, Pande Lab
> Stanford University, Department of Chemistry
> 
> p.s.
> 
> My sitess.xml file is
> ```
> <config>
> <pool handle="vsp-compute">
> <filesystem provider="ssh" url=" vsp-compute-01.stanford.edu "/>
> <execution provider="pbs" jobmanager="ssh:pbs" url="
> vsp-compute-01.stanford.edu "/>
> 
> <profile namespace="globus" key="maxtime">750</profile>
> <profile namespace="globus" key="jobsPerNode">1</profile>
> <profile namespace="globus" key="queue">default</profile>
> <profile namespace="swift" key="stagingMethod">file</profile>
> 
> <workdirectory>/scratch/rmcgibbo/swiftwork</workdirectory>
> </pool>
> 
> <!-- End -->
> </config>
> ```
> 
> My SwiftScript is
> ```
> #hello.swift
> type file;
> 
> app (file o) uname() {
> uname "-a" stdout=@o;
> }
> file outfile <"uname.txt">;
> 
> outfile = uname();
> ```
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user



More information about the Swift-user mailing list