[Swift-user] Setting up Swift at Stanford

Robert McGibbon rmcgibbo at gmail.com
Mon Jun 3 22:59:02 CDT 2013


Thanks for your help, Michael! With your suggestions, I got swift working. Here's my config: http://rmcgibbo.github.io/blog/2013/06/03/setting-up-swift/

One thing I couldn't get working is <filesystem provider="local"/>. When I have that in my sites.xml, I get 

$ swift uname.swift
Swift 0.94 swift-r6492 cog-r3658

RunID: 20130603-2056-7octkf3a
Progress:  time: Mon, 03 Jun 2013 20:56:13 -0700
Execution failed:
	Could not initialize shared directory on vsp-compute
Caused by:
	org.globus.cog.abstraction.impl.file.FileResourceException: Failed to create directory: /home/rmcgibbo/.swiftwork/uname-20130603-2056-7octkf3a/shared
	uname, uname.swift, line 10

But with <filesystem provider="ssh" url="vsp-compute-01.stanford.edu"/>, it seems to work just fine.

-Robert

On Jun 3, 2013, at 8:38 PM, Michael Wilde wrote:

> I forgot to also mention:  the example below with the "ssh-cl" ("ssh command line") provider also assumes that you can do a password-less ssh command from your workstation to your PBS head node. Ie, that you have ssh keys in place on the head node and that youre using an ssh agent.
> 
> The standard Swift ssh provider (eg using provider=coaster jobmanager=ssh:pbs) uses a file called $HOME/.ssh/auth.defaults to specify ssh passwords or passphrases, or for better security swift will prompt for these.
> 
> We tend to use and recommend the newer ssh-cl for both security and convenience.
> 
> - Mike
> 
> 
> ----- Original Message -----
>> From: "Michael Wilde" <wilde at mcs.anl.gov>
>> To: "Robert McGibbon" <rmcgibbo at gmail.com>
>> Cc: swift-user at ci.uchicago.edu
>> Sent: Monday, June 3, 2013 10:27:45 PM
>> Subject: Re: [Swift-user] Setting up Swift at Stanford
>> 
>> Hi Robert,
>> 
>> To run swift from a workstation that can ssh to one or more cluster
>> head nodes, use a sites file like this:
>> 
>>  <pool handle="vsp-compute">
>>    <execution provider="coaster" jobmanager="ssh-cl:pbs"
>>    url="vsp-compute-01.stanford.edu"/>
>>    <profile namespace="globus" key="jobsPerNode">1</profile>
>>    <profile namespace="globus" key="lowOverAllocation">100</profile>
>>    <profile namespace="globus"
>>    key="highOverAllocation">100</profile>
>>    <profile namespace="globus" key="maxtime">3600</profile>
>>    <profile namespace="globus" key="maxWalltime">00:05:00</profile>
>>    <profile namespace="globus" key="queue">default</profile>
>>    <profile namespace="globus" key="slots">5</profile>
>>    <profile namespace="globus" key="maxnodes">1</profile>
>>    <profile namespace="globus" key="nodeGranularity">1</profile>
>>    <profile namespace="karajan" key="jobThrottle">1.00</profile>
>>    <profile namespace="karajan" key="initialScore">10000</profile>
>>    <workdirectory>/scratch/rmcgibbo/swiftwork</workdirectory>
>>  </pool>
>> 
>> This specifies that Swift should:
>> 
>> - use the "coaster" provider, which enables Swift to ssh to another
>> system and qsub from there:
>> 
>>  <execution provider="coaster" jobmanager="ssh-cl:pbs"
>>  url="vsp-compute-01.stanford.edu"/>
>> 
>> - run up to 100 Swift app() tasks in parallel on the remote system:
>> 
>>  <profile namespace="karajan" key="jobThrottle">1.00</profile>
>>  <profile namespace="karajan" key="initialScore">10000</profile>
>> 
>> - app() tasks should be limited to 5 minutes walltime:
>> 
>>  <profile namespace="globus" key="maxWalltime">00:05:00</profile>
>> 
>> - app() tasks will be run within PBS coaster "pilot" jobs. Each PBS
>> job should have a walltime of 750 seconds:
>> 
>>  <profile namespace="globus" key="lowOverAllocation">100</profile>
>>  <profile namespace="globus" key="highOverAllocation">100</profile>
>>  <profile namespace="globus" key="maxtime">750</profile>
>> 
>> - Up to 5 concurrent PBS coaster jobs each asking for 1 node will be
>> submitted to the default queue:
>> 
>>  <profile namespace="globus" key="queue">default</profile>
>>  <profile namespace="globus" key="slots">5</profile>
>>  <profile namespace="globus" key="maxnodes">1</profile>
>>  <profile namespace="globus" key="nodeGranularity">1</profile>
>> 
>> - Swift should run only one app() task at a time within each PBS job
>> slot:
>> 
>>  <profile namespace="globus" key="jobsPerNode">1</profile>
>> 
>> - On the remote PBS cluster, create per-run directories under this
>> work directory:
>> 
>>  <workdirectory>/scratch/rmcgibbo/swiftwork</workdirectory>
>> 
>> - And stage data to the site by using local copy operations:
>> 
>>  <filesystem provider="local"/>
>> 
>> You can make the sites.xml entry more user-independent using, e.g.:
>> 
>>    <workdirectory>/scratch/{env.USER}/swiftwork</workdirectory>
>> 
>> The overall sites entry above assumes:
>> 
>> - That /scratch/rmcgibbo is mounted on both the Swift run host and on
>> the remote PBS system.
>> 
>> If there is no common shared filesystem, Swift can use a data
>> transport technique called "coaster provider staging" to move the
>> data for you. This is specified in the swift.properties file.
>> 
>> In many cases, with a shared filesystem bewteen the Swift client host
>> and the execution cluster, its desirable to turn off staging
>> altogether. This is done using a mode called "direct" data
>> management (see
>> http://www.ci.uchicago.edu/swift/guides/trunk/userguide/userguide.html#_collective_data_management.
>> This is being simplified for future releases.)
>> 
>> - That each PBS job is given one CPU core, not one full node.
>> 
>> The PBS ppn attribute can be specified to request a specific number
>> of cores (processors) per node:
>> 
>>  <profile namespace="globus" key="ppn">16</profile>
>> 
>> ...and then that each coaster pilot job should run up to 16 Swift
>> app() tasks at once:
>> 
>>  <profile namespace="globus" key="jobsPerNode">16</profile>
>> 
>> For more info on coasters, see:
>>  http://www.ci.uchicago.edu/swift/guides/trunk/userguide/userguide.html#_coasters
>>  and: http://www.ci.uchicago.edu/swift/papers/UCC-coasters.pdf
>> 
>> For more examples on site configurations, see:
>> 
>>  http://www.ci.uchicago.edu/swift/guides/trunk/siteguide/siteguide.html
>> 
>> And lastly, note that in your initial sites.xml below:
>> 
>> - Omitting the filesystem provider tag is typically only done when
>> "use.provider.staging" is specified in the swift.properties config
>> file
>> 
>> - The stagingMethod tag only applies to provider staging.
>> 
>> We're working hard to document all this better and provider a better
>> set of illustrated examples and templates for common site
>> configurations.  In the meantime, we'll help you create a set of
>> useful configurations for your site(s).
>> 
>> Regards,
>> 
>> - Mike
>> 
>>> We just heard about the swift project from some colleagues at U
>>> Chicago, and we're interested in trying it out with some of our
>>> compute resources at Stanford to run parallel molecular dynamics
>>> and
>>> x-ray scatting simulations. Currently, I'm most interested in
>>> setting up the environment such that I can submit my swift script
>>> on
>>> a local workstation, with execution on a few different clusters.
>>> The
>>> head nodes of our local clusters are accessible via ssh, and then
>>> job execution is scheduled with pbs.
>>> 
>>> When I run swift, it can't seem to find qsub on the cluster.
>>> 
>>> rmcgibbo at Roberts-MacBook-Pro-2 ~/projects/swift
>>> $ swift -sites.file sites.xml hello.swift -tc.file tc.data
>>> Swift 0.94 swift-r6492 cog-r3658
>>> 
>>> RunID: 20130603-1704-5xii8svc
>>> Progress: time: Mon, 03 Jun 2013 17:04:10 -0700
>>> 2013-06-03 17:04:10.735 java[77051:1f07] Loading Maximizer into
>>> bundle: com.apple.javajdk16.cmd
>>> 2013-06-03 17:04:11.410 java[77051:1f07] Maximizer: Unsupported
>>> window created of class: CocoaAppWindow
>>> Progress: time: Mon, 03 Jun 2013 17:04:13 -0700 Stage in:1
>>> Execution failed:
>>> Exception in uname:
>>> Arguments: [-a]
>>> Host: vsp-compute
>>> Directory: hello-20130603-1704-5xii8svc/jobs/y/uname-ydyn5fal
>>> Caused by:
>>> Cannot submit job: Cannot run program "qsub": error=2, No such file
>>> or directory
>>> uname, hello.swift, line 8
>>> 
>>> When I switch the execution provider from pbs to ssh, the hob runs
>>> successfully, but only on the head node of the vsp-compute cluster.
>>> I'd like to run instead using the cluster's pbs queue. Any help
>>> would be greatly appreciated.
>>> 
>>> -Robert
>>> Graduate Student, Pande Lab
>>> Stanford University, Department of Chemistry
>>> 
>>> p.s.
>>> 
>>> My sitess.xml file is
>>> ```
>>> <config>
>>> <pool handle="vsp-compute">
>>> <filesystem provider="ssh" url=" vsp-compute-01.stanford.edu "/>
>>> <execution provider="pbs" jobmanager="ssh:pbs" url="
>>> vsp-compute-01.stanford.edu "/>
>>> 
>>> <profile namespace="globus" key="maxtime">750</profile>
>>> <profile namespace="globus" key="jobsPerNode">1</profile>
>>> <profile namespace="globus" key="queue">default</profile>
>>> <profile namespace="swift" key="stagingMethod">file</profile>
>>> 
>>> <workdirectory>/scratch/rmcgibbo/swiftwork</workdirectory>
>>> </pool>
>>> 
>>> <!-- End -->
>>> </config>
>>> ```
>>> 
>>> My SwiftScript is
>>> ```
>>> #hello.swift
>>> type file;
>>> 
>>> app (file o) uname() {
>>> uname "-a" stdout=@o;
>>> }
>>> file outfile <"uname.txt">;
>>> 
>>> outfile = uname();
>>> ```
>>> _______________________________________________
>>> Swift-user mailing list
>>> Swift-user at ci.uchicago.edu
>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>> _______________________________________________
>> Swift-user mailing list
>> Swift-user at ci.uchicago.edu
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>> 




More information about the Swift-user mailing list