[Swift-user] Setting up Swift at Stanford
Robert McGibbon
rmcgibbo at gmail.com
Mon Jun 3 22:59:02 CDT 2013
Thanks for your help, Michael! With your suggestions, I got swift working. Here's my config: http://rmcgibbo.github.io/blog/2013/06/03/setting-up-swift/
One thing I couldn't get working is <filesystem provider="local"/>. When I have that in my sites.xml, I get
$ swift uname.swift
Swift 0.94 swift-r6492 cog-r3658
RunID: 20130603-2056-7octkf3a
Progress: time: Mon, 03 Jun 2013 20:56:13 -0700
Execution failed:
Could not initialize shared directory on vsp-compute
Caused by:
org.globus.cog.abstraction.impl.file.FileResourceException: Failed to create directory: /home/rmcgibbo/.swiftwork/uname-20130603-2056-7octkf3a/shared
uname, uname.swift, line 10
But with <filesystem provider="ssh" url="vsp-compute-01.stanford.edu"/>, it seems to work just fine.
-Robert
On Jun 3, 2013, at 8:38 PM, Michael Wilde wrote:
> I forgot to also mention: the example below with the "ssh-cl" ("ssh command line") provider also assumes that you can do a password-less ssh command from your workstation to your PBS head node. Ie, that you have ssh keys in place on the head node and that youre using an ssh agent.
>
> The standard Swift ssh provider (eg using provider=coaster jobmanager=ssh:pbs) uses a file called $HOME/.ssh/auth.defaults to specify ssh passwords or passphrases, or for better security swift will prompt for these.
>
> We tend to use and recommend the newer ssh-cl for both security and convenience.
>
> - Mike
>
>
> ----- Original Message -----
>> From: "Michael Wilde" <wilde at mcs.anl.gov>
>> To: "Robert McGibbon" <rmcgibbo at gmail.com>
>> Cc: swift-user at ci.uchicago.edu
>> Sent: Monday, June 3, 2013 10:27:45 PM
>> Subject: Re: [Swift-user] Setting up Swift at Stanford
>>
>> Hi Robert,
>>
>> To run swift from a workstation that can ssh to one or more cluster
>> head nodes, use a sites file like this:
>>
>> <pool handle="vsp-compute">
>> <execution provider="coaster" jobmanager="ssh-cl:pbs"
>> url="vsp-compute-01.stanford.edu"/>
>> <profile namespace="globus" key="jobsPerNode">1</profile>
>> <profile namespace="globus" key="lowOverAllocation">100</profile>
>> <profile namespace="globus"
>> key="highOverAllocation">100</profile>
>> <profile namespace="globus" key="maxtime">3600</profile>
>> <profile namespace="globus" key="maxWalltime">00:05:00</profile>
>> <profile namespace="globus" key="queue">default</profile>
>> <profile namespace="globus" key="slots">5</profile>
>> <profile namespace="globus" key="maxnodes">1</profile>
>> <profile namespace="globus" key="nodeGranularity">1</profile>
>> <profile namespace="karajan" key="jobThrottle">1.00</profile>
>> <profile namespace="karajan" key="initialScore">10000</profile>
>> <workdirectory>/scratch/rmcgibbo/swiftwork</workdirectory>
>> </pool>
>>
>> This specifies that Swift should:
>>
>> - use the "coaster" provider, which enables Swift to ssh to another
>> system and qsub from there:
>>
>> <execution provider="coaster" jobmanager="ssh-cl:pbs"
>> url="vsp-compute-01.stanford.edu"/>
>>
>> - run up to 100 Swift app() tasks in parallel on the remote system:
>>
>> <profile namespace="karajan" key="jobThrottle">1.00</profile>
>> <profile namespace="karajan" key="initialScore">10000</profile>
>>
>> - app() tasks should be limited to 5 minutes walltime:
>>
>> <profile namespace="globus" key="maxWalltime">00:05:00</profile>
>>
>> - app() tasks will be run within PBS coaster "pilot" jobs. Each PBS
>> job should have a walltime of 750 seconds:
>>
>> <profile namespace="globus" key="lowOverAllocation">100</profile>
>> <profile namespace="globus" key="highOverAllocation">100</profile>
>> <profile namespace="globus" key="maxtime">750</profile>
>>
>> - Up to 5 concurrent PBS coaster jobs each asking for 1 node will be
>> submitted to the default queue:
>>
>> <profile namespace="globus" key="queue">default</profile>
>> <profile namespace="globus" key="slots">5</profile>
>> <profile namespace="globus" key="maxnodes">1</profile>
>> <profile namespace="globus" key="nodeGranularity">1</profile>
>>
>> - Swift should run only one app() task at a time within each PBS job
>> slot:
>>
>> <profile namespace="globus" key="jobsPerNode">1</profile>
>>
>> - On the remote PBS cluster, create per-run directories under this
>> work directory:
>>
>> <workdirectory>/scratch/rmcgibbo/swiftwork</workdirectory>
>>
>> - And stage data to the site by using local copy operations:
>>
>> <filesystem provider="local"/>
>>
>> You can make the sites.xml entry more user-independent using, e.g.:
>>
>> <workdirectory>/scratch/{env.USER}/swiftwork</workdirectory>
>>
>> The overall sites entry above assumes:
>>
>> - That /scratch/rmcgibbo is mounted on both the Swift run host and on
>> the remote PBS system.
>>
>> If there is no common shared filesystem, Swift can use a data
>> transport technique called "coaster provider staging" to move the
>> data for you. This is specified in the swift.properties file.
>>
>> In many cases, with a shared filesystem bewteen the Swift client host
>> and the execution cluster, its desirable to turn off staging
>> altogether. This is done using a mode called "direct" data
>> management (see
>> http://www.ci.uchicago.edu/swift/guides/trunk/userguide/userguide.html#_collective_data_management.
>> This is being simplified for future releases.)
>>
>> - That each PBS job is given one CPU core, not one full node.
>>
>> The PBS ppn attribute can be specified to request a specific number
>> of cores (processors) per node:
>>
>> <profile namespace="globus" key="ppn">16</profile>
>>
>> ...and then that each coaster pilot job should run up to 16 Swift
>> app() tasks at once:
>>
>> <profile namespace="globus" key="jobsPerNode">16</profile>
>>
>> For more info on coasters, see:
>> http://www.ci.uchicago.edu/swift/guides/trunk/userguide/userguide.html#_coasters
>> and: http://www.ci.uchicago.edu/swift/papers/UCC-coasters.pdf
>>
>> For more examples on site configurations, see:
>>
>> http://www.ci.uchicago.edu/swift/guides/trunk/siteguide/siteguide.html
>>
>> And lastly, note that in your initial sites.xml below:
>>
>> - Omitting the filesystem provider tag is typically only done when
>> "use.provider.staging" is specified in the swift.properties config
>> file
>>
>> - The stagingMethod tag only applies to provider staging.
>>
>> We're working hard to document all this better and provider a better
>> set of illustrated examples and templates for common site
>> configurations. In the meantime, we'll help you create a set of
>> useful configurations for your site(s).
>>
>> Regards,
>>
>> - Mike
>>
>>> We just heard about the swift project from some colleagues at U
>>> Chicago, and we're interested in trying it out with some of our
>>> compute resources at Stanford to run parallel molecular dynamics
>>> and
>>> x-ray scatting simulations. Currently, I'm most interested in
>>> setting up the environment such that I can submit my swift script
>>> on
>>> a local workstation, with execution on a few different clusters.
>>> The
>>> head nodes of our local clusters are accessible via ssh, and then
>>> job execution is scheduled with pbs.
>>>
>>> When I run swift, it can't seem to find qsub on the cluster.
>>>
>>> rmcgibbo at Roberts-MacBook-Pro-2 ~/projects/swift
>>> $ swift -sites.file sites.xml hello.swift -tc.file tc.data
>>> Swift 0.94 swift-r6492 cog-r3658
>>>
>>> RunID: 20130603-1704-5xii8svc
>>> Progress: time: Mon, 03 Jun 2013 17:04:10 -0700
>>> 2013-06-03 17:04:10.735 java[77051:1f07] Loading Maximizer into
>>> bundle: com.apple.javajdk16.cmd
>>> 2013-06-03 17:04:11.410 java[77051:1f07] Maximizer: Unsupported
>>> window created of class: CocoaAppWindow
>>> Progress: time: Mon, 03 Jun 2013 17:04:13 -0700 Stage in:1
>>> Execution failed:
>>> Exception in uname:
>>> Arguments: [-a]
>>> Host: vsp-compute
>>> Directory: hello-20130603-1704-5xii8svc/jobs/y/uname-ydyn5fal
>>> Caused by:
>>> Cannot submit job: Cannot run program "qsub": error=2, No such file
>>> or directory
>>> uname, hello.swift, line 8
>>>
>>> When I switch the execution provider from pbs to ssh, the hob runs
>>> successfully, but only on the head node of the vsp-compute cluster.
>>> I'd like to run instead using the cluster's pbs queue. Any help
>>> would be greatly appreciated.
>>>
>>> -Robert
>>> Graduate Student, Pande Lab
>>> Stanford University, Department of Chemistry
>>>
>>> p.s.
>>>
>>> My sitess.xml file is
>>> ```
>>> <config>
>>> <pool handle="vsp-compute">
>>> <filesystem provider="ssh" url=" vsp-compute-01.stanford.edu "/>
>>> <execution provider="pbs" jobmanager="ssh:pbs" url="
>>> vsp-compute-01.stanford.edu "/>
>>>
>>> <profile namespace="globus" key="maxtime">750</profile>
>>> <profile namespace="globus" key="jobsPerNode">1</profile>
>>> <profile namespace="globus" key="queue">default</profile>
>>> <profile namespace="swift" key="stagingMethod">file</profile>
>>>
>>> <workdirectory>/scratch/rmcgibbo/swiftwork</workdirectory>
>>> </pool>
>>>
>>> <!-- End -->
>>> </config>
>>> ```
>>>
>>> My SwiftScript is
>>> ```
>>> #hello.swift
>>> type file;
>>>
>>> app (file o) uname() {
>>> uname "-a" stdout=@o;
>>> }
>>> file outfile <"uname.txt">;
>>>
>>> outfile = uname();
>>> ```
>>> _______________________________________________
>>> Swift-user mailing list
>>> Swift-user at ci.uchicago.edu
>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>> _______________________________________________
>> Swift-user mailing list
>> Swift-user at ci.uchicago.edu
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>>
More information about the Swift-user
mailing list