[Swift-user] Setting up Swift at Stanford
TJ Lane
tjlane at stanford.edu
Mon Jun 3 23:28:07 CDT 2013
Mike,
Is there support for other schedulers? Specifically I have a cluster
running LSF I'd like to also farm jobs out to. Maybe this is documented
somewhere? Haven't been able to locate it yet.
Thanks a lot for your help! Looking forward to playing more w/Swift.
TJ
On Mon, Jun 3, 2013 at 8:38 PM, Michael Wilde <wilde at mcs.anl.gov> wrote:
> I forgot to also mention: the example below with the "ssh-cl" ("ssh
> command line") provider also assumes that you can do a password-less ssh
> command from your workstation to your PBS head node. Ie, that you have ssh
> keys in place on the head node and that youre using an ssh agent.
>
> The standard Swift ssh provider (eg using provider=coaster
> jobmanager=ssh:pbs) uses a file called $HOME/.ssh/auth.defaults to specify
> ssh passwords or passphrases, or for better security swift will prompt for
> these.
>
> We tend to use and recommend the newer ssh-cl for both security and
> convenience.
>
> - Mike
>
>
> ----- Original Message -----
> > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > To: "Robert McGibbon" <rmcgibbo at gmail.com>
> > Cc: swift-user at ci.uchicago.edu
> > Sent: Monday, June 3, 2013 10:27:45 PM
> > Subject: Re: [Swift-user] Setting up Swift at Stanford
> >
> > Hi Robert,
> >
> > To run swift from a workstation that can ssh to one or more cluster
> > head nodes, use a sites file like this:
> >
> > <pool handle="vsp-compute">
> > <execution provider="coaster" jobmanager="ssh-cl:pbs"
> > url="vsp-compute-01.stanford.edu"/>
> > <profile namespace="globus" key="jobsPerNode">1</profile>
> > <profile namespace="globus" key="lowOverAllocation">100</profile>
> > <profile namespace="globus"
> > key="highOverAllocation">100</profile>
> > <profile namespace="globus" key="maxtime">3600</profile>
> > <profile namespace="globus" key="maxWalltime">00:05:00</profile>
> > <profile namespace="globus" key="queue">default</profile>
> > <profile namespace="globus" key="slots">5</profile>
> > <profile namespace="globus" key="maxnodes">1</profile>
> > <profile namespace="globus" key="nodeGranularity">1</profile>
> > <profile namespace="karajan" key="jobThrottle">1.00</profile>
> > <profile namespace="karajan" key="initialScore">10000</profile>
> > <workdirectory>/scratch/rmcgibbo/swiftwork</workdirectory>
> > </pool>
> >
> > This specifies that Swift should:
> >
> > - use the "coaster" provider, which enables Swift to ssh to another
> > system and qsub from there:
> >
> > <execution provider="coaster" jobmanager="ssh-cl:pbs"
> > url="vsp-compute-01.stanford.edu"/>
> >
> > - run up to 100 Swift app() tasks in parallel on the remote system:
> >
> > <profile namespace="karajan" key="jobThrottle">1.00</profile>
> > <profile namespace="karajan" key="initialScore">10000</profile>
> >
> > - app() tasks should be limited to 5 minutes walltime:
> >
> > <profile namespace="globus" key="maxWalltime">00:05:00</profile>
> >
> > - app() tasks will be run within PBS coaster "pilot" jobs. Each PBS
> > job should have a walltime of 750 seconds:
> >
> > <profile namespace="globus" key="lowOverAllocation">100</profile>
> > <profile namespace="globus" key="highOverAllocation">100</profile>
> > <profile namespace="globus" key="maxtime">750</profile>
> >
> > - Up to 5 concurrent PBS coaster jobs each asking for 1 node will be
> > submitted to the default queue:
> >
> > <profile namespace="globus" key="queue">default</profile>
> > <profile namespace="globus" key="slots">5</profile>
> > <profile namespace="globus" key="maxnodes">1</profile>
> > <profile namespace="globus" key="nodeGranularity">1</profile>
> >
> > - Swift should run only one app() task at a time within each PBS job
> > slot:
> >
> > <profile namespace="globus" key="jobsPerNode">1</profile>
> >
> > - On the remote PBS cluster, create per-run directories under this
> > work directory:
> >
> > <workdirectory>/scratch/rmcgibbo/swiftwork</workdirectory>
> >
> > - And stage data to the site by using local copy operations:
> >
> > <filesystem provider="local"/>
> >
> > You can make the sites.xml entry more user-independent using, e.g.:
> >
> > <workdirectory>/scratch/{env.USER}/swiftwork</workdirectory>
> >
> > The overall sites entry above assumes:
> >
> > - That /scratch/rmcgibbo is mounted on both the Swift run host and on
> > the remote PBS system.
> >
> > If there is no common shared filesystem, Swift can use a data
> > transport technique called "coaster provider staging" to move the
> > data for you. This is specified in the swift.properties file.
> >
> > In many cases, with a shared filesystem bewteen the Swift client host
> > and the execution cluster, its desirable to turn off staging
> > altogether. This is done using a mode called "direct" data
> > management (see
> >
> http://www.ci.uchicago.edu/swift/guides/trunk/userguide/userguide.html#_collective_data_management
> .
> > This is being simplified for future releases.)
> >
> > - That each PBS job is given one CPU core, not one full node.
> >
> > The PBS ppn attribute can be specified to request a specific number
> > of cores (processors) per node:
> >
> > <profile namespace="globus" key="ppn">16</profile>
> >
> > ...and then that each coaster pilot job should run up to 16 Swift
> > app() tasks at once:
> >
> > <profile namespace="globus" key="jobsPerNode">16</profile>
> >
> > For more info on coasters, see:
> >
> http://www.ci.uchicago.edu/swift/guides/trunk/userguide/userguide.html#_coasters
> > and: http://www.ci.uchicago.edu/swift/papers/UCC-coasters.pdf
> >
> > For more examples on site configurations, see:
> >
> > http://www.ci.uchicago.edu/swift/guides/trunk/siteguide/siteguide.html
> >
> > And lastly, note that in your initial sites.xml below:
> >
> > - Omitting the filesystem provider tag is typically only done when
> > "use.provider.staging" is specified in the swift.properties config
> > file
> >
> > - The stagingMethod tag only applies to provider staging.
> >
> > We're working hard to document all this better and provider a better
> > set of illustrated examples and templates for common site
> > configurations. In the meantime, we'll help you create a set of
> > useful configurations for your site(s).
> >
> > Regards,
> >
> > - Mike
> >
> > > We just heard about the swift project from some colleagues at U
> > > Chicago, and we're interested in trying it out with some of our
> > > compute resources at Stanford to run parallel molecular dynamics
> > > and
> > > x-ray scatting simulations. Currently, I'm most interested in
> > > setting up the environment such that I can submit my swift script
> > > on
> > > a local workstation, with execution on a few different clusters.
> > > The
> > > head nodes of our local clusters are accessible via ssh, and then
> > > job execution is scheduled with pbs.
> > >
> > > When I run swift, it can't seem to find qsub on the cluster.
> > >
> > > rmcgibbo at Roberts-MacBook-Pro-2 ~/projects/swift
> > > $ swift -sites.file sites.xml hello.swift -tc.file tc.data
> > > Swift 0.94 swift-r6492 cog-r3658
> > >
> > > RunID: 20130603-1704-5xii8svc
> > > Progress: time: Mon, 03 Jun 2013 17:04:10 -0700
> > > 2013-06-03 17:04:10.735 java[77051:1f07] Loading Maximizer into
> > > bundle: com.apple.javajdk16.cmd
> > > 2013-06-03 17:04:11.410 java[77051:1f07] Maximizer: Unsupported
> > > window created of class: CocoaAppWindow
> > > Progress: time: Mon, 03 Jun 2013 17:04:13 -0700 Stage in:1
> > > Execution failed:
> > > Exception in uname:
> > > Arguments: [-a]
> > > Host: vsp-compute
> > > Directory: hello-20130603-1704-5xii8svc/jobs/y/uname-ydyn5fal
> > > Caused by:
> > > Cannot submit job: Cannot run program "qsub": error=2, No such file
> > > or directory
> > > uname, hello.swift, line 8
> > >
> > > When I switch the execution provider from pbs to ssh, the hob runs
> > > successfully, but only on the head node of the vsp-compute cluster.
> > > I'd like to run instead using the cluster's pbs queue. Any help
> > > would be greatly appreciated.
> > >
> > > -Robert
> > > Graduate Student, Pande Lab
> > > Stanford University, Department of Chemistry
> > >
> > > p.s.
> > >
> > > My sitess.xml file is
> > > ```
> > > <config>
> > > <pool handle="vsp-compute">
> > > <filesystem provider="ssh" url=" vsp-compute-01.stanford.edu "/>
> > > <execution provider="pbs" jobmanager="ssh:pbs" url="
> > > vsp-compute-01.stanford.edu "/>
> > >
> > > <profile namespace="globus" key="maxtime">750</profile>
> > > <profile namespace="globus" key="jobsPerNode">1</profile>
> > > <profile namespace="globus" key="queue">default</profile>
> > > <profile namespace="swift" key="stagingMethod">file</profile>
> > >
> > > <workdirectory>/scratch/rmcgibbo/swiftwork</workdirectory>
> > > </pool>
> > >
> > > <!-- End -->
> > > </config>
> > > ```
> > >
> > > My SwiftScript is
> > > ```
> > > #hello.swift
> > > type file;
> > >
> > > app (file o) uname() {
> > > uname "-a" stdout=@o;
> > > }
> > > file outfile <"uname.txt">;
> > >
> > > outfile = uname();
> > > ```
> > > _______________________________________________
> > > Swift-user mailing list
> > > Swift-user at ci.uchicago.edu
> > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
> > _______________________________________________
> > Swift-user mailing list
> > Swift-user at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
> >
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20130603/a55acb01/attachment.html>
More information about the Swift-user
mailing list