[Swift-commit] r7195 - SwiftTutorials/swift-cloud-tutorial/doc

davidk at ci.uchicago.edu davidk at ci.uchicago.edu
Mon Oct 21 18:17:05 CDT 2013


Author: davidk
Date: 2013-10-21 18:17:04 -0500 (Mon, 21 Oct 2013)
New Revision: 7195

Modified:
   SwiftTutorials/swift-cloud-tutorial/doc/README
Log:
Updates to readme to make less osdc specific


Modified: SwiftTutorials/swift-cloud-tutorial/doc/README
===================================================================
--- SwiftTutorials/swift-cloud-tutorial/doc/README	2013-10-21 22:55:17 UTC (rev 7194)
+++ SwiftTutorials/swift-cloud-tutorial/doc/README	2013-10-21 23:17:04 UTC (rev 7195)
@@ -1,34 +1,10 @@
-Swift Tutorial for Open Science Data Cloud Resources
-====================================================
+Swift Tutorial for Cloud and Ad hoc Resources
+=============================================
 
-//// 
-
-This is the asciidoc input file.
-Its content is viewable as a plain-text README file.
-
-////
-
 This tutorial is viewable at:
-http://swiftlang.org/tutorials/osdc/tutorial.html
+http://swiftlang.org/tutorials/cloud
 
-////
 
-Tutorial Outline:
-
-Introductory example, running apps locally on login node:
-
-  p1 - Run an application under Swift
-  p2 - Parallel loops with foreach
-  p3 - Merging/reducing the results of a parallel foreach loop
-
-Compute-node exercises, running apps via qsub and aprun:
-
-  p4 - Running apps on OSDC compute nodes
-  p5 - Running on multiple pools of compute nodes
-  p6 - Running a more complex workflow pattern
-
-////
-
 Introduction: Why Parallel Scripting?
 ------------------------------------
 
@@ -48,37 +24,56 @@
 clusters, clouds, grids, and supercomputers.
 
 In this tutorial, you'll be able to first try a few Swift examples
-(parts 1-3) on an OSDC login host, to get a sense of the
-language. Then in parts 4-6 you'll run similar workflows on OSDC
+(parts 1-3) on your local machine, to get a sense of the
+language. Then in parts 4-6 you'll run similar workflows on cloud
 compute nodes, and see how more complex workflows can be expressed
 with Swift scripts.
 
-Swift tutorial setup
---------------------
-To begin, start a few virtual machines running the Ubuntu image.
-Once the instances have started, run the following commands to
-install the tutorial scripts on an OSDC login host:
+Swift installation
+------------------
+To install and setup Swift, please refer to the Swift quickstart
+guide at http://swiftlang.org/guides/quickstart.html.
 
+Swift tutorial installation
+---------------------------
+If you are running on cloud resources, please start your instances now.
+For both ad hoc and cloud resources, you will need to collect a list of
+the IP addresses you will be using.
+
+Run the following commands to extract these tutorial scripts.
+
 -----
 $ cd $HOME
-$ wget http://swiftlang.org/tutorials/osdc/swift-osdc-tutorial.tar.gz
-$ tar xvfz swift-osdc-tutorial.tgz
-$ cd swift-osdc-tutorial
-$ source setup.sh   # You must run this with "source" !
+$ wget http://swiftlang.org/tutorials/cloud/swift-cloud-tutorial.tar.gz
+$ tar xvfz swift-cloud-tutorial.tgz
+$ cd swift-cloud-tutorial
 -----
 
-Verify your environment
-~~~~~~~~~~~~~~~~~~~~~~~
+Next, edit the file scs/coaster-service.conf. There are three lines you will
+need to edit.
 
-To verify that Swift (and the Java environment it requires) are working, do:
+WORKER_LOCATION defines the path that will be used for copying files to the remote
+node. Usually you can set this to your home directory on the worker node.
+-----
+export WORKER_LOCATION=/home/ubuntu
+----
 
+Set WORKER_HOSTS to be the list of space separated IP addresses that will be used
+as workers.
 -----
-$ java -version   # verify that you have Java (ideally Oracle JAVA 1.6 or later)
-$ swift -version  # verify that you have Swift 0.94.1
+export WORKER_HOSTS="10.1.2.10 10.1.2.11"
 -----
 
-NOTE: If you re-login or open new ssh sessions, you must re-run `source setup.sh` in each ssh shell/window.
+Set WORKER_USERNAME to be the username to use on the worker nodes.
+-----
+export WORKER_USERNAME=ubuntu
+-----
 
+Next, run the tutorial setup script
+----- 
+$ source setup.sh   # You must run this with "source" !
+-----
+
 To check out the tutorial scripts from SVN
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -86,10 +81,10 @@
 the Swift Subversion repository, do:
 
 -----
-$ svn co https://svn.ci.uchicago.edu/svn/vdl2/SwiftTutorials/swift-osdc-tutorial OSDC-Swift
+$ svn co https://svn.ci.uchicago.edu/svn/vdl2/SwiftTutorials/swift-cloud-tutorial
 -----
 
-This will create a directory called "OSDC-Swift" which contains all of the
+This will create a directory called "swift-cloud-tutorial" which contains all of the
 files used in this tutorial.
 
 
@@ -481,14 +476,14 @@
 on the site's compute nodes.
 
 
-Running applications on OSDC compute nodes with Swift
------------------------------------------------------
+Running applications on compute nodes with Swift
+------------------------------------------------
 
-Part 4: Running a parallel ensemble on OSDC compute nodes
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Part 4: Running a parallel ensemble on compute nodes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 `p4.swift` will run our mock "simulation"
-applications on OSDC compute nodes.  The script is similar to as
+applications on compute nodes.  The script is similar to as
 `p3.swift`, but specifies that each simulation app invocation should
 additionally return the log file which the application writes to
 `stderr`.
@@ -512,7 +507,7 @@
 `sim_N.log`.  The log files provide data on the runtime environment of
 each app invocation. For example:
 
-FIXME: The output below needs to get recaptured for OSDC nodes
+FIXME: The output below needs to get recaptured for worker nodes
 
 -----
 $ cat output/sim_0.log
@@ -612,135 +607,6 @@
 nid00008
 -----
 
-Swift's `sites.xml` configuration file allows many parameters to
-specify how jobs should be run on a given cluster.
-
-FIXME: Translate this concept from Cray to OSDC:
-
-Consider for example that Raven has several queues, each with
-limitiations on the size of jobs that can be run in them.  All Raven
-queues will only run 2 jobs per user at one. The Raven queue "small"
-will only allow up to 4 nodes per job and 1 hours of walltime per job.
-The following site.xml parameters will allow us to match this:
-
------
-  <profile namespace="globus" key="queue">small</profile>
-  <profile namespace="globus" key="slots">2</profile>
-  <profile namespace="globus" key="maxNodes">4</profile>
-  <profile namespace="globus" key="nodeGranularity">4</profile>
------
-
-To run large jobs, we can specify:
-
------
-  <profile namespace="globus" key="slots">2</profile>
-  <profile namespace="globus" key="maxNodes">8</profile>
-  <profile namespace="globus" key="nodeGranularity">8</profile>
-  <profile namespace="karajan" key="jobThrottle">50.0</profile>
-  <profile namespace="globus" key="maxTime">21600</profile>
-  <profile namespace="globus" key="lowOverAllocation">10000</profile>
-  <profile namespace="globus" key="highOverAllocation">10000</profile>
------
-
-This will enable 512 Swift apps (2 x 8 x 32) to run concurrently
-within 2 8-node jobs on Raven's 32-core nodes.  It results in the
-following two PBS jobs submitted by Swift to "provision" compute nodes
-to run thousands of apps, 512 at a time:
-
------
-$ qstat -u $USER
-
-Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
---------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
-288637.sdb      p01532   medium   B0827-2703    --    8 256    --  05:59 Q   -- 
-288638.sdb      p01532   medium   B0827-2703    --    8 256    --  05:59 Q   -- 
------
-
-The following section is a summary of the important `sites.xml`
-attributes for running apps on Cray systems. Many of these attributes
-can be set the same for all Swift users of a given system; only a few
-of the attributes need be overridden by users. We explain these
-attributes in detail here to show the degree of control afforded by
-Swift over application execution.  Most users will use templates for a
-given Cray system, only changing a few parameters to meet any unique
-needs of their application workflows.
-
-////
-.sites.xml
------
-sys::[egrep -v '<.xml|<config|</config' ../part04/sites.xml | cat -n ]
------
-////
-
-The additional attributes in the `sites.xml` file (described here
-without their XML formatting) specify that Swift should run
-applications on Raven in the following manner:
-
-`execution provider coaster, jobmanager local:pbs` specifies that
-Swift should run apps using its "coaster" provider, which submits
-"pilot jobs" using qsub. These pilot jobs hold on to compute nodes and
-allow Swift to run many app invocations within a single job. This
-mechanism is described in
-http://www.swift-lang.org/papers/UCC-coasters.pdf[this paper from UCC-2011].
-
-`profile` tags specify additional attributes for the execution
-provider. (A "provider" is like a driver which knows how to handle
-site-specific aspects of app execution). The attributes are grouped
-into various "namespaces", but we can ignore this for now).
-
-The `env` key `PATHPREFIX` specifies that our tutorial `app` directory
-(`../app`) will be placed at the front of PATH to locate the app on
-the compute node.
-
-`queue small` specifies that pilot (coaster) jobs to run apps will be
-submitted to Raven's `small` queue.
-
-`providerAttributes pbs.aprun;pbs.mpp;depth=32` specifies some
-Cray-specific attributes: that jobs should use Cray-specific PBS "mpp"
-resource attributes (eg `mppwidth` and `mppnppn`) and an mppdepth of
-32 (because we will be running one coaster process per node, and
-Raven's XE6 dual IL-16 nodes have a depth of 32 processing elements
-(PEs).
-
-`jobsPerNode 32` tells Swift that each coaster should run up to 32
-concurrent apps. This can be reduced to place fewer apps per node, eg
-if each app needs more memory (or, rarely, greater than 32, e.g. if the apps are
-IO-bound or for benchmark experiments, etc).
-
-`slots 2` specifies that Swift will run up to 2 concurrent PBS jobs,
-and `maxNodes 1` specifies that each of these jobs will request only 1
-compute node.
-
-`maxWallTime 00:01:00` specifies that Swift should allow each app to
-run for up to one minute of walltime within the larger pilot job. In
-this example Swift will dynamically determine the total PBS walltime
-needed for the pilot job, but this can be specified manually using
-attributes `maxtime` along with `highOverAllocation` and
-`lowOverAllocation`.
-
-`jobThrottle 3.20` specifies that Swift should allow up to 320 apps to
-run on the `raven` site at once.  This is typically set to a number
-greater than or equal to the number of slots x compute nodes x apps
-per node (`jobsPerNode` attribute).
-
-`initialscore 10000` is specified to override Swift's automatic
-throttling, and forces an actual throttle value of approximately
-(specifically 1 over) `jobThrottle` * 100 to be used.
-
-The last two attributes specify where and how Swift should perform
-data management.  `workdirectory /lus/scratch/{env.USER}/swiftwork`
-specifies where the Swift "application execution sanbox directory"
-used for each app will be located. In some situations this can be a
-directory local to the compute node (eg, for Cray systems, `/dev/shm`
-or `/tmp`, if those are writable by user jobs and the nodes have
-sufficient space in these RAM-based filesystems).
-
-Finally, `stagingMethod sfs` specifies that Swift will copy data to
-and from the shared file system to the application sandbox
-directories.
-
-////
-
 Performing larger Swift runs
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -752,52 +618,6 @@
 $ swift p6.swift -steps=5 -nsim=1000
 -----
 
-FIXME: Adjust from Cray to OSDC concepts: (perhaps show how to add more nodes dynamically)
-
-The other change required is to the sites.xml to control how many nodes you request.
-Change the "maxnodes" values to control the number of nodes to request.
-
------
-<profile namespace="globus" key="maxNodes">2</profile>
------
-
-////
-
-Plotting run activity
-~~~~~~~~~~~~~~~~~~~~~
-
-The tutorial `bin` directory in your `PATH` provides a script
-`plot.sh` to plot the progress of a Swift script.  It generates two
-image files: `activeplot.png`, which shows the number of active jobs
-over time, and `cumulativeplot.png`, which shows the total number of
-app calls completed as the Swift script progresses.
-
-After each swift run, a log file will be created called
-partNN-<YYYYmmdd>-<hhmm>-<random>.log.  Once you have identified the
-log file name, run the command `./plot.sh` <logfile>` (where logfile
-is the most recent Swift run log) to generate the plots for that
-specific run. For example:
-
------
-$ ls -lt *.log | head
--rw-r--r-- 1 p01532 61532 2237693 Aug 26 12:45 p4-20130826-1244-kmos0d87.log
--rw-r--r-- 1 p01532 61532    1008 Aug 26 12:44 swift.log
--rw-r--r-- 1 p01532 61532 5345345 Aug 26 12:44 p4-20130826-1243-10u2qdbd.log
--rw-r--r-- 1 p01532 61532  357687 Aug 26 12:00 p4-20130826-1159-j01p4lu0.log
-...
-$ plot.sh p4-20130826-1244-kmos0d87.log
------
-
-This yields plots like:
-
-image::activeplot.png[width=700,align=center]
-image::cumulativeplot.png[width=700,align=center]
-
-NOTE: Because systems like Raven are often firewalled, you may need to
-use scp to pull these image files back to a system on which you can
-view them with a browser or preview tool.
-
-
 Part 5: Controlling the compute-node pools where applications run
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 




More information about the Swift-commit mailing list