[Swift-commit] r6981 - in SwiftTutorials/OSG-Swift: app doc
wilde at ci.uchicago.edu
wilde at ci.uchicago.edu
Fri Aug 23 13:15:39 CDT 2013
Author: wilde
Date: 2013-08-23 13:15:39 -0500 (Fri, 23 Aug 2013)
New Revision: 6981
Modified:
SwiftTutorials/OSG-Swift/app/simulate.sh
SwiftTutorials/OSG-Swift/app/stats.sh
SwiftTutorials/OSG-Swift/doc/README
Log:
Moved last changes from CIC to here.
Modified: SwiftTutorials/OSG-Swift/app/simulate.sh
===================================================================
--- SwiftTutorials/OSG-Swift/app/simulate.sh 2013-08-23 18:13:01 UTC (rev 6980)
+++ SwiftTutorials/OSG-Swift/app/simulate.sh 2013-08-23 18:15:39 UTC (rev 6981)
@@ -22,7 +22,7 @@
printf "Running as user: "; /usr/bin/id
printparams
printf "\nEnvironment:\n\n"
- /bin/env | /bin/sort
+ printenv | sort
}
addsims() {
Modified: SwiftTutorials/OSG-Swift/app/stats.sh
===================================================================
--- SwiftTutorials/OSG-Swift/app/stats.sh 2013-08-23 18:13:01 UTC (rev 6980)
+++ SwiftTutorials/OSG-Swift/app/stats.sh 2013-08-23 18:15:39 UTC (rev 6981)
@@ -7,3 +7,17 @@
END { printf("%d\n",sum/NR) }
' $*
+log() {
+ printf "\nCalled as: $0: $cmdargs\n\n"
+ printf "Start time: "; /bin/date
+ printf "Running on node: "; /bin/hostname
+ printf "Running as user: "; /usr/bin/id
+ printf "\nEnvironment:\n\n"
+ printenv | sort
+}
+
+log 1>&2
+
+
+
+
Modified: SwiftTutorials/OSG-Swift/doc/README
===================================================================
--- SwiftTutorials/OSG-Swift/doc/README 2013-08-23 18:13:01 UTC (rev 6980)
+++ SwiftTutorials/OSG-Swift/doc/README 2013-08-23 18:15:39 UTC (rev 6981)
@@ -1,11 +1,11 @@
-OSGconnect Swift Tutorial - 2013.0827
-=====================================
+Tutorial: Swift parallel scripting on OSG Connect
+================================================
////
Outline
-Introductory exercises
+* Introductory exercises
p1 - Run an application under Swift
@@ -19,47 +19,96 @@
p6 - Add additional apps for generating seeds remotely
+* Advanced exercises
+
+Running R and BLAST
+
+Running on multiple resources
+
+Using OSG Connect and Globus Data Services
+
////
+Introduction: Why Parallel Scripting?
+------------------------------------
+
+Swift is a simple scripting language for executing many instances of
+ordinary application programs on distributed parallel resources.
+Swift scripts run many copies of ordinary programs concurrently, using
+statements like this:
+
+-----
+foreach protein in proteinList {
+ runBLAST(protein);
+}
+-----
+
+Swift acts like a structured "shell" language. It runs programs
+concurrently as soon as their inputs are available, reducing the need
+for complex parallel programming. Swift expresses your workflow
+in a portable fashion: The same script runs on grids like OSG, as well
+as on multicore computers, clusters, clouds, and supercomputers.
+
+In this tutorial, you'll be able to first try a few Swift examples on
+the OSG Connect login host, to get a sense of the language. Then
+you'll run similar workflows on distributed OSG resources, and see how
+more complex workflows can be expressed as scripts.
+
Workflow tutorial setup
-----------------------
-Check out scripts from SVN
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-To checkout the most recent CIC tutorial scripts from SVN, run the following
-command:
+To get started, do:
-----
-$ svn co https://svn.ci.uchicago.edu/svn/vdl2/SwiftTutorials/CIC_2013-08-09 tutorial
+$ cd $HOME
+$ tutorial osg-swift
+$ cd osg-swift
+
+Verify your environment
+~~~~~~~~~~~~~~~~~~~~~~~
+
+To verify that Swift (and the Java environment it requires) are working, do:
+
-----
+$ java -version # verify that you have Oracle JAVA 1.6 or later
+$ swift -version # verify that you have Swift 0.94.1 (RC2 revision)
+-----
-This will create a directory called "tutorial" which contains all of the
-scripts mentioned in this document.
+NOTE: If you re-login or open new ssh sessions, you will need to
+re-run `source setup.sh` in each ssh window:
-Run setup
-~~~~~~~~~
-Once the scripts are checked out, run the following commands to perform
-the initial setup.
+-----
+$ cd $HOME/osg-swift # change to the newly created tutorial directory
+$ source setup.sh # sets PATH and swift config files
+-----
+To check out the tutorial scripts from SVN
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If later you want to checkout the most recent Swift tutorial scripts for OSG Connect from SVN, do:
+
-----
-$ cd tutorial # change to the newly created tutorial directory
-$ source setup.sh # sets swift config files in $HOME/.swift
-$ java -version # verify that you have Oracle JAVA (prefered; 1.6 or later)
-$ swift -version # verify that Swift 0.94 is in your $PATH and functional
+$ svn co https://svn.ci.uchicago.edu/svn/vdl2/SwiftTutorials/OSG-Swift
-----
-NOTE: If you re-login, you will need to re-run source setup.sh.
+This will create a directory called "OSG-Swift" which contains all of the
+file used in this tutorial.
Simple "science applications" for the workflow tutorial
-------------------------------------------------------
-There are two shell scripts included that serve a very simple stand-ins for science application:
-simulation.sh and stats.sh
+There are two shell scripts included that serve a very simple
+stand-ins for science application: simulation.sh and stats.sh
+
simulation.sh
~~~~~~~~~~~~
-The simulation.sh script serves as a trivial substitute for a complex scientific simulation application. It generates and prints a set of one or more random integers in the range [0-2^32) as controlled by its optional arguments, which are:
+The simulation.sh script serves as a trivial substitute for a complex
+scientific simulation application. It generates and prints a set of
+one or more random integers in the range [0-2^32) as controlled by its
+optional arguments, which are:
+
-----
$ ./app/simulate.sh --help
./app/simulate.sh: usage:
@@ -90,7 +139,10 @@
|=======================
////
-With no arguments, simulate.sh prints 1 number in the range of 1-100. Otherwise it generates n numbers of the form (R*scale)+bias where R is a random integer. By default it logs information about its execution environment to stderr. Here's some examples of its usage:
+With no arguments, simulate.sh prints 1 number in the range of
+1-100. Otherwise it generates n numbers of the form (R*scale)+bias
+where R is a random integer. By default it logs information about its
+execution environment to stderr. Here's some examples of its usage:
-----
$ simulate.sh 2>log
@@ -110,24 +162,23 @@
USER=wilde
$
-$ simulate.sh -n 3 -r 1000000 2>log
+$ simulate.sh -n 4 -r 1000000 2>log
+ 239454
386702
- 239454
13849
+ 873526
$ simulate.sh -n 3 -r 1000000 -x 100 2>log
6643700
62182300
5230600
-$ simulate.sh -n 3 -r 1000 -x 1000 2>log
+$ simulate.sh -n 2 -r 1000 -x 1000 2>log
565000
636000
- 477000
-$ time simulate.sh -n 3 -r 1000 -x 1000 -t 3 2>log
+$ time simulate.sh -n 2 -r 1000 -x 1000 -t 3 2>log
336000
- 20000
320000
real 0m3.012s
user 0m0.005s
@@ -137,22 +188,41 @@
-----
-
stats.sh
~~~~~~~
-The stats.sh script reads a file containing n numbers and prints the average
-of those numbers to stdout.
+The stats.sh script serves as a trivial model of an "analysis" program. It reads N files each containing M integers and simply prints the average
+of all those numbers to stdout. Similarly to simulate.sh it logs environmental information to the stderr.
+-----
+$ ls f*
+f1 f2 f3 f4
+
+$ cat f*
+25
+60
+40
+75
+
+$ stats.sh f*
+50
+-----
+
+
OSG Connect exercises
---------------------
-Parts 1-3 (p1.swift - p3.swift) run on your login host and serve as examples of the Swift language.
-Parts 4-6 (p4.swift - p6.swift) submit jobs to OSG Connect resources.
+Parts 1-3 (p1.swift - p3.swift) run on your login host and serve as
+examples of the Swift language. Parts 4-6 (p4.swift - p6.swift)
+submit jobs to OSG Connect resources.
+
Part 1: Run a single application under Swift
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The first swift script, p1.swift, runs simulate.sh to generate a single random
-number. It writes the number to a file.
+
+
+The first swift script, p1.swift, runs simulate.sh to generate a
+single random number. It writes the number to a file.
+
image::part01.png["p1 workflow",align="center"]
.p1.swift
@@ -160,18 +230,21 @@
sys::[cat -n ../part01/p1.swift]
-----
-The sites.xml file included in each part directory gives Swift information about the machines we will be running on.
-It defines things like the work directory, the scheduler to use, and how to control parallelism. The sites.xml file
-below will tell Swift to run on the local machine only, and run just 1 task at a time.
+The sites.xml file included in each part directory gives Swift
+information about the machines we will be running on. It defines
+things like the work directory, the scheduler to use, and how to
+control parallelism. The sites.xml file below will tell Swift to run
+on the local machine only, and run just 1 task at a time.
.sites.xml
-----
sys::[cat -n ../part01/sites.xml]
-----
-The app file translates from a Swift app function to the path of an executable on the file system.
-In this case, it translates from "simulate" to simulate.sh and assumes that simulate.sh will
-be available in your $PATH.
+The app file translates from a Swift app function to the path of an
+executable on the file system. In this case, it translates from
+"simulate" to simulate.sh and assumes that simulate.sh will be
+available in your $PATH.
.apps
-----
@@ -216,9 +289,11 @@
Part 3: Analyzing results of a parallel ensemble
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-p3.swift introduces a postprocessing step. After all the parallel simulations have completed, the files
-created by simulation.sh will be averaged by stats.sh.
+p3.swift introduces a postprocessing step. After all the parallel
+simulations have completed, the files created by simulation.sh will be
+averaged by stats.sh.
+
image::part03.png[align="center"]
.p3.swift
@@ -256,6 +331,27 @@
Output files will be named output/sim_N.out.
+In order to run on OSG compute nodes, sites.xml was modified. Here is
+the new sites.xml we are using for this example. Note the changes
+between the sites.xml file in this example which specifies "execution
+provider=condor", and the sites.xml file in part 1, which runs locally
+by specifying "execution provider=local".
+
+.sites.xml
+-----
+sys::[cat -n ../part06/sites.xml]
+-----
+
+Below is the updated apps file. Since Swift is staging shell scripts
+remotely to nodes on the cluster, the only application you need to
+define here is the shell.
+
+.apps
+-----
+sys::[cat -n ../part06/apps]
+-----
+
+
Part 5: Controlling where applications run
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -289,25 +385,7 @@
sys::[cat -n ../part06/p6.swift]
----
-In order to run on the cluster, sites.xml needed to be modified. Here
-is the new sites.xml we are using for this example. Note the changes
-between the sites.xml file in this example which uses condor, and the
-sites.xml file in part 1, which runs locally.
-.sites.xml
------
-sys::[cat -n ../part06/sites.xml]
------
-
-Below is the updated apps file. Since Swift is staging shell scripts
-remotely to nodes on the cluster, the only application you need to
-define here is the shell.
-
-.apps
------
-sys::[cat -n ../part06/apps]
------
-
Use the command below to specify the time for each simulation.
----
$ cd ../part06
More information about the Swift-commit
mailing list