[Swift-commit] r6980 - in SwiftTutorials/CIC_2013-08-09: app doc

wilde at ci.uchicago.edu wilde at ci.uchicago.edu
Fri Aug 23 13:13:01 CDT 2013


Author: wilde
Date: 2013-08-23 13:13:01 -0500 (Fri, 23 Aug 2013)
New Revision: 6980

Modified:
   SwiftTutorials/CIC_2013-08-09/app/simulate.sh
   SwiftTutorials/CIC_2013-08-09/app/stats.sh
   SwiftTutorials/CIC_2013-08-09/doc/README
Log:
Last changes for this version. Now switching to OSG-Swift.

Modified: SwiftTutorials/CIC_2013-08-09/app/simulate.sh
===================================================================
--- SwiftTutorials/CIC_2013-08-09/app/simulate.sh	2013-08-23 18:00:44 UTC (rev 6979)
+++ SwiftTutorials/CIC_2013-08-09/app/simulate.sh	2013-08-23 18:13:01 UTC (rev 6980)
@@ -22,7 +22,7 @@
   printf "Running as user: "; /usr/bin/id
   printparams
   printf "\nEnvironment:\n\n"
-  /bin/env | /bin/sort
+  printenv | sort
 }
 
 addsims() {

Modified: SwiftTutorials/CIC_2013-08-09/app/stats.sh
===================================================================
--- SwiftTutorials/CIC_2013-08-09/app/stats.sh	2013-08-23 18:00:44 UTC (rev 6979)
+++ SwiftTutorials/CIC_2013-08-09/app/stats.sh	2013-08-23 18:13:01 UTC (rev 6980)
@@ -7,3 +7,17 @@
 END { printf("%d\n",sum/NR) }
 ' $*
 
+log() {
+  printf "\nCalled as: $0: $cmdargs\n\n"
+  printf "Start time: "; /bin/date
+  printf "Running on node: "; /bin/hostname
+  printf "Running as user: "; /usr/bin/id
+  printf "\nEnvironment:\n\n"
+  printenv | sort
+}
+
+log 1>&2
+
+
+
+

Modified: SwiftTutorials/CIC_2013-08-09/doc/README
===================================================================
--- SwiftTutorials/CIC_2013-08-09/doc/README	2013-08-23 18:00:44 UTC (rev 6979)
+++ SwiftTutorials/CIC_2013-08-09/doc/README	2013-08-23 18:13:01 UTC (rev 6980)
@@ -1,11 +1,11 @@
-OSGconnect Swift Tutorial - 2013.0827
-=====================================
+Tutorial: Swift parallel scripting on OSG Connect
+================================================
 
 ////
 
 Outline
 
-Introductory exercises
+* Introductory exercises
 
 p1 - Run an application under Swift
 
@@ -19,47 +19,96 @@
 
 p6 - Add additional apps for generating seeds remotely 
 
+* Advanced exercises
+
+Running R and BLAST
+
+Running on multiple resources
+
+Using OSG Connect and Globus Data Services
+
 ////
 
 
+Introduction: Why Parallel Scripting?
+------------------------------------
+
+Swift is a simple scripting language for executing many instances of
+ordinary application programs on distributed parallel resources.
+Swift scripts run many copies of ordinary programs concurrently, using
+statements like this:
+
+-----
+foreach protein in proteinList {
+  runBLAST(protein);
+}
+-----
+
+Swift acts like a structured "shell" language. It runs programs
+concurrently as soon as their inputs are available, reducing the need
+for complex parallel programming.  Swift expresses your workflow
+in a portable fashion: The same script runs on grids like OSG, as well
+as on multicore computers, clusters, clouds, and supercomputers.
+
+In this tutorial, you'll be able to first try a few Swift examples on
+the OSG Connect login host, to get a sense of the language. Then
+you'll run similar workflows on distributed OSG resources, and see how
+more complex workflows can be expressed as scripts.
+
 Workflow tutorial setup
 -----------------------
 
-Check out scripts from SVN
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-To checkout the most recent CIC tutorial scripts from SVN, run the following
-command:
+To get started, do:
 
 -----
-$ svn co https://svn.ci.uchicago.edu/svn/vdl2/SwiftTutorials/CIC_2013-08-09 tutorial
+$ cd $HOME
+$ tutorial osg-swift
+$ cd osg-swift
+
+Verify your environment
+~~~~~~~~~~~~~~~~~~~~~~~
+
+To verify that Swift (and the Java environment it requires) are working, do:
+
 -----
+$ java -version   # verify that you have Oracle JAVA 1.6 or later
+$ swift -version  # verify that you have Swift 0.94.1 (RC2 revision)
+-----
 
-This will create a directory called "tutorial" which contains all of the
-scripts mentioned in this document.
+NOTE: If you re-login or open new ssh sessions, you will need to
+re-run `source setup.sh` in each ssh window:
 
-Run setup
-~~~~~~~~~
-Once the scripts are checked out, run the following commands to perform
-the initial setup.
+-----
+$ cd $HOME/osg-swift     # change to the newly created tutorial directory
+$ source setup.sh        # sets PATH and swift config files
+-----
 
+To check out the tutorial scripts from SVN
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If later you want to checkout the most recent Swift tutorial scripts for OSG Connect from SVN, do:
+
 -----
-$ cd tutorial            # change to the newly created tutorial directory
-$ source setup.sh        # sets swift config files in $HOME/.swift
-$ java -version          # verify that you have Oracle JAVA (prefered; 1.6 or later)
-$ swift -version         # verify that Swift 0.94 is in your $PATH and functional
+$ svn co https://svn.ci.uchicago.edu/svn/vdl2/SwiftTutorials/OSG-Swift 
 -----
 
-NOTE: If you re-login, you will need to re-run source setup.sh.
+This will create a directory called "OSG-Swift" which contains all of the
+file used in this tutorial.
 
 Simple "science applications" for the workflow tutorial
 -------------------------------------------------------
-There are two shell scripts included that serve a very simple stand-ins for science application:
-simulation.sh and stats.sh
 
+There are two shell scripts included that serve a very simple
+stand-ins for science application: simulation.sh and stats.sh
+
 simulation.sh
 ~~~~~~~~~~~~
-The simulation.sh script serves as a trivial substitute for a complex scientific simulation application. It generates and prints a set of one or more random integers in the range [0-2^32) as controlled by its optional arguments, which are:
 
+The simulation.sh script serves as a trivial substitute for a complex
+scientific simulation application. It generates and prints a set of
+one or more random integers in the range [0-2^32) as controlled by its
+optional arguments, which are:
+
 -----
 $ ./app/simulate.sh --help
 ./app/simulate.sh: usage:
@@ -90,7 +139,10 @@
 |=======================
 ////
 
-With no arguments, simulate.sh prints 1 number in the range of 1-100. Otherwise it generates n numbers of the form (R*scale)+bias where R is a random integer. By default it logs information about its execution environment to stderr.  Here's some examples of its usage:
+With no arguments, simulate.sh prints 1 number in the range of
+1-100. Otherwise it generates n numbers of the form (R*scale)+bias
+where R is a random integer. By default it logs information about its
+execution environment to stderr.  Here's some examples of its usage:
 
 -----
 $ simulate.sh 2>log
@@ -110,24 +162,23 @@
 USER=wilde
 $ 
 
-$ simulate.sh -n 3 -r 1000000 2>log
+$ simulate.sh -n 4 -r 1000000 2>log
+  239454
   386702
-  239454
    13849
+  873526
 
 $ simulate.sh -n 3 -r 1000000 -x 100 2>log
  6643700
 62182300
  5230600
 
-$ simulate.sh -n 3 -r 1000 -x 1000 2>log
+$ simulate.sh -n 2 -r 1000 -x 1000 2>log
   565000
   636000
-  477000
 
-$ time simulate.sh -n 3 -r 1000 -x 1000 -t 3 2>log
+$ time simulate.sh -n 2 -r 1000 -x 1000 -t 3 2>log
   336000
-   20000
   320000
 real    0m3.012s
 user    0m0.005s
@@ -137,22 +188,41 @@
 
 -----
 
-
 stats.sh
 ~~~~~~~
-The stats.sh script reads a file containing n numbers and prints the average
-of those numbers to stdout.
+The stats.sh script serves as a trivial model of an "analysis" program. It reads N files each containing M integers and simply prints the average
+of all those numbers to stdout. Similarly to simulate.sh it logs environmental information to the stderr.
 
+-----
+$ ls f*
+f1  f2	f3  f4
+
+$ cat f*
+25
+60
+40
+75
+
+$ stats.sh f*
+50
+-----
+
+
 OSG Connect exercises
 ---------------------
-Parts 1-3 (p1.swift - p3.swift) run on your login host and serve as examples of the Swift language.
-Parts 4-6 (p4.swift - p6.swift) submit jobs to OSG Connect resources.
 
+Parts 1-3 (p1.swift - p3.swift) run on your login host and serve as
+examples of the Swift language.  Parts 4-6 (p4.swift - p6.swift)
+submit jobs to OSG Connect resources.
+
 Part 1: Run a single application under Swift
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The first swift script, p1.swift, runs simulate.sh to generate a single random
-number. It writes the number to a file.
 
+
+
+The first swift script, p1.swift, runs simulate.sh to generate a
+single random number. It writes the number to a file.
+
 image::part01.png["p1 workflow",align="center"]
 
 .p1.swift
@@ -160,18 +230,21 @@
 sys::[cat -n ../part01/p1.swift]
 -----
 
-The sites.xml file included in each part directory gives Swift information about the machines we will be running on.
-It defines things like the work directory, the scheduler to use, and how to control parallelism. The sites.xml file
-below will tell Swift to run on the local machine only, and run just 1 task at a time.
+The sites.xml file included in each part directory gives Swift
+information about the machines we will be running on.  It defines
+things like the work directory, the scheduler to use, and how to
+control parallelism. The sites.xml file below will tell Swift to run
+on the local machine only, and run just 1 task at a time.
 
 .sites.xml
 -----
 sys::[cat -n ../part01/sites.xml]
 -----
 
-The app file translates from a Swift app function to the path of an executable on the file system. 
-In this case, it translates from "simulate" to simulate.sh and assumes that simulate.sh will 
-be available in your $PATH.
+The app file translates from a Swift app function to the path of an
+executable on the file system.  In this case, it translates from
+"simulate" to simulate.sh and assumes that simulate.sh will be
+available in your $PATH.
 
 .apps
 -----
@@ -216,9 +289,11 @@
 
 Part 3: Analyzing results of a parallel ensemble
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-p3.swift introduces a postprocessing step. After all the parallel simulations have completed, the files
-created by simulation.sh will be averaged by stats.sh.
 
+p3.swift introduces a postprocessing step. After all the parallel
+simulations have completed, the files created by simulation.sh will be
+averaged by stats.sh.
+
 image::part03.png[align="center"]
 
 .p3.swift
@@ -256,6 +331,27 @@
 
 Output files will be named output/sim_N.out.
 
+In order to run on OSG compute nodes, sites.xml was modified. Here is
+the new sites.xml we are using for this example. Note the changes
+between the sites.xml file in this example which specifies "execution
+provider=condor", and the sites.xml file in part 1, which runs locally
+by specifying "execution provider=local".
+
+.sites.xml
+-----
+sys::[cat -n ../part06/sites.xml]
+-----
+
+Below is the updated apps file. Since Swift is staging shell scripts
+remotely to nodes on the cluster, the only application you need to
+define here is the shell.
+
+.apps
+-----
+sys::[cat -n ../part06/apps]
+-----
+
+
 Part 5: Controlling where applications run
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -289,25 +385,7 @@
 sys::[cat -n ../part06/p6.swift]
 ----
 
-In order to run on the cluster, sites.xml needed to be modified. Here
-is the new sites.xml we are using for this example. Note the changes
-between the sites.xml file in this example which uses condor, and the
-sites.xml file in part 1, which runs locally.
 
-.sites.xml
------
-sys::[cat -n ../part06/sites.xml]
------
-
-Below is the updated apps file. Since Swift is staging shell scripts
-remotely to nodes on the cluster, the only application you need to
-define here is the shell.
-
-.apps
------
-sys::[cat -n ../part06/apps]
------
-
 Use the command below to specify the time for each simulation.
 ----
 $ cd ../part06




More information about the Swift-commit mailing list