[Swift-commit] r6747 - SwiftTutorials/ATPESC_2013-08-06

wilde at ci.uchicago.edu wilde at ci.uchicago.edu
Mon Aug 5 12:22:33 CDT 2013


Author: wilde
Date: 2013-08-05 12:22:32 -0500 (Mon, 05 Aug 2013)
New Revision: 6747

Modified:
   SwiftTutorials/ATPESC_2013-08-06/README
Log:
Adjusted section names, added cloud README material, made sections for Swift/T and MPI.

Modified: SwiftTutorials/ATPESC_2013-08-06/README
===================================================================
--- SwiftTutorials/ATPESC_2013-08-06/README	2013-08-05 15:51:37 UTC (rev 6746)
+++ SwiftTutorials/ATPESC_2013-08-06/README	2013-08-05 17:22:32 UTC (rev 6747)
@@ -1,8 +1,8 @@
 ATPESC 2013 Workflow Tutorial Exercises
 =======================================
 
-Installing ATPESC tutorial
---------------------------
+Workflow tutorial setup
+-----------------------
 
 Check out scripts from SVN
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -29,28 +29,28 @@
 
 NOTE: If you disconnect from the machine, you will need to re-run source setup.sh.
 
-Overview of the applications
-----------------------------
-There are two shell scripts included that act as a mock science application: 
+Mock "science applications" for the workflow tutorial
+-----------------------------------------------------
+There are two shell scripts included that serve a very simple stand-ins for science application: 
 simulation.sh and stats.sh
 
 simulation.sh
 ~~~~~~~~~~~~~
-The simulation.sh script generates and prints a random number. It optionally
-takes the following arguments:
+The simulation.sh script is a simple substitute for a scientific simulation application. It generates and prints a set of one or more random integers in the range 0-29,999 as controlled by its optional arguments, which are:
 
 .simulation.sh arguments
 [options="header"]
 |=======================
 |Argument number|Description
 |1    |runtime. Set how long simulation.sh should run, in seconds.   
-|2    |range. Limit random numbers to a given range.
-|3    |biasfile. Look a number contained within this file to set bias.
-|4    |scale. Scale random number by this factor.
+|2    |range. Limit random numbers to the range [0,range-1]
+|3    |biasfile. Adds the integer contained in this file to each random number generated.
+|4    |scale. Multiplies each random number by this integer argument.
 |5    |n. Generate n number of random numbers.
 |=======================
 
-With no arguments, simulate.sh prints 1 number in the range of 1-100.
+With no arguments, simulate.sh prints 1 number in the range of 1-100. Otherwise it generates n numbers of the form R * scale + bias.
+
 -----
 $ ./simulate.sh 
 96
@@ -59,15 +59,15 @@
 stats.sh
 ~~~~~~~~
 The stats.sh script reads a file containing n numbers and prints the average
-of those numbers.
+of those numbers to stdout.
 
 Introductory exercises 
 ----------------------
-Parts 1-6 run locally and serve as examples of the Swift language.
-Parts 7-9 submit jobs via Condor to ATPESC resources
+Parts 1-6 (p1.swift - p6.swift) run locally and serve as examples of the Swift language.
+Parts 7-9 (p7.swift - p9.swift) submit jobs via Cobalt to the Tukey data analysis and visualization cluster.
 
-part01
-~~~~~~
+p1 - Run an application under Swift
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 The first swift script, p1.swift, runs simulate.sh to generate a single random 
 number. It writes the number to a file.
 
@@ -102,8 +102,8 @@
 $ ./cleanup.sh
 ------
 
-part02
-~~~~~~
+p2 - Mapping (naming) output files
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 The second swift script shows an example of naming the file. The output is now 
 in a file called sim.out.
 
@@ -128,8 +128,8 @@
 $ swift p2.swift
 -----
 
-part03
-~~~~~~
+p3 - Parallel loops with foreach
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 The p3.swift script introduces the foreach loop. This script runs many 
 simulations. Output files are named here by Swift and will get created 
 in the _concurrent directory.
@@ -156,9 +156,9 @@
 $ swift p3.swift
 ----
 
-part04
-~~~~~~
-Part 4 gives an example of naming multiple files within a foreach loop.
+p4 - Mapping arrays to files
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+p4.swift gives an example of naming multiple files within a foreach loop.
 
 image:p4.png[]
 
@@ -184,10 +184,10 @@
 
 Output files will be named output/sim_N.out.
 
-part05
-~~~~~~
-Part 5 introduces a postprocessing step. After many simulations have run, the files
-created by simulation.sh will be sent to stats.sh for averaging.
+p5 - merging/reducing the results of a parallel foreach loop
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+p5.swift introduces a postprocessing step. After all the parallel simulations have completed, the files
+created by simulation.sh will be averaged by stats.sh.
 
 image:p5.png[]
 
@@ -224,9 +224,9 @@
 $ swift p5.swift
 ----
 
-part06
-~~~~~~
-Part 6 introduces command line arguments. The script sets a variable called 
+p6 - Sending arguments to applications
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+p6.swift introduces command line arguments. The script sets a variable called 
 "steps" here, which determines the length of time that the simulation.sh 
 will run for. It also defines a variable called nsim, which determines the
 number of simulations to run. 
@@ -267,12 +267,12 @@
 $ swift p6.swift -steps=3  # each simulation takes 3 seconds
 ----
 
-part07
-~~~~~~
-Part 7 is the first script that will submit jobs to ATPESC via Condor.
-It is similar to earlier scripts, with a few minor exceptions. Since
-there is not a shared filesystems when using OSG, the application simulate.sh
-will get transferred to the worker node by Swift.
+p7 - Running on the Tukey analysis cluster compute nodes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+p7.swift is the first script that will submit jobs to Tukey via the Cobalt scheduler.
+It is similar to earlier scripts, with a few minor exceptions. To generalize the script
+for other types of remote execution (e.g., when no shared filesystem is available to the compute nodes), the application simulate.sh
+will get transferred to the worker node by Swift, in the same manner as any other input data file.
 
 image:p7.png[]
 
@@ -314,9 +314,9 @@
 $ swift p7.swift
 ----
 
-part08
-~~~~~~
-Part 8 will also stage in and run stats.sh to calculate averages. It adds a
+p8 - Running the stats summary step on the Tukey cluster
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+p8.swift will also stage in and run stats.sh to calculate averages. It adds a
 trace statement so you can see the order in which things execute.
 
 image:p8.png[]
@@ -372,9 +372,9 @@
 $ swift p8.swift
 ----
 
-part09
-~~~~~~
-Part 9 adds another app function called genrand. Genrand will produce a random
+p9 - A more complex workflow pattern: multiple parallel pipelines
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+p9.swift adds another app function called genrand. Genrand will produce a random
 number that will be used to determine how long each simulation app will run.
 
 image:p9.png[]
@@ -440,33 +440,68 @@
 $ swift p9.swift
 ----
 
-part10
-~~~~~~
-p10.swift is exactly the same as p9.swift. Instead of the swift script,
-take a look at the sites.xml configuration file.
-The sites.xml file determines where swift runs its job at. Here the
-line with the condor requirement to select nodes from the ATPESC seeder
-cluster is left un-commented to select that site.
+Running Swift scripts on Cloud resources
+----------------------------------------
 
+Setting up the Cloud exercises
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+On head node, these are preconfigured:
+* Java runtime environment (jre)
+* Swift
+* java and swift's bin on path
+
+Required if you run your own app:
+* your app input data (if any)
+
+On work node these are preset:
+* perl
+
+Required if you run your own app:
+* your app executables and dependencies (if any)
+
+Run the Cloud exercises
+~~~~~~~~~~~~~~~~~~~~~~~
+
+* Change to cloud dir:
 -----
-<profile namespace="globus" key="condor.Requirements">regexp("uc3-c*", Machine)</profile>
+   cd ~/cloud
 -----
+* Copy the private key tutorial.pem to your .ssh dir:
+-----
+   cp tutorial.pem ~/.ssh/
+-----
+* Source the setup script on command line:
+-----
+   source ./setup
+-----
+* Run the catsn Swift script:
+-----
+   ./run.catsn
+-----
+* Run the Cloud versions of the Swift scripts p7, p8,and p9.swift:
+-----
+   swift -sites.file sites.xml -config cf -tc.file tc p7.swift
+   swift -sites.file sites.xml -config cf -tc.file tc p8.swift
+   swift -sites.file sites.xml -config cf -tc.file tc p9.swift
+-----
+* Finally, to clean up the log files, kill agent and shutdown the coaster service:
+-----
+   ./cleanme
+-----
 
-The condor requirements for selecting nodes from ATPESC seeder, ITS
-Virtualization lab, Open Science Grid and Atlas Midwest Tier 2 (at UC,
-IU, UIUC) are present in the sites.xml file.
-To choose any of these sites, simply uncomment the requirement line
-for the target system and run the swift script as:
+Notes on the Cloud exercises
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The run.catsn shell script contains the full command line to call Swift scripts with configuration files. This script runs swift as follows:
 
-To run:
-----
-$ cd part10
-$ swift p10.swift
-----
+swift -sites.file sites.xml -tc.file tc -config cf catsn.swift -n=10
 
-Once the script completes, run the script find_host.sh to find where
-the jobs were run.
+To learn more about the configuration files, see Swift user-guide:
+http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html
 
------
-./find_host.sh
------
+
+Running Swift/T on Vesta with Python and R integration
+------------------------------------------------------
+
+Running MPI apps under Swift
+----------------------------




More information about the Swift-commit mailing list