[Swift-commit] r6747 - SwiftTutorials/ATPESC_2013-08-06
wilde at ci.uchicago.edu
wilde at ci.uchicago.edu
Mon Aug 5 12:22:33 CDT 2013
Author: wilde
Date: 2013-08-05 12:22:32 -0500 (Mon, 05 Aug 2013)
New Revision: 6747
Modified:
SwiftTutorials/ATPESC_2013-08-06/README
Log:
Adjusted section names, added cloud README material, made sections for Swift/T and MPI.
Modified: SwiftTutorials/ATPESC_2013-08-06/README
===================================================================
--- SwiftTutorials/ATPESC_2013-08-06/README 2013-08-05 15:51:37 UTC (rev 6746)
+++ SwiftTutorials/ATPESC_2013-08-06/README 2013-08-05 17:22:32 UTC (rev 6747)
@@ -1,8 +1,8 @@
ATPESC 2013 Workflow Tutorial Exercises
=======================================
-Installing ATPESC tutorial
---------------------------
+Workflow tutorial setup
+-----------------------
Check out scripts from SVN
~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -29,28 +29,28 @@
NOTE: If you disconnect from the machine, you will need to re-run source setup.sh.
-Overview of the applications
-----------------------------
-There are two shell scripts included that act as a mock science application:
+Mock "science applications" for the workflow tutorial
+-----------------------------------------------------
+There are two shell scripts included that serve a very simple stand-ins for science application:
simulation.sh and stats.sh
simulation.sh
~~~~~~~~~~~~~
-The simulation.sh script generates and prints a random number. It optionally
-takes the following arguments:
+The simulation.sh script is a simple substitute for a scientific simulation application. It generates and prints a set of one or more random integers in the range 0-29,999 as controlled by its optional arguments, which are:
.simulation.sh arguments
[options="header"]
|=======================
|Argument number|Description
|1 |runtime. Set how long simulation.sh should run, in seconds.
-|2 |range. Limit random numbers to a given range.
-|3 |biasfile. Look a number contained within this file to set bias.
-|4 |scale. Scale random number by this factor.
+|2 |range. Limit random numbers to the range [0,range-1]
+|3 |biasfile. Adds the integer contained in this file to each random number generated.
+|4 |scale. Multiplies each random number by this integer argument.
|5 |n. Generate n number of random numbers.
|=======================
-With no arguments, simulate.sh prints 1 number in the range of 1-100.
+With no arguments, simulate.sh prints 1 number in the range of 1-100. Otherwise it generates n numbers of the form R * scale + bias.
+
-----
$ ./simulate.sh
96
@@ -59,15 +59,15 @@
stats.sh
~~~~~~~~
The stats.sh script reads a file containing n numbers and prints the average
-of those numbers.
+of those numbers to stdout.
Introductory exercises
----------------------
-Parts 1-6 run locally and serve as examples of the Swift language.
-Parts 7-9 submit jobs via Condor to ATPESC resources
+Parts 1-6 (p1.swift - p6.swift) run locally and serve as examples of the Swift language.
+Parts 7-9 (p7.swift - p9.swift) submit jobs via Cobalt to the Tukey data analysis and visualization cluster.
-part01
-~~~~~~
+p1 - Run an application under Swift
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The first swift script, p1.swift, runs simulate.sh to generate a single random
number. It writes the number to a file.
@@ -102,8 +102,8 @@
$ ./cleanup.sh
------
-part02
-~~~~~~
+p2 - Mapping (naming) output files
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The second swift script shows an example of naming the file. The output is now
in a file called sim.out.
@@ -128,8 +128,8 @@
$ swift p2.swift
-----
-part03
-~~~~~~
+p3 - Parallel loops with foreach
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The p3.swift script introduces the foreach loop. This script runs many
simulations. Output files are named here by Swift and will get created
in the _concurrent directory.
@@ -156,9 +156,9 @@
$ swift p3.swift
----
-part04
-~~~~~~
-Part 4 gives an example of naming multiple files within a foreach loop.
+p4 - Mapping arrays to files
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+p4.swift gives an example of naming multiple files within a foreach loop.
image:p4.png[]
@@ -184,10 +184,10 @@
Output files will be named output/sim_N.out.
-part05
-~~~~~~
-Part 5 introduces a postprocessing step. After many simulations have run, the files
-created by simulation.sh will be sent to stats.sh for averaging.
+p5 - merging/reducing the results of a parallel foreach loop
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+p5.swift introduces a postprocessing step. After all the parallel simulations have completed, the files
+created by simulation.sh will be averaged by stats.sh.
image:p5.png[]
@@ -224,9 +224,9 @@
$ swift p5.swift
----
-part06
-~~~~~~
-Part 6 introduces command line arguments. The script sets a variable called
+p6 - Sending arguments to applications
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+p6.swift introduces command line arguments. The script sets a variable called
"steps" here, which determines the length of time that the simulation.sh
will run for. It also defines a variable called nsim, which determines the
number of simulations to run.
@@ -267,12 +267,12 @@
$ swift p6.swift -steps=3 # each simulation takes 3 seconds
----
-part07
-~~~~~~
-Part 7 is the first script that will submit jobs to ATPESC via Condor.
-It is similar to earlier scripts, with a few minor exceptions. Since
-there is not a shared filesystems when using OSG, the application simulate.sh
-will get transferred to the worker node by Swift.
+p7 - Running on the Tukey analysis cluster compute nodes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+p7.swift is the first script that will submit jobs to Tukey via the Cobalt scheduler.
+It is similar to earlier scripts, with a few minor exceptions. To generalize the script
+for other types of remote execution (e.g., when no shared filesystem is available to the compute nodes), the application simulate.sh
+will get transferred to the worker node by Swift, in the same manner as any other input data file.
image:p7.png[]
@@ -314,9 +314,9 @@
$ swift p7.swift
----
-part08
-~~~~~~
-Part 8 will also stage in and run stats.sh to calculate averages. It adds a
+p8 - Running the stats summary step on the Tukey cluster
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+p8.swift will also stage in and run stats.sh to calculate averages. It adds a
trace statement so you can see the order in which things execute.
image:p8.png[]
@@ -372,9 +372,9 @@
$ swift p8.swift
----
-part09
-~~~~~~
-Part 9 adds another app function called genrand. Genrand will produce a random
+p9 - A more complex workflow pattern: multiple parallel pipelines
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+p9.swift adds another app function called genrand. Genrand will produce a random
number that will be used to determine how long each simulation app will run.
image:p9.png[]
@@ -440,33 +440,68 @@
$ swift p9.swift
----
-part10
-~~~~~~
-p10.swift is exactly the same as p9.swift. Instead of the swift script,
-take a look at the sites.xml configuration file.
-The sites.xml file determines where swift runs its job at. Here the
-line with the condor requirement to select nodes from the ATPESC seeder
-cluster is left un-commented to select that site.
+Running Swift scripts on Cloud resources
+----------------------------------------
+Setting up the Cloud exercises
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+On head node, these are preconfigured:
+* Java runtime environment (jre)
+* Swift
+* java and swift's bin on path
+
+Required if you run your own app:
+* your app input data (if any)
+
+On work node these are preset:
+* perl
+
+Required if you run your own app:
+* your app executables and dependencies (if any)
+
+Run the Cloud exercises
+~~~~~~~~~~~~~~~~~~~~~~~
+
+* Change to cloud dir:
-----
-<profile namespace="globus" key="condor.Requirements">regexp("uc3-c*", Machine)</profile>
+ cd ~/cloud
-----
+* Copy the private key tutorial.pem to your .ssh dir:
+-----
+ cp tutorial.pem ~/.ssh/
+-----
+* Source the setup script on command line:
+-----
+ source ./setup
+-----
+* Run the catsn Swift script:
+-----
+ ./run.catsn
+-----
+* Run the Cloud versions of the Swift scripts p7, p8,and p9.swift:
+-----
+ swift -sites.file sites.xml -config cf -tc.file tc p7.swift
+ swift -sites.file sites.xml -config cf -tc.file tc p8.swift
+ swift -sites.file sites.xml -config cf -tc.file tc p9.swift
+-----
+* Finally, to clean up the log files, kill agent and shutdown the coaster service:
+-----
+ ./cleanme
+-----
-The condor requirements for selecting nodes from ATPESC seeder, ITS
-Virtualization lab, Open Science Grid and Atlas Midwest Tier 2 (at UC,
-IU, UIUC) are present in the sites.xml file.
-To choose any of these sites, simply uncomment the requirement line
-for the target system and run the swift script as:
+Notes on the Cloud exercises
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The run.catsn shell script contains the full command line to call Swift scripts with configuration files. This script runs swift as follows:
-To run:
-----
-$ cd part10
-$ swift p10.swift
-----
+swift -sites.file sites.xml -tc.file tc -config cf catsn.swift -n=10
-Once the script completes, run the script find_host.sh to find where
-the jobs were run.
+To learn more about the configuration files, see Swift user-guide:
+http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html
------
-./find_host.sh
------
+
+Running Swift/T on Vesta with Python and R integration
+------------------------------------------------------
+
+Running MPI apps under Swift
+----------------------------
More information about the Swift-commit
mailing list