[Swift-commit] r6985 - SwiftTutorials/swift-cray-tutorial/doc
davidk at ci.uchicago.edu
davidk at ci.uchicago.edu
Fri Aug 23 15:45:04 CDT 2013
Author: davidk
Date: 2013-08-23 15:45:04 -0500 (Fri, 23 Aug 2013)
New Revision: 6985
Modified:
SwiftTutorials/swift-cray-tutorial/doc/README
Log:
Crayify readme
Modified: SwiftTutorials/swift-cray-tutorial/doc/README
===================================================================
--- SwiftTutorials/swift-cray-tutorial/doc/README 2013-08-23 19:40:06 UTC (rev 6984)
+++ SwiftTutorials/swift-cray-tutorial/doc/README 2013-08-23 20:45:04 UTC (rev 6985)
@@ -1,5 +1,5 @@
-Swift CIC Tutorial - 2013.0827
-==============================
+Swift Cray Tutorial
+===================
////
@@ -9,22 +9,16 @@
p1 - Run an application under Swift
-p2 - Mapping (naming) output files
+p2 - Parallel loops with foreach
-p3 - Parallel loops with foreach
+p3 - Merging/reducing the results of a parallel foreach loop
-p4 - Mapping arrays to files
+p4 - Running on the remote site nodes
-p5 - merging/reducing the results of a parallel foreach loop
+p5 - Running the stats summary step on the remote site
-p6 - Sending arguments to applications
+p6 - Add additional apps for generating seeds remotely
-p7 - Running on the remote site nodes
-
-p8 - Running the stats summary step on the remote site
-
-p9 - A more complex workflow pattern: multiple parallel pipelines
-
////
@@ -33,28 +27,35 @@
Workflow tutorial setup
-----------------------
-Check out scripts from SVN
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-To checkout the most recent ATPESC tutorial scripts from SVN, run the following
-command:
+Installing scripts on Raven
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+If you are installing these scripts on Raven, run the following command to extract the tutorial scripts:
-----
-$ svn co https://svn.ci.uchicago.edu/svn/vdl2/SwiftTutorials/CIC_2013-08-09 tutorial
+$ tar xfz /home/users/p01537/swift-cray-tutorial.tar.gz
-----
This will create a directory called "tutorial" which contains all of the
scripts mentioned in this document.
+Installing scripts on other systems
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+If you are running on a machine other than Raven, you can install the scripts via SVN.
+
+-----
+$ svn co https://svn.ci.uchicago.edu/svn/vdl2/SwiftTutorials/swift-cray-tutorial
+-----
+
Run setup
~~~~~~~~~
Once the scripts are checked out, run the following commands to perform
the initial setup.
-----
-$ cd tutorial # change to the newly created tutorial directory
-$ source setup.sh <SITE> # sets swift config files in $HOME/.swift
-$ java -version # verify that you have Oracle JAVA (prefered; 1.6 or later)
-$ swift -version # verify that Swift 0.94 is in your $PATH and functional
+$ cd swift-cray-tutorial # change to the newly created tutorial directory
+$ source setup.sh # sets swift config files in $HOME/.swift
+$ java -version # verify that you have Oracle JAVA (prefered; 1.6 or later)
+$ swift -version # verify that Swift 0.94 is in your $PATH and functional
-----
NOTE: If you re-login, you will need to re-run source setup.sh.
@@ -65,37 +66,96 @@
simulation.sh and stats.sh
simulation.sh
-~~~~~~~~~~~~~
+~~~~~~~~~~~~
The simulation.sh script serves as a trivial substitute for a complex scientific simulation application. It generates and prints a set of one or more random integers in the range [0-2^32) as controlled by its optional arguments, which are:
+-----
+$ ./app/simulate.sh --help
+./app/simulate.sh: usage:
+ -b|--bias offset bias: add this integer to all results
+ -B|--biasfile file of integer biases to add to results
+ -l|--log generate a log in stderr if not null
+ -n|--nvalues print this many values per simulation
+ -r|--range range (limit) of generated results
+ -s|--seed use this integer [0..32767] as a seed
+ -S|--seedfile use this file (containing integer seeds [0..32767]) one per line
+ -t|--timesteps number of simulated "timesteps" in seconds (determines runtime)
+ -x|--scale scale the results by this integer
+ -h|-?|?|--help print this help
+$
+-----
+
+////
.simulation.sh arguments
[width="80%",cols="^2,10",options="header"]
|=======================
-|Argument number|Description
+|Argument|Short|Description
|1 |runtime: sets run time of simulation.sh in seconds
|2 |range: limits generated values to the range [0,range-1]
|3 |biasfile: add the integer contained in this file to each value generated
|4 |scale: multiplies each generated value by this integer
|5 |count: number of values to generate in the simulation
|=======================
+////
-With no arguments, simulate.sh prints 1 number in the range of 1-100. Otherwise it generates n numbers of the form (R*scale)+bias where R is a random integer.
+With no arguments, simulate.sh prints 1 number in the range of 1-100. Otherwise it generates n numbers of the form (R*scale)+bias where R is a random integer. By default it logs information about its execution environment to stderr. Here's some examples of its usage:
-----
-$ ./simulate.sh
-96
+$ simulate.sh 2>log
+ 5
+$ head -5 log
+
+Called as: /home/users/p01537/swift-cray-tutorial/app/simulate.sh:
+
+Start time: Fri Aug 23 15:07:16 CDT 2013
+Running on node: raven
+
+$ tail -5 log
+SSH_CLIENT=67.173.156.31 46887 22
+SSH_CONNECTION=67.173.156.31 46887 128.135.158.173 22
+SSH_TTY=/dev/pts/9
+TERM=xterm-color
+USER=wilde
+$
+
+$ simulate.sh -n 3 -r 1000000 2>log
+ 386702
+ 239454
+ 13849
+
+$ simulate.sh -n 3 -r 1000000 -x 100 2>log
+ 6643700
+62182300
+ 5230600
+
+$ simulate.sh -n 3 -r 1000 -x 1000 2>log
+ 565000
+ 636000
+ 477000
+
+$ time simulate.sh -n 3 -r 1000 -x 1000 -t 3 2>log
+ 336000
+ 20000
+ 320000
+real 0m3.012s
+user 0m0.005s
+sys 0m0.006s
+$
+
+
-----
+
stats.sh
-~~~~~~~~
+~~~~~~~
The stats.sh script reads a file containing n numbers and prints the average
of those numbers to stdout.
Introductory exercises
----------------------
-Parts 1-6 (p1.swift - p6.swift) run locally and serve as examples of the Swift language.
-Parts 7-9 (p7.swift - p9.swift) submit jobs to the site specified the setup stage
+Parts 1-3 (p1.swift - p3.swift) run locally and serve as examples of the Swift language.
+Parts 4-6 (p4.swift - p6.swift) submit jobs to the site specified the setup stage
p1 - Run an application under Swift
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -106,16 +166,27 @@
.p1.swift
-----
-type file;
+sys::[cat -n ../part01/p1.swift]
+-----
-app (file o) mysim ()
-{
- simulate stdout=@filename(o);
-}
+The sites.xml file included in each part directory gives Swift information about the machines we will be running on.
+It defines things like the work directory, the scheduler to use, and how to control parallelism. The sites.xml file
+below will tell Swift to run on the local machine only, and run just 1 task at a time.
-file f = mysim();
+.sites.xml
-----
+sys::[cat -n ../part01/sites.xml]
+-----
+The app file translates from a Swift app function to the path of an executable on the file system.
+In this case, it translates from "simulate" to simulate.sh and assumes that simulate.sh will
+be available in your $PATH.
+
+.apps
+-----
+sys::[cat -n ../part01/apps]
+-----
+
To run this script, run the following command:
-----
$ cd part01
@@ -133,24 +204,17 @@
$ ./clean.sh
------
-p2 - Mapping (naming) output files
+p2 - Parallel loops with foreach
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The second swift script shows an example of naming the file. The output is now
-in a file called sim.out.
+The p2.swift script introduces the foreach loop. This script runs many
+simulations. The script also shows an example of naming the files. The output files
+are now called sim_N.out.
image:p2.png[]
.p2.swift
-----
-type file;
-
-app (file o) mysim ()
-{
- simulate stdout=@filename(o);
-}
-
-file f <"sim.out">;
-f = mysim();
+sys::[cat -n ../part02/p2.swift]
-----
To run the script:
@@ -159,26 +223,16 @@
$ swift p2.swift
-----
-p3 - Parallel loops with foreach
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The p3.swift script introduces the foreach loop. This script runs many
-simulations. Output files are named here by Swift and will get created
-in the _concurrent directory.
+p3 - Merging/reducing the results of a parallel foreach loop
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+p3.swift introduces a postprocessing step. After all the parallel simulations have completed, the files
+created by simulation.sh will be averaged by stats.sh.
image:p3.png[]
.p3.swift
----
-type file;
-
-app (file o) mysim ()
-{
- simulate stdout=@filename(o);
-}
-
-foreach i in [0:9] {
- file f = mysim();
-}
+sys::[cat -n ../part03/p3.swift]
----
To run:
@@ -187,25 +241,18 @@
$ swift p3.swift
----
-p4 - Mapping arrays to files
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-p4.swift gives an example of naming multiple files within a foreach loop.
+p4 - Running on the remote site nodes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+p4.swift is the first script that will submit jobs to remote site nodes for analysis.
+It is similar to earlier scripts, with a few minor exceptions. To generalize the script
+for other types of remote execution (e.g., when no shared filesystem is available to the compute nodes), the application simulate.sh
+will get transferred to the worker node by Swift, in the same manner as any other input data file.
image:p4.png[]
.p4.swift
----
-type file;
-
-app (file o) mysim ()
-{
- simulate stdout=@filename(o);
-}
-
-foreach i in [0:9] {
- file f <single_file_mapper; file=@strcat("output/sim_",i,".out")>;
- f = mysim();
-}
+sys::[cat -n ../part04/p4.swift]
----
To run:
@@ -215,39 +262,17 @@
Output files will be named output/sim_N.out.
-p5 - merging/reducing the results of a parallel foreach loop
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+p5 - Running the stats summary step on the remote site
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
p5.swift introduces a postprocessing step. After all the parallel simulations have completed, the files
-created by simulation.sh will be averaged by stats.sh.
+created by simulation.sh will be averaged by stats.sh. This is similar to p3, but all app invocations
+are done on remote nodes with Swift managing file transfers.
image:p5.png[]
.p5.swift
----
-type file;
-
-app (file o) mysim ()
-{
- simulate stdout=@filename(o);
-}
-
-app (file o) analyze (file s[])
-{
- stats @filenames(s) stdout=@filename(o);
-}
-
-file sims[];
-
-int nsim = @toInt(@arg("nsim","10"));
-
-foreach i in [0:nsim-1] {
- file simout <single_file_mapper; file=@strcat("output/sim_",i,".out")>;
- simout = mysim();
- sims[i] = simout;
-}
-
-file stats<"output/average.out">;
-stats = analyze(sims);
+sys::[cat -n ../part05/p5.swift]
----
To run:
@@ -255,227 +280,41 @@
$ swift p5.swift
----
-p6 - Sending arguments to applications
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-p6.swift introduces command line arguments. The script sets a variable called
-"steps" here, which determines the length of time that the simulation.sh
-will run for. It also defines a variable called nsim, which determines the
-number of simulations to run.
+p6 - Add additional apps and randomness
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+p6.swift build on p5.swift, but adds new apps for generating a random
+seed and a random bias value.
image:p6.png[]
.p6.swift
----
-type file;
-
-app (file o) mysim (int timesteps)
-{
- simulate timesteps stdout=@filename(o);
-}
-
-app (file o) analyze (file s[])
-{
- stats @filenames(s) stdout=@filename(o);
-}
-
-file sims[];
-int nsim = @toInt(@arg("nsim","10"));
-int steps = @toInt(@arg("steps","1"));
-
-foreach i in [0:nsim-1] {
- file simout <single_file_mapper; file=@strcat("output/sim_",i,".out")>;
- simout = mysim(steps);
- sims[i] = simout;
-}
-
-file stats<"output/average.out">;
-stats = analyze(sims);
+sys::[cat -n ../part06/p6.swift]
----
-Use the command below to specify the time for each simulation.
-----
-$ cd ../part06
-$ swift p6.swift -steps=3 # each simulation takes 3 seconds
-----
+In order to run on the cluster, sites.xml needed to be modified. Here is
+the new sites.xml we are using for this example. Note the changes between the sites.xml file
+in this example which uses condor, and the sites.xml file in part 1, which runs locally.
-p7 - Running on the remote site nodes
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-p7.swift is the first script that will submit jobs to remote site nodes for analysis.
-It is similar to earlier scripts, with a few minor exceptions. To generalize the script
-for other types of remote execution (e.g., when no shared filesystem is available to the compute nodes), the application simulate.sh
-will get transferred to the worker node by Swift, in the same manner as any other input data file.
-
-image:p7.png[]
-
-.p7.swift
+.sites.xml
-----
-type file;
-
-# Application to be called by this script
-
-file simulation_script <"simulate.sh">;
-
-# app() functions for application programs to be called:
-
-app (file out) simulation (file script, int timesteps, int sim_range)
-{
- sh @filename(script) timesteps sim_range stdout=@filename(out);
-}
-
-# Command line params to this script:
-
-int nsim = @toInt(@arg("nsim", "10")); # number of simulation programs to run
-int range = @toInt(@arg("range", "100")); # range of the generated random numbers
-
-# Main script and data
-
-int steps=3;
-
-tracef("\n*** Script parameters: nsim=%i steps=%i range=%i \n\n", nsim, steps, range);
-
-foreach i in [0:nsim-1] {
- file simout <single_file_mapper; file=@strcat("output/sim_",i,".out")>;
- simout = simulation(simulation_script, steps, range);
-}
+sys::[cat -n ../part06/sites.xml]
-----
-To run:
-----
-$ cd ../part07
-$ swift p7.swift
-----
+Below is the updated apps file. Since Swift is staging shell scripts remotely to nodes on the cluster,
+the only application it needs defined here is the shell.
-p8 - Running the stats summary step on the remote site
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-p8.swift will also stage in and run stats.sh to calculate averages. It adds a
-trace statement so you can see the order in which things execute.
-
-image:p8.png[]
-
-.p8.swift
+.apps
-----
-type file;
-
-# Applications to be called by this script
-
-file simulation_script <"simulate.sh">;
-file analysis_script <"stats.sh">;
-
-# app() functions for application programs to be called:
-
-app (file out) simulation (file script, int timesteps, int sim_range, file bias_file, int scale, int sim_count)
-{
- sh @filename(script) timesteps sim_range @filename(bias_file) scale sim_count stdout=@filename(out);
-}
-
-app (file out) analyze (file script, file s[])
-{
- sh @script @filenames(s) stdout=@filename(out);
-}
-
-# Command line params to this script:
-
-int nsim = @toInt(@arg("nsim", "10")); # number of simulation programs to run
-int steps = @toInt(@arg("steps", "1")); # number of "steps" each simulation (==seconds of runtime)
-int range = @toInt(@arg("range", "100")); # range of the generated random numbers
-int count = @toInt(@arg("count", "10")); # number of random numbers generated per simulation
-
-# Main script and data
-
-tracef("\n*** Script parameters: nsim=%i steps=%i range=%i count=%i\n\n", nsim, steps, range, count);
-
-file sims[]; # Array of files to hold each simulation output
-file bias<"bias.dat">; # Input data file to "bias" the numbers:
- # 1 line: scale offset ( N = n*scale + offset)
-foreach i in [0:nsim-1] {
- file simout <single_file_mapper; file=@strcat("output/sim_",i,".out")>;
- simout = simulation(simulation_script, steps, range, bias, 100000, count);
- sims[i] = simout;
-}
-
-file stats<"output/stats.out">; # Final output file: average of all "simulations"
-stats = analyze(analysis_script,sims);
+sys::[cat -n ../part06/apps]
-----
-To run:
+Use the command below to specify the time for each simulation.
----
-$ cd ../part08
-$ swift p8.swift
+$ cd ../part06
+$ swift p6.swift -steps=3 # each simulation takes 3 seconds
----
-p9 - A more complex workflow pattern: multiple parallel pipelines
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-p9.swift adds another app function called genrand. Genrand will produce a random
-number that will be used to determine how long each simulation app will run.
-
-image:p9.png[]
-
-.p9.swift
------
-type file;
-
-# Applications to be called by this script
-
-file simulation_script <"simulate.sh">;
-file analysis_script <"stats.sh">;
-
-# app() functions for application programs to be called:
-
-app (file out) genrand (file script, int timesteps, int sim_range)
-{
- sh @filename(script) timesteps sim_range stdout=@filename(out);
-}
-
-app (file out) simulation (file script, int timesteps, int sim_range, file bias_file, int scale, int sim_count)
-{
- sh @filename(script) timesteps sim_range @filename(bias_file) scale sim_count stdout=@filename(out);
-}
-
-app (file out) analyze (file script, file s[])
-{
- sh @script @filenames(s) stdout=@filename(out);
-}
-
-# Command line params to this script:
-int nsim = @toInt(@arg("nsim", "10")); # number of simulation programs to run
-int range = @toInt(@arg("range", "100")); # range of the generated random numbers
-int count = @toInt(@arg("count", "10")); # number of random numbers generated per simulation
-
-# Main script and data
-
-tracef("\n*** Script parameters: nsim=%i range=%i count=%i\n\n", nsim, range, count);
-
-file bias<"dynamic_bias.dat">; # Dynamically generated bias for simulation ensemble
-
-bias = genrand(simulation_script, 1, 1000);
-
-file sims[]; # Array of files to hold each simulation output
-
-foreach i in [0:nsim-1] {
-
- int steps = readData(genrand(simulation_script, 1, 5));
- tracef(" for simulation[%i] steps=%i\n", i, steps+1);
-
- file simout <single_file_mapper; file=@strcat("output/sim_",i,".out")>;
- simout = simulation(simulation_script, steps+1, range, bias, 100000, count);
- sims[i] = simout;
-}
-
-file stats<"output/stats.out">; # Final output file: average of all "simulations"
-stats = analyze(analysis_script,sims);
------
-
-To run:
-----
-$ cd ../part09
-$ swift p9.swift
-----
-
-
-
-
-
-
Running Swift scripts on Cloud resources
----------------------------------------
More information about the Swift-commit
mailing list