[Swift-commit] r6985 - SwiftTutorials/swift-cray-tutorial/doc

davidk at ci.uchicago.edu davidk at ci.uchicago.edu
Fri Aug 23 15:45:04 CDT 2013


Author: davidk
Date: 2013-08-23 15:45:04 -0500 (Fri, 23 Aug 2013)
New Revision: 6985

Modified:
   SwiftTutorials/swift-cray-tutorial/doc/README
Log:
Crayify readme


Modified: SwiftTutorials/swift-cray-tutorial/doc/README
===================================================================
--- SwiftTutorials/swift-cray-tutorial/doc/README	2013-08-23 19:40:06 UTC (rev 6984)
+++ SwiftTutorials/swift-cray-tutorial/doc/README	2013-08-23 20:45:04 UTC (rev 6985)
@@ -1,5 +1,5 @@
-Swift CIC Tutorial - 2013.0827
-==============================
+Swift Cray Tutorial
+===================
 
 ////
 
@@ -9,22 +9,16 @@
 
 p1 - Run an application under Swift
 
-p2 - Mapping (naming) output files
+p2 - Parallel loops with foreach
 
-p3 - Parallel loops with foreach
+p3 - Merging/reducing the results of a parallel foreach loop
 
-p4 - Mapping arrays to files
+p4 - Running on the remote site nodes
 
-p5 - merging/reducing the results of a parallel foreach loop
+p5 - Running the stats summary step on the remote site
 
-p6 - Sending arguments to applications
+p6 - Add additional apps for generating seeds remotely 
 
-p7 - Running on the remote site nodes
-
-p8 - Running the stats summary step on the remote site
-
-p9 - A more complex workflow pattern: multiple parallel pipelines
-
 ////
 
 
@@ -33,28 +27,35 @@
 Workflow tutorial setup
 -----------------------
 
-Check out scripts from SVN
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-To checkout the most recent ATPESC tutorial scripts from SVN, run the following
-command:
+Installing scripts on Raven
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+If you are installing these scripts on Raven, run the following command to extract the tutorial scripts:
 
 -----
-$ svn co https://svn.ci.uchicago.edu/svn/vdl2/SwiftTutorials/CIC_2013-08-09 tutorial
+$ tar xfz /home/users/p01537/swift-cray-tutorial.tar.gz
 -----
 
 This will create a directory called "tutorial" which contains all of the
 scripts mentioned in this document.
 
+Installing scripts on other systems
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+If you are running on a machine other than Raven, you can install the scripts via SVN.
+
+-----
+$ svn co https://svn.ci.uchicago.edu/svn/vdl2/SwiftTutorials/swift-cray-tutorial
+-----
+
 Run setup
 ~~~~~~~~~
 Once the scripts are checked out, run the following commands to perform
 the initial setup.
 
 -----
-$ cd tutorial            # change to the newly created tutorial directory
-$ source setup.sh <SITE> # sets swift config files in $HOME/.swift
-$ java -version          # verify that you have Oracle JAVA (prefered; 1.6 or later)
-$ swift -version         # verify that Swift 0.94 is in your $PATH and functional
+$ cd swift-cray-tutorial   # change to the newly created tutorial directory
+$ source setup.sh          # sets swift config files in $HOME/.swift
+$ java -version            # verify that you have Oracle JAVA (prefered; 1.6 or later)
+$ swift -version           # verify that Swift 0.94 is in your $PATH and functional
 -----
 
 NOTE: If you re-login, you will need to re-run source setup.sh.
@@ -65,37 +66,96 @@
 simulation.sh and stats.sh
 
 simulation.sh
-~~~~~~~~~~~~~
+~~~~~~~~~~~~
 The simulation.sh script serves as a trivial substitute for a complex scientific simulation application. It generates and prints a set of one or more random integers in the range [0-2^32) as controlled by its optional arguments, which are:
 
+-----
+$ ./app/simulate.sh --help
+./app/simulate.sh: usage:
+    -b|--bias       offset bias: add this integer to all results
+    -B|--biasfile   file of integer biases to add to results
+    -l|--log        generate a log in stderr if not null
+    -n|--nvalues    print this many values per simulation            
+    -r|--range      range (limit) of generated results
+    -s|--seed       use this integer [0..32767] as a seed
+    -S|--seedfile   use this file (containing integer seeds [0..32767]) one per line
+    -t|--timesteps  number of simulated "timesteps" in seconds (determines runtime)
+    -x|--scale      scale the results by this integer
+    -h|-?|?|--help  print this help
+$ 
+-----
+
+////
 .simulation.sh arguments
 [width="80%",cols="^2,10",options="header"]
 
 |=======================
-|Argument number|Description
+|Argument|Short|Description
 |1    |runtime: sets run time of simulation.sh in seconds
 |2    |range: limits generated values to the range [0,range-1]
 |3    |biasfile: add the integer contained in this file to each value generated
 |4    |scale: multiplies each generated value by this integer
 |5    |count: number of values to generate in the simulation
 |=======================
+////
 
-With no arguments, simulate.sh prints 1 number in the range of 1-100. Otherwise it generates n numbers of the form (R*scale)+bias where R is a random integer. 
+With no arguments, simulate.sh prints 1 number in the range of 1-100. Otherwise it generates n numbers of the form (R*scale)+bias where R is a random integer. By default it logs information about its execution environment to stderr.  Here's some examples of its usage:
 
 -----
-$ ./simulate.sh
-96
+$ simulate.sh 2>log
+       5
+$ head -5 log
+
+Called as: /home/users/p01537/swift-cray-tutorial/app/simulate.sh: 
+
+Start time: Fri Aug 23 15:07:16 CDT 2013
+Running on node: raven
+
+$ tail -5 log
+SSH_CLIENT=67.173.156.31 46887 22
+SSH_CONNECTION=67.173.156.31 46887 128.135.158.173 22
+SSH_TTY=/dev/pts/9
+TERM=xterm-color
+USER=wilde
+$ 
+
+$ simulate.sh -n 3 -r 1000000 2>log
+  386702
+  239454
+   13849
+
+$ simulate.sh -n 3 -r 1000000 -x 100 2>log
+ 6643700
+62182300
+ 5230600
+
+$ simulate.sh -n 3 -r 1000 -x 1000 2>log
+  565000
+  636000
+  477000
+
+$ time simulate.sh -n 3 -r 1000 -x 1000 -t 3 2>log
+  336000
+   20000
+  320000
+real    0m3.012s
+user    0m0.005s
+sys     0m0.006s
+$ 
+
+
 -----
 
+
 stats.sh
-~~~~~~~~
+~~~~~~~
 The stats.sh script reads a file containing n numbers and prints the average
 of those numbers to stdout.
 
 Introductory exercises
 ----------------------
-Parts 1-6 (p1.swift - p6.swift) run locally and serve as examples of the Swift language.
-Parts 7-9 (p7.swift - p9.swift) submit jobs to the site specified the setup stage
+Parts 1-3 (p1.swift - p3.swift) run locally and serve as examples of the Swift language.
+Parts 4-6 (p4.swift - p6.swift) submit jobs to the site specified the setup stage
 
 p1 - Run an application under Swift
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -106,16 +166,27 @@
 
 .p1.swift
 -----
-type file;
+sys::[cat -n ../part01/p1.swift]
+-----
 
-app (file o) mysim ()
-{
-  simulate stdout=@filename(o);
-}
+The sites.xml file included in each part directory gives Swift information about the machines we will be running on.
+It defines things like the work directory, the scheduler to use, and how to control parallelism. The sites.xml file
+below will tell Swift to run on the local machine only, and run just 1 task at a time.
 
-file f = mysim();
+.sites.xml
 -----
+sys::[cat -n ../part01/sites.xml]
+-----
 
+The app file translates from a Swift app function to the path of an executable on the file system. 
+In this case, it translates from "simulate" to simulate.sh and assumes that simulate.sh will 
+be available in your $PATH.
+
+.apps
+-----
+sys::[cat -n ../part01/apps]
+-----
+
 To run this script, run the following command:
 -----
 $ cd part01
@@ -133,24 +204,17 @@
 $ ./clean.sh
 ------
 
-p2 - Mapping (naming) output files
+p2 - Parallel loops with foreach
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The second swift script shows an example of naming the file. The output is now
-in a file called sim.out.
+The p2.swift script introduces the foreach loop. This script runs many
+simulations. The script also shows an example of naming the files. The output files
+are now called sim_N.out.
 
 image:p2.png[]
 
 .p2.swift
 -----
-type file;
-
-app (file o) mysim ()
-{
-  simulate stdout=@filename(o);
-}
-
-file f <"sim.out">;
-f = mysim();
+sys::[cat -n ../part02/p2.swift]
 -----
 
 To run the script:
@@ -159,26 +223,16 @@
 $ swift p2.swift
 -----
 
-p3 - Parallel loops with foreach
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The p3.swift script introduces the foreach loop. This script runs many
-simulations. Output files are named here by Swift and will get created
-in the _concurrent directory.
+p3 - Merging/reducing the results of a parallel foreach loop
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+p3.swift introduces a postprocessing step. After all the parallel simulations have completed, the files
+created by simulation.sh will be averaged by stats.sh.
 
 image:p3.png[]
 
 .p3.swift
 ----
-type file;
-
-app (file o) mysim ()
-{
-  simulate stdout=@filename(o);
-}
-
-foreach i in [0:9] {
-  file f = mysim();
-}
+sys::[cat -n ../part03/p3.swift]
 ----
 
 To run:
@@ -187,25 +241,18 @@
 $ swift p3.swift
 ----
 
-p4 - Mapping arrays to files
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-p4.swift gives an example of naming multiple files within a foreach loop.
+p4 - Running on the remote site nodes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+p4.swift is the first script that will submit jobs to remote site nodes for analysis.
+It is similar to earlier scripts, with a few minor exceptions. To generalize the script
+for other types of remote execution (e.g., when no shared filesystem is available to the compute nodes), the application simulate.sh
+will get transferred to the worker node by Swift, in the same manner as any other input data file.
 
 image:p4.png[]
 
 .p4.swift
 ----
-type file;
-
-app (file o) mysim ()
-{
-  simulate stdout=@filename(o);
-}
-
-foreach i in [0:9] {
-  file f <single_file_mapper; file=@strcat("output/sim_",i,".out")>;
-  f = mysim();
-}
+sys::[cat -n ../part04/p4.swift]
 ----
 
 To run:
@@ -215,39 +262,17 @@
 
 Output files will be named output/sim_N.out.
 
-p5 - merging/reducing the results of a parallel foreach loop
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+p5 - Running the stats summary step on the remote site
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 p5.swift introduces a postprocessing step. After all the parallel simulations have completed, the files
-created by simulation.sh will be averaged by stats.sh.
+created by simulation.sh will be averaged by stats.sh. This is similar to p3, but all app invocations 
+are done on remote nodes with Swift managing file transfers. 
 
 image:p5.png[]
 
 .p5.swift
 ----
-type file;
-
-app (file o) mysim ()
-{
-  simulate stdout=@filename(o);
-}
-
-app (file o) analyze (file s[])
-{
-  stats @filenames(s) stdout=@filename(o);
-}
-
-file sims[];
-
-int nsim = @toInt(@arg("nsim","10"));
-
-foreach i in [0:nsim-1] {
-  file simout <single_file_mapper; file=@strcat("output/sim_",i,".out")>;
-  simout = mysim();
-  sims[i] = simout;
-}
-
-file stats<"output/average.out">;
-stats = analyze(sims);
+sys::[cat -n ../part05/p5.swift]
 ----
 
 To run:
@@ -255,227 +280,41 @@
 $ swift p5.swift
 ----
 
-p6 - Sending arguments to applications
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-p6.swift introduces command line arguments. The script sets a variable called
-"steps" here, which determines the length of time that the simulation.sh
-will run for. It also defines a variable called nsim, which determines the
-number of simulations to run.
+p6 - Add additional apps and randomness
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+p6.swift build on p5.swift, but adds new apps for generating a random
+seed and a random bias value. 
 
 image:p6.png[]
 
 .p6.swift
 ----
-type file;
-
-app (file o) mysim (int timesteps)
-{
-  simulate timesteps stdout=@filename(o);
-}
-
-app (file o) analyze (file s[])
-{
-  stats @filenames(s) stdout=@filename(o);
-}
-
-file sims[];
-int  nsim = @toInt(@arg("nsim","10"));
-int steps = @toInt(@arg("steps","1"));
-
-foreach i in [0:nsim-1] {
-  file simout <single_file_mapper; file=@strcat("output/sim_",i,".out")>;
-  simout = mysim(steps);
-  sims[i] = simout;
-}
-
-file stats<"output/average.out">;
-stats = analyze(sims);
+sys::[cat -n ../part06/p6.swift]
 ----
 
-Use the command below to specify the time for each simulation.
-----
-$ cd ../part06
-$ swift p6.swift -steps=3  # each simulation takes 3 seconds
-----
+In order to run on the cluster, sites.xml needed to be modified. Here is 
+the new sites.xml we are using for this example. Note the changes between the sites.xml file
+in this example which uses condor, and the sites.xml file in part 1, which runs locally.
 
-p7 - Running on the remote site nodes
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-p7.swift is the first script that will submit jobs to remote site nodes for analysis.
-It is similar to earlier scripts, with a few minor exceptions. To generalize the script
-for other types of remote execution (e.g., when no shared filesystem is available to the compute nodes), the application simulate.sh
-will get transferred to the worker node by Swift, in the same manner as any other input data file.
-
-image:p7.png[]
-
-.p7.swift
+.sites.xml
 -----
-type file;
-
-# Application to be called by this script
-
-file simulation_script <"simulate.sh">;
-
-# app() functions for application programs to be called:
-
-app (file out) simulation (file script, int timesteps, int sim_range)
-{
-  sh @filename(script) timesteps sim_range stdout=@filename(out);
-}
-
-# Command line params to this script:
-
-int  nsim  = @toInt(@arg("nsim",  "10"));  # number of simulation programs to run
-int  range = @toInt(@arg("range", "100")); # range of the generated random numbers
-
-# Main script and data
-
-int steps=3;
-
-tracef("\n*** Script parameters: nsim=%i steps=%i range=%i \n\n", nsim, steps, range);
-
-foreach i in [0:nsim-1] {
-  file simout <single_file_mapper; file=@strcat("output/sim_",i,".out")>;
-  simout = simulation(simulation_script, steps, range);
-}
+sys::[cat -n ../part06/sites.xml]
 -----
 
-To run:
-----
-$ cd ../part07
-$ swift p7.swift
-----
+Below is the updated apps file. Since Swift is staging shell scripts remotely to nodes on the cluster,
+the only application it needs defined here is the shell.
 
-p8 - Running the stats summary step on the remote site
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-p8.swift will also stage in and run stats.sh to calculate averages. It adds a
-trace statement so you can see the order in which things execute.
-
-image:p8.png[]
-
-.p8.swift
+.apps
 -----
-type file;
-
-# Applications to be called by this script
-
-file simulation_script <"simulate.sh">;
-file analysis_script   <"stats.sh">;
-
-# app() functions for application programs to be called:
-
-app (file out) simulation (file script, int timesteps, int sim_range, file bias_file, int scale, int sim_count)
-{
-  sh @filename(script) timesteps sim_range @filename(bias_file) scale sim_count stdout=@filename(out);
-}
-
-app (file out) analyze (file script, file s[])
-{
-  sh @script @filenames(s) stdout=@filename(out);
-}
-
-# Command line params to this script:
-
-int  nsim  = @toInt(@arg("nsim",  "10"));  # number of simulation programs to run
-int  steps = @toInt(@arg("steps", "1"));   # number of "steps" each simulation (==seconds of runtime)
-int  range = @toInt(@arg("range", "100")); # range of the generated random numbers
-int  count = @toInt(@arg("count", "10"));  # number of random numbers generated per simulation
-
-# Main script and data
-
-tracef("\n*** Script parameters: nsim=%i steps=%i range=%i count=%i\n\n", nsim, steps, range, count);
-
-file sims[];                               # Array of files to hold each simulation output
-file bias<"bias.dat">;                     # Input data file to "bias" the numbers:
-                                           # 1 line: scale offset ( N = n*scale + offset)
-foreach i in [0:nsim-1] {
-  file simout <single_file_mapper; file=@strcat("output/sim_",i,".out")>;
-  simout = simulation(simulation_script, steps, range, bias, 100000, count);
-  sims[i] = simout;
-}
-
-file stats<"output/stats.out">;         # Final output file: average of all "simulations"
-stats = analyze(analysis_script,sims);
+sys::[cat -n ../part06/apps]
 -----
 
-To run:
+Use the command below to specify the time for each simulation.
 ----
-$ cd ../part08
-$ swift p8.swift
+$ cd ../part06
+$ swift p6.swift -steps=3  # each simulation takes 3 seconds
 ----
 
-p9 - A more complex workflow pattern: multiple parallel pipelines
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-p9.swift adds another app function called genrand. Genrand will produce a random
-number that will be used to determine how long each simulation app will run.
-
-image:p9.png[]
-
-.p9.swift
------
-type file;
-
-# Applications to be called by this script
-
-file simulation_script <"simulate.sh">;
-file analysis_script   <"stats.sh">;
-
-# app() functions for application programs to be called:
-
-app (file out) genrand (file script, int timesteps, int sim_range)
-{
-  sh @filename(script) timesteps sim_range stdout=@filename(out);
-}
-
-app (file out) simulation (file script, int timesteps, int sim_range, file bias_file, int scale, int sim_count)
-{
-  sh @filename(script) timesteps sim_range @filename(bias_file) scale sim_count stdout=@filename(out);
-}
-
-app (file out) analyze (file script, file s[])
-{
-  sh @script @filenames(s) stdout=@filename(out);
-}
-
-# Command line params to this script:
-int  nsim  = @toInt(@arg("nsim",  "10"));  # number of simulation programs to run
-int  range = @toInt(@arg("range", "100")); # range of the generated random numbers
-int  count = @toInt(@arg("count", "10"));  # number of random numbers generated per simulation
-
-# Main script and data
-
-tracef("\n*** Script parameters: nsim=%i range=%i count=%i\n\n", nsim, range, count);
-
-file bias<"dynamic_bias.dat">;        # Dynamically generated bias for simulation ensemble
-
-bias = genrand(simulation_script, 1, 1000);
-
-file sims[];                               # Array of files to hold each simulation output
-
-foreach i in [0:nsim-1] {
-
-  int steps = readData(genrand(simulation_script, 1, 5));
-  tracef("  for simulation[%i] steps=%i\n", i, steps+1);
-
-  file simout <single_file_mapper; file=@strcat("output/sim_",i,".out")>;
-  simout = simulation(simulation_script, steps+1, range, bias, 100000, count);
-  sims[i] = simout;
-}
-
-file stats<"output/stats.out">;            # Final output file: average of all "simulations"
-stats = analyze(analysis_script,sims);
------
-
-To run:
-----
-$ cd ../part09
-$ swift p9.swift
-----
-
-
-
-
-
-
 Running Swift scripts on Cloud resources
 ----------------------------------------
 




More information about the Swift-commit mailing list