[Swift-commit] r6710 - SwiftTutorials/tukey
davidk at ci.uchicago.edu
davidk at ci.uchicago.edu
Fri Aug 2 11:46:15 CDT 2013
Author: davidk
Date: 2013-08-02 11:46:15 -0500 (Fri, 02 Aug 2013)
New Revision: 6710
Added:
SwiftTutorials/tukey/README
Log:
Start of readme asciidoc - needs to be converted from uc3 to tukey
Added: SwiftTutorials/tukey/README
===================================================================
--- SwiftTutorials/tukey/README (rev 0)
+++ SwiftTutorials/tukey/README 2013-08-02 16:46:15 UTC (rev 6710)
@@ -0,0 +1,472 @@
+Swift UC3 mini-tutorial
+=======================
+
+Installing UC3 tutorial
+-----------------------
+
+Check out scripts from SVN
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+To checkout the most recent UC3 tutorial scripts from SVN, run the following
+command:
+
+-----
+$ svn co https://svn.ci.uchicago.edu/svn/vdl2/SwiftTutorials/uc3
+-----
+
+This will create a directory called uc3 which contains all of the scripts
+mentioned in this document.
+
+Run setup
+~~~~~~~~~
+Once the scripts are checked out, run the following commands to perform
+the initial setup.
+
+-----
+$ cd uc3 # change to the newly created uc3 directory
+$ source setup.sh # sets swift config files in $HOME/.swift
+$ swift -version # verify that Swift 0.94 is in your $PATH and functional
+-----
+
+NOTE: If you disconnect from the machine, you will need to re-run source setup.sh.
+
+Overview of the applications
+----------------------------
+There are two shell scripts included that act as a mock science application:
+simulation.sh and stats.sh
+
+simulation.sh
+~~~~~~~~~~~~~
+The simulation.sh script generates and prints a random number. It optionally
+takes the following arguments:
+
+.simulation.sh arguments
+[options="header"]
+|=======================
+|Argument number|Description
+|1 |runtime. Set how long simulation.sh should run, in seconds.
+|2 |range. Limit random numbers to a given range.
+|3 |biasfile. Look a number contained within this file to set bias.
+|4 |scale. Scale random number by this factor.
+|5 |n. Generate n number of random numbers.
+|=======================
+
+With no arguments, simulate.sh prints 1 number in the range of 1-100.
+-----
+$ ./simulate.sh
+96
+-----
+
+stats.sh
+~~~~~~~~
+The stats.sh script reads a file containing n numbers and prints the average
+of those numbers.
+
+Overview of the Swift scripts
+------------------------------
+Parts 1-6 run locally and serve as examples of the Swift language.
+Parts 7-9 submit jobs via Condor to UC3 resources
+
+part01
+~~~~~~
+The first swift script, p1.swift, runs simulate.sh to generate a single random
+number. It writes the number to a file.
+
+image:p1.png[]
+
+.p1.swift
+-----
+type file;
+
+app (file o) mysim ()
+{
+ simulate stdout=@filename(o);
+}
+
+file f = mysim();
+-----
+
+To run this script, run the following command:
+-----
+$ cd part01
+$ swift p1.swift
+-----
+
+The simulate application gets translated to simulate.sh within the 'apps' file.
+
+NOTE: Since the file you created is not named, swift will generate a random
+name for the file in a directory called _concurrent. To view the created
+output, run "cat _concurrent/*"
+
+To cleanup the directory and remove all outputs, run:
+-----
+$ ./cleanup.sh
+------
+
+part02
+~~~~~~
+The second swift script shows an example of naming the file. The output is now
+in a file called sim.out.
+
+image:p2.png[]
+
+.p2.swift
+-----
+type file;
+
+app (file o) mysim ()
+{
+ simulate stdout=@filename(o);
+}
+
+file f <"sim.out">;
+f = mysim();
+-----
+
+To run the script:
+-----
+$ cd part02
+$ swift p2.swift
+-----
+
+part03
+~~~~~~
+The p3.swift script introduces the foreach loop. This script runs many
+simulations. Output files are named here by Swift and will get created
+in the _concurrent directory.
+
+image:p3.png[]
+
+.p3.swift
+----
+type file;
+
+app (file o) mysim ()
+{
+ simulate stdout=@filename(o);
+}
+
+foreach i in [0:9] {
+ file f = mysim();
+}
+----
+
+To run:
+----
+$ cd part03
+$ swift p3.swift
+----
+
+part04
+~~~~~~
+Part 4 gives an example of naming multiple files within a foreach loop.
+
+image:p4.png[]
+
+.p4.swift
+----
+type file;
+
+app (file o) mysim ()
+{
+ simulate stdout=@filename(o);
+}
+
+foreach i in [0:9] {
+ file f <single_file_mapper; file=@strcat("output/sim_",i,".out")>;
+ f = mysim();
+}
+----
+
+To run:
+----
+$ swift p4.swift
+----
+
+Output files will be named output/sim_N.out.
+
+part05
+~~~~~~
+Part 5 introduces a postprocessing step. After many simulations have run, the files
+created by simulation.sh will be sent to stats.sh for averaging.
+
+image:p5.png[]
+
+.p5.swift
+----
+type file;
+
+app (file o) mysim ()
+{
+ simulate stdout=@filename(o);
+}
+
+app (file o) analyze (file s[])
+{
+ stats @filenames(s) stdout=@filename(o);
+}
+
+file sims[];
+
+int nsim = @toInt(@arg("nsim","10"));
+
+foreach i in [0:nsim-1] {
+ file simout <single_file_mapper; file=@strcat("output/sim_",i,".out")>;
+ simout = mysim();
+ sims[i] = simout;
+}
+
+file stats<"output/average.out">;
+stats = analyze(sims);
+----
+
+To run:
+----
+$ swift p5.swift
+----
+
+part06
+~~~~~~
+Part 6 introduces command line arguments. The script sets a variable called
+"steps" here, which determines the length of time that the simulation.sh
+will run for. It also defines a variable called nsim, which determines the
+number of simulations to run.
+
+image:p6.png[]
+
+.p6.swift
+----
+type file;
+
+app (file o) mysim (int timesteps)
+{
+ simulate timesteps stdout=@filename(o);
+}
+
+app (file o) analyze (file s[])
+{
+ stats @filenames(s) stdout=@filename(o);
+}
+
+file sims[];
+int nsim = @toInt(@arg("nsim","10"));
+int steps = @toInt(@arg("steps","1"));
+
+foreach i in [0:nsim-1] {
+ file simout <single_file_mapper; file=@strcat("output/sim_",i,".out")>;
+ simout = mysim(steps);
+ sims[i] = simout;
+}
+
+file stats<"output/average.out">;
+stats = analyze(sims);
+----
+
+Use the command below to specify the time for each simulation.
+----
+$ cd part06
+$ swift p6.swift -steps=3 # each simulation takes 3 seconds
+----
+
+part07
+~~~~~~
+Part 7 is the first script that will submit jobs to UC3 via Condor.
+It is similar to earlier scripts, with a few minor exceptions. Since
+there is not a shared filesystems when using OSG, the application simulate.sh
+will get transferred to the worker node by Swift.
+
+image:p7.png[]
+
+.p7.swift
+-----
+type file;
+
+# Application to be called by this script
+
+file simulation_script <"simulate.sh">;
+
+# app() functions for application programs to be called:
+
+app (file out) simulation (file script, int timesteps, int sim_range)
+{
+ sh @filename(script) timesteps sim_range stdout=@filename(out);
+}
+
+# Command line params to this script:
+
+int nsim = @toInt(@arg("nsim", "10")); # number of simulation programs to run
+int range = @toInt(@arg("range", "100")); # range of the generated random numbers
+
+# Main script and data
+
+int steps=3;
+
+tracef("\n*** Script parameters: nsim=%i steps=%i range=%i \n\n", nsim, steps, range);
+
+foreach i in [0:nsim-1] {
+ file simout <single_file_mapper; file=@strcat("output/sim_",i,".out")>;
+ simout = simulation(simulation_script, steps, range);
+}
+-----
+
+To run:
+----
+$ cd part07
+$ swift p7.swift
+----
+
+part08
+~~~~~~
+Part 8 will also stage in and run stats.sh to calculate averages. It adds a
+trace statement so you can see the order in which things execute.
+
+image:p8.png[]
+
+.p8.swift
+-----
+type file;
+
+# Applications to be called by this script
+
+file simulation_script <"simulate.sh">;
+file analysis_script <"stats.sh">;
+
+# app() functions for application programs to be called:
+
+app (file out) simulation (file script, int timesteps, int sim_range, file bias_file, int scale, int sim_count)
+{
+ sh @filename(script) timesteps sim_range @filename(bias_file) scale sim_count stdout=@filename(out);
+}
+
+app (file out) analyze (file script, file s[])
+{
+ sh @script @filenames(s) stdout=@filename(out);
+}
+
+# Command line params to this script:
+
+int nsim = @toInt(@arg("nsim", "10")); # number of simulation programs to run
+int steps = @toInt(@arg("steps", "1")); # number of "steps" each simulation (==seconds of runtime)
+int range = @toInt(@arg("range", "100")); # range of the generated random numbers
+int count = @toInt(@arg("count", "10")); # number of random numbers generated per simulation
+
+# Main script and data
+
+tracef("\n*** Script parameters: nsim=%i steps=%i range=%i count=%i\n\n", nsim, steps, range, count);
+
+file sims[]; # Array of files to hold each simulation output
+file bias<"bias.dat">; # Input data file to "bias" the numbers:
+ # 1 line: scale offset ( N = n*scale + offset)
+foreach i in [0:nsim-1] {
+ file simout <single_file_mapper; file=@strcat("output/sim_",i,".out")>;
+ simout = simulation(simulation_script, steps, range, bias, 100000, count);
+ sims[i] = simout;
+}
+
+file stats<"output/stats.out">; # Final output file: average of all "simulations"
+stats = analyze(analysis_script,sims);
+-----
+
+To run:
+----
+$ cd part08
+$ swift p8.swift
+----
+
+part09
+~~~~~~
+Part 9 adds another app function called genrand. Genrand will produce a random
+number that will be used to determine how long each simulation app will run.
+
+image:p9.png[]
+
+.p9.swift
+-----
+type file;
+
+# Applications to be called by this script
+
+file simulation_script <"simulate.sh">;
+file analysis_script <"stats.sh">;
+
+# app() functions for application programs to be called:
+
+app (file out) genrand (file script, int timesteps, int sim_range)
+{
+ sh @filename(script) timesteps sim_range stdout=@filename(out);
+}
+
+app (file out) simulation (file script, int timesteps, int sim_range, file bias_file, int scale, int sim_count)
+{
+ sh @filename(script) timesteps sim_range @filename(bias_file) scale sim_count stdout=@filename(out);
+}
+
+app (file out) analyze (file script, file s[])
+{
+ sh @script @filenames(s) stdout=@filename(out);
+}
+
+# Command line params to this script:
+int nsim = @toInt(@arg("nsim", "10")); # number of simulation programs to run
+int range = @toInt(@arg("range", "100")); # range of the generated random numbers
+int count = @toInt(@arg("count", "10")); # number of random numbers generated per simulation
+
+# Main script and data
+
+tracef("\n*** Script parameters: nsim=%i range=%i count=%i\n\n", nsim, range, count);
+
+file bias<"dynamic_bias.dat">; # Dynamically generated bias for simulation ensemble
+
+bias = genrand(simulation_script, 1, 1000);
+
+file sims[]; # Array of files to hold each simulation output
+
+foreach i in [0:nsim-1] {
+
+ int steps = readData(genrand(simulation_script, 1, 5));
+ tracef(" for simulation[%i] steps=%i\n", i, steps+1);
+
+ file simout <single_file_mapper; file=@strcat("output/sim_",i,".out")>;
+ simout = simulation(simulation_script, steps+1, range, bias, 100000, count);
+ sims[i] = simout;
+}
+
+file stats<"output/stats.out">; # Final output file: average of all "simulations"
+stats = analyze(analysis_script,sims);
+-----
+
+To run:
+----
+$ cd part09
+$ swift p9.swift
+----
+
+part10
+~~~~~~
+p10.swift is exactly the same as p9.swift. Instead of the swift script,
+take a look at the sites.xml configuration file.
+The sites.xml file determines where swift runs its job at. Here the
+line with the condor requirement to select nodes from the UC3 seeder
+cluster is left un-commented to select that site.
+
+-----
+<profile namespace="globus" key="condor.Requirements">regexp("uc3-c*", Machine)</profile>
+-----
+
+The condor requirements for selecting nodes from UC3 seeder, ITS
+Virtualization lab, Open Science Grid and Atlas Midwest Tier 2 (at UC,
+IU, UIUC) are present in the sites.xml file.
+To choose any of these sites, simply uncomment the requirement line
+for the target system and run the swift script as:
+
+To run:
+----
+$ cd part10
+$ swift p10.swift
+----
+
+Once the script completes, run the script find_host.sh to find where
+the jobs were run.
+
+-----
+./find_host.sh
+-----
More information about the Swift-commit
mailing list