[Swift-commit] r4377 - www/cookbook

ketan at ci.uchicago.edu ketan at ci.uchicago.edu
Fri Apr 15 10:31:59 CDT 2011


Author: ketan
Date: 2011-04-15 10:31:59 -0500 (Fri, 15 Apr 2011)
New Revision: 4377

Modified:
   www/cookbook/cookbook-asciidoc.html
   www/cookbook/cookbook-asciidoc.txt
Log:
more content: need to refine

Modified: www/cookbook/cookbook-asciidoc.html
===================================================================
--- www/cookbook/cookbook-asciidoc.html	2011-04-15 15:12:56 UTC (rev 4376)
+++ www/cookbook/cookbook-asciidoc.html	2011-04-15 15:31:59 UTC (rev 4377)
@@ -581,11 +581,21 @@
 <div class="sect1">
 <h2 id="_overview">1. Overview</h2>
 <div class="sectionbody">
-<div class="paragraph"><p>Swift cookbook overview. Goals of this cookbook. Organization of this cookbook. Benefits of cookbook.</p></div>
-<div class="paragraph"><p>This cookbook covers various recipes involving running Swift under diverse configurations based on the application requirements and underlying infrastructures. The SwiftScript language and the Swift runtim system. For introductory material, consult the Swift tutorial.</p></div>
-<div class="paragraph"><p>Swift is a data-oriented coarse grained scripting language that supports dataset typing and mapping, dataset iteration, conditional branching, and procedural composition.</p></div>
+<div class="paragraph"><p>Swift cookbook overview. Goals of this cookbook. Organization of this
+cookbook. Benefits of cookbook.</p></div>
+<div class="paragraph"><p>This cookbook covers various recipes involving running Swift under diverse
+configurations based on the application requirements and underlying
+infrastructures. The SwiftScript language and the Swift runtim system. For
+introductory material, consult the Swift tutorial.</p></div>
+<div class="paragraph"><p>Swift is a data-oriented coarse grained scripting language that supports
+dataset typing and mapping, dataset iteration, conditional branching, and
+procedural composition.</p></div>
 <div class="paragraph"><p>Swift programs (or workflows) are written in a language called SwiftScript.</p></div>
-<div class="paragraph"><p>SwiftScript programs are dataflow oriented - they are primarily concerned with processing (possibly large) collections of data files, by invoking programs to do that processing. Swift handles execution of such programs on remote sites by choosing sites, handling the staging of input and output files to and from the chosen sites and remote execution of program code.</p></div>
+<div class="paragraph"><p>SwiftScript programs are dataflow oriented - they are primarily concerned with
+processing (possibly large) collections of data files, by invoking programs to
+do that processing. Swift handles execution of such programs on remote sites
+by choosing sites, handling the staging of input and output files to and from
+the chosen sites and remote execution of program code.</p></div>
 </div>
 </div>
 <div class="sect1">
@@ -594,7 +604,34 @@
 <div class="sect2">
 <h3 id="_installation">2.1. Installation</h3>
 <div class="paragraph"><p>Installation instructions</p></div>
+<div class="sect3">
+<h4 id="_prerequisites">2.1.1. prerequisites</h4>
+<div class="paragraph"><p>Check your Java
+Swift is a Java application. Make sure you’re running Java 5 or higher. You
+can make sure you have Java in your $HOME/.soft file and the softenv system
+(<a href="http://www.ci.uchicago.edu/wiki/bin/view/Resources/Softenv">http://www.ci.uchicago.edu/wiki/bin/view/Resources/Softenv</a>) will set it up
+for you. To run Java 6:</p></div>
+<div class="listingblock">
+<div class="content">
+<pre><tt>$ grep java $HOME/.soft
+#+java-sun # Gives you Java 5
++java-1.6.0_03-sun-r1
+$ which java
+/soft/java-1.6.0_11-sun-r1/bin/java
+$ java -version
+java version "1.6.0_11"
+Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
+Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode)</tt></pre>
+</div></div>
+<div class="paragraph"><p>Setting up to run Swift
+This is simple. We’ll be using a version of the Swift stable SVN branch,
+compiled for this class.  Make sure you have a suitable Java set up. The examples were tested with
+Java version 1.6 Make sure you don’t already have Swift in your PATH. If you do, remove it,
+or remove any +swift or @swift lines from your $HOME/.soft file. Then do: PATH=$PATH:/home/wilde/bigdata/swift/bin
+ Do NOT set SWIFT_HOME or CLASSPATH in your environment unless you fully
+understand how these will affect Swift’s execution</p></div>
 </div>
+</div>
 <div class="sect2">
 <h3 id="_environment_setup">2.2. Environment Setup</h3>
 <div class="paragraph"><p>Setting up the environment</p></div>
@@ -607,6 +644,23 @@
 distribution and trunk.</td>
 </tr></table>
 </div>
+<div class="literalblock">
+<div class="content">
+<pre><tt>    To execute your Swift script on the login host ("localhost") use this
+command:</tt></pre>
+</div></div>
+<div class="literalblock">
+<div class="content">
+<pre><tt>swift -tc.file tc modis.swift</tt></pre>
+</div></div>
+<div class="literalblock">
+<div class="content">
+<pre><tt>To execute your Swift script on the PADS cluster use this command:</tt></pre>
+</div></div>
+<div class="literalblock">
+<div class="content">
+<pre><tt>swift -tc.file tc -sites.file pbs.xml modis.swift</tt></pre>
+</div></div>
 <div class="sect3">
 <h4 id="_setting_transformation_catalog">2.2.1. Setting transformation catalog</h4>
 <div class="paragraph"><p>tc</p></div>
@@ -615,11 +669,148 @@
 <h4 id="_setting_swift_configuration">2.2.2. Setting swift configuration</h4>
 <div class="paragraph"><p>cf</p></div>
 </div>
+<div class="sect3">
+<h4 id="_mappers">2.2.3. Mappers</h4>
+<div class="paragraph"><p>SimpleMapper</p></div>
+<div class="listingblock">
+<div class="content">
+<pre><tt>com$ cat swiftapply.swift
+type RFile;
+trace("hi 1");
+app (RFile result) RunR (RFile rcall)
+{
+  RunR @rcall @result;
+}
+trace("hi 2");
+RFile rcalls[] ;
+RFile results[] ;
+trace("start");
+foreach c, i in rcalls {
+  trace("c",i, at c);
+  trace("r",i, at filename(results[i]));
+  results[i] = RunR(c);
+}
+com$ ls calldir resdir
+calldir:
+rcall.1.Rdata  rcall.2.Rdata  rcall.3.Rdata  rcall.4.Rdata
+resdir:
+result.1.Rdata result.2.Rdata result.3.Rdata result.4.Rdata
+com$</tt></pre>
+</div></div>
+<div class="paragraph"><p>Notes:</p></div>
+<div class="paragraph"><p>how the .'s match
+prefix and suffix dont span dirs
+intervening pattern must be digits
+these digits become the array indices
+explain how padding= arg works & helps (including padding=0)
+figure out and explain differences between simple_mapper and
+filesys_mapper
+FIXME: Use the "filesys_mapper" and its "location=" parameter to map the
+input data from /home/wilde/bigdata/*</p></div>
+<div class="paragraph"><p>Abbreviations for SingleFileMapper
+Notes:</p></div>
+<div class="paragraph"><p>within <> you can only have a literal string as in <"filename">, not an
+expression. Someday we will fix this to make <> accept a general expression.
+you can use @filenames( ) (note: plural) to pull off a list of filenames.</p></div>
+<div class="paragraph"><p>writeData()</p></div>
+<div class="paragraph"><p>example here</p></div>
+<div class="listingblock">
+<div class="content">
+<pre><tt>$ cat writedata.swift
+type file;
+file f <"filea">;
+file nf <"filenames">;
+nf = writeData(@f);
+$ swift writedata.swift
+Swift svn swift-r3264 (swift modified locally) cog-r2730 (cog modified
+locally)
+RunID: 20100319-2002-s9vpo0pe
+Progress:
+Final status:
+$ cat filenames
+filea$
+$</tt></pre>
+</div></div>
+<div class="paragraph"><p>StructuredRegexpMapper
+IN PROGRESS This mapper can be used to base the mapped filenames of an output
+array on the mapped filenames of an existing array. landuse outputfiles[]
+<structured_regexp_mapper; source=inputfiles,
+location="./output",match="(.)*tif", transform="\\1histogram">;</p></div>
+<div class="paragraph"><p>Use the undocumented "structured_regexp_mapper" to name the output
+filenames based on the input filenames:</p></div>
+<div class="paragraph"><p>For example:</p></div>
+<div class="listingblock">
+<div class="content">
+<pre><tt>login2$ ls /home/wilde/bigdata/data/sample
+h11v04.histogram  h11v05.histogram  h12v04.histogram  h32v08.histogram
+h11v04.tif        h11v05.tif        h12v04.tif        h32v08.tif
+login2$
+
+login2$ cat regexp2.swift
+type tif;
+type mytype;
+
+tif  images[]<filesys_mapper;
+location="/home/wilde/bigdata/data/sample", prefix="h", suffix=".tif">;
+
+mytype of[] <structured_regexp_mapper; source=images, match="(h..v..)",
+transform="output/myfile.\\1.mytype">;
+
+foreach image, i in images {
+   trace(i, at filename(images));
+   trace(i, at filename(of[i]));
+}
+login2$
+
+login1$ swift regexp2.swift
+Swift svn swift-r3255 (swift modified locally) cog-r2723 (cog modified
+locally)
+
+RunID: 20100310-1105-4okarq08
+Progress:
+SwiftScript trace: 1, output/myfile.h11v04.mytype
+SwiftScript trace: 2, home/wilde/bigdata/data/sample/h11v05.tif
+SwiftScript trace: 3, home/wilde/bigdata/data/sample/h12v04.tif
+SwiftScript trace: 0, output/myfile.h32v08.mytype
+SwiftScript trace: 0, home/wilde/bigdata/data/sample/h32v08.tif
+SwiftScript trace: 3, output/myfile.h12v04.mytype
+SwiftScript trace: 1, home/wilde/bigdata/data/sample/h11v04.tif
+SwiftScript trace: 2, output/myfile.h11v05.mytype
+Final status:
+login1$</tt></pre>
+</div></div>
 </div>
+</div>
 <div class="sect2">
 <h3 id="_first_swiftscript">2.3. First SwiftScript</h3>
 <div class="paragraph"><p>Your first SwiftScript
 Hello Swift-World!</p></div>
+<div class="paragraph"><p>A good sanity check that Swift is set up and running OK locally is this:</p></div>
+<div class="listingblock">
+<div class="content">
+<pre><tt>$ which swift
+
+/home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/swift
+
+$ echo 'trace("Hello, Swift world!");' >hello.swift
+
+$ swift hello.swift
+
+Swift svn swift-r3202 cog-r2682
+
+RunID: 20100115-1240-6xhzxuz3
+
+Progress:
+
+SwiftScript trace: Hello, Swift world!
+
+Final status:
+
+$</tt></pre>
+</div></div>
+<div class="paragraph"><p>A good first tutorial in using Swift is at:
+<a href="http://www.ci.uchicago.edu/swift/guides/tutorial.php">http://www.ci.uchicago.edu/swift/guides/tutorial.php</a>. Follow the steps in that
+tutorial to learn how to run a few simple scripts on the login host.</p></div>
 </div>
 <div class="sect2">
 <h3 id="_second_swiftscript">2.4. second SwiftScript</h3>
@@ -631,8 +822,109 @@
 <div class="paragraph"><p>A description of Swift Commandline Options</p></div>
 <div class="paragraph"><p>Also includes a description of Swift inputs and outputs.</p></div>
 </div>
+<div class="sect2">
+<h3 id="_resuming_a_stopped_or_crashed_swift_run">2.6. Resuming a stopped or crashed Swift Run</h3>
+<div class="paragraph"><p>I had a .rlog file from a Swift run that ran out of time. I kicked it off
+using the -resume flag described in section 16.2 of the Swift User Guide and
+it picked up where it left off. Then I killed it because I wanted to make
+changes to my sites file.</p></div>
+<div class="listingblock">
+<div class="content">
+<pre><tt>. . .
+Progress:  Selecting site:1150  Stage in:55  Active:3  Checking status:1
+Stage out:37  Finished in previous run:2462  Finished successfully:96
+Progress:  Selecting site:1150  Stage in:55  Active:2  Checking status:1
+Stage out:38  Finished in previous run:2462  Finished successfully:96
+Cleaning up...
+Shutting down service at https://192.5.86.6:54813
+Got channel MetaChannel: 1293358091 -> null
++ Done
+Canceling job 9297.svc.pads.ci.uchicago.edu</tt></pre>
+</div></div>
+<div class="paragraph"><p>No new rlog file was emitted but it did recognize the progress that had been
+made, the 96 tasks that finished sucessfully above and resumed from 2558 tasks
+finished.</p></div>
+<div class="listingblock">
+<div class="content">
+<pre><tt>[nbest at login2 files]$ pwd
+/home/nbest/bigdata/files
+[nbest at login2 files]$
+~wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/swift \
+> -tc.file tc -sites.file pbs.xml ~/scripts/mcd12q1.swift -resume
+> mcd12q1-20100310-1326-ptxe1x1d.0.rlog
+Swift svn swift-r3255 (swift modified locally) cog-r2723 (cog modified
+locally)
+RunID: 20100311-1027-148caf0a
+Progress:
+Progress:  uninitialized:4
+Progress:  Selecting site:671  Initializing site shared directory:1  Finished
+in previous run:1864
+Progress:  uninitialized:1  Selecting site:576  Stage in:96  Finished in
+previous run:1864
+Progress:  Selecting site:1150  Stage in:94  Submitting:2  Finished in
+previous run:2558
+Progress:  Selecting site:1150  Stage in:94  Submitted:2  Finished in previous
+run:2558
+Progress:  Selecting site:1150  Stage in:93  Submitting:1  Submitted:2
+Finished in previous run:2558
+Progress:  Selecting site:1150  Stage in:90  Submitting:1  Submitted:5
+Finished in previous run:2558
+Progress:  Selecting site:1150  Stage in:90  Submitted:5  Active:1  Finished
+in previous run:2558</tt></pre>
+</div></div>
+<div class="paragraph"><p>From Neil: A comment about that section of the user guide: It says "In order
+to restart from a restart log file, the -resume logfile argument can be used
+after the SwiftScript? program file name." and then puts the -resume logfile
+argument before the script file name. I’m sure the order doesn’t matter but
+the contradiction is confusing.</p></div>
+<div class="paragraph"><p>Notes to add (from Mike):</p></div>
+<div class="ulist"><ul>
+<li>
+<p>
+explain what aspects of a Swift script make it restartable, and which
+  aspects are notrestartable. Eg, if your mappers can return different data at
+different times, what happens? What other non-determinsitc behavior would
+cause unpredictable, unexpected, or undesired behavior on resumption?
+</p>
+</li>
+<li>
+<p>
+explain what changes you can make in the execution environment (eg
+  increasing or reducing CPUs to run on or throttles, etc); fixing tc.data
+entries, env vars, or apps, etc.
+</p>
+</li>
+<li>
+<p>
+note that resume will again retry failed app() calls. Explain if the retry
+  count starts over or not.
+</p>
+</li>
+<li>
+<p>
+explain how to resume after multiple failures and resumes - i.e. if a .rlog
+  is generated on each run, which one should you resume from? Do you have a
+choice of resuming from any of them, and what happens if you go backwards to
+an older resume file?
+</p>
+</li>
+<li>
+<p>
+whap happens when you kill (eg with <sup>C) a running swift script? Is the
+  signal caught, and the resume file written out at that point? Or written out
+all along? (Note case in which script ws running for hours, then hit </sup>C, but
+resume fie was short (54 bbytes) and swift shows no sign of doing a resume?
+(It silently ignored resume file instead of acknowleging that it found one
+with not useful resume state in it???) Swift should clearly state that its
+resuming and what its resume state is.
+</p>
+</li>
+</ul></div>
+<div class="paragraph"><p><tt>swift -resume ftdock-[id].0.rlog \[rest of the exact command line from initial
+run\]</tt></p></div>
 </div>
 </div>
+</div>
 <div class="sect1">
 <h2 id="_swift_on_8230">3. Swift on …</h2>
 <div class="sectionbody">
@@ -650,6 +942,8 @@
 <div class="paragraph"><p><strong>step 4.</strong> Start using Swift! To get started with and example, copy the folder at
 /lustre/beagle/ketan/labs/catsn.works to your home and follow the instructions
 in README of that folder.</p></div>
+<div class="paragraph"><p>Note: Running from sandbox node or requesting 1 hour walltime for upto 3 nodes
+will get fast prioritized execution. Good for small tests</p></div>
 </div>
 <div class="sect2">
 <h3 id="_intrepid_bg_p">3.2. Intrepid-BG/P</h3>
@@ -666,17 +960,17 @@
 <div class="sect2">
 <h3 id="_bionimbus">3.5. Bionimbus</h3>
 <div class="paragraph"><p>Swift on Bionimbus
-.VERY IMPORTANT TO READ THIS</p></div>
-<div class="paragraph"><p>CONTAINS INFORMATION ON HOW TO CONNECT BACK TO GATEWAY FROM VIRTUAL MACHINES
-USING REVERSE SSH TUNNELING</p></div>
-<div class="paragraph"><p>TO OPEN A REVERSE SSH TUNNEL DO THE FOLLOWING:
-.FROM THE GATEWAY PROMPT</p></div>
+.very important to read this</p></div>
+<div class="paragraph"><p>contains information on how to connect back to gateway from virtual machines
+using reverse ssh tunneling</p></div>
+<div class="paragraph"><p>to open a reverse ssh tunnel do the following:
+.from the gateway prompt</p></div>
 <div class="paragraph"><p><tt>ssh -R *:5000:localhost:5000 <a href="mailto:root at 10.101.8.50">root at 10.101.8.50</a> sleep 999</tt></p></div>
 <div class="paragraph"><p>WHERE:
 *=network interface, should remain the same on all cases
 localhost=the gateway host, should remain the same</p></div>
 <div class="paragraph"><p>5000(LEFT OF localhost)=the port number on localhost to listen to **THIS WILL
-VARY DEPENDING UPON WHICH PORT YOU WANT TO LISTEN TO</p></div>
+vary depending upon which port you want to listen to</p></div>
 <div class="paragraph"><p>5000(RIGHT OF localhost)=the port on target host that you want to forward</p></div>
 <div class="paragraph"><p><a href="mailto:root at 10.101.8.50">root at 10.101.8.50</a>=the ip of the Virtual Machine on bionimbus cloud, this will
 vary based on what ip you get for your Virtual Machine instance</p></div>
@@ -696,16 +990,90 @@
 <h2 id="_coasters">4. Coasters</h2>
 <div class="sectionbody">
 <div class="paragraph"><p>Describe coasters mechanisms.
-Include neat diagrams.</p></div>
+<strong>Include neat diagrams.</strong></p></div>
+<div class="paragraph"><p>A nice coasters setup case-study:</p></div>
+<div class="listingblock">
+<div class="content">
+<pre><tt>Your main sites.xml coaster settings were:
+
+<execution provider="coaster" jobmanager="local:pbs"/>
+<profile namespace="globus" key="project">CI-CCR000013</profile>
+<profile namespace="globus" key="ppn">24:cray:pack</profile>
+<profile namespace="globus" key="workersPerNode">24</profile>
+<profile namespace="globus" key="maxTime">100000</profile>
+<profile namespace="globus" key="lowOverallocation">100</profile>
+<profile namespace="globus" key="highOverallocation">100</profile>
+<profile namespace="globus" key="slots">20</profile>
+<profile namespace="globus" key="nodeGranularity">5</profile>
+<profile namespace="globus" key="maxNodes">5</profile>
+<profile namespace="karajan" key="jobThrottle">20.00</profile>
+<profile namespace="karajan" key="initialScore">10000</profile>
+
+Your tc entry (shortened here) was:
+
+pbs modftdock /.../modftdock.sh null null GLOBUS::maxwalltime="02:00:00"
+
+And you said you saw in PBS: 13 jobs of 24 hours and 4 jobs of 22 hours. I
+suspect this was after the script had been running a while, and many jobs had
+been completed.
+
+Based on your settings, I think you should have had at one time about 17
+coaster block jobs running, because the throttle on your coaster pool was set
+to 20 (which would cause Swift to try to run about 2000 apps at once - 2001 to
+be precise). Since each job should have requested exactly 5 nodes (based on
+your maxnodes=nodegranularity=5 setting above), Swift would have had to run 17
+jobs to accomodate 2000 apps (17 * (5*24) ) = 2040 apps. 24 comes from your
+workerspernode setting, which is a poorly-named parameter that we are renaming
+to what it really specifies: appsPerNode for concurrent application calls per
+node.
+
+I also suspect that that when this workflow started, coasters was requesting
+blocks of time closer to the 100,000 seconds that you specified for maxtime?
+(thats ~27 hours). I think the qstat snapshot you provided showed fewer than
+17 jobs and job times shorter than 27 hours (24 and 22 hours) because there
+was no longer enough apps remaining to run to require those higher values. But
+it was still going to try to run all the remaining jobs - probably fewer than
+2000 jobs remained when you run the enclosed qstat. In fact the jobs remaining
+at the time was likely less than:
+
+13*5*(24/2) + 4*5*(22/2) = 120*12 + 20*10 signifying <= 1640 jobs remaining
+
+Since the maxwalltime estimate for your app in tc.data was 2 hours, I think
+coasters will pick a wall time that is the min(time needed for jobs remaining,
+time needed for max throttle jobs based on maxtime and high/low overallocation
+settings).
+
+A note here to Swift developers: we need to first clarify the behavior of
+coasters in detail in the User Guide; then we need to build suitable templates
+that *greatly* simplify the settings and end-user parameters, and explain
+those simpler settings for use by all but the most sophisticated users with
+complex needs.
+
+We also need to do much more experimentation to see if coasters will run OK
+with far less parameter-override specification, and see if its automation and
+algorithmic intelligence will do the right thing in almost all cases.
+
+Most of the time in current use we specify overrides for almost all settings
+so that we get a precise shape and number of jobs submitted. Doing that
+assumes we know better than coasters and forces the user to understand how to
+override all the settings.
+
+Its a very interesting question, and a hard but critically important one to
+answer to make usage simpler.</tt></pre>
+</div></div>
 <div class="sect2">
 <h3 id="_for_beginners">4.1. For Beginners</h3>
 <div class="paragraph"><p>Coasters for beginners. Usage of existing, prebuilt templates.</p></div>
 </div>
 <div class="sect2">
 <h3 id="_for_intermediate_users">4.2. For Intermediate Users</h3>
-<div class="paragraph"><p>Coasters for intermediate users. Usage of gensites to generate your own sites
+<div class="paragraph"><p>Coasters for intermediate users.</p></div>
+<div class="sect3">
+<h4 id="_using_gensites">4.2.1. Using gensites</h4>
+<div class="paragraph"><p>Usage of gensites to generate your own sites
 configurations.</p></div>
 </div>
+</div>
 <div class="sect2">
 <h3 id="_for_advanced_users">4.3. For Advanced Users</h3>
 <div class="paragraph"><p>Coasters for advanced users. Getting your hands dirty.</p></div>
@@ -730,14 +1098,88 @@
 <div class="paragraph"><p>A tabular representations of highlights of different coaster setups</p></div>
 <div class="paragraph"><p>Data Management</p></div>
 </div>
+<div class="sect2">
+<h3 id="_debugging_swift">4.6. Debugging Swift</h3>
+<div class="paragraph"><p>Note: The Swift installation at /home/wilde/bigdata/swift includes a
+swift.properties file that has been modified to over-ride the defaults with
+the following property values that are useful for script debugging:</p></div>
+<div class="paragraph"><p>execution.retries=0</p></div>
+<div class="paragraph"><p>sitedir.keep=true</p></div>
+<div class="paragraph"><p>status.mode=provider</p></div>
+<div class="paragraph"><p>wrapperlog.always.transfer=true</p></div>
+<div class="admonitionblock">
+<table><tr>
+<td class="icon">
+<div class="title">Note</div>
+</td>
+<td class="content">How to clone copies of swift.properties and things to watch for
+(like the macros at the start, and unexpected properties in ~/.swift.</td>
+</tr></table>
 </div>
+<div class="paragraph"><p>Swift errors are logged in several places:</p></div>
+<div class="olist arabic"><ol class="arabic">
+<li>
+<p>
+all text from standard output and standard error produced by running the
+swift command
+</p>
+</li>
+<li>
+<p>
+The .log file from this run. It will be named swiftscript.uniqueID.log
+where "swiftscript" is the name of your *.swift script source file, and
+uniqueID is a long unique id which starts with the date and time you ran the
+swift command.
+</p>
+</li>
+<li>
+<p>
+$HOME/.globus/coasters directory on remote machines on which you are
+running coasters
+</p>
+</li>
+<li>
+<p>
+$HOME/.globus/scripts directory on the host on which you run the Swift
+command, when swift is submitting to a local scheduler (Condor, PBS, SGE,
+Cobalt)
+</p>
+</li>
+<li>
+<p>
+$HOME/.globus/??? on remote systems that you access via Globus
+</p>
+</li>
+</ol></div>
 </div>
+<div class="sect2">
+<h3 id="_problem_reporting">4.7. Problem Reporting</h3>
+<div class="paragraph"><p>When reporting problems to <a href="mailto:swift-user at ci.uchicago.edu">swift-user at ci.uchicago.edu</a>, please attach the
+following files and information:</p></div>
+<div class="paragraph"><p>tc.data and sites.xml (or whatever you named these files)
+your .swift source file and any .swift files it imports
+any external mapper scripts called by your .swift script
+all text from standard output and standard error produced by running the
+swift command
+The .log file from this run. It will be named swiftscript.uniqueID.log
+where "swiftscript" is the name of your *.swift script source file, and
+uniqueID is a long unique id which starts with the date and time you ran the
+swift command.
+The swift command line you invoked
+Any swift.properties entries you over-rode ($HOME/.swift/swift.properties,
+-config.file argument properties file, any changes to etc/swift.proerties from
+your swift distribution)
+Which swift distribution you are running (release; svn revisions; other
+local changes you mave have made or included)</p></div>
 </div>
+</div>
+</div>
+</div>
 <div id="footnotes"><hr /></div>
 <div id="footer">
 <div id="footer-text">
 Version 0.92<br />
-Last updated 2011-04-13 17:54:29 CDT
+Last updated 2011-04-15 10:21:37 CDT
 </div>
 </div>
 </body>

Modified: www/cookbook/cookbook-asciidoc.txt
===================================================================
--- www/cookbook/cookbook-asciidoc.txt	2011-04-15 15:12:56 UTC (rev 4376)
+++ www/cookbook/cookbook-asciidoc.txt	2011-04-15 15:31:59 UTC (rev 4377)
@@ -8,23 +8,62 @@
 
 Overview
 --------
-Swift cookbook overview. Goals of this cookbook. Organization of this cookbook. Benefits of cookbook.
+Swift cookbook overview. Goals of this cookbook. Organization of this
+cookbook. Benefits of cookbook.
 
-This cookbook covers various recipes involving running Swift under diverse configurations based on the application requirements and underlying infrastructures. The SwiftScript language and the Swift runtim system. For introductory material, consult the Swift tutorial.
+This cookbook covers various recipes involving running Swift under diverse
+configurations based on the application requirements and underlying
+infrastructures. The SwiftScript language and the Swift runtim system. For
+introductory material, consult the Swift tutorial.
 
-Swift is a data-oriented coarse grained scripting language that supports dataset typing and mapping, dataset iteration, conditional branching, and procedural composition.
+Swift is a data-oriented coarse grained scripting language that supports
+dataset typing and mapping, dataset iteration, conditional branching, and
+procedural composition.
 
 Swift programs (or workflows) are written in a language called SwiftScript.
 
-SwiftScript programs are dataflow oriented - they are primarily concerned with processing (possibly large) collections of data files, by invoking programs to do that processing. Swift handles execution of such programs on remote sites by choosing sites, handling the staging of input and output files to and from the chosen sites and remote execution of program code.
+SwiftScript programs are dataflow oriented - they are primarily concerned with
+processing (possibly large) collections of data files, by invoking programs to
+do that processing. Swift handles execution of such programs on remote sites
+by choosing sites, handling the staging of input and output files to and from
+the chosen sites and remote execution of program code.
 
 Swift Basics
 ------------
 
 Installation
 ~~~~~~~~~~~~
+
 Installation instructions
 
+prerequisites
+^^^^^^^^^^^^^^
+Check your Java
+Swift is a Java application. Make sure you're running Java 5 or higher. You
+can make sure you have Java in your $HOME/.soft file and the softenv system
+(http://www.ci.uchicago.edu/wiki/bin/view/Resources/Softenv) will set it up
+for you. To run Java 6:
+
+----
+$ grep java $HOME/.soft
+#+java-sun # Gives you Java 5
++java-1.6.0_03-sun-r1
+$ which java
+/soft/java-1.6.0_11-sun-r1/bin/java
+$ java -version
+java version "1.6.0_11"
+Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
+Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode)
+----
+Setting up to run Swift
+This is simple. We'll be using a version of the Swift stable SVN branch,
+compiled for this class.  Make sure you have a suitable Java set up. The examples were tested with
+Java version 1.6 Make sure you don't already have Swift in your PATH. If you do, remove it,
+or remove any +swift or @swift lines from your $HOME/.soft file. Then do: PATH=$PATH:/home/wilde/bigdata/swift/bin
+ Do NOT set SWIFT_HOME or CLASSPATH in your environment unless you fully
+understand how these will affect Swift's execution 
+
+
 Environment Setup
 ~~~~~~~~~~~~~~~~~
 
@@ -33,6 +72,16 @@
 The environment will be different when using Swift from prebuilt
 distribution and trunk.
 
+    To execute your Swift script on the login host ("localhost") use this
+command: 
+
+     swift -tc.file tc modis.swift
+
+    To execute your Swift script on the PADS cluster use this command: 
+
+     swift -tc.file tc -sites.file pbs.xml modis.swift
+
+
 Setting transformation catalog
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
@@ -42,11 +91,157 @@
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 cf
 
+Mappers
+^^^^^^^^
+SimpleMapper
+
+----
+com$ cat swiftapply.swift
+type RFile;
+trace("hi 1");
+app (RFile result) RunR (RFile rcall)
+{
+  RunR @rcall @result;
+}
+trace("hi 2");
+RFile rcalls[] ;
+RFile results[] ;
+trace("start");
+foreach c, i in rcalls {
+  trace("c",i, at c);
+  trace("r",i, at filename(results[i]));
+  results[i] = RunR(c);
+}
+com$ ls calldir resdir
+calldir:
+rcall.1.Rdata  rcall.2.Rdata  rcall.3.Rdata  rcall.4.Rdata
+resdir:
+result.1.Rdata result.2.Rdata result.3.Rdata result.4.Rdata
+com$ 
+----
+
+Notes:
+
+how the .'s match
+prefix and suffix dont span dirs
+intervening pattern must be digits
+these digits become the array indices
+explain how padding= arg works & helps (including padding=0)
+figure out and explain differences between simple_mapper and
+filesys_mapper
+FIXME: Use the "filesys_mapper" and its "location=" parameter to map the
+input data from /home/wilde/bigdata/* 
+
+Abbreviations for SingleFileMapper
+Notes:
+
+within <> you can only have a literal string as in <"filename">, not an
+expression. Someday we will fix this to make <> accept a general expression.
+you can use @filenames( ) (note: plural) to pull off a list of filenames. 
+
+writeData()
+
+example here 
+----
+$ cat writedata.swift
+type file;
+file f <"filea">;
+file nf <"filenames">;
+nf = writeData(@f);
+$ swift writedata.swift
+Swift svn swift-r3264 (swift modified locally) cog-r2730 (cog modified
+locally)
+RunID: 20100319-2002-s9vpo0pe
+Progress:
+Final status:
+$ cat filenames
+filea$ 
+$ 
+----
+
+StructuredRegexpMapper
+IN PROGRESS This mapper can be used to base the mapped filenames of an output
+array on the mapped filenames of an existing array. landuse outputfiles[]
+<structured_regexp_mapper; source=inputfiles,
+location="./output",match="(.)*tif", transform="\\1histogram">;
+
+Use the undocumented "structured_regexp_mapper" to name the output
+filenames based on the input filenames: 
+
+For example:
+
+----
+login2$ ls /home/wilde/bigdata/data/sample
+h11v04.histogram  h11v05.histogram  h12v04.histogram  h32v08.histogram
+h11v04.tif        h11v05.tif        h12v04.tif        h32v08.tif
+login2$
+
+login2$ cat regexp2.swift
+type tif;
+type mytype;
+
+tif  images[]<filesys_mapper; 
+location="/home/wilde/bigdata/data/sample", prefix="h", suffix=".tif">;
+
+mytype of[] <structured_regexp_mapper; source=images, match="(h..v..)", 
+transform="output/myfile.\\1.mytype">;
+
+foreach image, i in images {
+   trace(i, at filename(images));
+   trace(i, at filename(of[i]));
+}
+login2$
+
+login1$ swift regexp2.swift
+Swift svn swift-r3255 (swift modified locally) cog-r2723 (cog modified
+locally)
+
+RunID: 20100310-1105-4okarq08
+Progress:
+SwiftScript trace: 1, output/myfile.h11v04.mytype
+SwiftScript trace: 2, home/wilde/bigdata/data/sample/h11v05.tif
+SwiftScript trace: 3, home/wilde/bigdata/data/sample/h12v04.tif
+SwiftScript trace: 0, output/myfile.h32v08.mytype
+SwiftScript trace: 0, home/wilde/bigdata/data/sample/h32v08.tif
+SwiftScript trace: 3, output/myfile.h12v04.mytype
+SwiftScript trace: 1, home/wilde/bigdata/data/sample/h11v04.tif
+SwiftScript trace: 2, output/myfile.h11v05.mytype
+Final status:
+login1$ 
+----
+
 First SwiftScript
 ~~~~~~~~~~~~~~~~~
 Your first SwiftScript
 Hello Swift-World!
 
+A good sanity check that Swift is set up and running OK locally is this:
+
+----
+$ which swift
+
+/home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/swift
+
+$ echo 'trace("Hello, Swift world!");' >hello.swift
+
+$ swift hello.swift
+
+Swift svn swift-r3202 cog-r2682
+
+RunID: 20100115-1240-6xhzxuz3
+
+Progress:
+
+SwiftScript trace: Hello, Swift world!
+
+Final status:
+
+$ 
+----
+A good first tutorial in using Swift is at:
+http://www.ci.uchicago.edu/swift/guides/tutorial.php. Follow the steps in that
+tutorial to learn how to run a few simple scripts on the login host.
+
 second SwiftScript
 ~~~~~~~~~~~~~~~~~~~
 Your second SwiftScript
@@ -58,6 +253,97 @@
 
 Also includes a description of Swift inputs and outputs.
 
+Resuming a stopped or crashed Swift Run
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+I had a .rlog file from a Swift run that ran out of time. I kicked it off
+using the -resume flag described in section 16.2 of the Swift User Guide and
+it picked up where it left off. Then I killed it because I wanted to make
+changes to my sites file.
+
+----
+. . .
+Progress:  Selecting site:1150  Stage in:55  Active:3  Checking status:1
+Stage out:37  Finished in previous run:2462  Finished successfully:96
+Progress:  Selecting site:1150  Stage in:55  Active:2  Checking status:1
+Stage out:38  Finished in previous run:2462  Finished successfully:96
+Cleaning up...
+Shutting down service at https://192.5.86.6:54813
+Got channel MetaChannel: 1293358091 -> null
++ Done
+Canceling job 9297.svc.pads.ci.uchicago.edu
+----
+
+No new rlog file was emitted but it did recognize the progress that had been
+made, the 96 tasks that finished sucessfully above and resumed from 2558 tasks
+finished.
+
+----
+[nbest at login2 files]$ pwd
+/home/nbest/bigdata/files
+[nbest at login2 files]$
+~wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/swift \
+> -tc.file tc -sites.file pbs.xml ~/scripts/mcd12q1.swift -resume
+> mcd12q1-20100310-1326-ptxe1x1d.0.rlog
+Swift svn swift-r3255 (swift modified locally) cog-r2723 (cog modified
+locally)
+RunID: 20100311-1027-148caf0a
+Progress:
+Progress:  uninitialized:4
+Progress:  Selecting site:671  Initializing site shared directory:1  Finished
+in previous run:1864
+Progress:  uninitialized:1  Selecting site:576  Stage in:96  Finished in
+previous run:1864
+Progress:  Selecting site:1150  Stage in:94  Submitting:2  Finished in
+previous run:2558
+Progress:  Selecting site:1150  Stage in:94  Submitted:2  Finished in previous
+run:2558
+Progress:  Selecting site:1150  Stage in:93  Submitting:1  Submitted:2
+Finished in previous run:2558
+Progress:  Selecting site:1150  Stage in:90  Submitting:1  Submitted:5
+Finished in previous run:2558
+Progress:  Selecting site:1150  Stage in:90  Submitted:5  Active:1  Finished
+in previous run:2558
+----
+
+
+From Neil: A comment about that section of the user guide: It says "In order
+to restart from a restart log file, the -resume logfile argument can be used
+after the SwiftScript? program file name." and then puts the -resume logfile
+argument before the script file name. I'm sure the order doesn't matter but
+the contradiction is confusing.
+
+Notes to add (from Mike):
+
+- explain what aspects of a Swift script make it restartable, and which
+  aspects are notrestartable. Eg, if your mappers can return different data at
+different times, what happens? What other non-determinsitc behavior would
+cause unpredictable, unexpected, or undesired behavior on resumption?
+
+- explain what changes you can make in the execution environment (eg
+  increasing or reducing CPUs to run on or throttles, etc); fixing tc.data
+entries, env vars, or apps, etc.
+
+- note that resume will again retry failed app() calls. Explain if the retry
+  count starts over or not.
+
+- explain how to resume after multiple failures and resumes - i.e. if a .rlog
+  is generated on each run, which one should you resume from? Do you have a
+choice of resuming from any of them, and what happens if you go backwards to
+an older resume file?
+
+- whap happens when you kill (eg with ^C) a running swift script? Is the
+  signal caught, and the resume file written out at that point? Or written out
+all along? (Note case in which script ws running for hours, then hit ^C, but
+resume fie was short (54 bbytes) and swift shows no sign of doing a resume?
+(It silently ignored resume file instead of acknowleging that it found one
+with not useful resume state in it???) Swift should clearly state that its
+resuming and what its resume state is. 
+
++swift -resume ftdock-[id].0.rlog \[rest of the exact command line from initial
+run\]+
+
+
 Swift on ...
 ------------
 
@@ -80,6 +366,10 @@
 /lustre/beagle/ketan/labs/catsn.works to your home and follow the instructions
 in README of that folder. 
 
+Note: Running from sandbox node or requesting 1 hour walltime for upto 3 nodes
+will get fast prioritized execution. Good for small tests
+
+
 Intrepid-BG/P
 ~~~~~~~~~~~~~
 Swift on Intrepid-BG/P
@@ -95,13 +385,13 @@
 Bionimbus
 ~~~~~~~~~
 Swift on Bionimbus
-.VERY IMPORTANT TO READ THIS 
+.very important to read this 
 
-CONTAINS INFORMATION ON HOW TO CONNECT BACK TO GATEWAY FROM VIRTUAL MACHINES
-USING REVERSE SSH TUNNELING
+contains information on how to connect back to gateway from virtual machines
+using reverse ssh tunneling
 
-TO OPEN A REVERSE SSH TUNNEL DO THE FOLLOWING:
-.FROM THE GATEWAY PROMPT
+to open a reverse ssh tunnel do the following:
+.from the gateway prompt
 
 +ssh -R *:5000:localhost:5000 root at 10.101.8.50 sleep 999+
 
@@ -110,7 +400,7 @@
 localhost=the gateway host, should remain the same
 
 5000(LEFT OF localhost)=the port number on localhost to listen to **THIS WILL
-VARY DEPENDING UPON WHICH PORT YOU WANT TO LISTEN TO
+vary depending upon which port you want to listen to
 
 5000(RIGHT OF localhost)=the port on target host that you want to forward
 
@@ -132,18 +422,94 @@
 Coasters
 --------
 Describe coasters mechanisms.
-Include neat diagrams.
+**Include neat diagrams.**
 
+A nice coasters setup case-study:
+----
+Your main sites.xml coaster settings were:
 
+<execution provider="coaster" jobmanager="local:pbs"/>
+<profile namespace="globus" key="project">CI-CCR000013</profile>
+<profile namespace="globus" key="ppn">24:cray:pack</profile>
+<profile namespace="globus" key="workersPerNode">24</profile>
+<profile namespace="globus" key="maxTime">100000</profile>
+<profile namespace="globus" key="lowOverallocation">100</profile>
+<profile namespace="globus" key="highOverallocation">100</profile>
+<profile namespace="globus" key="slots">20</profile>
+<profile namespace="globus" key="nodeGranularity">5</profile>
+<profile namespace="globus" key="maxNodes">5</profile>
+<profile namespace="karajan" key="jobThrottle">20.00</profile>
+<profile namespace="karajan" key="initialScore">10000</profile>
+
+Your tc entry (shortened here) was:
+
+pbs modftdock /.../modftdock.sh null null GLOBUS::maxwalltime="02:00:00"
+
+And you said you saw in PBS: 13 jobs of 24 hours and 4 jobs of 22 hours. I
+suspect this was after the script had been running a while, and many jobs had
+been completed.
+
+Based on your settings, I think you should have had at one time about 17
+coaster block jobs running, because the throttle on your coaster pool was set
+to 20 (which would cause Swift to try to run about 2000 apps at once - 2001 to
+be precise). Since each job should have requested exactly 5 nodes (based on
+your maxnodes=nodegranularity=5 setting above), Swift would have had to run 17
+jobs to accomodate 2000 apps (17 * (5*24) ) = 2040 apps. 24 comes from your
+workerspernode setting, which is a poorly-named parameter that we are renaming
+to what it really specifies: appsPerNode for concurrent application calls per
+node.
+
+I also suspect that that when this workflow started, coasters was requesting
+blocks of time closer to the 100,000 seconds that you specified for maxtime?
+(thats ~27 hours). I think the qstat snapshot you provided showed fewer than
+17 jobs and job times shorter than 27 hours (24 and 22 hours) because there
+was no longer enough apps remaining to run to require those higher values. But
+it was still going to try to run all the remaining jobs - probably fewer than
+2000 jobs remained when you run the enclosed qstat. In fact the jobs remaining
+at the time was likely less than:
+
+13*5*(24/2) + 4*5*(22/2) = 120*12 + 20*10 signifying <= 1640 jobs remaining
+
+Since the maxwalltime estimate for your app in tc.data was 2 hours, I think
+coasters will pick a wall time that is the min(time needed for jobs remaining,
+time needed for max throttle jobs based on maxtime and high/low overallocation
+settings).
+
+A note here to Swift developers: we need to first clarify the behavior of
+coasters in detail in the User Guide; then we need to build suitable templates
+that *greatly* simplify the settings and end-user parameters, and explain
+those simpler settings for use by all but the most sophisticated users with
+complex needs.
+
+We also need to do much more experimentation to see if coasters will run OK
+with far less parameter-override specification, and see if its automation and
+algorithmic intelligence will do the right thing in almost all cases.
+
+Most of the time in current use we specify overrides for almost all settings
+so that we get a precise shape and number of jobs submitted. Doing that
+assumes we know better than coasters and forces the user to understand how to
+override all the settings.
+
+Its a very interesting question, and a hard but critically important one to
+answer to make usage simpler.
+----
+
+
 For Beginners
 ~~~~~~~~~~~~~~
 Coasters for beginners. Usage of existing, prebuilt templates.
 
 For Intermediate Users
 ~~~~~~~~~~~~~~~~~~~~~~~
-Coasters for intermediate users. Usage of gensites to generate your own sites
+Coasters for intermediate users. 
+
+Using gensites
+^^^^^^^^^^^^^^^
+Usage of gensites to generate your own sites
 configurations.
 
+
+
 For Advanced Users
 ~~~~~~~~~~~~~~~~~~~
 Coasters for advanced users. Getting your hands dirty.
@@ -171,3 +537,56 @@
 
 Data Management 
 
+Debugging Swift
+~~~~~~~~~~~~~~~~
+Note: The Swift installation at /home/wilde/bigdata/swift includes a
+swift.properties file that has been modified to over-ride the defaults with
+the following property values that are useful for script debugging:
+
+execution.retries=0
+
+sitedir.keep=true
+
+status.mode=provider
+
+wrapperlog.always.transfer=true
+
+NOTE: How to clone copies of swift.properties and things to watch for
+(like the macros at the start, and unexpected properties in ~/.swift. 
+
+Swift errors are logged in several places:
+
+. all text from standard output and standard error produced by running the
+swift command
+. The .log file from this run. It will be named swiftscript.uniqueID.log
+where "swiftscript" is the name of your *.swift script source file, and
+uniqueID is a long unique id which starts with the date and time you ran the
+swift command.
+. $HOME/.globus/coasters directory on remote machines on which you are
+running coasters
+. $HOME/.globus/scripts directory on the host on which you run the Swift
+command, when swift is submitting to a local scheduler (Condor, PBS, SGE,
+Cobalt)
+   . $HOME/.globus/??? on remote systems that you access via Globus 
+
+Problem Reporting
+~~~~~~~~~~~~~~~~~
+When reporting problems to swift-user at ci.uchicago.edu, please attach the
+following files and information:
+
+tc.data and sites.xml (or whatever you named these files)
+your .swift source file and any .swift files it imports
+any external mapper scripts called by your .swift script
+all text from standard output and standard error produced by running the
+swift command
+The .log file from this run. It will be named swiftscript.uniqueID.log
+where "swiftscript" is the name of your *.swift script source file, and
+uniqueID is a long unique id which starts with the date and time you ran the
+swift command.
+The swift command line you invoked
+Any swift.properties entries you over-rode ($HOME/.swift/swift.properties,
+-config.file argument properties file, any changes to etc/swift.proerties from
+your swift distribution)
+Which swift distribution you are running (release; svn revisions; other
+local changes you mave have made or included) 
+




More information about the Swift-commit mailing list