[Swift-commit] r4478 - in branches/release-0.92: . docs

davidk at ci.uchicago.edu davidk at ci.uchicago.edu
Mon May 16 14:29:37 CDT 2011


Author: davidk
Date: 2011-05-16 14:29:01 -0500 (Mon, 16 May 2011)
New Revision: 4478

Removed:
   branches/release-0.92/docs/Makefile
   branches/release-0.92/docs/README.txt
   branches/release-0.92/docs/build-chunked-userguide.sh
   branches/release-0.92/docs/buildguides.sh
   branches/release-0.92/docs/formatting/
   branches/release-0.92/docs/historical/
   branches/release-0.92/docs/log-processing.xml
   branches/release-0.92/docs/plot-tour.xml
   branches/release-0.92/docs/plot-tour/
   branches/release-0.92/docs/provenance.xml
   branches/release-0.92/docs/quickstartguide.xml
   branches/release-0.92/docs/reallyquickstartguide.xml
   branches/release-0.92/docs/swift-site-model.fig
   branches/release-0.92/docs/swift-site-model.png
   branches/release-0.92/docs/tutorial-live.xml
   branches/release-0.92/docs/tutorial.xml
   branches/release-0.92/docs/type-hierarchy.fig
   branches/release-0.92/docs/type-hierarchy.png
   branches/release-0.92/docs/userguide-rotated.jpeg
   branches/release-0.92/docs/userguide-shane.jpeg
   branches/release-0.92/docs/userguide.xml
Modified:
   branches/release-0.92/build.xml
Log:
First draft of updated document formats and scripts


Modified: branches/release-0.92/build.xml
===================================================================
--- branches/release-0.92/build.xml	2011-05-16 15:43:22 UTC (rev 4477)
+++ branches/release-0.92/build.xml	2011-05-16 19:29:01 UTC (rev 4478)
@@ -55,6 +55,8 @@
 
 		fixeol:
 			change newlines to the unix standard.
+                docs:
+			Build Swift documentation from asciidoc txt files
 		</echo>
 	</target>
 
@@ -260,6 +262,15 @@
 	</target>
 
 
+	<!-- ================================================ -->
+	<!-- Docs					      -->
+	<!-- ================================================ -->
+	<target name="docs">
+                <property name="docs.out" value="docs/"></property>
+		<exec executable="docs/build_docs.sh">
+			<arg value="${docs.out}"></arg>
+		</exec>
+	</target>
 
 	<!-- ================================================ -->
 	<!-- Compile                                          -->

Deleted: branches/release-0.92/docs/Makefile
===================================================================
--- branches/release-0.92/docs/Makefile	2011-05-16 15:43:22 UTC (rev 4477)
+++ branches/release-0.92/docs/Makefile	2011-05-16 19:29:01 UTC (rev 4478)
@@ -1,35 +0,0 @@
-
-# FOP = fop
-FOP = fop/fop.sh
-
-all: phps pdfs
-
-phps: userguide.php tutorial.php tutorial-live.php quickstartguide.php reallyquickstartguide.php provenance.php historical/languagespec.php historical/languagespec-0.6.php log-processing.php plot-tour.php
-
-htmls: userguide.html tutorial.html tutorial-live.html quickstartguide.html reallyquickstartguide.html provenance.html historical/languagespec.html historical/languagespec-0.6.html log-processing.html plot-tour.html
-
-pdfs: userguide.pdf tutorial.pdf tutorial-live.pdf quickstartguide.pdf reallyquickstartguide.pdf provenance.pdf historical/languagespec.pdf historical/languagespec-0.6.pdf log-processing.pdf
-
-GUIDE_PHP=$(shell find userguide -name "*.php" )
-GUIDE_HTML=$(patsubst %.php,%.html,$(GUIDE_PHP))
-
-guide_html: $(GUIDE_HTML)
-
-chunked-userguide: userguide.xml
-	./build-chunked-userguide.sh
-
-%.php: %.xml formatting/swiftsh_html.xsl
-	xsltproc --nonet formatting/swiftsh_html.xsl $<
-	sed -e "s/index.html#/#/g" index.html >$@
-	chmod a+rx $@
-
-%.pdf: %.xml formatting/vdl2_fo.xsl
-	$(FOP) -xsl formatting/vdl2_fo.xsl -xml $< -pdf $@
-	chmod a+rx $@
-
-%.html: %.php
-	cp $< $@
-	chmod a+rx $@
-
-clean:
-	rm -fv *.php userguide/*.php

Deleted: branches/release-0.92/docs/README.txt
===================================================================
--- branches/release-0.92/docs/README.txt	2011-05-16 15:43:22 UTC (rev 4477)
+++ branches/release-0.92/docs/README.txt	2011-05-16 19:29:01 UTC (rev 4478)
@@ -1,69 +0,0 @@
-Swift Documentation
-===================
-
-General principles:
-
- - sections and subsections are ordered from <sect1>, <sect2>, or using
-   arbitrary depth <section> tags
- - code samples are given inside <programlisting> tags, which will cause
-   syntax highlighting to happen automatically
- - user interactions / screen output are given inside <screen> tags.
- - be careful to escape in-text "<" to "<"
- - there are some conventions for using id attributes at various
-   places in the documents - for example, some tutorial sections use
-   'tutorial.<something>'; profile entries in the user guide use
-   'profile.<namespace>.<key>'. Try to keep id attributes unique across
-   the entire document set.
-
-The first time guides are built in a particular checkout, it is necessary
-to place the docbook formatting stylesheets under the formatting/docbook/
-directory. This can be done with a symlink if docbook is installed elsewhere.
-
-For example:
-
-A) On the CI network, /home/hategan/docbook contains a docbook installation that
-can be linked like this:
-
-$ cd formatting
-$ ln -s /home/hategan/docbook/ docbook
-
-
-B) on benc's os x machine:
-
-# install docbook from DarwinPorts
-$ sudo port install docbook-xsl
-
-# setup links
-$ cd formatting
-$ ln -s /opt/local/share/xsl/docbook-xsl/ docbook
-
-C) in general:
-
- 1) Install Apache
-
- 2) Install PHP (cf. http://dan.drydog.com/apache2php.html)
-
- 3) Add these lines to Apache's httpd.conf:
-
-    AddHandler php5-script php
-    AddType text/html       php
-
- 4) Update httpd.conf, adding index.php:
-
-    DirectoryIndex index.html index.php
-
- 5) Make sure perms are correct
-
- 6) Create formatting/docbook link (see above) 
-
- 7) Create fop link
-
-Once the links are set up, the buildguides.sh script will build all guides
-as php.  Run it with no parameters, like this:
-
-$ ./buildguides.sh
-
-or use make to get HTML documents like this:
-
-$ make userguide.html
-

Deleted: branches/release-0.92/docs/build-chunked-userguide.sh
===================================================================
--- branches/release-0.92/docs/build-chunked-userguide.sh	2011-05-16 15:43:22 UTC (rev 4477)
+++ branches/release-0.92/docs/build-chunked-userguide.sh	2011-05-16 19:29:01 UTC (rev 4478)
@@ -1,11 +0,0 @@
-#!/bin/sh
-
-mkdir -p userguide/ || exit 1
-cd userguide/ || exit 2
-rm -f *.html *.php
-cp ../*.png .
-cp ../*.jpeg .
-
-xsltproc --nonet ../formatting/swiftsh_html_chunked.xsl ../userguide.xml
-chmod a+r *.php 
-

Deleted: branches/release-0.92/docs/buildguides.sh
===================================================================
--- branches/release-0.92/docs/buildguides.sh	2011-05-16 15:43:22 UTC (rev 4477)
+++ branches/release-0.92/docs/buildguides.sh	2011-05-16 19:29:01 UTC (rev 4478)
@@ -1,5 +0,0 @@
-#!/bin/sh
-
-make all chunked-userguide
-./build-chunked-userguide.sh
-

Deleted: branches/release-0.92/docs/log-processing.xml
===================================================================
--- branches/release-0.92/docs/log-processing.xml	2011-05-16 15:43:22 UTC (rev 4477)
+++ branches/release-0.92/docs/log-processing.xml	2011-05-16 19:29:01 UTC (rev 4478)
@@ -1,162 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-
-<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook V4.2//EN" "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" [] >
-
-<article>
-	<articleinfo revision="0.1">
-		<title>Swift log processing tools</title>
-		<subtitle>Source control $LastChangedRevision$</subtitle>
-	</articleinfo>
-
-	<section>
-		<title>Overview</title>
-		<para>
-There is a package of Swift log processing utilties. 
-		</para>
-
-	</section>
-	<section><title>Prerequisites</title>
-<para>
-gnuplot 4.0, gnu m4, gnu textutils, perl
-</para>
-
-	</section>
-	<section><title>Web page about a run</title>
-		<para>
-			<screen>
-swift-plot-log /path/to/readData-20080304-0903-xgqf5nhe.log 
-			</screen>
-This will create a web page, report-readData-20080304-0903-xgqf5nhe
-If the above command is used before a run is completed, the web page will
-report information about the workflow progress so far.
-		</para>
-
-	</section>
-	<section><title>CEDPS logs</title>
-		<para>
-The log processing tools can output transition streams in
-CEDPS logging format:
-			<screen>
-swift-plot-log /path/to/readData-20080304-0903-xgqf5nhe.log execute.cedps
-			</screen>
-		</para>
-	</section>
-	<section><title>Event/transition channels</title>
-		<para>
-Various event channels are extracted from the log files and made available
-as <filename>.event</filename> and <filename>.transition</filename> files.
-These roughly correspond to processes within the Swift runtime environment.
-		</para>
-		<para>These streams are then used to provide the data for the various
-output formats, such as graphs, web pages and CEDPS log format.</para>
-<para>The available streams are:
-
-<table>
- <tgroup cols="2">
-  <thead><row><entry>Stream name</entry><entry>Description</entry></row></thead>
-  <tbody>
-    <row><entry>execute</entry><entry>Swift procedure invocations</entry></row>
-    <row><entry>execute2</entry><entry>individual execution attempts</entry></row>
-    <row><entry>kickstart</entry><entry>kickstart records (not available as transitions)</entry></row>
-    <row><entry>karatasks</entry><entry> karajan level tasks, available as transitions (there are also four substreams karatasks.FILE_OPERATION,  karatasks.FILE_TRANSFER and karatasks.JOB_SUBMISSION available as events but not transitions)</entry></row>
-    <row><entry>workflow</entry><entry>a single event representing the entire workflow</entry></row>
-    <row><entry>dostagein</entry><entry>stage-in operations for execute2s</entry></row>
-    <row><entry>dostageout</entry><entry>stage-out operations for execute2s</entry></row>
-  </tbody>
- </tgroup>
-</table>
-
-</para>
-<para>
-Streams are generated from their source log files either as .transitions
-or .event files, for example by <literal>swift-plot-log whatever.log foo.event</literal>.
-</para>
-<para>
-Various plots are available based on different streams:
-
-<table>
- <tgroup cols="2">
-  <thead><row><entry>Makefile target</entry><entry>Description</entry></row></thead>
-  <tbody>
-    <row><entry>foo.png</entry><entry>Plots the foo event stream</entry></row>
-    <row><entry>foo-total.png</entry><entry>Plots how many foo events are in progress at any time</entry></row>
-    <row><entry>foo.sorted-start.png</entry><entry>Plot like foo.png but ordered by start time</entry></row>
-  </tbody>
- </tgroup>
-</table>
-
-</para>
-<para>
-Text-based statistics are also available with <literal>make foo.stats</literal>.
-</para>
-<para>
-Event streams are nested something like this:
-
-<screen>
-workflow
-  execute
-    execute2
-      dostagein
-        karatasks (fileops and filetrans)
-      clustering (optional)
-        karatasks (execution)
-          cluster-log (optional)
-            wrapper log (optional)
-              kickstart log
-      dostageout
-        karatasks (fileops and filetrans)
-</screen>
-
-</para>
-	</section>
-	<section><title>Internal file formats</title>
-<para>The log processing code generates a number of internal files that
-follow a standard format. These are used for communication between the
-modules that parse various log files to extract relevant information; and
-the modules that generate plots and other summary information.</para>
-<screen>
-need an event file format of one event per line, with that line
-containing start time and duration and other useful data.
-
-col1 = start, col2 = duration, col3 onwards = event specific data - for
-some utilities for now should be column based, but later will maybe
-move to attribute based.
-
-between col 1 and col 2 exactly one space
-between col 2 and col 3 exactly one space
-
-start time is in seconds since unix epoch. start time should *not* be
-normalised to start of workflow
-
-event files should not (for now) be assumed to be in order
-
-different event streams can be stored in different files. each event
-stream should use the extension  .event
-</screen>
-
-<screen>
-.coloured-event files
-=====================
-third column is a colour index
-first two columns as per .event (thus a coloured-event is a specific
-form of .event)
-</screen>
-
-	</section>
-
-	<section><title>hacky scripts</title>
-<para>There are a couple of hacky scripts that aren't made into proper
-commandline tools. These are in the libexec/log-processing/ directory:
-
-<screen>
-  ./execute2-status-from-log [logfile]
-     lists every (execute2) job and its final status
-
-  ./execute2-summary-from-log [logfile]
-     lists the counts of each final job status in log
-</screen>
-</para>
-	</section>
-</article>
-
-

Deleted: branches/release-0.92/docs/plot-tour.xml
===================================================================
--- branches/release-0.92/docs/plot-tour.xml	2011-05-16 15:43:22 UTC (rev 4477)
+++ branches/release-0.92/docs/plot-tour.xml	2011-05-16 19:29:01 UTC (rev 4478)
@@ -1,311 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-
-<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook V4.2//EN" "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" [] >
-
-<article>
-	<articleinfo revision="$LastChangedRevision$">
-		<title>Swift log plotting and the some internal mechanics of Swift</title>
-	</articleinfo>
-
-	<section id="overview">
-		<title>Overview</title>
-<para>
-This document attempts to explain some of the meaning of the Swift
-log-processing plots, giving an explanation of how some of Swift's
-execution mechanism works and of some of the terminology used.
-</para>
-	</section>
-
-<section id="execute"><title>'execute' - SwiftScript app {} block invocations</title>
-
-<para>
-When a SwiftScript program invokes a application procedure (one with an
-app {} block), an 'execute' appears in the log file in START state. When
-all attempts at execution have finished (either successfully or unsuccessfully)
-then the execute will go into END_SUCCESS or END_FAILURE state. A workflow
-is successful if and only if all invocations end in END_SUCCESS.
-</para>
-
-<para>
-The execute states represent progress
-through the karajan procedure defined in
-<filename>libexec/execute-default.k</filename>.
-</para>
-<para>State changes for execute logs are defined by karajan log calls throughout
-this file.
-</para>
-<inlinemediaobject><imageobject><imagedata fileref="execute.png"></imagedata></imageobject></inlinemediaobject>
-
-<para>An execute consists of multiple attempts to perform
-<link linkend="execute2">execute2</link>s, with retries and replication
-as appropriate. Retries and replication are not exposed through the states
-of 'execute's.
-</para>
-<para>
-Executes are uniquely identified within a run by their karajan thread ID,
-which is present in the log files as the thread= parameter on execute
-log messsages.
-</para>
-
-<para>
-Here is a simple SwiftScript program which runs a foreach loop (<filename>few.swift</filename>):
-<programlisting>
-p() { 
-    app {
-        sleep "10s";
-    }
-}
-
-foreach i in [1:8] {
-    p();
-}
-</programlisting>
-
-</para>
-
-<para>
-Using the <command>swift-plot-log</command> from the log processing module,
-this graph gets generated to summarise execute state transitions:
-</para>
-<para>
-<inlinemediaobject><imageobject><imagedata fileref="plot-tour/pregenerated/execute.png"></imagedata></imageobject></inlinemediaobject>
-</para>
-<para>
-In this graph, the forloop calls p() eight times. Because there are no
-dependencies between those eight invocations, they are all invoked at the same
-time, around 1s into the run. This is show on the graph by the JOB_START line
-going from zero up to eight at around x=1s. As time passes, the sleep jobs
-complete, and as they do so the number of jobs in END_SUCCESS state increases.
-When all eight jobs are in END_SUCCESS state, the run is over.
-</para>
-<para>Here is a program with some data dependencies between invocations (<filename>dep.swift</filename>):
-
-<programlisting>
-$ cat dep.swift 
-type file;
-
-p(file f) { 
-    app {
-        sleep "10s";
-    }
-}
-
-(file o) q() {
-    app {
-        touch @o;
-    }
-}
-
-file intermediate = q();
-p(intermediate);
-</programlisting>
-
-</para>
-
-<para>
-Here is a plot of the execute states for this program:
-</para>
-<para><inlinemediaobject><imageobject><imagedata fileref="plot-tour/pregenerated/execute-dep.png"></imagedata></imageobject></inlinemediaobject>
-</para>
-<para>
-In this run, one invocation starts (q()) fairly quickly, 
-but the other invocation (of p()) does not - instead, it does not start until
-approximately the time that the q() invocation has reached END_SUCCESS. 
-</para>
-
-<para>
-Finally in this section on 'execute', here is a demonstration of how the above
-two patterns fit together in one program (<filename>few2.swift</filename>:
-<programlisting>
-type file;
-
-(file o) p(file i) { 
-    app {
-        sleepcopy @i @o;
-    }
-}
-
-file input <"input">;
-file output[];
-
-foreach i in [1:8] {
-    file intermediate;
-    intermediate = p(input);
-    output[i] = p(intermediate);
-}
-</programlisting>
-</para>
-
-
-<para>
-In total the program has 16 invocations of p(), dependent on each other in
-pairs. The dependencies can be plotted like this:
-
-<screen>
-$ <userinput>swift -pgraph few2.dot few2.swift</userinput>
-$ dot -Tpng -o few2.png few2.dot 
-</screen>
-
-yielding this graph:
-</para>
-
-<para><inlinemediaobject><imageobject><imagedata fileref="plot-tour/pregenerated/few2.png"></imagedata></imageobject></inlinemediaobject> </para>
-
-<para>
-When this program is run, the first row of 8 invocations can all start at the
-beginning of the program, because they have no dependencies (aside from on
-the input file). This can be seen around t=4 when the start line jumps up to 8.
-The other 8 invocations can only begin when the invocations they are dependent
-on have finished. This can be seen in the graph - every time one of the first
-invocations reaches END_SUCCESS, a new invocation enters START.
-</para>
-
-<para><inlinemediaobject><imageobject><imagedata fileref="plot-tour/pregenerated/execute-many-dep.png"></imagedata></imageobject></inlinemediaobject> </para>
-
-</section>
-<section id="execute2"><title>execute2 - one attempt at running an execute</title>
-<para>
-An execute2 is one attempt to execute an app procedure. execute2s are invoked
-by <link linkend="execute">execute</link>, once for each retry or replication
-attempt.
-</para>
-<para>The states of an execute2 represent progress through the execute2 karajan
-procedure defined in <filename>libexec/vdl-int.k</filename>
-</para>
-<inlinemediaobject><imageobject><imagedata fileref="execute2.png"></imagedata></imageobject></inlinemediaobject>
-<para>
-Before an execute2 makes its first state log entry, it chooses a site to run on.
-Then at the start of file stage-in, the execute2 goes into THREAD_ASSOCIATION
-state. Once stagein is completed, the JOB_START state is entered, indicating
-that execution of the job executable will now be attempted. Following that,
-STAGING_OUT indicates that the output files are being staged out. If everything
-is completed successfully, the job will enter JOB_END state.
-</para>
-<para>There are two exceptions to the above sequence: JOB_CANCELLED indicates that
-the replication mechanism has cancelled this job because a different execute2
-began actual execution on a site for the same execute. APPLICATION_EXCEPTION
-indicates that there was an error somewhere in the attempt to stage in,
-actually execute or stage out. If a job goes into APPLICATION_EXCEPTION state
-then it will generally be retried (up to a certain number of times defined
-by the "execution.retries" parameter) by the containing <link linkend="execute">execute</link>.
-</para>
-
-<para>
-In this example, we use a large input file to slow down file staging so that
-it is visible on an execute2 graph (<filename>big-files.swift</filename>):
-<programlisting>
-type file;  
-  
-(file o) p(file i) {   
-    app {  
-        sleepcopy @i @o;  
-    }  
-}  
-  
-file input <"biginput">;  
-file output[];  
-  
-foreach i in [1:8] {  
-    output[i] = p(input);  
-}  
-</programlisting>
-</para>
-
-<para>
-<inlinemediaobject><imageobject><imagedata fileref="plot-tour/pregenerated/execute2.png"></imagedata></imageobject></inlinemediaobject></para>
-
-<para>
-There is an initial large input file that must be staged in. This causes the first
-jobs to be in stagein state for a period of time (the space between the
-ASSOCIATED and JOB_START lines at the lower left corner of the graph). All
-invocations share a single input file, so it is only staged in once and
-shared between all subsequent invocations - once the file has staged in at the
-start, there is no space later on between the ASSOCIATED and JOB_START lines
-because of this.
-</para>
-<para>
-Conversely, each invocation generates a large output file without there being
-any sharing. Each of those output files must be staged back to the submit
-side, which in this application takes some time. This can be seen by the large
-amount of space between the STAGING_OUT and JOB_END lines.
-</para>
-<para>
-The remaining large space on the graph is between the JOB_START and STAGING_OUT
-lines. This represents the time taken to queue and execute the application
-executable (and surrounding Swift worker-side wrapper, which can sometimes
-have non-negligable execution times - this can be seen in the
-<link linkend="info">info section</link>).
-</para>
-
-</section>
-
-<section id="info"><title>wrapper info logs</title>
-<para>
-When a job runs, it is wrapped by a Swift shell script on the remote site that
-prepares the job environment, creating a temporary directory and moving
-input and output files around. Each wrapper invocation corresponds to a single
-application execution. For each invocation of the wrapper, a log file is created.
-Sometimes that log file is moved back to the submission side (when there is
-an error during execution, or when the setting 
-<ulink url="http://www.ci.uchicago.edu/swift/guides/userguide.php#engineconfiguration">wrapper.always.transfer</ulink>=true
-is set) and placed in a <filename>*.d/</filename> directory corresponding in
-name to the main log file.
-</para>
-
-<inlinemediaobject><imageobject><imagedata fileref="info.png"></imagedata></imageobject></inlinemediaobject>
-<para>The states of the info logs represent progress through the wrapper
-script, <filename>libexec/wrapper.sh</filename>.
-</para>
-
-<para>
-For the same run of <filename>big-files.swift</filename> as shown in the
-<link linkend="execute2">execute2 section</link>, here is a plot of states
-in wrapper info log files:
-</para>
-
-<para>
-<inlinemediaobject><imageobject><imagedata fileref="plot-tour/pregenerated/info.png"></imagedata></imageobject></inlinemediaobject></para>
-
-<para>
-The trace lines on this graph fit entirely within the space between JOB_START 
-and STAGING_OUT on the corresponding execute2 graph, because the Swift worker node
-wrapper script does not run until the submit side of Swift has submitted a
-job for execution and that job has begun running.
-</para>
-
-<para>
-Many of the lines on this plot are very close together, because many of the
-operations take minimal time. The main space between lines is between
-EXECUTE and EXECUTE_DONE, where the actual application executable is executing;
-and between COPYING_OUTPUTS and RM_JOBDIR, where the large output files are
-copied from a job specific working directory to the site-specific shared
-directory. It is quite hard to distinguish on the graph where overlapping
-lines are plotted together.
-</para>
-
-<para>
-Note also that minimal time is spent copying input files into the job-specific
-directory in the wrapper script; that is because in this run, the wrapper
-script is using the default behaviour of making symbolic links in the job-specific
-directory; symbolic links are usually cheap to create compared to copying file
-content. However, if the <ulink url="http://www.ci.uchicago.edu/swift/guides/userguide.php#envvars">SWIFT_JOBDIR_PATH</ulink> parameter is set, then Swift will
-copy the input file to the specified job directory instead of linking. This
-will generally result in much more time being spent preparing the job directory
-in the Swift wrapper, but in certain circumstances this time is overwhelmingly
-offset by increased performance of the actual application executable (so on
-this chart, this would be seen as an increased job directory preparation time,
-but a reduced-by-more application executable time).
-</para>
-
-
-</section>
-
-<section><title>Relation of logged entities to each other</title>
-<para>Here is a simple diagram of how some of the above log channels along
-with other pieces fit together:</para>
-<inlinemediaobject><imageobject><imagedata fileref="logrelations.png"></imagedata></imageobject></inlinemediaobject>
-
-</section>
-
-</article>
-

Deleted: branches/release-0.92/docs/provenance.xml
===================================================================
--- branches/release-0.92/docs/provenance.xml	2011-05-16 15:43:22 UTC (rev 4477)
+++ branches/release-0.92/docs/provenance.xml	2011-05-16 19:29:01 UTC (rev 4478)
@@ -1,2592 +0,0 @@
-<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook V4.2//EN" "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" [] >
-
-<!-- build this doc by symlinking to it form your swift/docs directory
-and typing make provenance.php -->
-
-<article>
-<title>provenance working notes, benc</title>
-<screen>$Id$</screen>
-<section><title>Goal of this present work</title>
-<para>
-The goal of the work described in this document is to investigate
-<emphasis>retrospective provenance</emphasis> and
-<emphasis>metadata handling</emphasis> in Swift, with an emphasis
-on effective querying of the data, rather than on collection of the data.
-</para>
-
-<para>
-The motivating examples are queries of the kinds discussed in section 4 of
-<ulink url="http://www.ci.uchicago.edu/swift/papers/VirtualDataProvenance.pdf"
->'Applying the Virtual Data Provenance Model'</ulink>;
-the queries and metadata in the <ulink url="http://twiki.ipaw.info/bin/view/Challenge/FirstProvenanceChallenge"
->First Provenance Challenge</ulink>; and the metadata database used by
-i2u2 cosmic.
-</para>
-
-<para>
-I am attempting to scope this so that it can be implemented in a few
-months; more expensive features, though desirable, are relegated the the
-'What this work does not address' section. Features which appear fairly
-orthogonal to the main aims are also omitted.
-</para>
-
-<para>
-This document is a combination of working notes and on-going status report
-regarding my provenance work; as such its got quite a lot of opinion in it,
-some of it not justified in the text.
-</para>
-
-</section>
-<section id="owndb"><title>Running your own provenance database</title>
-<para>This section details running your own SQL-based provenance database on
-servers of your own control.</para>
-
-<section><title>Check out the latest SVN code</title>
-
-<para>
-Use the following command to check out the <literal>provenancedb</literal>
-module:
-
-<screen>
-svn co https://svn.ci.uchicago.edu/svn/vdl2/provenancedb                      
-</screen>
-</para>
-
-</section>
-
-
-<section><title>Configuring your SQL database</title>
-<para>
-Follow the instructions in one of the following sections, to configure your
-database either for sqlite3 or for postgres.
-</para>
-<section><title>Configuring your sqlite3 SQL database</title>
-<para>
-This section describes configuring the SQL scripts to use 
-<ulink url="http://www.sqlite.org/">sqlite</ulink>, which is
-appropriate for a single-user installation.
-</para>
-<para>Install or find sqlite3. On
-<literal>communicado.ci.uchicago.edu</literal>, it is installed and can be
-accessed by adding the line <literal>+sqlite3</literal> to your ~/.soft file
-and typing <literal>resoft</literal>. Alternatively, on OS X with MacPorts, this command works:
-<screen>
-$ <userinput>sudo port install sqlite3</userinput>
-</screen>
-Similar commands using <literal>apt</literal> or <literal>yum</literal> will
-probably work under Linux.
-</para>
-<para>
-In the next section, you will create a <literal>provenance.config</literal>
-file. In that, you should configure the use of sqlite3 by specifying:
-<screen>
-export SQLCMD="sqlite3 provdb "
-</screen>
-(note the trailing space before the closing quote)
-</para>
-</section>
-
-<section><title>Configuring your own postgres 8.3 SQL database</title>
-<para>
-This section describes configuring a postgres 8.3 database, which is
-appropriate for a large installation (where large means lots of log
-files or multiple users)
-</para>
-<para>
-First install and start postgres as appropriate for your platform
-(using <command>apt-get</command> or <command>port</command> for example).
-</para>
-<para>
-As user <literal>postgres</literal>, create a database:
-<screen>
-$ <userinput>/opt/local/lib/postgresql83/bin/createdb provtest1</userinput>
-</screen>
-</para>
-<para>
-Check that you can connect and see the empty database:
-<screen>
-$ <userinput>psql83 -d provtest1 -U postgres</userinput>
-Welcome to psql83 8.3.6, the PostgreSQL interactive terminal.
-
-Type:  \copyright for distribution terms
-       \h for help with SQL commands
-       \? for help with psql commands
-       \g or terminate with semicolon to execute query
-       \q to quit
-
-provtest1=# <userinput>\dt</userinput>
-No relations found.
-provtest1=# <userinput>\q</userinput>
-</screen>
-</para>
-<para>
-In the next section, when configuring <literal>provenance.config</literal>,
-specify the use of postgres like this:
-<screen>
-export SQLCMD="psql83 -d provtest1 -U postgres "
-</screen>
-Note the trailing space before the final quote. Also, note that if you
-fiddled the above test command line to make it work, you will have to make
-similar fiddles in the <literal>SQLCMD</literal> configuration line.
-</para>
-</section>
-</section>
-
-<section><title>Import your logs</title>
-<para>
-Now create a <filename>etc/provenance.config</filename> file to define local
-configuration. An example that I use on my laptop is present in
-<filename>provenance.config.soju</filename>.
-The <literal>SQLCMD</literal> indicates which command to run for SQL
-access. This is used by other scripts to access the database. The
-<literal>LOGREPO</literal> and <literal>IDIR</literal> variables should
-point to the directory under which you collect your Swift logs.
-</para>
-<para>
-Now import your logs for the first time like this:
-<screen>
-$ <userinput>./swift-prov-import-all-logs rebuild</userinput>
-</screen>
-</para>
-
-</section>
-
-<section><title>Querying the newly generated database</title>
-<para>
-You can use <command>swift-about-*</command> commands, described in
-the <link linkend="commands">commands section</link>.
-</para>
-<para>
-If you're using the SQLite database, you can get an interactive SQL
-session to query your new provenance database like this:
-<screen>
-$ <userinput>sqlite3 provdb</userinput>
-SQLite version 3.6.11
-Enter ".help" for instructions
-Enter SQL statements terminated with a ";"
-sqlite> 
-</screen>
-
-</para>
-</section>
-
-</section>
-
-<section id="commands"><title>swift-about-* commands</title>
-<para>There are several swift-about- commands:
-</para>
-<para>swift-about-filename - returns the global dataset IDs for the specified
-filename. Several runs may have output the same filename; the provenance
-database cannot tell which run (if any) any file with that name that
-exists now came from.
-</para>
-<para>Example: this looks for information about
-<filename>001-echo.out</filename> which is the output of the first
-test in the language-behaviour test suite:
-<screen>
-$ <userinput>./swift-about-filename 001-echo.out</userinput>
-Dataset IDs for files that have name file://localhost/001-echo.out
- tag:benc at ci.uchicago.edu,2008:swift:dataset:20080114-1353-g1y3moc0:720000000001
- tag:benc at ci.uchicago.edu,2008:swift:dataset:20080107-1440-67vursv4:720000000001
- tag:benc at ci.uchicago.edu,2008:swift:dataset:20080107-2146-ja2r2z5f:720000000001
- tag:benc at ci.uchicago.edu,2008:swift:dataset:20080107-1608-itdd69l6:720000000001
- tag:benc at ci.uchicago.edu,2008:swift:dataset:20080303-1011-krz4g2y0:720000000001
- tag:benc at ci.uchicago.edu,2008:swift:dataset:20080303-1100-4in9a325:720000000001
-</screen>
-Six different datasets in the provenance database have had that filename
-(because six language behaviour test runs have been uploaded to the
-database).
-</para>
-
-<para>swift-about-dataset - returns information about a dataset, given
-that dataset's uri. Returned information includes the IDs of a containing
-dataset, datasets contained within this dataset, and IDs for executions
-that used this dataset as input or output.
-</para>
-<para>Example:
-<screen>
-$ <userinput>./swift-about-dataset tag:benc at ci.uchicago.edu,2008:swift:dataset:20080114-1353-g1y3moc0:720000000001</userinput>
-About dataset tag:benc at ci.uchicago.edu,2008:swift:dataset:20080114-1353-g1y3moc0:720000000001
-That dataset has these filename(s):
- file://localhost/001-echo.out
-
-That dataset is part of these datasets:
-
-That dataset contains these datasets:
-
-That dataset was input to the following executions (as the specified named parameter):
-
-That dataset was output from the following executions (as the specified return parameter):
- tag:benc at ci.uchicago.edu,2008:swiftlogs:execute:001-echo-20080114-1353-n7puv429:0                                                | t     
-</screen>
-This shows that this dataset is not part of a more complicated dataset
-structure, and was produced as an output parameter t from an execution.
-</para>
-<para>swift-about-execution - gives information about an execution, given
-an execution ID
-<screen>
-$ <userinput>./swift-about-execution tag:benc at ci.uchicago.edu,2008:swiftlogs:execute:001-echo-20080114-1353-n7puv429:0</userinput>
-About execution tag:benc at ci.uchicago.edu,2008:swiftlogs:execute:001-echo-20080114-1353-n7puv429:0
-                                                                id                                                                |   starttime    |     duration      |                                                            finalstate                                                            |                                                               app                                                                |                                                             scratch                                                              
-----------------------------------------------------------------------------------------------------------------------------------+----------------+-------------------+----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------
- tag:benc at ci.uchicago.edu,2008:swiftlogs:execute:001-echo-20080114-1353-n7puv429:0                                                | 1200318839.393 | 0.743000030517578 | 0                                                                                                                                | END_SUCCESS                                                                                                                      | echo                                                                                                                            
-(1 row)
-</screen>
-This shows some basic information about the execution - the start time,
-the duration, the name of the application, the final status.
-</para>
-
-</section>
-
-
-<section><title>What this work does not address</title>
-
-<para>This work explicitly excludes a number of uses which traditionally
-have been associated with the VDS1 Virtual Data Catalog - either as real
-or as imagined functionality.</para>
-
-<para>
-Much of this is open to debate; especially regarding which features are the
-most important to implement after the first round of implementation has
-occurred.
-</para>
-
-<variablelist>
-
-<varlistentry>
-<term>Namespaces and versioning</term>
-<listitem> <para>
-the need for these is somewhat orthogonal to the work here.
-</para>
-<para>Namespaces and versions provide a richer identifier but don't
-fundamentally change the nature of the identifier.</para>
-<para>so for now I mostly ignore as they are
-(I think) fairly straightforward drudgework to implement, rather than
-being fundamentally part of how queries are formed. Global namespaces
-are used a little bit for identifying datasets between runs (see tag URI
-section)
-</para> </listitem>
-</varlistentry>
-
-<varlistentry>
-<term>Prospective provenance</term>
-<listitem> <para>
-SwiftScript source programs don't have as
-close a similarity to their retrospective structure as in VDL1, so a bunch
-of thought required here. Is this required? Is it different from the
-SwiftScript program-library point?
-</para> </listitem>
-</varlistentry>
-<varlistentry> <term>A database of all logged information</term>
-<listitem>
-<para>though it would be interesting
-to see what could be done there. straightforward to import eg.
-.event and/or .transition files from log parsers into the DB.
-</para>
-</listitem>
-</varlistentry>
-
-<varlistentry> <term>Replica management</term>
-<listitem>
-<para>
-No specific replica location or management support. However see sections
-on general metadata handling (in as much as general metadata can support
-replica location as a specific metadata usecase); and also the section on
-global naming in the to-be-discussed section. This ties in with the
-Logical File Names concept somehow.
-</para>
-</listitem>
-</varlistentry>
-
-
-<varlistentry> <term>A library for SwiftScript code</term>
-<listitem>
-<para>need better uses for this and
-some indication that a more conventional version control system is
-not more appropriate.
-</para>
-<para>
-Also included in this exclusion is storage of type definitions.
-Its straightforward to store type names; but the definitions are
-per-execution. More usecases would be useful here to figure out what sort
-of query people want to make.
-</para>
-</listitem>
-</varlistentry>
-
-<varlistentry> <term>Live status updates of in-progress workflows</term>
-<listitem>
-<para>
-this may happen if data goes
-into the DB during run rather than at end (which may or may not happen).
-also need to deal with slightly different data - for example, execute2s
-that ran but failed (which is not direct lineage provenance?)
-</para>
-<para>
-so - one successful invocation has: one execute, one execute2 (the most recent),
-and maybe one kickstart record. it doesn't track execute2s and kickstarts for
-failed execution attempts (nor, perhaps, for failed workflows at all...)
-</para>
-</listitem>
-</varlistentry>
-
-<varlistentry>
-<term>Deleting or otherwise modifying provenance data</term>
-<listitem>
-<para>
-Deleting or otherwise modifying provenance data. Though deleting/modifying
-other metadata should be taken into account.
-</para>
-</listitem>
-</varlistentry>
-
-<varlistentry> <term>Security</term>
-<listitem>
-<para>There are several approaches
-here. The most native approach is to use the security model of the
-underlying database (which will vary depending on which database is used).
-</para>
-<para>This is a non-trivial area, especially to do with any richness.
-Trust relationships between the various parties accessing the database
-should be taken into account.
-</para>
-</listitem>
-</varlistentry>
-
-<varlistentry><term>A new metadata or provenance query language</term>
-<listitem>
-<para>Designing a (useful - i.e. usable and performing) database
-and query language is a non-trivial exercise (on the order of years).
-</para>
-<para>
-For now, use existing query languages and their implementations. Boilerplate
-queries can be developed around those languages.
-</para>
-<para>
-One property of this is that there will not be a uniform query language for
-all prototypes. This is contrast to the VDS1 VDC which had a language which
-was then mapped to at least SQL and perhaps some XML query language too.
-</para>
-<para>
-An intermediate / alternative to something language-like is a much more
-tightly constrained set of library / template queries with a very constrained
-set of parameters.
-</para>
-<para>
-Related to this is the avoidance as much as possible of mixing models; so that
-one query language is needed for part of a query, and another query language
-is needed for another part of a query. An example of this in practice is the
-storage of XML kickstart records as blobs inside a relational database in the
-VDS1 VDC. SQL could be used to query the containing records, whilst an
-XML query language had to be used to query inner records. No benefit could
-be derived there from query language level joining and query optimisation;
-instead the join had to be implemented poorly by hand.
-</para>
-</listitem>
-</varlistentry>
-
-<varlistentry><term>An elegant collection mechanism for provenance or
-metadata</term>
-<listitem> <para>
-The prototypes here collect their information through log stripping. This
-may or may not be the best way to collect the data. For example, hooks
-inside the code might be a better way.
-</para></listitem>
-</varlistentry>
-
-</variablelist>
-</section>
-
-<section><title>Data model</title>
-<section><title>Introduction to the data model</title>
-<para>
-All of the prototypes use a basic data model that is strongly
-related to the structure of data in the log files; much of the naming here
-comes from names in the log files, which in turn often comes from source
-code procedure names.
-</para>
-<para>
-The data model consists of the following data objects:
-</para>
-<para>execute - an execute represents a procedure call in a
-SwiftScript program.
-</para>
-<para>execute2 - an execute2 is an attempt to actually execute an
-'execute' object.</para>
-<para>dataset - a dataset is data used by a Swift program. this might be
-a file, an array, a structure, or a simple value.</para>
-<para>workflow - a workflow is an execution of an entire SwiftScript
-program</para>
-</section>
-
-<section><title>execute</title>
-
-<para>
-<firstterm>execute</firstterm> - an 'execute' is an execution of a
-procedure call in a SwiftScript program. Every procedure call in a
-SwiftScript program corresponds to either
-one execute (if the execution was attempted) or zero (if the workflow was
-abandoned before an execution was attempted). An 'execute' may encompass
-a number of attempts to run the appropriate procedure, possibly on differnt
-sites. Those attempts are contained within an execute as execute2 entities.
-Each execute is related to zero or more datasets - those passed as inputs
-and those that are produced as outputs.
-</para>
-</section>
-<section><title>execute2</title>
-<para>
-<firstterm>execute2</firstterm> - an 'execute2' is an attempt to run a
-program on some grid site. It consists of staging in input files, running the
-program, and staging out the output files. Each execute2 belongs to exactly
-one execute. If the database is storing only successful workflows and
-successful executions, then each execute will be associated with
-exactly one execute2. If storing data about unsuccessful workflows or
-executions, then each execute may have zero or more execute2s.
-</para>
-</section>
-<section><title>dataset</title>
-<para>
-A dataset represents data within a
-SwiftScript program. A dataset can be an array, a structure, a file or
-a simple value. Depending on the nature of the dataset it may have some
-of the following attributes: a value (for example, if the dataset
-represents an integer); a filename (if the dataset represents a file);
-child datasets (if the dataset represents a structure or an array); and
-parent dataset (if the dataset is contained with a structure or an
-array).
-</para>
-
-<para>
-At present, each dataset corresponds to exactly one in-memory DSHandle
-object in the Swift runtime environment; however this might not continue
-to be the case - see the discussion section on cross-dataset run
-identification.
-</para>
-
-<para>Datasets may be related to executes, either as datasets
-taken as inputs by an execute, or as datasets produced by an execute.
-A dataset may be produced as an output by at most one execute. If it is not
-produced by any execute, it is an <firstterm>input to the workflow</firstterm>
-and has been produced through some other mechanism. Multiple datasets may
-have the same filename - for example, at present, each time the same file
-is used as an input in different workflows, a different dataset appears in
-the database. this might change. multiple workflows might (and commonly do)
-output files with the same name. at present, these are different datasets,
-but are likely to remain that way to some extent - if the contents of files
-is different then the datasets should be regarded as distinct.
-</para>
-</section>
-<section><title>workflow</title>
-<para>
-<firstterm>workflow</firstterm> - a 'workflow' is an execution of an
-entire SwiftScript program. Each execute belongs to exactly one workflow. At
-present, each dataset also belongs to exactly one workflow (though the
-discussion section talks about how that should not necessarily be the case).
-</para>
-</section>
-
-<para>TODO: diagram of the dataset model (similar to the one in the
-provenance paper but probably different). design so that in the XML
-model, the element containment hierarchies can be easily marked in a
-different colour</para>
-</section>
-
-<section><title>Prototype Implementations</title>
-<para>
-I have made a few prototype implementations to explore ways of storing
-and querying provenance data.
-</para>
-<para>
-The basic approach is: build on the log-processing code, which knows how
-to pull out lots of information from the log files and store it in a
-structured text format; extend Swift to log additional information as
-needed;
-write import code which knows
-how to take the log-processing structured files and put them into whatever
-database/format is needed by the particular prototype.
-</para>
-<para>
-If it is desirable to support more than one of these storage/query mechanisms
-(perhaps because
-they have unordered values of usability vs query expessibility) then
-perhaps should be core provenance output code which is somewhat
-agnostic to storage system (equivalent to the post-log-processing
-text files at the moment) and then some relatively straightforward set
-of importers which are doing little more than syntax change
-(cf. it was easy to adapt the SQL import code to make
-prolog code instead)
-</para>
-
-<para>
-The script <command>import-all</command> will import into the
-basic SQL and eXist XML databases.
-</para>
-
-<section><title>Relational, using SQL</title>
-<para>There are a couple of approaches based around relational databases
-using SQL. The plain SQL approach allows many queries to be answered, but
-does provide particularly easy querying for the transitive relations
-(such as the 'preceeds' relation mentioned elsewhere); ameliorating this 
-problem is point of the second model.
-</para>
-<section>
-<title>Plain SQL</title>
-<para>In this model, the provenance model is mapped to a relational
-schema, stored in sqlite3 and queried with SQL.
-</para>
-
-<para>
-This prototype uses sqlite3 on my laptop. The <command>import-all</command>
-will initialise and import into this database (and also into the XML DB).
-</para>
-
-<para>
-example query - counts how many of each procedure have been called.
-<screen>
-sqlite> select procedure_name, count(procedure_name) from executes, invocation_procedure_names where executes.id = invocation_procedure_names.execute_id group by procedure_name;
-align|4
-average|1
-convert|3
-slicer|3
-</screen>
-</para>
-<para>
-needs an SQL database. sqlite is easy to get (from standard OS software
-repos, and from globus toolkit) so this is not as horrible as it seems. 
-setup requirements for sqlite are minimal.
-</para>
-<para>
-metadata: one way is to handle them as SQL relations. this allows them
-to be queried using SQL quite nicely, and to be indexed and joined on
-quite easily.
-</para>
-<para>
-prov query 1:Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc.
-</para>
-
-<section><title>Description of tables</title>
-<para>
-Executions are stored in a table called 'executes'. Each execution has the fields: id - a globally unique ID for that execution; starttime - the start time
-of the execution attempt, in seconds since the unix epoch (this is roughly
-the time that swift decides to submit the task, *not* the time that a worker
-node started executing the task); duration - in seconds (time from start time
-to swift finally finishing the execution, not the actual on-worker execution
-time); final state (is likely to always be END_SUCCESS as the present import
-code ignores failed tasks, but in future may include details of failures;
-app - the symbolic name of the application
-</para>
-<para>
-Details of datasets are stored in three tables: dataset_filenames,
-dataset_usage and dataset_containment.
-</para>
-<para>
-dataset_filenames maps filenames (or more generally URIs) to unique dataset
-identifiers.
-</para>
-<para>
-dataset_usage maps from unique dataset identifiers to the execution
-unique identifiers for executions that take those datasets as inputs
-and outputs. execute_id and dataset_id identify the execution and the
-procedure which are related. direction indicates whether this dataset
-was used as an input or an output. param_name is the name of the parameter
-in the SwiftScript source file.
-</para>
-<para>
-dataset_containment indicates which datasets are contained within others,
-for example within a structure or array. An array or structure is a dataset
-with its own unique identifier; and each member of the array or structure
-is again a dataset with its own unique identifier. The outer_dataset_id and
-inner_dataset_id fields in each row indicate respectively the
-containing and contained dataset.
-</para>
-
-</section>
-
-</section>
-
-<section>
-<title>SQL with Pre-generated Transitive Closures</title>
-
-<para>SQL does not allow expression of transitive relations. This causes a
-problem for some of the queries.</para>
-<para>Work has previously been done (cite) to work on pre-generating
-transitive closures over relations. This is similar in concept to the
-pregenerated indices that SQL databases traditionally provide.
-</para>
-<para>In the pre-generated transitive closure model, a transitive closure
-table is pregenerated (and can be incrementally maintained as data is added
-to the database). Queries are then made against this table instead of
-against the ground table.
-</para>
-
-<para>All of the data available in the earlier SQL model is available, in
-addition to the additional closures generated here.</para>
-
-<para>
-Prototype code: There is a script called <literal>prov-sql-generate-transitive-closures.sh</literal> to generate the close of the preceeds
-relation and places it in a table called <literal>trans</literal>:
-<screen>
-$ prov-sql-generate-transitive-closures.sh 
-Previous: 0 Now: 869
-Previous: 869 Now: 1077
-Previous: 1077 Now: 1251
-Previous: 1251 Now: 1430
-Previous: 1430 Now: 1614
-Previous: 1614 Now: 1848
-Previous: 1848 Now: 2063
-Previous: 2063 Now: 2235
-Previous: 2235 Now: 2340
-Previous: 2340 Now: 2385
-Previous: 2385 Now: 2396
-Previous: 2396 Now: 2398
-Previous: 2398 Now: 2398
-</screen>
-</para>
-<para>A note on timing - constructing the closure of 869 base relations,
-leading to 2398 relations in the closure takes 48s with no indices; adding
-an index on a column in the transitive relations table takes this time down
-to 1.6s. This is interesting as an example of how some decent understanding
-of the data structure to produce properly optimised queries and the like
-is very helpful in scaling up, and an argument against implementing a poor
-'inner system'.
-</para>
-<para>Now we can reformulate some of the queries from the SQL section
-making use of this table.
-</para>
-<para>
-There's some papers around about transitive closures in SQL:
-
-<ulink url="http://coblitz.codeen.org:3125/citeseer.ist.psu.edu/cache/papers/cs/554/http:zSzzSzsdmc.krdl.org.sgzSzkleislizSzpsZzSzdlsw-ijit97-9.pdf/dong99maintaining.pdf">'Maintaining transitive closure of graphs in SQL'</ulink>
-and
-<ulink url="http://willets.org/sqlgraphs.html"
->http://willets.org/sqlgraphs.html</ulink>
-</para>
-
-<para>
-how expensive is doing this? how cheaper queries? how more expensive is
-adding data? and how scales (in both time and size (eg row count)) as we
-put in more rows (eg. i2u2 scale?) exponential, perhaps? though
-the theoretical limit is going to be influenced by our usage pattern
-which I think for the most part will be lots of disjoint graphs
-(I think). we get to index the transitive closure table, which we don't
-get to do when making the closure at run time.
-</para>
-
-<para>We don't have the path(s) between nodes but we could store that in the
-closure table too if we wanted (though multiple paths would then be more
-expensive as there are now more unique rows to go in the closure table)</para>
-
-<para>
-This is a violation of normalisation which the traditional relational people
-would say is bad, but OLAP people would say is ok.
-</para>
-
-<para>
-how much easier does it make queries?
-for queries to root, should be much easier (query over
-transitive table almost as if over base table). but queries such as
-'go back n-far then stop' and the like harder to query.
-</para>
-
-<para>
-keyword: 'incremental evaluation system' (to maintain transitive closure)
-</para>
-
-<para>
-The difference between plain SQL and SQL-with-transitive-closures
-is that in SQL mode, construction occurs at query time and the query
-needs to specify that construction. In the transitive-close mode,
-construction occurs at data insertion time, with increased expense there
-and in size of db, but cheaper queries (I think).
-</para>
-<para>
-sample numbers: fmri example has 50 rows in base causal relation
-table. 757 in table with transitive close.
-</para>
-
-<para>
-If running entirely separate workflows, both those numbers will scale linearly
-with the number of workflows; however, if there is some crossover between
-subsequent workflows in terms of shared data files then the transitive
-graph will grow super-linearly.
-</para>
-
-</section>
-</section>
-<section><title>XML</title>
-
-<para>In this XML approach, provenance data and metadata is represented as 
-a set of XML documents.</para>
-
-<para>Each document is stored in some kind of document store.
-Two different document stores are used: 
-the posix filesystem and eXist. XPath and XQuery are investigated as
-query languages.</para>
-
-<para>semi-structuredness allows structured metadata without having to
-necessarily declare its schema (which I think is one of the desired properties
-that turns people off using plain SQL tables to reflect the metadata
-schema). but won't get indexing without some configuration of structure so
-whilst that will be nice for small DBs it may be necessary to scale up
-(though that in itself isn't a problem - it allows gentle start without
-schema declaration and to scale up, add schema declarations later on - fits
-in with the scripting style). semi-structured form of XML lines up very
-will with the desire to have semi-structured metadata. compare ease of
-converting other things (eg fmri showheader output) to loose XML - field
-relabelling without having to know what the fields actually are - to how
-this needs to be done in SQL.
-</para>
-
-<para>
-The hierarchical structure of XML perhaps better for dataset containment
-because we can use // operator which is transitive down the tree for
-dataset containment.
-</para>
-<para>
-XML provides a more convenient export format than SQL or the other formats
-in terms of an easily parseable file format. There are lots of
-tools around for processing XML files in various different ways (for example,
-treating as text-like documents; deserialising into Java in-memory objects
-based on an XML Schema definition), and XML is one of the most familiar
-structured data file formats.
-</para>
-<para>
-Not sure what DAG representation would look like here? many (one per arc)
-small documents? is that a problem for the DBs? many small elements, more
-likely, rather than many small documents - roughly one document per workflow.
-</para>
-
-<section><title>xml metadata</title>
-<para>in the XML model, two different ways of putting in metadata: as descendents of the
-appropriate objects (eg. dataset metadata under the relevant datasets). this
-is most xml-like in the sense that its strongly hierarchical. as separate
-elements at a higher level (eg. separate documents in xml db). the two ways
-are compatible to the extent that some metadata can be stored one way, some
-the other way, although the way of querying each will be different.
-</para>
-<para>
-way i: at time of converting provenance data into XML, insert metadata at
-appropriate slots (though if XML storage medium allows, it could be inserted
-later on).
-</para>
-<para>
-modified <command>prov-to-xml.sh</command> to put that info in for
-the appropriate datasets (identified using the below descripted false-filename
-method</para>
-<para>
-can now make queries such as 'tell me the datasets which have header metadata':
-<screen>
-cat /tmp/prov.xml | ~/work/xpathtool-20071102/xpathtool/xpathtool.sh --oxml '//dataset[headermeta]'
-</screen>
-</para>
-<para>
-way ii: need to figure out what the dataset IDs for the volumes are. At the
-moment, the filename field for (some) mapped dataset parents still 
-has a filename
-even though that file never exists, like below. This depends on the mapper
-being able to invent a filename for such. Mappers aren't guaranteed to be
-able to do that - eg where filenames are not formed as a function of the
-parameters and path, but rely on eg. whats in the directory at initialisation
-(like the filesystem mapper).
-<screen>
-<dataset identifier="10682109">
-<filename>file://localhost/0001.in</filename>
-<dataset identifier="12735302">
-<filename>file://localhost/0001.h.in</filename>
-</dataset>
-<dataset identifier="7080341">
-<filename>file://localhost/0001.v.in</filename>
-</dataset>
-</screen>
-so we can perhaps use that. The mapped filename here is providing a
-dataset identification (by chance, not by design) so we can take advantage
-of it:
-<screen>
-$ cat /tmp/prov.xml | ~/work/xpathtool-20071102/xpathtool/xpathtool.sh '/provenance//dataset[filename="file://localhost/0001.in"]/@identifier'
-10682109
-</screen>
-</para>
-<para>I think metadata in XML is more flexible than metadata in relational,
-in terms of not having to define schema and not having to stick to schema.
-However, how will it stand up to the challenge of scalability? Need to get
-a big DB. Its ok to say that indices need to be made - I don't dispute that.
-What's nice is that you can operate at the low end without such. So need to
-get this stuff being imported into eg eXist (maybe the prototype XML processing
-should look like -> XML doc(s) on disk -> whatever xmldb in order to
-facilitate prototyping and pluggability.)
-</para>
-
-</section>
-
-<section><title>XPath query language</title>
-
-<para>
-XPath queries can be run either against the posix file system store
-or against the eXist database. When using eXist, the opportunity exists
-for more optimised query processing (and indeed, the eXist query processing
-model appears to evaluate queries in an initially surprising and unintuitive
-way to get speed); compared to on the filesystem, 
-where XML is stored in serialised form and must be parsed for each query.
-</para>
-
-<para>
-xml generation:
-<screen>
-./prov-to-xml.sh > /tmp/prov.xml
-</screen>
-and basic querying with xpathtool (http://www.semicomplete.com/projects/xpathtool/)
-<screen>
-cat /tmp/prov.xml | ~/work/xpathtool-20071102/xpathtool/xpathtool.sh --oxml '/provenance/execute[thread="0-4-1"]' 
-</screen>
-</para>
-<para>
-q1:
-<screen>
-cat /tmp/prov.xml | ~/work/xpathtool-20071102/xpathtool/xpathtool.sh --oxml '/provenance//dataset[filename="file://localhost/0001.jpeg"]'       
-<toplevel>
-  <dataset identifier="14976260">
-    <filename>file://localhost/0001.jpeg</filename>
-  </dataset>
-</toplevel>
-</screen>
-or can get the identifier like this:
-<screen>
- cat /tmp/prov.xml | ~/work/xpathtool-20071102/xpathtool/xpathtool.sh '/provenance//dataset[filename="file://localhost/0001.jpeg"]/@identifier' 
-14976260
-</screen>
-can also request eg IDs for multiple, like this:
-<screen>
-cat /tmp/prov.xml | ~/work/xpathtool-20071102/xpathtool/xpathtool.sh '/provenance//dataset[filename="file://localhost/0001.jpeg"]/@identifier|/provenance//dataset[filename="file://localhost/0002.jpeg"]/@identifier' 
-</screen>
-</para>
-<para>
-can find the threads that use this dataset like this:
-<screen>
- cat /tmp/prov.xml | ~/work/xpathtool-20071102/xpathtool/xpathtool.sh --oxml '/provenance/tie[dataset=/provenance//dataset[filename="file://localhost/0001.jpeg"]/@identifier]' 
-<toplevel>
-  <tie>
-    <thread>0-4-3</thread>
-    <direction>output</direction>
-    <dataset>14976260</dataset>
-    <param>j</param>
-    <value>org.griphyn.vdl.mapping.DataNode hashCode 14976260 with no value at dataset=final path=[1]</value>
-  </tie>
-</screen>
-</para>
-<para>
-now we can iterate as in the SQL example:
-<screen>
-$ cat /tmp/prov.xml | ~/work/xpathtool-20071102/xpathtool/xpathtool.sh '/provenance/tie[thread="0-4-3"][direction="input"]/dataset'
-4845856
-$ cat /tmp/prov.xml | ~/work/xpathtool-20071102/xpathtool/xpathtool.sh '/provenance/tie[dataset="4845856"][direction="output"]/thread'
-0-3-3
-$ cat /tmp/prov.xml | ~/work/xpathtool-20071102/xpathtool/xpathtool.sh '/provenance/tie[thread="0-3-3"][direction="input"]/dataset'
-3354850
-6033476
-$ cat /tmp/prov.xml | ~/work/xpathtool-20071102/xpathtool/xpathtool.sh '/provenance/tie[dataset="3354850"][direction="output"]/thread'
-0-2
-$ cat /tmp/prov.xml | ~/work/xpathtool-20071102/xpathtool/xpathtool.sh '/provenance/tie[thread="0-2"][direction="input"]/dataset'
-4436324
-$ cat /tmp/prov.xml | ~/work/xpathtool-20071102/xpathtool/xpathtool.sh '/provenance/tie[dataset="4436324"][direction="output"]/thread'
-</screen>
-</para>
-<para>so now we've exhausted the tie relation - dataset 4436324 comes from
-elsewhere...</para>
-<para>
-so we say this:
-<screen>
-$ cat /tmp/prov.xml | ~/work/xpathtool-20071102/xpathtool/xpathtool.sh '/provenance/dataset[@identifier="4436324"]//dataset/@identifier'
-11153746
-7202698
-12705705
-7202698
-12705705
-655223
-2088036
-13671126
-2088036
-13671126
-5169861
-14285084
-12896050
-14285084
-12896050
-6487148
-5772360
-4910675
-5772360
-4910675
-</screen>
-which gives us (non-unique) datasets contained within dataset 4436324. We can
-uniquify outside of the language:
-<screen>
-$ cat /tmp/prov.xml | ~/work/xpathtool-20071102/xpathtool/xpathtool.sh '/provenance/dataset[@identifier="4436324"]//dataset/@identifier' | sort |uniq
-11153746
-12705705
-12896050
-13671126
-14285084
-2088036
-4910675
-5169861
-5772360
-6487148
-655223
-7202698
-</screen>
-and now need to find what produced all of those... iterate everything again.
-probably we can do it integrated with the previous query so that we
-don't have to iterate externally:
-<screen>
-$ cat /tmp/prov.xml | ~/work/xpathtool-20071102/xpathtool/xpathtool.sh --oxml '/provenance/tie[dataset=/provenance/dataset[@identifier="4436324"]//dataset/@identifier]'
-<?xml version="1.0"?>
-<toplevel>
-  <tie>
-    <thread>0-1-3</thread>
-    <direction>output</direction>
-    <dataset>5169861</dataset>
-    <param>o</param>
-    <value>org.griphyn.vdl.mapping.DataNode hashCode 5169861 with no value at dataset=aligned path=[4]</value>
-  </tie>
-  <tie>
-    <thread>0-1-4</thread>
-    <direction>output</direction>
-    <dataset>6487148</dataset>
-    <param>o</param>
-    <value>org.griphyn.vdl.mapping.DataNode hashCode 6487148 with no value at dataset=aligned path=[1]</value>
-  </tie>
-  <tie>
-    <thread>0-1-2</thread>
-    <direction>output</direction>
-    <dataset>655223</dataset>
-    <param>o</param>
-    <value>org.griphyn.vdl.mapping.DataNode hashCode 655223 with no value at dataset=aligned path=[2]</value>
-  </tie>
-  <tie>
-    <thread>0-1-1</thread>
-    <direction>output</direction>
-    <dataset>11153746</dataset>
-    <param>o</param>
-    <value>org.griphyn.vdl.mapping.DataNode hashCode 11153746 with no value at dataset=aligned path=[3]</value>
-  </tie>
-</toplevel>
-</screen>
-which reveals only 4 ties to procedures from those datasets - the elements
-of the aligned array. We can get just the thread IDs for that by adding
-/thread onto the end:
-<screen>
- cat /tmp/prov.xml | ~/work/xpathtool-20071102/xpathtool/xpathtool.sh '/provenance/tie[dataset=/provenance/dataset[@identifier="4436324"]//dataset/@identifier]/thread'
-0-1-3
-0-1-4
-0-1-2
-0-1-1
-</screen>
-so now we need to iterate over those four threads as before using same
-process.
-</para>
-<para>so we will ask 'which datasets does this contain?' because at
-present, a composite dataset will ultimately be produced by its component
-datasets (though I think perhaps we'll end up with apps producing datasets
-that are composites, eg when a file is output that then maps into some
-structure - eg file contains  (1,2)   and this maps to struct { int x; int y;}.
-TODO move this para into section on issues-for-future-discussion.
-</para>
-<para>so xpath here doesn't really seem too different in expressive ability
-from the SQL approach - it still needs external implementation of
-transitivity for some of the transitive relations (though not for
-dataset containment). and that's a big complicating factor for ad-hoc queries...
-</para>
-<section><title>notes on using eXist</title>
-<para><ulink url="http://exist.sourceforge.net/client.html">command line
-client doc</ulink></para>
-<para>
-run in command line shell with local embedded DB (not running inside a
-server, so analogous to using sqlite rather than postgres):
-<screen>
-~/work/eXist/bin/client.sh -s -ouri=xmldb:exist://
-</screen>
-</para>
-<para>
-import a file:
-<screen>
-~/work/eXist/bin/client.sh -m /db/prov -p `pwd`/tmp.xml  -ouri=xmldb:exist://
-</screen>
-note that the -p document path is relative to exist root directory, not to
-the pwd, hence the explicit pwd.
-</para>
-<para>
-xpath query from commandline:
-<screen>
- echo '//tie' |  ~/work/eXist/bin/client.sh -ouri=xmldb:exist:// -x
-</screen>
-</para>
-</section>
-</section>
-
-<section><title>XSLT</title>
-<para>
-very much like when we revieved xpath, xslt and xquery for MDS data - these
-are the three I'll consider for the XML data model? does XSLT add anything?
-not sure. for now I think not so ignore, or at least comment that it does
-not add anything.
-</para>
-<para>
-<screen>
-./prov-to-xml.sh > /tmp/prov.xml
-xsltproc ./prov-xml-stylesheet.xslt /tmp/prov.xml
-</screen>
-with no rules will generate plain text output that is not much use.
-</para>
-<para>
-Two potential uses: i) as a formatting language for stuff coming out of
-some other (perhaps also XSLT, or perhaps other language) query process.
-and ii) as that other language doing semantic rather than presentation
-level querying (better names for those levels?)
-</para>
-</section>
-
-<section><title>XQuery query language</title>
-<para>
-Build query results for this using probably the same database as the
-above XPath section, but indicating where things could be better expressed
-using XPath.
-</para>
-<para>
-Using XQuery with eXists:
-<screen>
-$ cat xq.xq
-//tie
-$ ~/work/eXist/bin/client.sh -ouri=xmldb:exist:// -F `pwd`/xq.xq
-</screen>
-</para>
-<para>
-A more advanced query:
-<screen>
-for $t in //tie
-  let $dataset := //dataset[@identifier=$t/dataset]
-  let $exec := //execute[thread=$t/thread]
-  where $t/direction="input"
-  return <r>An invocation of {$exec/trname} took input {$dataset/filename}</r>
-</screen>
-</para>
-</section>
-
-</section>
-
-<section><title>RDF and SPARQL</title>
-<para>
-This can probably also be extended to SPARQL-with-transitive-closures
-using the same methods as 1; or see OWL note below.
-</para>
-<para>
-Pegasus/WINGS queries could be interesting to look at here - they
-are from the same tradition as Swift. However, the don't deal very
-well with transitivity.
-</para>
-<para>OWL mentions transitivity as something that can be expressed in
-an OWL ontology but are there any query languages around that can
-make use of that kind of information?
-</para>
-<para>See prolog section on RDF querying with prolog.
-</para>
-<para>
-There's an RDF-in-XML format for exposing information in serialised form.
-Same discussion applies to this as to the discussion in XML above.
-</para>
-</section>
-
-<section><title>GraphGrep</title>
-<screen>
- - download link see email
-<ulink url="http://www.cs.nyu.edu/shasha/papers/graphgrep/">graphgrep</ulink>
-graphgrep install notes: port install db3
-some hack patches to get it to build with db3
-</screen>
-<para>
-Got a version of graph grep with interesting graph language apparently in it.
-Haven't tried it yet though.
-</para>
-</section>
-
-<section><title>prolog</title>
-<para>Perhaps interesting querying ability here. Probably slow? but not
-really sure - SWI Prolog talks about indexing its database (and allowing
-the indexing to be customised) and about supporting very large databases.
-So this sounds hopeful.
-</para>
-<para>
-convert database into SWI prolog. make queries based on that.
-</para>
-<para>Can make library to handle things like transitive relations - should be
-easy to express the transitivity in various different ways (dataset
-containment, procedure-ordering, whatever) - far more clear there than
-in any other query language.</para>
-<para>
-SWI Prolog has some RDF interfacing, so this is clearly a realm that is
-being investigated by some other people. For example:
-<blockquote>It is assumed that Prolog is a suitable vehicle to reason with the data expressed in RDF models -- http://www.swi-prolog.org/packages/rdf2pl.html</blockquote>
-</para>
-<para><ulink
-url="http://www.xml.com/pub/a/2001/04/25/prologrdf/index.html">
-http://www.xml.com/pub/a/2001/04/25/prologrdf/index.html
-</ulink>
-</para>
-<para>
-prolog can be used over RDF or over any other tuples. stuff in SQL
-tables should map neatly too. Stuff in XML hieararchy perhaps not so
-easily but should still be doable.</para>
-<para>indeed, SPARQL queries have a very prolog-like feel to them
-superficially.
-</para>
-<para>prolog db is a program at the moment - want something that looks more
-like a persistent modifiable database. not sure what the prolog approach
-to doing that is.</para>
-<para>
-so maybe prolog makes an interesting place to do future research on
-query language? not used by this immediate work but a direction to
-do query expressibility research (building on top of whatever DB is used
-for this round?)
-</para>
-
-<para>q1 incremental:
-
-<screen>
-?- dataset_filenames(Dataset,'file://localhost/0001.in').
-
-Dataset = '10682109' ;
-</screen>
-
-Now with lib.pl:
-
-<screen>
-dataset_trans_preceeds(Product, Source) :-
-   dataset_usage(Thread, 'O', Product, _, _),
-   dataset_usage(Thread, 'I', Source, _, _).
-
-
-dataset_trans_preceeds(Product, Source) :-
-   dataset_usage(Thread, 'O', Product, _, _),
-   dataset_usage(Thread, 'I', Inter, _, _),
-   dataset_trans_preceeds(Inter, Source).
-</screen>
-
-then we can ask:
-
-<screen>
-?- dataset_trans_preceeds('14976260',S).
-
-S = '4845856' ;
-
-S = '3354850' ;
-
-S = '6033476' ;
-
-S = '4436324' ;
-
-No
-</screen>
-
-which is all the dataset IDs up until the point that we get into
-array construction. This is the same iterative problem we have
-in the SQL section too - however, it should be solvable in the prolog case
-within prolog in the same way that the recursion is. so now:
-
-<screen>
-base_dataset_trans_preceeds(Product, Source, Derivation) :-
-   dataset_usage(Thread, 'O', Product, _, _),
-   dataset_usage(Thread, 'I', Source, _, _),
-   Derivation = f(one).
-
-base_dataset_trans_preceeds(Product, Source, Derivation) :-
-   dataset_containment(Product, Source),
-   Derivation = f(two).
-
-dataset_trans_preceeds(Product, Source, Derivation) :-
-    base_dataset_trans_preceeds(Product, Source, DBase),
-    Derivation = [DBase].
-
-dataset_trans_preceeds(Product, Source, Derivation) :-
-   base_dataset_trans_preceeds(Product, Inter, DA),
-   dataset_trans_preceeds(Inter, Source, DB),
-   Derivation = [DA|DB].
-</screen>
-
-</para>
-
-
-<para>q4:
-
-<screen>
-invocation_procedure_names(Thread, 'align_warp'), dataset_usage(Thread, Direction, Dataset, 'model', '12'), execute(Thread, Time, Duration, Disposition, Executable),  format_time(atom(DayOfWeek), '%u', Time), DayOfWeek = '5'.
-TODO format this multiline, perhaps remove unused bindings
-</screen>
-</para>
-
-
-</section>
-
-<section><title>amazon simpledb</title>
-<para>restricted beta access... dunno if i will get any access - i have
-none so far, though I have applied.</para>
-<para>
-From reading a bit about it, my impressions are that this will prove to be
-a key->value lookup mechanism with poor support for going the other way
-(eg. value or value pattern or predicate-on-value  -> key) or for doing
-joins (so rather like a hash table - which then makes me say 'why not also
-investigate last year's buzzword of DHTs?'. I think that these additional
-lookup mechanisms are probably necessary for a lot of the
-query patterns.
-</para>
-
-<para>
-For some set of queries, though, key -> value lookup is sufficient; and
-likely the set of queries that is appropriate to this model varies depending
-on how the key -> value model is laid out (i.e. what gets to be a key
-and what is its value? do we form a hierarchy from workflow downwards?)
-</para>
-
-</section>
-
-
-<section><title>graphviz</title>
-<para>
-This is a very different approach that is on the boundaries of relevance.
-</para>
-<para>
-goal: produce an annotated graph showing the procedures and the
-datasets, with appropriate annotation of identifiers and
-descriptive text (eg filenames, procedure names, executable names) that
-for small (eg. fmri sized workflows) its easy to get a visual view of
-whats going on.
-</para>
-<para>
-don't target anything much bigger than the fmri example for this.
-(though there is maybe some desire to produce larger visualisations for
-this - perhaps as a separate piece of work. eg could combine foreach
-into single node, datasets into single node)
-</para>
-<para>
-perhaps make subgraphs by the various containment relationships:
-datasets in same subgraph as their top level parent;
-app procedure invocations in the same subgraph as their compound
-procedure invocation.
-</para>
-</section>
-</section>
-
-<section><title>Comparison with related work that our group has done
-before</title>
-<section>
-<title>vs VDS1 VDC</title>
-<para>gendax - VDS1 has a tool <command>gendax</command> which provides
-various ways of accessing data from the command line. Eg. prov challenge
-question 1 very easily answered by this.
-</para>
-<para>
-two points I don't like that should discuss here: i) the metadata schema
-(I claim there doesn't need to be a generic metadata schema at all - 
-when applications decide they want to store certain metadata, they declare
-it in the database); and ii) the mixed-model - this is discussed a bit in
-the 'no general query language' section. consolidate/crosslink.
-</para>
-</section>
-
-
-<section><title>vs VDL provenance paper figure 1 schema</title>
-<para>
-The significant differences are:
-(TODO perhaps produce a diagram for comparison. could use same diagram
-differently annotated to indicate trees in the XML section and also
-in the transitivity discussion section)
-</para>
-<para>
-the 'annotation' model - screw that, go
-native</para>
-<para>the dataset containment model, which doesn't exist in the
-virtual dataset model.
-</para>
-<para>
-workflow object has a fromDV and toDV field. what are
-those meant to mean? In present model, there isn't any base data for a workflow
-at the moment - everything can be found in the descriptions of its
-components (such as files used, start time, etc). (see notes on
-compound procedure containment with model of a workflow as a compound
-procedure)
-</para>
-<para>invocation to call to procedure chain. this chain looks different.
-there are executes (which look like invocations/calls) and procedure names
-(which do not exist as primary objects because I am not storing
-program code). kickstart records and execute2 records would be more like
-the annotations you'd get from the annotation part, with the call being
-more directly associated with the execute object.
-</para>
-</section>
-</section>
-<section><title>Questions/Discussion points</title>
-
-<section><title>metadata</title>
-<para>
-discourse analysis: Perhaps the word 'metadata' should be banned in
-this document - it implies that there is
-some special property that distinguishes it sufficiently from normal
-data such that it must be treated differently from different data.
-I don't believe this to be the case.
-</para>
-<para>
-script <command>prov-mfd-meta-to-xml</command> that generates (fake)
-metadata record in XML like this:
-<screen>$ ./prov-mfd-meta-to-xml 123
-<headermeta>
-  <dataset>123</dataset>
-  <bitspixel>16</bitspixel>
-  <xdim>256</xdim>
-  <ydim>256</ydim>
-  <zdim>128</zdim>
-  <xsize>1.000000e+00</xsize>
-  <ysize>1.000000e+00</ysize>
-  <zsize>1.250000e+00</zsize>
-  <globalmaximum>4095</globalmaximum>
-  <globalminimum>0</globalminimum>
-</headermeta>
-</screen>
-</para>
-
-<section><title>metadata random notes</title>
-<para>
-metadata: there's a model of arbitrary metadata pairs being
-associated with arbitrary objects.</para>
-<para>there's another model (that I tend to
-favour) in that the metadata schema is more defined than this - eg in i2u2
-for any particular elab, the schema for metadata is fairly well defined.
-</para>
-<para>
-eg in cosmic, there are strongly typed fields such as "blessed" or
-"detector number" that
-are hard-coded throughout the elab. whilst the VDS1 VDC can deal with
-arbitrary typing, that's not the model that i2u2/cosmic is using. need to be
-careful to avoid the inner-platform effect here especially - "we need a
-system that can do arbitrarily typed metadata pairs" is not actually a
-requirement in this case as the schema is known at application build time.
-(note that this matters for SQL a lot, not so much for plain XML data model,
-though if we want to specify things like 'is-transitive' properties then
-in any model things like that need to be better defined)
-</para>
-<para>
-fMRI provenance challenge metadata (extracted using scanheader) looks like
-this:
-<screen>
-$ /Users/benc/work/fmri-tutorial/AIR5.2.5/bin/scanheader ./anatomy0001.hdr
-bits/pixel=16
-x_dim=256
-y_dim=256
-z_dim=128
-x_size=1.000000e+00
-y_size=1.000000e+00
-z_size=1.250000e+00
-
-global maximum=4095
-global minimum=0
-</screen>
-</para>
-</section>
-
-</section>
-
-
-<section><title>The 'preceeds' relation</title>
-
-<section><title>Provenance of hierarchical datasets</title>
-
-
-
-<para>
-One of the main provenance queries is whether some entity (a
-data file or a procedure) was influenced by some other entity.
-</para>
-
-<para>
-In VDS1 a workflow is represented by a bipartite DAG where one
-vertex partition is files and the other is procedures.
-</para>
-
-<para>
-The more complex data structures in Swift make the provenance graph
-not so straightforward. Procedures input and output datasets that may
-be composed of smaller datasets and may in turn be composed into larger
-datasets.
-</para>
-
-<para>
-For example, a dataset D coming out of a procedure P may form a part
-of a larger dataset E. Dataset E may then be an input to procedure Q.
-The ordering is then:
-<screen>
- P --output--> D --contained-by-> E --input--> Q
-</screen>
-</para>
-
-<para>
-Conversely, a dataset D coming out of a procedure P may contain a
-smaller dataset E. Dataset E may then be used as an input to procedure
-Q.
-<screen>
- P --output--> D --contains--> E --input--> Q
-</screen>
-</para>
-
-<para>
-So the contains relation and its reverse, the contained-by relation, do not
-in the general case seem to give an appropriate preceeds relation.
-
-<screen>
-so: i) should Q1<->Q be a bidirection dependency (in which case we
-  no longer have a DAG, which causes trouble)
-
-or
-
-    ii) the dependency direction between Q1 and Q depends on how Q and Q1
-were constructed. I think this is the better approach, because I think
-there really is some natural dependency order.
-
-If A writes to Q1 and Q1 is part of Q then A->Q1->Q
-If A writes to Q and Q1 is part of Q then A->Q->Q1
-
-So when we write to a dataset, we need to propagate out the dependency
-from there (both upwards and downwards, I think).
-
-eg. if Q1X is part of Q1 is part of Q
-and A writes to Q1, then Q1X depends on Q1 and Q depends on Q1.
-
-
-</screen>
-
-</para>
-
-
-
-<para>
-Various ways of doing closure - we have various relations in the graph
-such as dataset containment and procedure input/output. Need to figure out
-how this relates to predecessor/successors in the provenance sense.
-
-<screen>
-A(
-Also there are multiple levels of tracking (see the section on that):
-
-If an app procedure produces eg
-volume v, consisting of two files v.img and v.hdr (the fmri example)
-then what is the dependency here? I guess v.img and v.hdr is the
-output... (so in the present model there will never be
-downward propagation as every produced dataset will be produced out of
-base files. however its worth noting that this perhaps not always the
-case...)
-
-Alternatively we can model at the level of the app procedure, which in
-the above case returns a volume v.
-
-I guess this is similar to the case of the compound procedures vs
-contained app procedures...
-
-If we model at the level of files, then we don't really need to know
-about higher datasets much?
-
-Perhaps for now should model at level of procedure calls
-)A
-
-List A()A above as an issue and pick one choice - for now, lowest=file
-production, so that all intermediate and output datasets will end up
-with a strictly upward dependency
-
-This rule does not deal with input-only datasets (that is, datasets
-which we do not know where they came from). It would be fairly natural
-with the above choice to again make dependencies from files upward.
-
-So for now, dataset dependency rule is:
-
-  * parent datasets depend on their children.
-
-Perhaps?
-</screen>
-</para>
-
-
-</section>
-
-<section><title>Transitivity of relations in query language</title>
-<para>
-One of my biggest concerns in query languages such as SQL and XPath
-is lack of decent transitive query ability.
-</para>
-<para>
-I think we need a main relation, the <firstterm>preceeds</firstterm>
-relation. None of the relations defined in the source provenance data
-provides this relation.
-</para>
-<para>The relation needs to be such that if any dataset or program Y that
-contributed to the production of any other dataset or program Y, then
-X preceeds Y.
-</para>
-<para>
-We can construct pieces of this relation from the existing relations:
-<itemizedlist>
-<listitem><para>There are fairly simple rules for procedure inputs and
-outputs:
-A dataset passed as an input to a procedure preceeds that procedure.
-Similarly, a procedure that outputs a dataset preceeds that dataset.
-</para></listitem>
-<listitem>
-<para>
-Hierarchical datasets are straightforward to describe in the present
-implementation. Composite data structures are always described in terms
-of their members, so the members of a data structure always preceed
-the structures that contain them. [not true, i think - we can pass a
-struct into a procedure and have that procedure populate multiple
-contained files... bleugh]
-</para>
-</listitem>
-<listitem><para>
-The relation is transitive, so the presence of some relations by the
-above rules will imply the presence of other relations to ensure
-transitivity.
-</para></listitem>
-</itemizedlist>
-</para>
-</section>
-</section>
-
-
-<section><title>Unique identification of provenance objects</title>
-
-<para>
-A few issues - what are the objects that should be identified? (semantics);
-and how should the objects be identified? (syntax).
-</para>
-<section><title>provenence object identifier syntax</title>
-<para>
-For syntax, I favour a URI-based approach and this is what I have
-implemented in the prototypes. URIs rovide a ready made system for
-identifying different kinds of objects in different ways within the
-same syntax.
-which should be useful for the querys that want to do that.
-file, gsiftp URIs for filenames. probably should be normalising file
-URIs to refer to a specific hostname? otherwise they're fairly
-meaningless outside of one host...
-also, these name files but files are mutable.
-</para>
-<para>
-its also fairly straightforward to subsume other identifier schemes into
-URIs (for example, that is already done for UUIDs, in RFC4122).
-</para>
-<para>
-for other IDs, such as workflow IDs, a tag or uuid URI would be nice.
-</para>
-<para>
-cite: <ulink url="http://www.rfc-editor.org/rfc/rfc4151.txt">RFC4151</ulink>
-<blockquote>
-The tag algorithm lets people mint -- create -- identifiers that no one else using the same algorithm could ever mint. It is simple enough to do in your head, and the resulting identifiers can be easy to read, write, and remember. The identifiers conform to the URI (URL) Syntax.
-</blockquote>
-</para>
-<para>
-cite:  <ulink url="http://www.rfc-editor.org/rfc/rfc4122.txt">RFC4122</ulink>
-<blockquote>
-This specification defines a Uniform Resource Name namespace for
-UUIDs (Universally Unique IDentifier), also known as GUIDs (Globally
-Unique IDentifier).  A UUID is 128 bits long, and requires no central
-registration process.
-</blockquote>
-</para>
-
-<section><title>tag URIs</title>
-
-<para>
-tag URIs for identifiers of provenance objects:
-</para>
-
-<para>
-all URIs allocated according to this section are labelled beginning with one
-of:
-<screen>
-tag:benc at ci.uchicago.edu,2007:swift:
-tag:benc at ci.uchicago.edu,2008:
-</screen>
-</para>
-
-<para>
-for datasets identified only within a run (that is, for example, anything
-that doesn't have a filename):
-tag:benc at ci.uchicago.edu,2007:swift:dataset:TIMESTAMP:SEQ
-with TIMESTAMP being a timestamp of sometime near the start of the run,
-intending to be a unique workflow id (probably better to use the
-run-id)
-and SEQ being a sequence number. However, shouldn't really be pulling any
-information out of these time and seq fields.
-</para>
-<para>
-for executes - this is based on the karajan thread ID and the log base
-filename (which is assumed to be a globally unique identifying string):
-tag:benc at ci.uchicago.edu,2007:swiftlogs:execute:WFID:THREAD with,
-as for datasets, WFID is a workflow-id-like entity.
-</para>
-
-</section>
-
-
-</section>
-
-<section id="crossrun-id"><title>Dataset identifier semantics</title>
-
-<para>At present, dataset identifiers are formed uniquely for every
-dataset object created in the swift runtime (unique across JVMs as well
-as within a JVM).</para>
-
-<para>This provides an overly sensitive(?) identity - datasets
-which are the same will be given different dataset identifiers at
-different times/places; although two different datasets will never be
-given the same identifier.
-</para>
-
-<para>A different approach would be to say 'datasets are made of files,
-so we want to identify files, and files already have identiers called
-filenames'.</para>
-
-<para>I think this approach is also insufficient.</para>
-
-<para>The assertion 'datasets are made of files' is not correct. Datasets
-come in several forms: typed files, typed simple values, and typed
-collections of other datasets. Each of these needs a way to identify it.
-</para>
-
-<para>
-Simple values are probably the easiest to identify. They can be identified
-by their own value and embedded within a suitable URI scheme. For example,
-a dataset representing the integer 7 could be identified as:
-<screen>
-tag:benc at ci.uchicago.edu,2008:swift:types:int:7
-</screen>
-This would have the property that all datasets representing the integer
-7 would be identical (that is, have the same identifier).
-</para>
-
-<para>
-Collections of datasets are more complicated. One interesting example of
-something that feels to me quite similar is the treatment of directories
-in hash-based file systems, such as git. In this model, a collection of
-datasets would be represented by a hash of a canonical representation of
-its contenst, for example, a dataset consisting of a three element array
-of three files in this order: "x-foo:red", "x-foo:green" and "x-foo:blue"
-might be represented as:
-<screen>
-tag:benc at ci.uchicago.edu,2008:collection:QHASH
-</screen>
-where:
-<screen>
-QHASH := sha1sum(QLONG)
-
-QLONG := "[0] x-foo:red [1] x-foo:green [2] x-foo:blue"
-</screen>
-This allows a repeatable computation of dataset identifiers given 
-only knowledge of the contents of the dataset. Specifically it does
-not rely on a shared database to map content to identifier. However,
-it can only be computed when the content of the dataset is fully known
-(roughly equivalent to when the dataset is closed in the Swift
-runtime)
-</para>
-
-<para>
-For identifying a dataset that is a file, there are various properties.
-Filename is one property. File content is another property. It seems
-desirable to distinguish between datasets that have the same name yet
-have different content, whilst identifying datasets that have the same
-content. To this end, an identifier might be constructed from both the
-filename and a hash of the content.
-</para>
-
-
-<para>
-for prototype could deal only with files staged to local system,
-so that we can easily compute a hash over the content.
-</para>
-
-<para>related to taking md5sums, kickstart provides the first few bytes
-of certain files (the executable and specified input and output files);
-whilst useful for basic sanity checks, there are very strong correlations
-with magic numbers and common headers that make this a poor content
-identifying function. perhaps it should be absorbed as dataset metadata if
-its available?
-</para>
-
-
-<para>TODO the following para needs to rephrase as justification for
-having identities for dataset collections  ::: at run-time when can we
-pick up the
-identities from other runs? pretty much we want identity to be expressed
-in some way so that we can get cross-run linkup.
-how do we label a dataset such that we can annotate it - eg in fmri
-example, how do we identify the input datasets (as file pairs) rather than
-the individual files?</para>
-<para>
-Its desirable to give the same dataset the same identifier in multiple
-runs; and be able to figure out that dataset identifier outside of a run,
-for example for the purposes of dealing with metadata that is annotating
-a dataset.
-</para>
-</section>
-
-<section><title>File content tracking</title>
-<para>
-identify file contents with md5sum (or other hash) - this is somewhat
-expensive, but without it we have (significantly) lessened belief in what the
-contents of a file are - we would otherwise, I think, be using only names
-and relying on the fact that those names are primary keys to file content
-(which is not true in general).
-so this should be perhaps optional. plus where to do it? various places...
-in wrapper.sh?
-</para>
-<para>
-References here for using content-hashes:
-git, many of the DHTs (freenet, for example - amusing to cite the classic
-freenet gpl.txt example)
-</para>
-</section>
-
-</section>
-
-<section><title>Type representation</title>
-<para>
-how to represent types in this? for now use names, but that doesn't
-go cross-program because we can define a different type with the same
-name in every different program. hashtree of type definitions?
-</para>
-</section>
-<section><title>representation of workflows</title>
-<para>
-perhaps need a workflow object that acts something like a namespace
-but implicitly definted rather than being user labelled (hence capturing
-the actual runtime space rather than what the user claims). that's the
-runID, I guess.
-</para>
-<para>
-Also tracking of workflow source file. Above-mentioned reference to
-tracking file contents applies to this file too.
-</para>
-</section>
-<section><title>metadata extraction</title>
-<para>
-provenance challenge I question 5 reports about pulling fields out of the
-headers of one of the input files. There's a program, scanheader, that
-extracts this info. Related but not actually useful, I think, for this
-question is that header fields could be mapped into SwiftScript if we
-allowed value+file simultaneous data structures.
-</para>
-</section>
-
-<section><title>source code recreation</title>
-<para>
-should the output of the queries be sufficient to regenerate the
-data? the most difficult thing here seems to be handling data
-sets - we have the mapping tree for a dataset, but what is the right
-way to specify that in swift syntax? maybe need mapper that takes a
-literal datastructure and maps the filenames from it. though that
-doesn't account for file contents (so this bit of this point is
-the file contents issue, which should perhaps be its own chapter
-in this file)
-</para>
-</section>
-<section><title>Input parameters</title>
-<para>
-Should also work on workflows which take an input parameter, so that we
-end up with the same output file generated several times with different
-output values - eg pass a string as a parameter and write that to
-'output.txt' - every time we run it, the file will be different, and we'll
-have multiple provenance reports indicating how it was made, with different
-parameters. that's a simple demonstration of the content-tracking which
-could be useful.
-</para>
-<para>
-If we're tracking datasets for simple values, I think we get this
-automatically. The input parameters are input datasets in the same way
-that input files are input datasets; and so fit into the model in the
-same way.
-</para>
-
-</section>
-
-
-</section>
-
-<section id="opm"><title>Open Provenance Model (OPM)</title>
-<section><title>OPM-defined terms and their relation to Swift</title>
-<para>
-OPM defines a number of terms. This section describes how those terms
-relate to Swift.
-</para>
-<para>
-artifact: This OPM term maps well onto the internal Swift representation
-of <literal>DSHandle</literal>s. Each DSHandle in a Swift run is an
-OPM artifact, and each OPM artifact in a graph is a DSHandle.
-</para>
-<para>collection: OPM collections are a specific kind of artifact, containing
-other artifacts. This corresponds with DSHandles for composite data types
-(structs and arrays). OPM has collection accessors and collection
-constructors which correspond to the <literal>[]</literal> and
-<literal>.</literal> operators (for accessors) and various assignment
-forms for constructors.
-</para>
-<para>
-process: An OPM process corresponds to a number of Swift concepts (although
-they are slowly converging in Swift to a single concept). Those concepts
-are: procedure invocations, function calls, and operators.
-</para>
-<para>
-agent: There are several entities which can act as an agent. At the
-highest level where only Swift is involved, a run of the
-<literal>swift</literal> commandline client is an agent which drives
-everything. Some other components of Swift may be regarded as agents,
-such as the client side wrapper script. For present OPM work, the
-only agent will be the Swift command line client invocation.
-</para>
-<para>
-account: For present OPM work, there will be one account per workflow run.
-In future, different levels of granularity that could be expressed through
-different accounts might include representing compound procedure calls as
-processes vs representing atomic procedures calls explicitly.
-</para>
-<para>
-OPM graph: there are two kinds of OPM graph that appear interesting and
-straightforward to export: i) of entire provenance database (thus containing
-multiple workflow runs); ii) of a single run
-</para>
-</section>
-<section><title>OPM links</title>
-<para><ulink url="http://twiki.ipaw.info/bin/view/Challenge/OPM">Open Provenance Model at ipaw.info</ulink></para>
-</section>
-
-<section><title>Swift specific OPM considerations</title>
-
-<para>
-non-strictness: Swift sometimes lazily constructs collections (leading to
-the notion in Swift of an array being closed, which means that we know no
-more contents will be created, somewhat like knowing we've reached the end
-of a list). It may be that an array is never closed during a run, but that
-we still have sufficient provenance information to answer useful queries
-(for example, if we specify a list [1:100000] and only refer to the 5th
-element in that array, we probably never generate most of the DSHandles...
-so an explicit representation of that array in terms of datasets cannot be
-expressed - though a higher level representation of it in terms of its
-constructor parameters can be made) (?)
-</para>
-
-<para>
-aliasing: (this is related to some similar ambiguity in other parts of
-Swift, to do with dataset roots - not provenance related). It is possible to
-construct arrays by explicitly listing their members:
-<programlisting>
-int i = 8;
-int j = 100;
-int a[] = [i,j];
-int k = a[1];
-// here, k = 8
-</programlisting>
-The dataset contained in <literal>i</literal> is an artifact (a literal, so
-some input artifact that has no creating process). The array
-<literal>a</literal> is an artifact created by the explicit array construction
-syntax <literal>[memberlist]</literal> (which is an OPM process). If we
-then model the array accessor syntax <literal>a[1]</literal> as an OPM
-process, what artifact does it return? The same one or a different one?
-In OPM, we want it to return a different artifact; but in Swift we want this
-to be the same dataset... (perhaps explaining this with <literal>int</literal>
-type variables is not the best way - using file-mapped data might be better)
-TODO: what are the reasons we want files to have a single dataset
-representation in Swift? dependency ordering - definitely. cache management?
-Does this lead to a stronger notion of aliasing in Swift?
-</para>
-
-<para>
-Provenance of array indices: It seems fairly natural to represent arrays as OPM
-collections, with array element extraction being a process. However, in OPM,
-the index of an array is indicated with a role (with suggestions that it might
-be a simple number or an XPath expression). In Swift arrays, the index is
-a number, but it has its own provenance, so by recording only an integer there,
-we lose provenance information about where that integer came from - that
-integer is a Swift dataset in its own right, which has its own provenance.
-It would be nice to be able to represent that (even if its not standardised
-in OPM). I think that needs re-ification of roles so that they can be
-described; or it needs treatment of [] as being like any other binary
-operator (which is what happens inside swift) - where the LHS and RHS are
-artifacts, and the role is not used for identifying the member (which would
-also be an argument for making array element extraction be treated more
-like a plain binary operator inside the Swift compiler and runtime)
-</para>
-
-<para>
-provenance of references vs provenance of the data in them: the array and
-structure access operators can be used to acquire <literal>DSHandle</literal>s
-which have no value yet, and which are then subsequently assigned. In this
-usage, the provenance of the containing structure should perhaps be that it
-is constructed from the assignments made to its members, rather than the
-other way round. There is some subtlety here that I have not fully figured
-out.
-</para>
-
-<para>
-Piecewise construction of collections: arrays and structs can be
-constructed piecewise using <literal>. =</literal> and <literal>[] =</literal>.
-how is this to be represented in OPM? perhaps the closing operation maps
-to the OPM process that creates the array, so that it ends up looking
-like an explicit array construction, happening at the time of the close?
-</para>
-
-<para>
-Provenance of mapper parameters: mapper parameters are artifacts. We can
-represent references to those in a Swift-specific part of an artifacts
-value, perhaps. Probably not something OPM-generalisable.
-</para>
-
-</section>
-
-</section>
-
-
-
-<section><title>Processing i2u2 cosmic metadata</title>
-<para>i2u2 cosmic metadata is extracted from a VDS1 VDC.</para>
-<para>
-TODO some notes here about how I dislike the inner-plaform effect in the
-metadata part of the VDS1 VDC.
-</para>
-<para>
-to launch postgres on soju.hawaga.org.uk:
-<screen>
-sudo -u postgres /opt/local/lib/postgresql82/bin/postgres -D  /opt/local/var/db/postgresql82/defaultdb
-</screen>
-
-and then to import i2u2 vdc data as VDC1 vdc:
-
-<screen>
-$ /opt/local/lib/postgresql82/bin/createdb -U postgres i2u2vdc1
-CREATE DATABASE
-$ psql82 -U postgres -d i2u2vdc1 < work/i2u2.vdc 
-gives lots of errors like this:
-ERROR:  role "portal2006_1022" does not exist
-because indeed that role doesn't exist
-but I think that doesn't matter for these purposes - everything will end
-up being owned by the postgres user which suffices for what I want to do.
-</screen>
-
-</para>
-
-<para>
-annotation tables are:
-<screen>
- public | anno_bool       | table | postgres   29214 rows
-  this is boolean values
-
- public | anno_call       | table | postgres   0 rows
-- this is a subject table. also has did
-
- public | anno_date       | table | postgres   52644 rows
-   this is date values
-
- public | anno_definition | table | postgres   1849 rows
-    this is XML-embedded derivations (values / objects)
-
- public | anno_dv         | table | postgres   0 rows
-- this is a subject table. also has did
-
- public | anno_float      | table | postgres   27966 rows
-    this is float values
-
- public | anno_int        | table | postgres   58879 rows
-    this is int values
-
- public | anno_lfn        | table | postgres   411490 rows
-    this is the subject record for LFN subjects - subjects have an
-    mkey (predicate) column
-  
- public | anno_lfn_b      | table | postgres
-this appears to be keyed by did field - ties dids to what looks like
-LFNs
-
- public | anno_lfn_i      | table | postgres
- public | anno_lfn_o      | table | postgres
-likewise these two
-
- public | anno_targ       | table | postgres
-is this a subject table? it has an mkey value that always appears to be
-'description' and then has a name column which lists invocation parameter
-names and ties them to dids.
-
- public | anno_text       | table | postgres   242824 rows
-text values (objects)
-
- public | anno_tr         | table | postgres
-</screen>
-</para>
-
-<para>
-most of the interesting data starts in anno_lfn because data is mostly
-annotating LFNs:
-</para>
-
-<screen>
-i2u2vdc1=# select * from anno_lfn limit 1;
- id |        name         |   mkey   
-----+---------------------+----------
-  2 | 180.2004.0819.0.raw | origname
-</screen>
-
-<para>
-There are 63 different mkeys (predicates in RDF-speak):
-</para>
-
-<screen>
-i2u2vdc1=# select distinct mkey from anno_lfn;
-             mkey
-------------------------------
- alpha
- alpha_error
- author
- avgaltitude
- avglatitude
- avglongitude
- background_constant
- background_constant_error
- bins
- blessed
- caption
- chan1
- chan2
- chan3
- chan4
- channel
- city
- coincidence
- comments
- cpldfrequency
- creationdate
- date
- description
- detectorcoincidence
- detectorid
- dvname
- enddate
- energycheck
- eventcoincidence
- eventnum
- expire
- filename
- gate
- gatewidth
- group
- height
- julianstartdate
- lifetime(microseconds)
- lifetime_error(microseconds)
- name
- nondatalines
- numBins
- origname
- plotURL
- project
- provenance
- radius
- rawanalyze
- rawdate
- school
- source
- stacked
- startdate
- state
- study
- teacher
- thumbnail
- time
- title
- totalevents
- transformation
- type
- year
-(63 rows)
-</screen>
-
-<para>
-so work on a metadata importer for i2u2 cosmic that will initially deal
-with only the lfn records.
-</para>
-
-<para>
-There are 19040 annotated LFNs, with 411490 annotations in total, so about
-21 annotations per LFN.
-</para>
-
-<para>The typing of the i2u2 data doesn't support metadata objects
-that aren't swift workflow entities - for example high schools as
-objects in their own right - the same text string is stored as a value
-over and over in many anno_text rows. A more generalised 
-Subject-Predicate-Object model in RDF would have perhaps a URI for
-the high school, with metadata on files tying files to a high school and
-metadata on the high school object. In SQL, that same could be modelled
-in a relational schema.
-</para>
-
-<para>
-Conversion of i2u2 VDS1 VDC LFN/text annotations into an XML document
-using quick hack script took 32mins on soju, my laptop. resulting XML
-is 8mb. needed some manual massage to remove malformed embedded xml and
-things like that.
-<screen>
-./i2u2-to-xml.sh >lfn-text-anno.xml
-</screen>
-
-so we end up with a lot of records that look like this:
-
-<screen>
-<lfn name="43.2007.0619.0.raw">
-<origname>rgnew.txt</origname>
-<group>riogrande</group>
-<teacher>Luis Torres Rosa</teacher>
-<school>Escuelo Superior Pedro Falu</school>
-<city>Rio Grande</city>
-<state>PR</state>
-<year>AY2007</year>
-<project>cosmic</project>
-<comments></comments>
-<detectorid>43</detectorid>
-<type>raw</type>
-<avglatitude>18.22.8264</avglatitude>
-<avglongitude>-65.50.1975</avglongitude>
-<avgaltitude>-30</avgaltitude>
-</lfn>
-</screen>
-
-The translation here is not cosmic-aware - the XML tag is the mkey name from
-vdc and the content is the value. So we get all the different metadata
-(informal) schemas that appear to have been used, translated.
-
-</para>
-
-<para>
-Output the entire provenance database:
-<screen>
-$ time cat lfn-text-anno.xml | ~/work/xpathtool-20071102/xpathtool/xpathtool.sh --oxml '/cosmic'  | wc -c
- 10178037
-
-real    0m2.624s
-user    0m2.612s
-sys     0m0.348s
-</screen>
-</para>
-
-<para>
-Select all LFN objects (which on this dataset means everything one layer
-down):
-<screen>
-$ time cat lfn-text-anno.xml | ~/work/xpathtool-20071102/xpathtool/xpathtool.sh --oxml '/cosmic/lfn'  | wc -c
- 9618818
-
-real    0m2.692s
-user    0m2.703s
-sys     0m0.337s
-</screen>
-</para>
-
-<para>
-Try to select an LFN that doesn't exist, by specifying a filename that is not
-there:
-<screen>
-$ time cat lfn-text-anno.xml | ~/work/xpathtool-20071102/xpathtool/xpathtool.sh --oxml '/cosmic/lfn[@name="NOSUCHNAME"]'
-<?xml version="1.0"?>
-<toplevel/>
-
-real    0m0.867s
-user    0m0.740s
-sys     0m0.143s
-</screen>
-</para>
-
-<para>
-Similar query for a filename that does exist:
-<screen>
-$ time cat lfn-text-anno.xml | ~/work/xpathtool-20071102/xpathtool/xpathtool.sh --oxml '/cosmic/lfn[@name="1.2005.0801.0"]'
-<?xml version="1.0"?>
-<toplevel>
-  <lfn name="1.2005.0801.0">
-    <origname>C:\Documents and Settings\zsaleh\My Documents\Tera stuff\Qnet\Qnet Data\All_data_Aug_01_2005_TERA_9_Vth_1000.TXT</origname>
-    <group>TERA</group>
-    <teacher>Marcus Hohlmann</teacher>
-    <school>Florida Institute of Technology</school>
-    <city>Melbourne</city>
-    <state>FL</state>
-    <year>AY2004</year>
-    <project>cosmic</project>
-    <comments/>
-    <source>1.2005.0801.0</source>
-    <detectorid>1</detectorid>
-    <type>split</type>
-  </lfn>
-</toplevel>
-
-real    0m0.875s
-user    0m0.745s
-sys     0m0.154s
-</screen>
-</para>
-
-</section>
-
-<section><title>processing fMRI metadata</title>
-<para>
-for fmri, we can extract embedded image metadata using the scanheader
-utility.
-</para>
-<para>
-associate that with the 'volume' dataset, not with the actual image data
-files. for now that means we need the datasets to have been labelled with
-their ID already, which is at the moment after execution has completed.
-that's fine for now with the retrospective provenance restriction of this
-immediate work. see the 
-<link linkend="crossrun-id">'cross-run dataset ID' section</link>, for which this
-also applies - we are generating dataset IDs outside of a particular run.
-</para>
-</section>
-
-<section><title>random unsorted notes</title>
-<para>
-to put provdb in postgres instead of sqlite3:
-
-start as per i2u2 instructions, then <command>/opt/local/lib/postgresql82/bin/createdb -U postgres provdb</command>
-
-then:
-<command>
- psql82 -U postgres -d provdb < prov-init.sql 
-</command> to initialise the db.
-</para>
-<para>
-on terminable, made new database that is not the default system db install,
-by using existing postgres but running under my user id:
-<screen>
-  131  mkdir pgplay
-  133  chmod 0700 pgplay/
-  135  initdb -D ~/pgplay/
-  138  postmaster -D ~/pgplay/ -p 5435
-$ createdb -p 5435 provdb
-CREATE DATABASE
-</screen>
-now can access like this:
-<screen>
-$ psql -p 5435 -d provdb
-provdb=# \dt
-No relations found.
-</screen>
-</para>
-<para>osg/gratia - how does this data tie in?
-</para>
-<para>cedps logging - potential for info there but there doesn't seem
-anything particularly substantial at the moment
-</para>
-</section>
-
-<section><title>Provenance Challenge 1 examples</title>
-<section><title>Basic SQL</title>
-<section><title>provch q1</title>
-
-<screen>
-get the dataset id for the relevant final dataset:
-sqlite> select * from dataset_filenames where filename like '%0001.jpeg';
-14976260|file://localhost/0001.jpeg
-
-get containment info for that file:
-sqlite> select * from dataset_containment where inner_dataset_id = 14976260;
-7316236|14976260
-sqlite> select * from dataset_containment where inner_dataset_id = 7316236;
-[no answer]
-
-now need to find what contributed to those...
-
-> select * from dataset_usage where dataset_id=14976260;
-0-4-3|O|14976260
-
-> select * from dataset_usage where execute_id='0-4-3' and direction='I';
-0-4-3|I|4845856
-qlite> select * from dataset_usage where dataset_id=4845856 and direction='O';
-0-3-3|O|4845856
-
-sqlite> select * from dataset_usage where execute_id='0-3-3' and direction='I';
-0-3-3|I|3354850
-0-3-3|I|6033476
-
-qlite> select * from dataset_usage where (dataset_id=3354850 or dataset_id=6033476) and direction='O';
-0-2|O|3354850
-
-sqlite> select * from dataset_usage where execute_id='0-2' and direction='I';0-2|I|4436324
-
-sqlite> select * from dataset_usage where dataset_id=4436324 and direction='O';
-[no answer]
-
-so here we have run out of places to keep going. however, I think this 4436324
-is not an input - its related to another dataset. so we need another rule for
-inference here...
-
-
-
-</screen>
-</section>
-
-<section><title>prov ch q4</title>
-<para>prov ch q4 incremental solutions:</para>
-
-<para>first cut:
-this will select align_warp procedures and their start times. does not
-select based on parameters, and does not select based on day of week.
-(the former we don't have the information for; the latter maybe don't have
-the information in sqlite3 to do - or maybe need SQL date ops and SQL dates
-rather than unix timestamps)
-<screen>
-sqlite> select id, starttime from invocation_procedure_names, executes where executes.id = invocation_procedure_names.execute_id and procedure_name='align_warp';
-</screen>
-<para>Next, this will display the day of week for an invocation:</para>
-<screen>
-select id, strftime('%w',starttime, 'unixepoch') from executes,invocation_procedure_names where procedure_name='align_warp' and executes.id=invocation_Procedure_names.execute_id;
-0-0-3|5
-0-0-4|5
-0-0-1|5
-0-0-2|5
-</screen>
-</para>
-<para>And this will match day of week (sample data is on day 5, which is a
-Friday, not the day requested in the question):
-<screen>
-sqlite> select id from executes,invocation_procedure_names where procedure_name='align_warp' and executes.id=invocation_Procedure_names.execute_id and strftime('%w',starttime, 'unixepoch') = '5';
-0-0-3
-0-0-4
-0-0-1
-0-0-2
-</screen>
-</para>
-<para>
-Now we bring in input data binding: we query which datasets were passed in
-as the model parameter for each of the above found invocations:
-<screen>
-sqlite> select executes.id, dataset_usage.dataset_id from executes,invocation_procedure_names, dataset_usage where procedure_name='align_warp' and executes.id=invocation_Procedure_names.execute_id and strftime('%w',starttime, 'unixepoch') = '5' and dataset_usage.execute_id = executes.id and direction='I' and param_name='model';
-0-0-3|11032210
-0-0-4|13014156
-0-0-1|14537849
-0-0-2|16166946
-</screen>
-though at the moment this doesn't give us the value of the parameter.
-</para>
-
-<para>so now pull in the parameter value:
-
-<screen>
-sqlite> select executes.id, dataset_usage.dataset_id, dataset_usage.value from executes,invocation_procedure_names, dataset_usage where procedure_name='align_warp' and executes.id=invocation_Procedure_names.execute_id and strftime('%w',starttime, 'unixepoch') = '5' and dataset_usage.execute_id = executes.id and direction='I' and param_name='model';
-0-0-3|11032210|12
-0-0-4|13014156|12
-0-0-1|14537849|12
-0-0-2|16166946|12
-</screen>
-</para>
-
-<para>
-Now we can select on the parameter value and get our final answer:
-<screen>
-sqlite> select executes.id from executes,invocation_procedure_names, dataset_usage where procedure_name='align_warp' and executes.id=invocation_Procedure_names.execute_id and strftime('%w',starttime, 'unixepoch') = '5' and dataset_usage.execute_id = executes.id and direction='I' and param_name='model' and dataset_usage.value=12;
-0-0-3
-0-0-4
-0-0-1
-0-0-2
-</screen>
-Note that in SQL in general,
-we *don't* get typing of the parameter value here so can't do anything
-more than string comparison. For example, we couldn't check for the
-parameter being greater than 12 or similar. In sqlite, it happens that
-its typing is dynamic enough to allow the use of relational operators like
-> on fields no matter what their declared type, because declared type is
-ignored. This would stop working if stuff was run on eg postgres or mysql,
-I think.
-</para>
-</section>
-<section><title>prov ch metadata</title>
-<para>metadata: in the prov challenge, we annotate (some) files with
-their header info. in the provenance paper, we want annotations on more
-than just files.
-</para>
-<para>for prov ch metadata, define a scanheader table with the result of
-scanheader on each input dataset, but do it *after* we've done the
-run (because we're then aware of dataset IDs)</para>
-<para>There's a representation question here - the metadata is about a volume
-dataset which is a pair of files, not about a header or image file separately.
-how to represent this? we need to know the dataset ID for the volume. at
-the moment, we can know that after a run. but this ties into the
-identification of datasets outside of an individual run point - move this
-paragraph into that questions/discussions section.
-</para>
-<para>
-should probably for each storage method show the inner-platform style of
-doing metadata too; associated queries to allow comparison with the
-different styles; speeds of metadata query for large metadata collections
-(eg. dump i2u2 cosmic metadata for real cosmic VDC)
-</para>
-</section>
-</section>
-
-<section><title>SQL with transitive closures</title>
-<section>
-<title>prov ch question 1:</title>
-<screen>
-$ sqlite3 provdb
-SQLite version 3.3.17
-Enter ".help" for instructions
-sqlite> select * from dataset_filenames where filename like '%0001.jpeg';
-14976260|file://localhost/0001.jpeg
--- can query keeping relations
-sqlite> select * from trans where after=14976260;
-0-4-3|14976260
-4845856|14976260
-0-3-3|14976260
-3354850|14976260
-6033476|14976260
-4825541|14976260
-7061626|14976260
-0-2|14976260
-4436324|14976260
-11153746|14976260
-655223|14976260
-5169861|14976260
-6487148|14976260
-5772360|14976260
-4910675|14976260
-7202698|14976260
-12705705|14976260
-2088036|14976260
-13671126|14976260
-14285084|14976260
-12896050|14976260
-0-1-3|14976260
-0-1-4|14976260
-0-1-2|14976260
-0-1-1|14976260
-2673619|14976260
-9339756|14976260
-10682109|14976260
-8426950|14976260
-16032673|14976260
-2274050|14976260
-1461238|14976260
-13975694|14976260
-9282438|14976260
-12766963|14976260
-8344105|14976260
-9190543|14976260
-14055055|14976260
-2942918|14976260
-12735302|14976260
-7080341|14976260
-0-0-3|14976260
-0-0-4|14976260
-0-0-2|14976260
-0-0-1|14976260
-2307300|14976260
-11032210|14976260
-16166946|14976260
-14537849|14976260
-13014156|14976260
-6435309|14976260
-6646123|14976260
--- or can query without relations:
-sqlite> select before from trans where after=14976260;
-0-4-3
-4845856
-0-3-3
-3354850
-6033476
-4825541
-7061626
-0-2
-4436324
-11153746
-655223
-5169861
-6487148
-5772360
-4910675
-7202698
-12705705
-2088036
-13671126
-14285084
-12896050
-0-1-3
-0-1-4
-0-1-2
-0-1-1
-2673619
-9339756
-10682109
-8426950
-16032673
-2274050
-1461238
-13975694
-9282438
-12766963
-8344105
-9190543
-14055055
-2942918
-12735302
-7080341
-0-0-3
-0-0-4
-0-0-2
-0-0-1
-2307300
-11032210
-16166946
-14537849
-13014156
-6435309
-6646123
-
-
-</screen>
-</section>
-
-
-</section>
-
-</section>
-
-<section><title>Representation of dataset containment and procedure execution in r2681 and how it could change.</title>
-
-<para>
-Representation of processes that transform one dataset into another dataset
-at present only occurs for <literal>app</literal> procedures, in logging of
-<literal>vdl:execute</literal> invocations, in lines like this:
-<screen>
-2009-03-12 12:20:29,772+0100 INFO  vdl:parameterlog PARAM thread=0-10-1 direction=input variable=s provenanceid=tag:benc at ci.uchicago.edu,2008:swift:dataset:20090312-1220-md2mfc24:720000000033
-</screen>
-and dataset containment is represented at closing of the containing DSHandle by this:
-<screen>
-2009-03-12 12:20:30,205+0100 INFO  AbstractDataNode CONTAINMENT parent=tag:benc at ci.uchicago.edu,2008:swift:dataset:20090312-1220-md2mfc24:720000000020 child=tag:benc at ci.uchicago.edu,2008:swift:dataset:20090312-1220-md2mfc24:720000000086
-2009-03-12 12:20:30,205+0100 INFO  AbstractDataNode ROOTPATH dataset=tag:benc at ci.uchicago.edu,2008:swift:dataset:20090312-1220-md2mfc24:720000000086 path=[2]
-</screen>
-</para>
-
-<para>
-This representation does not represent the relationship between datasets when
-they are related by @functions or operators. Nor does it represent causal
-relationships between collections and their members - instead it represents
-containment.
-</para>
-
-<para>
-Adding representation of operators (including array construction) and of
- at function invocations would give substantially more information about
-the provenance of many more datasets.
-</para>
-
-</section>
-
-</article>
-

Deleted: branches/release-0.92/docs/quickstartguide.xml
===================================================================
--- branches/release-0.92/docs/quickstartguide.xml	2011-05-16 15:43:22 UTC (rev 4477)
+++ branches/release-0.92/docs/quickstartguide.xml	2011-05-16 19:29:01 UTC (rev 4478)
@@ -1,272 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-
-<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook V4.2//EN" "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" [] >
-
-<article>
-	<articleinfo revision="0.2">
-		<title>Swift Quick Start Guide</title>
-		<subtitle>Source Control $LastChangedRevision$</subtitle>
-	
-		<abstract>
-			<formalpara>
-				<title></title>
-				<para>
-				
-					The impatient may find the <ulink
-					url="reallyquickstartguide.php">Swift Really Quick Start
-					Guide</ulink> to be more convenient.
-
-					This guide describes the steps needed to download, install,
-					configure, and run the basic examples for Swift. If you are
-					using a pre-installed version of Swift, you can skip 
-					directly to the <link linkend="configure">configuration
-					section</link>.
-
-				</para>
-			</formalpara>
-		</abstract>
-	</articleinfo>
-
-	<sect1 id="download">
-		<title>Downloading a Swift Distribution</title>
-		
-		<para>
-		
-			There are three main ways of getting the Swift implementation: <link
-			linkend="dl-stable">stable releases</link>, <link
-			linkend="dl-nightly">nightly builds</link>, and the <link
-			linkend="dl-repository">source code repository</link>. 
-
-		</para>
-		
-		<sect2 id="dl-stable">
-			<title>Stable Releases</title>
-		
-			<para>
-			
-				Stable releases can be obtained from the Swift download page:
-				<ulink
-				url="http://www.ci.uchicago.edu/swift/downloads/index.php#stable">Swift
-				Downloads Page</ulink>. Once you downloaded the package, please
-				move to the <link linkend="install">install section</link>.
-
-			</para>
-		</sect2>
-		
-		<sect2 id="dl-nightly">
-			<title>Nightly Builds</title>
-		
-			<para>
-			
-				Swift builds and tests are being run every day. The <ulink
-				url="http://www.ci.uchicago.edu/swift/downloads/index.php#nightly">Swift
-				downloads page</ulink> contains links to the latest build and
-				test page. The nightly builds reflect a development version of
-				the Swift code and should not be used in production mode. After
-				downloading a nightly build package, please continue to the
-				<link linkend="install">install section</link>.
-			
-			</para>
-		</sect2>
-		
-		<sect2 id="dl-repository">
-			<title>Source Repository</title>
-		
-			<para>
-			
-				Details about accessing the Swift source repository together with
-				build instructions are available on the <ulink
-				url="http://www.ci.uchicago.edu/swift/downloads/index.php#nightly">Swift
-				downloads page</ulink>. Once built, the <filename
-				class="directory">dist/swift-svn</filename> directory
-				will contain a self-contained build which can be used in place or moved to a different location.
-				You should then proceed to the <link
-				linkend="configure">configuration section</link>.
-			
-			</para>
-		</sect2>
-	</sect1>
-	
-	<sect1 id="install">
-		<title>Installing a Swift Binary Package</title>
-		
-		<para>
-		
-			Simply unpack the downloaded package (<filename
-			class="file">swift-<version>.tar.gz</filename>) into a
-			directory of your choice:
-			
-<screen>
-<prompt>></prompt> <command>tar</command> <option>-xzvf</option> <filename
-class="file">swift-<version>.tar.gz</filename>
-</screen>
-			
-			This will create a <filename
-			class="directory">swift-<version></filename> directory
-			containing the build.
-		
-		</para>
-	</sect1>
-	
-	<sect1 id="configure">
-		<title>Configuring Swift</title>
-		
-		<para>
-		
-			This section describes configuration steps that need to be taken in
-			order to get Swift running. Since all command line tools provided
-			with Swift can be found in the <filename
-			class="directory">bin/</filename> directory of the Swift distribution, it may
-			be a good idea to add this directory to your <envar>PATH</envar>
-			environment variable:
-			
-<screen>
-<prompt>></prompt> <command>export</command> <envar>PATH</envar>=<filename
-class="directory">/path/to/swift/bin</filename>:<envar>$PATH</envar>
-</screen>
-			
-		</para>
-		<sect2 id="security"><title>Grid Security</title>
-			<para>For local execution of jobs, no grid security configuration
-				is necessary.
-			</para>
-			<para>However, when submitting jobs to a remote machine using Globus
-				Toolkit services, Swift makes use of the
-				<ulink
-				url="http://www.globus.org/toolkit/docs/4.0/security/key-index.html">
-				Grid Security Infrastructure (GSI)</ulink> for authentication
-				and authorization. The requirements for this are detailed in
-				the following sections. Note that GSI is not required to be
-				configured for local execution (which will usually be the
-				case when first starting with Swift).
-			</para>
-
-		<sect3 id="certs">
-	
-			<title>User Certificate</title>
-			<para>
-			
-				GSI requires a certificate/private key
-				pair for authentication to 
-				<ulink url="http://www.globus.org/toolkit">Globus Toolkit</ulink>
-				services. The certificate and private key should
-				be placed into the <filename
-				class="file">~/.globus/usercert.pem</filename> and <filename
-				class="file">~/.globus/userkey.pem</filename> files,
-				respectively.
-			
-			</para>
-		
-		</sect3>
-		
-		<sect3 id="cas">
-		
-			<title>Certificate Authorities Root Certificates</title>
-			
-			<para>
-			
-				The Swift client libraries are generally required to authenticate
-				the services to which they connect. This process requires the
-				presence on the Swift submit site of the root certificates used
-				to sign the host certificates of services used. These root
-				certificates need to be installed in either (or both) the
-				<filename class="directory">~/.globus/certificates</filename>
-				and <filename
-				class="directory">/etc/grid-security/certificates</filename>
-				directories. A package with the root certificates of the
-				certificate authorities used in the <ulink
-				url="http://www.teragrid.org">TeraGrid</ulink> can be found
-				<ulink
-				url="http://security.teragrid.org/TG-CAs.html">here</ulink>.
-			
-			</para>
-		
-		</sect3>
-		</sect2>
-				
-		<sect2>
-		
-			<title>Swift Properties</title>
-			
-			<para>
-			
-				A Swift properties file (named <filename
-				class="file">swift.properties</filename>) can be used to
-				customize certain configuration aspects of Swift. A shared
-				version of this file, <filename
-				class="file">etc/swift.properties</filename>
-				in the installation directory
-				can be used to provide installation-wide defaults. A per-user
-				properties file, <filename
-				class="file">~/.swift/swift.properties</filename> can be used for
-				user specific settings. Swift first loads the shared
-				configuration file and, if present, the user configuration file.
-				Any properties not explicitly set in the user configuration file
-				will be inherited from the shared configuration file. Properties
-				are specified in the following format:
-
-<screen>
-<property>name</property>=<parameter>value</parameter>
-</screen>
-
-				For details about the various properties Swift accepts, please
-				take a look at the <ulink
-				url="http://www.ci.uchicago.edu/swift/guides/userguide.php#properties">Swift
-				Properties Section</ulink> in the <ulink
-				url="http://www.ci.uchicago.edu/swift/guides/userguide.php">Swift
-				User Guide</ulink>.
-
-			</para>
-		
-		</sect2>
-	</sect1>
-	
-	<sect1 id="examples">
-		
-		<title>Running Swift Examples</title>
-		
-		<para>
-		
-			The Swift examples can be found in the <filename
-			class="directory">examples</filename> directory in the Swift distribution.
-			The examples are written in the <ulink
-			url="http://www.ci.uchicago.edu/swift/guides/userguide/language.php">SwiftScript
-			language</ulink>, and have <filename class="file">.swift</filename> as
-			a file extension. 
-
-		</para>
-		
-		<para>
-		
-			The Grid Security Infrastructure, which Swift uses, works with
-			limited time certificates called proxies. These proxies can be
-			generated from your user certificate and private key using one of
-			<command>grid-proxy-init</command> or
-			<command>cog-proxy-init</command> (the latter being a Java Swing
-			interface to the former).
-		
-		</para>
-				
-		<para>
-		
-			Execution of a Swift workflow is done using the
-			<command>swift</command> command, which takes the Swift
-			workflow file name as an argument:
-			
-<screen>
-<prompt>></prompt> <command>cd examples/swift</command>
-<prompt>></prompt> <command>swift</command> <option><filename
-class="file">first.swift</filename></option>
-</screen>
-
-			The <ulink 		
-			url="http://www.ci.uchicago.edu/swift/guides/userguide.php#swiftcommand">Swift
-			Command Options Section</ulink> in the <ulink 			
-			url="http://www.ci.uchicago.edu/swift/guides/userguide.php">Swift 			
-			User Guide</ulink> contains details about the various options of the
-			<command>swift</command>.
-		
-		</para>
-		
-	</sect1>
-</article>

Deleted: branches/release-0.92/docs/reallyquickstartguide.xml
===================================================================
--- branches/release-0.92/docs/reallyquickstartguide.xml	2011-05-16 15:43:22 UTC (rev 4477)
+++ branches/release-0.92/docs/reallyquickstartguide.xml	2011-05-16 19:29:01 UTC (rev 4478)
@@ -1,109 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-
-<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook V4.2//EN" "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" [] >
-
-<article>
-	<articleinfo revision="0.2">
-		<title>Swift Really Quick Start Guide</title>
-		<subtitle>Source control $LastChangedRevision$</subtitle>
-	
-		<abstract>
-			<formalpara>
-				<title></title>
-				<para>
-				
-					This guide is a compressed version of the <ulink
-					url="quickstartguide.php">Swift Quick Start
-					Guide</ulink>.
-				
-				</para>
-			</formalpara>
-		</abstract>
-	</articleinfo>
-
-	<sect1 id="reallyquickstart">
-		<title>Swift Really Quick Start Guide</title>
-		
-		<itemizedlist>
-		
-			<listitem>
-				<para>
-				
-					<ulink
-					url="http://www.ci.uchicago.edu/swift/downloads/index.php">Download</ulink>
-					Swift.
-				
-				</para>
-			</listitem>	
-			
-			<listitem>
-				<para>
-				
-					Unpack it and add the <filename
-					class="directory">swift-xyz/bin</filename> directory to your
-					<envar>PATH</envar>.
-				
-				</para>
-			</listitem>	
-			
-			<listitem>
-				<para>
-				
-					Make sure you have your user certificate, a valid GSI proxy
-					certificate, and the proper CA root certificates in either
-					<filename
-					class="directory">~/.globus/certificates</filename> or
-					<filename
-					class="directory">/etc/grid-security/certificates</filename>.
-				
-				</para>
-			</listitem>	
-			
-			<listitem>
-				<para>
-				
-					Edit <filename
-					class="file">swift-xyz/etc/swift.properties</filename>. You
-					should add your numeric IP address there
-					(<property>ip.address</property>=<literal>x.y.z.w</literal>).
-				
-				</para>
-			</listitem>
-			
-			<listitem>
-				<para>
-				
-					Use the example site catalog and transformation catalog (they 
-					are configured for local submission):
-					
-<screen>
-<command>cd</command> swift-xyz/etc
-<command>cp</command> sites.xml.example sites.xml
-<command>cp</command> tc.data.example tc.data
-</screen>
-				
-				</para>
-			</listitem>
-						
-			<listitem>
-				<para>
-				
-					Use <command>swift file.dtm</command> to compile and execute
-					<filename class="file">file.dtm</filename>.
-				
-				</para>
-			</listitem>	
-			
-			<listitem>
-				<para>
-				
-					Use <command>swift -resume file-<runid>.?.rlog
-					file.dtm</command> to resume a failed run.
-				
-				</para>
-			</listitem>	
-		
-		</itemizedlist>
-		
-	</sect1>
-</article>

Deleted: branches/release-0.92/docs/swift-site-model.fig
===================================================================
--- branches/release-0.92/docs/swift-site-model.fig	2011-05-16 15:43:22 UTC (rev 4477)
+++ branches/release-0.92/docs/swift-site-model.fig	2011-05-16 19:29:01 UTC (rev 4478)
@@ -1,53 +0,0 @@
-#FIG 3.2  Produced by xfig version 3.2.5
-Landscape
-Center
-Inches
-Letter  
-100.00
-Single
--2
-1200 2
-2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
-	 2100 3450 4425 2025
-2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
-	 2100 3675 4425 3825
-2 4 0 1 0 11 999 -1 20 0.000 0 0 7 0 0 5
-	 8850 4800 4200 4800 4200 675 8850 675 8850 4800
-2 2 0 1 0 5 50 -1 20 0.000 0 0 -1 0 0 5
-	 7425 2025 8400 2025 8400 2775 7425 2775 7425 2025
-2 2 0 1 0 5 50 -1 20 0.000 0 0 -1 0 0 5
-	 7425 2850 8400 2850 8400 3600 7425 3600 7425 2850
-2 2 0 1 0 5 50 -1 20 0.000 0 0 -1 0 0 5
-	 7425 3675 8400 3675 8400 4425 7425 4425 7425 3675
-2 2 0 1 0 5 50 -1 20 0.000 0 0 -1 0 0 5
-	 7425 1200 8400 1200 8400 1950 7425 1950 7425 1200
-2 2 0 1 0 6 500 -1 20 0.000 0 0 -1 0 0 5
-	 525 3000 2025 3000 2025 4050 525 4050 525 3000
-2 2 0 1 0 6 100 -1 20 0.000 0 0 -1 0 0 5
-	 4500 1350 6300 1350 6300 2700 4500 2700 4500 1350
-2 2 0 1 0 6 100 -1 20 0.000 0 0 -1 0 0 5
-	 4500 3225 6375 3225 6375 4425 4500 4425 4500 3225
-2 4 0 1 0 11 55 -1 20 0.000 0 0 7 0 0 5
-	 8850 5850 4200 5850 4200 5100 8850 5100 8850 5850
-2 4 0 1 0 11 55 -1 20 0.000 0 0 7 0 0 5
-	 8850 6675 4200 6675 4200 6075 8850 6075 8850 6675
-3 0 0 1 0 7 50 -1 -1 0.000 0 0 0 2
-	 7350 1575 6375 1950
-	 0.000 0.000
-3 0 0 1 0 7 50 -1 -1 0.000 0 0 0 2
-	 7350 1725 6450 3975
-	 0.000 0.000
-4 0 0 50 -1 0 12 0.0000 4 150 450 750 3300 Swift\001
-4 0 0 50 -1 0 12 0.0000 4 150 1155 750 3555 commandline\001
-4 0 0 50 -1 0 12 0.0000 4 150 465 750 3810 client\001
-4 0 0 50 -1 0 12 0.0000 4 195 1500 4725 2100 shared filesystem\001
-4 0 0 50 -1 0 12 0.0000 4 150 465 4800 3900 LRM\001
-4 0 0 50 -1 0 12 0.0000 4 150 1215 7350 900 Worker nodes\001
-4 0 0 50 -1 0 12 0.0000 4 150 465 6525 1425 Posix\001
-4 0 0 50 -1 0 12 0.0000 4 150 1065 5325 5475 Another site\001
-4 0 0 50 -1 0 12 0.0000 4 150 1065 5250 6450 Another site\001
-4 0 0 50 -1 0 12 0.0000 4 150 810 2475 2475 remote fs\001
-4 0 0 50 -1 0 12 0.0000 4 105 600 2475 2730 access\001
-4 0 0 50 -1 0 12 0.0000 4 195 1260 2400 4125 job submission\001
-4 0 0 50 -1 0 12 0.0000 4 195 915 2400 4380 eg GRAM\001
-4 0 0 50 -1 0 12 0.0000 4 195 1080 3075 3000 eg. GridFTP\001

Deleted: branches/release-0.92/docs/swift-site-model.png
===================================================================
(Binary files differ)

Deleted: branches/release-0.92/docs/tutorial-live.xml
===================================================================
--- branches/release-0.92/docs/tutorial-live.xml	2011-05-16 15:43:22 UTC (rev 4477)
+++ branches/release-0.92/docs/tutorial-live.xml	2011-05-16 19:29:01 UTC (rev 4478)
@@ -1,982 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook V4.2//EN" "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" [] >
-
-<article>
-    <articleinfo>
-        <title>A Swift Tutorial for ISSGC07</title>
-    </articleinfo>
-
-<sect1> <title>Introduction</title>
-    <para>
-This tutorial is intended to introduce new users to the basics of Swift.
-It is structured as a series of small exercise/examples which you can
-try for yourself as you read along.
-    </para>
-    <para>This is version: $LastChangedRevision$</para>
-</sect1>
-
-<sect1><title>Environment setup</title>
-
-<para>First set up the swift environment:
-</para>
-
-<programlisting>
-$ cp ~benc/workflow/vdsk-0.1-r877/etc/tc.data ~
-$ cp ~benc/workflow/vdsk-0.1-r877/etc/sites.xml ~
-$ export PATH=$PATH:~benc/workflow/vdsk-0.1-r877/bin
-</programlisting>
-
-</sect1>
-
-<sect1> <title>A first workflow</title>
-    <para>
-The first example program uses an image processing utility to perform
-a visual special effect on a supplied file.
-    </para>
-
-<para>Here is the program we will use:</para>
-
-<programlisting>
-
-type imagefile;
-
-(imagefile output) flip(imagefile input) {
-  app {
-    convert "-rotate" "180" @input @output;
-  }
-}
-
-imagefile puppy <"input-1.jpeg">;
-imagefile flipped <"output.jpeg">;
-
-flipped = flip(puppy);
-
-</programlisting>
-
-
-<para>
-This simple workflow has the effect of running this command:
-convert -rotate 180 input-1.jpeg output.jpeg
-</para>
-
-<para>ACTION: First prepare your working environment:</para>
-
-<programlisting>
-
-$ cp ~benc/workflow/input-1.jpeg .
-
-$ ls *.jpeg
-input-1.jpeg
-
-</programlisting>
-
-<para>ACTION: Open input-1.jpeg</para>
-
-<para>
-You should see a picture. This is the
-picture that we will modify in our first workflow.</para>
-
-<para>
-ACTION: use your favourite text editor to put the above SwiftScript
-program into a file called
-flipper.swift
-</para>
-
-<para>Once you have put the program into flipper.swift, you can execute
-the workflow like this:
-</para>
-
-<programlisting>
-
-$ swift flipper.swift
-
-Swift v0.1-dev
-
-RunID: e1bupgygrzn12
-convert started
-convert completed
-
-$ ls *.jpeg
-input-1.jpeg
-output.jpeg
-</programlisting>
-
-<para>A new jpeg has appeared - output.jpeg.</para>
-
-<para>ACTION: Open it.
-You should see that the image is different from the input image - it
-has been rotated 180 degress.</para>
-
-<para>The basic structure of this program is a type definition,
-a procedure definition, a variable definition and
-then a call to the procedure:</para>
-
-<para>
-All data in SwiftScript must have a type. This line defines a new type
-called imagefile, which will be the type that all of our images will be.
-</para>
-
-<programlisting>
-type imagefile;
-</programlisting>
-
-<para>
-Next we define a procedure called flip. This procedure will use the
-ImageMagick convert application to rotate a picture around by 180 degrees.
-</para>
-
-<programlisting>
-(imagefile output) flip(imagefile input) {
-  app {
-    convert "-rotate" "180" @input @output;
-  }
-}
-</programlisting>
-
-<para>
-To achieve this, it executes the ImageMagick utility 'convert', passing
-in the appropriate commandline option and the name of the input and output
-files.
-</para>
-
-<para>
-In swift, the output of a program looks like a return value.
-It has a type, and also has a variable name
-(unlike in most other programming languages).
-</para>
-
-<programlisting>
-imagefile puppy <"input-1.jpeg">;
-imagefile flipped <"output.jpeg">;
-</programlisting>
-
-<para>
-We define two variables, called puppy and flipped. These variables will
-contain our input and output images, respectively.
-</para>
-
-<para>We tell swift that the contents of the variables will be stored on
-disk (rather than in memory) in the files "input-1.jpeg" (which already
-exists), and in "output.jpeg". This is called <firstterm>mapping</firstterm>
-and will be discussed in more depth later.</para>
-
-<programlisting>
-flipped = flip(puppy);
-</programlisting>
-
-<para>
-Now we call the flip procedure, with 'puppy' as its input and its
-output going into 'flipped'.
-</para>
-
-<para>Over the following exercises, we will use this relatively
-simple SwiftScript program as a base for future exercises.</para>
-
-</sect1>
-
-<sect1><title>A second program</title>
-
-<para>
-Our next example program uses some more swift syntax to produce images that are
-rotated by different angles, instead of flipped over all the way.
-</para>
-
-<para>Here is the program in full. We'll go over it section by section.</para>
-<programlisting>
-type imagefile;
-
-(imagefile output) rotate(imagefile input, int angle) {
-  app {
-    convert "-rotate" angle @input @output;
-  }
-}
-
-imagefile puppy <"input-1.jpeg">;
-
-int angles[] = [45, 90, 120];
-
-foreach a in angles {
-    imagefile output <single_file_mapper;file=@strcat("rotated-",a,".jpeg")>;
-    output = rotate(puppy, a);
-}
-</programlisting>
-
-<programlisting>
-type imagefile;
-</programlisting>
-
-<para>
-We keep the type definition the same as in the previous program.
-</para>
-
-<programlisting>
-(imagefile output) rotate(imagefile input, int angle) {
-  app {
-    convert "-rotate" angle @input @output;
-  }
-}
-</programlisting>
-
-<para>
-This rotate procedure looks very much like the flip procedure 
-from the previous program,
-but we have added another parameter, called angle. Angle is of type 'int',
-which is a built-in SwiftScript type for integers. We use that on the
-commandline instead of a hard coded 180 degrees.
-</para>
-
-<programlisting>
-imagefile puppy <"input-1.jpeg">;
-</programlisting>
-
-<para>
-Our input image is the same as before.
-</para>
-
-<programlisting>
-int angles[] = [45, 90, 120];
-</programlisting>
-
-<para>
-Now we define an array of integers, and initialise it with three angles.
-</para>
-
-<programlisting>
-foreach a in angles {
-</programlisting>
-
-<para>
-Now we have a foreach loop. This loop will iterate over each of the elements
-in angles. In each iteration, the element will be put in the variable 'a'.
-</para>
-
-<programlisting>
-    imagefile output <single_file_mapper;file=@strcat("rotated-",a,".jpeg")>;
-</programlisting>
-
-<para>
-Inside the loop body, we have an output variable that is mapped differently
-for each iteration. We use the single_file_mapper and the @strcat function
-to construct a filename and then map that filename to our output variable.
-</para>
-
-<programlisting>
-    output = rotate(puppy, a);
-}
-</programlisting>
-
-<para>Now we invoke rotate, passing in our input image and the angle to
-use, and putting the output in the mapped output file. This will happen
-three times, with a different output filename and a different angle
-each time.
-</para>
-
-<para>
-ACTION: Put the program source into a file called in rotate.swift and
-execute it with the swift command, like we did for flipper.swift above.
-</para>
-
-<programlisting>
-$ ls rotated*
-rotated-120.jpeg rotated-45.jpeg  rotated-90.jpeg
-</programlisting>
-
-</sect1>
-
-
-<sect1><title>Third example</title>
-
-<para>
-Our third example will introduce some more concepts: complex data
-types, the comma-separated values mapper, and the transformation
-catalog.
-</para>
-
-<para>
-Here's the complete listing:
-</para>
-
-<programlisting>
-
-type imagefile;
-type pgmfile;
-
-type voxelfile;
-type headerfile;
-
-type volume {
-    voxelfile img;
-    headerfile hdr;
-};
-
-
-volume references[] <csv_mapper;file="reference.csv">;
-volume reference=references[0];
-
-(pgmfile outslice) slicer(volume input, string axis, string position)
-{
-    app {
-        slicer @input.img axis position @outslice;
-    }
-}
-
-(imagefile output) convert(pgmfile inpgm)
-{
-    app {
-        convert @inpgm @output;
-    }
-}
-
-pgmfile slice;
-
-imagefile slicejpeg <"slice.jpeg">;
-
-slice = slicer(reference, "-x", ".5");
-
-slicejpeg = convert(slice);
-
-</programlisting>
-
-<para>IMPORTANT! We need to make some changes to other files in addition
-to putting the above source into a file. Read the following notes
-carefully to find out what to change.</para>
-
-<programlisting>
-type imagefile;
-type pgmfile;
-type voxelfile;
-type headerfile;
-</programlisting>
-
-<para>
-We define some simple types - imagefile as before, as well as three new ones.
-</para>
-
-<programlisting>
-type volume {
-    voxelfile img;
-    headerfile hdr;
-};
-</programlisting>
-
-<para>
-Now we define a <firstterm>complex type</firstterm> to represent a brain scan.
-Our programs store brain data in two files - a .img file and a .hdr file.
-This complex type defines a volume type, consisting of a voxelfile and a
-headerfile.
-</para>
-
-<programlisting>
-volume references[] <csv_mapper;file="reference.csv">;
-</programlisting>
-
-<para>
-Now that we have defined a more complex type that consists of several
-elements (and hence several files on disk), we can no longer use the
-same ways of mapping. Instead, we will use a new mapper, the CSV mapper.
-This maps rows of a comma-separated value file into an array of complex
-types.</para>
-
-<para>ACTION: Make a file called reference.csv using your
-favourite text editor. This is what it should look contain (2 lines):
-</para>
-<programlisting>
-img,hdr
-Raw/reference.img,Raw/reference.hdr
-</programlisting>
-<para>Our mapped structure will be a 1 element array (because there was one
-data line in the CSV file), and that element will be mapped to two
-files: the img component will map to the file Raw/reference.img and the
-hdr component will map to Raw/reference.hdr
-</para>
-<para>
-We also need to put the Raw/reference files into the current directory
-so that swift can find them.
-</para>
-<para>ACTION REQUIRED: Type the following:
-</para>
-<programlisting>
-$ mkdir Raw
-$ cp ~benc/workflow/data/reference.* Raw/
-</programlisting>
-<para>
-Now you will have the reference files in your home directory.
-</para>
-
-<programlisting>
-volume reference=references[0];
-</programlisting>
-
-<para>
-We only want the single first element of the references array, so this line
-makes a new volume variable and extracts the first element of references.
-</para>
-
-<programlisting>
-(imagefile output) convert(pgmfile inpgm)
-{
-    app {
-        convert @inpgm @output;
-    }
-}
-</programlisting>
-
-<para>
-This procedure is like the previous flip and rotate procedures. It uses
-convert to change a file from one file format (.pgm format) to another
-format (.jpeg format)
-</para>
-
-<programlisting>
-(pgmfile outslice) slicer(volume input, string axis, string position)
-{
-    app {
-        slicer @input.img axis position @outslice;
-    }
-}
-</programlisting>
-
-<para>
-Now we define another procedure that uses a new application called 'slicer'.
-Slicer will take a slice through a supplied brain scan volume and produce
-a 2d image in PGM format.
-</para>
-
-<para>
-We must tell Swift how to run 'slicer' by modifying the
-<firstterm>transformation catalog</firstterm>. The transformation catalog
-maps logical transformation names into unix executable names.
-</para>
-
-<para>The transformation catalog is in your home directory, in a file called
-tc.data.
-There is already one entry there, for convert.</para>
-
-<programlisting>
-localhost    convert    /usr/bin/convert    INSTALLED INTEL32::LINUX null
-</programlisting>
-
-<para>ACTION REQUIRED: Open tc.data in your favourite unix text
-editor, and add a new line to configure the location of slicer. Note that
-you must use TABS and not spaces to separate the fields:</para>
-
-<programlisting>
-localhost    slicer    /afs/pdc.kth.se/home/b/benc/workflow/slicer-swift    INSTALLED INTEL32::LINUX null
-</programlisting>
-
-<para>For now, ignore all of the fields except the second and the third.
-The second field 'slicer' specifies a logical transformation name and the
-third specifies the location of an executable to perform that
-transformation.</para>
-
-<programlisting>
-pgmfile slice;
-</programlisting>
-
-<para>
-Now we define a variable which will store the sliced image. It will be
-a file on disk, but note that there is no filename mapping defined. This
-means that swift will choose a filename automatically. This is useful for
-intermediate files in a workflow.
-</para>
-
-<programlisting>
-imagefile slicejpeg <"slice.jpeg">;
-</programlisting>
-
-<para>Now we declare a variable for our output and map it to a filename.
-</para>
-
-<programlisting>
-slice = slicer(reference, "-x", ".5");
-
-slicejpeg = convert(slice);
-</programlisting>
-
-<para>
-Finally we invoke the two procedures to slice the brain volume and
-then convert that slice into a jpeg.
-</para>
-
-<para>ACTION: Place the source above into a file (for example, third.swift) and
-make the other modifications discussed above. Then run the workflow
-with the swift command, as before.</para>
-
-</sect1>
-
-<sect1><title>Running on another site</title>
-
-<para>
-So far everything has been run on the local site.
-Swift can run jobs over the grid to remote resources. It will handle the
-transfer of files to and from the remote resource, and execution of jobs
-on the remote resource.
-</para>
-
-<para>
-We will run the first flip program, but this time on a grid resource
-located in chicago.
-</para>
-
-<para>
-First clear away the output from the first program:
-</para>
-
-<programlisting>
-$ rm output.jpeg
-$ ls output.jpeg
-ls: output.jpeg: No such file or directory
-</programlisting>
-
-<para>
-Now initialise your grid proxy, to log-in to the grid.
-</para>
-
-<programlisting>
-$ grid-proxy-init
-</programlisting>
-
-<para>Now we must tell Swift about the other site. This is done through
-another catalog file, the <firstterm>site catalog</firstterm>.</para>
-
-<para>The site catalog is found in sites.xml</para>
-
-<para>Open sites.xml. There is one entry in there in XML defining the
-local site. Because this is the only site defined, all execution will
-happen locally.</para>
-
-<para>Another sites.xml is available for use, in 
-~benc/workflow/sites-iceage.xml
-</para>
-
-<para>ACTION: Copy ~benc/workflow/sites-iceage.xml to your home directory
- and look inside.  See how it differs from sites.xml.</para>
-
-<para>Now we will run the first flipper exercise again, but this time via
-Globus GRAM.</para>
-
-<para>We will use this other sites file to run the first workflow. In
-addition to telling swift about the other site in the sites file,
-we need to tell swift where to find transformations on the new site.
-</para>
-<para>ACTION: Edit the transformation catalog and add a line to tell swift where
-it can find convert. Conveniently, it is in the same path when running
-locally and through GRAM.
-Here is the line to add:
-</para>
-
-<programlisting>
-iceage  convert  /usr/bin/convert   INSTALLED   INTEL32::LINUX  null
-</programlisting>
-
-<para>Note the different between this line and the existing convert
-definition in the file. All fields are the same except for the first
-column, which is the site column. We say 'iceage' here instead of
-localhost. This matches up with the site name 'iceage' defined in
-the new site catalog, and identifies the name of the remote site.
-</para>
-
-<para>Now use the same swift command as before, but with an
-extra parameter to tell swift to use a different sites file:
-</para>
-
-<programlisting>
-$ swift -sites.file ~benc/workflow/sites-iceage.xml flipper.swift
-</programlisting>
-
-<para>
-If this runs successfully, you should now have an output.jpeg file with
-a flipped picture in it. It should look exactly the same as when run locally.
-You have used the same program to produce the same output, but used a remote
-resource to do it.
-</para>
-
-</sect1>
-
-
-
-
-<sect1><title>A bigger workflow example</title>
-
-<para>Now we'll make a bigger workflow that will execute a total of
-15 jobs.
-</para>
-
-<para>
-As before, here is the entire program listing. Afterwards, we will go through
-the listing step by step.
-</para>
-
-<programlisting>
-type voxelfile;
-type headerfile;
-
-type pgmfile;
-type imagefile;
-
-type warpfile;
-
-type volume {
-    voxelfile img;
-    headerfile hdr;
-};
-
-(warpfile warp) align_warp(volume reference, volume subject, string model, string quick) {
-    app {
-        align_warp @reference.img @subject.img @warp "-m " model quick;
-    }
-}
-
-(volume sliced) reslice(warpfile warp, volume subject)
-{
-    app {
-        reslice @warp @sliced.img;
-    }
-}
-
-(volume sliced) align_and_reslice(volume reference, volume subject, string model, string quick) {
-    warpfile warp;
-    warp = align_warp(reference, subject, model, quick);
-    sliced = reslice(warp, subject);
-}
-
-
-(volume atlas) softmean(volume sliced[])
-{
-    app {
-        softmean @atlas.img "y" "null" @filenames(sliced[*].img);
-    }
-}
-
-
-(pgmfile outslice) slicer(volume input, string axis, string position)
-{
-    app {
-        slicer @input.img axis position @outslice;
-    }
-}
-
-(imagefile outimg) convert(pgmfile inpgm)
-{
-    app {
-        convert @inpgm @outimg;
-    }
-}
-
-(imagefile outimg) slice_to_jpeg(volume inp, string axis, string position)
-{
-    pgmfile outslice;
-    outslice = slicer(inp, axis, position);
-    outimg = convert(outslice);
-}
-
-(volume s[]) all_align_reslices(volume reference, volume subjects[]) {
-
-    foreach subject, i in subjects {
-        s[i] = align_and_reslice(reference, subjects[i], "12", "-q");
-    }
-
-}
-
-
-volume references[] <csv_mapper;file="reference.csv">;
-volume reference=references[0];
-
-volume subjects[] <csv_mapper;file="subjects.csv">;
-
-volume slices[] <csv_mapper;file="slices.csv">;
-slices = all_align_reslices(reference, subjects);
-
-volume atlas <simple_mapper;prefix="atlas">;
-atlas = softmean(slices);
-
-string directions[] = [ "x", "y", "z"];
-
-foreach direction in directions {
-    imagefile o <single_file_mapper;file=@strcat("atlas-",direction,".jpeg")>;
-    string option = @strcat("-",direction);
-    o = slice_to_jpeg(atlas, option, ".5");
-}
-
-</programlisting>
-
-<para>
-As before, there are some other changes to make to the environment
-in addition to running the program. These are discussed inline below.
-</para>
-
-<programlisting>
-type voxelfile;
-type headerfile;
-
-type pgmfile;
-type imagefile;
-
-type warpfile;
-</programlisting>
-
-<para>
-We define some simple types, like in the previous programs. We add another
-one for a new kind of intermediate file - a warpfile, which will be used by
-some new applications that we will use.
-</para>
-
-<programlisting>
-
-type volume {
-    voxelfile img;
-    headerfile hdr;
-};
-</programlisting>
-
-<para>
-The same complex type as before, a volume consisting of a pair of files -
-the voxel data and the header data.
-</para>
-
-<programlisting>
-
-(warpfile warp) align_warp(volume reference, volume subject, string model, string quick) {
-    app {
-        align_warp @reference.img @subject.img @warp "-m " model quick;
-    }
-}
-
-</programlisting>
-
-<para>
-Now we define a new transformation called align_warp. We haven't used
-align_warp before, so we need to add in a transformation catalog entry
-for it. We will be adding some other transformations too, so add those
-entries now too.
-</para>
-
-<para>
-ACTION: Edit the transformation catalog (like in the third
-exercise). Add entries for the following transformations. The table
-below lists the path. You must write the appropriate syntax for
-transformation catalog entries yourself, using the existing two
-entries as examples.
-</para>
-
-<para>Here is the list of transformations to add:</para>
-<programlisting>
-align_warp (the path is /afs/pdc.kth.se/home/b/benc/workflow/app/AIR/bin/align_warp)
-reslice   (the path is /afs/pdc.kth.se/home/b/benc/workflow/app/AIR/bin/reslice)
-softmean  (the path is /afs/pdc.kth.se/home/b/benc/workflow/app/softmean-swift)
-</programlisting>
-
-<para>
-These programs come from several software packages:
-the AIR (Automated Image Registration) suite
-http://bishopw.loni.ucla.edu/AIR5/index.html and
-FSL http://www.fmrib.ox.ac.uk/fsl/fsl/intro.html
-</para>
-
-<para>Make sure you have added three entries to the transformation
-catalog, listing the above three transformations and the appropriate
-path</para>
-
-<programlisting>
-
-(volume sliced) reslice(warpfile warp, volume subject)
-{
-    app {
-        reslice @warp @sliced.img;
-    }
-}
-
-</programlisting>
-
-<para>
-This adds another transformation, called reslice. We already added the
-transformation catalog entry for this, in the previous step.
-</para>
-
-<programlisting>
-
-
-(volume sliced) align_and_reslice(volume reference, volume subject, string model, string quick) {
-    warpfile warp;
-    warp = align_warp(reference, subject, model, quick);
-    sliced = reslice(warp, subject);
-}
-
-</programlisting>
-
-<para>
-This is a new kind of procedure, called a <firstterm>compound
-procedure</firstterm>. A compound procedure does not call applications
-directly. Instead it calls other procedures, connecting them together
-with variables. This procedure above calls align_warp and then reslice.
-</para>
-
-<programlisting>
-
-(volume atlas) softmean(volume sliced[])
-{
-    app {
-        softmean @atlas.img "y" "null" @filenames(sliced[*].img);
-    }
-}
-
-</programlisting>
-
-<para>
-Yet another application procedure. Again, we added the transformation catalog
-entry for this above. Note the special @filenames ... [*] syntax.
-</para>
-
-<programlisting>
-
-(pgmfile outslice) slicer(volume input, string axis, string position)
-{
-    app {
-        slicer @input.img axis position @outslice;
-    }
-}
-
-(imagefile outimg) convert(pgmfile inpgm)
-{
-    app {
-        convert @inpgm @outimg;
-    }
-}
-
-</programlisting>
-
-<para>These are two more straightforward application transforms</para>
-
-<programlisting>
-
-(imagefile outimg) slice_to_jpeg(volume inp, string axis, string position)
-{
-    pgmfile outslice;
-    outslice = slicer(inp, axis, position);
-    outimg = convert(outslice);
-}
-
-(volume s[]) all_align_reslices(volume reference, volume subjects[]) {
-
-    foreach subject, i in subjects {
-        s[i] = align_and_reslice(reference, subjects[i], "12", "-q");
-    }
-
-}
-
-</programlisting>
-
-<para>
-slice_to_jpeg and all_align_reslices are compound procedures. They call
-other procedures, like align_and_reslice did above. Note how 
-all_align_reslices uses foreach to run the same procedure on each element
-in an array.
-</para>
-
-<programlisting>
-volume references[] <csv_mapper;file="reference.csv">;
-volume reference=references[0];
-</programlisting>
-
-<para>The same mapping we used in the previous exercise to map a pair
-of reference files into the reference variable using a complex type.
-</para>
-
-<programlisting>
-volume subjects[] <csv_mapper;file="subjects.csv">;
-</programlisting>
-
-<para>
-Now we map a number of subject images into the subjects array.
-</para>
-
-<para>ACTION REQUIRED: Copy the subjects data files into your
-working directory, like this:
-</para>
-
-<programlisting>
-$ cp ~benc/workflow/data/anatomy* Raw/
-$ ls Raw/
-anatomy1.hdr  anatomy2.hdr  anatomy3.hdr  anatomy4.hdr  reference.hdr
-anatomy1.img  anatomy2.img  anatomy3.img  anatomy4.img  reference.img
-</programlisting>
-
-<para>ACTION REQUIRED: Create a text file called subjects.csv using your
-favourite text editor. List all four image pairs. Here is an example
-of how to start:
-</para>
-
-<programlisting>
-img,hdr
-Raw/anatomy1.img,Raw/anatomy1.hdr
-Raw/anatomy2.img,Raw/anatomy2.hdr
-</programlisting>
-
-<para>
-Put the above text in students.csv and also add two new lines to list
-anatomy data sets 3 and 4.
-</para>
-
-<programlisting>
-volume slices[] <csv_mapper;file="slices.csv">;
-</programlisting>
-
-<para>
-Slices will hold intermediate volumes that have been processed by some
-of our tools. We need to map to tell swift where to put these intermediate
-files. Because we need the filenames to correspond, we cannot use
-anonymous mapping for these intermediate values like in the second
-exercise. We need to populate slices.csv, but we do not need to find
-the corresponding files. Swift will create these files as part of
-executing the workflow.
-</para>
-
-<para>ACTION REQUIRED: Create a text file called slices.csv with your
-text editor, and put the following content into it:
-</para>
-
-<programlisting>
-img,hdr
-slice1.img,slice1.hdr
-slice2.img,slice2.hdr
-slice3.img,slice3.hdr
-slice4.img,slice4.hdr
-</programlisting>
-
-<programlisting>
-slices = all_align_reslices(reference, subjects);
-
-volume atlas <simple_mapper;prefix="atlas">;
-atlas = softmean(slices);
-
-string directions[] = [ "x", "y", "z"];
-
-foreach direction in directions {
-    imagefile o <single_file_mapper;file=@strcat("atlas-",direction,".jpeg")>;
-    string option = @strcat("-",direction);
-    o = slice_to_jpeg(atlas, option, ".5");
-}
-</programlisting>
-
-<para>
-Finally we make a number of actual procedure invocations (and declare a few
-more variables). The ultimate output of our workflow comes from the o
-variable inside the foreach look. This is mapped to a different filename
-in each iteration, similar to exercise two.
-</para>
-
-<para>
-ACTION:
-Put the workflow into a file called final.swift, and 
-then run the workflow with the swift command. Then open
-the resulting files - atlas-x.jpeg, atlas-y.jpeg and atlas-z.jpeg.
-</para>
-<para>
-You should see three brain images, along three different axes.
-</para>
-
-</sect1>
-
-<para>The End</para>
-</article>
-

Deleted: branches/release-0.92/docs/tutorial.xml
===================================================================
--- branches/release-0.92/docs/tutorial.xml	2011-05-16 15:43:22 UTC (rev 4477)
+++ branches/release-0.92/docs/tutorial.xml	2011-05-16 19:29:01 UTC (rev 4478)
@@ -1,1317 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook V4.2//EN" "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" [] >
-
-<article>
-    <articleinfo>
-        <title>A Swift Tutorial</title>
-        <abstract>
-            <formalpara>
-                <para>
-This is an introductory tutorial on the use of Swift and its
-programming language SwiftScript.
-                </para>
-                <para>
-$LastChangedRevision$
-                </para>
-            </formalpara>
-        </abstract>
-    </articleinfo>
-
-<section> <title>Introduction</title>
-    <para>
-This tutorial is intended to introduce new users to the basics of Swift.
-It is structured as a series of small exercise/examples which you can
-try for yourself as you read along. After the first 'hello world'
-example, there are two tracks - the language track (which introduces
-features of the SwiftScript language) and the runtime track (which
-introduces features of the Swift runtime environment, such as
-running jobs on different sites)
-    </para>
-    <para>
-For information on getting an installation of Swift running, consult the
-<ulink url="http://www.ci.uchicago.edu/swift/guides/quickstartguide.php">Swift Quickstart Guide</ulink>,
-and return to this document when you have
-successfully run the test SwiftScript program mentioned there.
-    </para>
-    <para>
-There is also a
-<ulink url="http://www.ci.uchicago.edu/swift/guides/userguide.php">Swift User's Guide</ulink>
-which contains more detailed reference
-material on topics covered in this manual. 
-
-All of the programs included in this tutorial can be found in your Swift distribution in the examples/swift directory.
-    </para>
-</section>
-
-<section> <title>Hello World</title>
-    <para>
-The first example program,
-<filename>first.swift</filename>,
-outputs a hello world message into
-a file called <filename>hello.txt</filename>.
-    </para>
-
-<programlisting>
-type messagefile;
-
-app (messagefile t) greeting () {
-        echo "Hello, world!" stdout=@filename(t);
-}
-
-messagefile outfile <"hello.txt">;
-
-outfile = greeting();
-</programlisting>
-
-<para>We can run this program as follows:</para>
-
-<screen>
-$ <userinput>cd examples/swift/</userinput>
-$ <userinput>swift first.swift</userinput>
-Swift svn swift-r3334 (swift modified locally) cog-r2752
-
-RunID: 20100526-1925-8zjupq1b
-Progress:
-Final status:  Finished successfully:1
-$ <userinput>cat hello.txt</userinput>
-Hello, world!
-</screen>
-
-<para>The basic structure of this program is a
-<firstterm>type definition</firstterm>,
-an <firstterm>application procedure definition</firstterm>,
-a <firstterm>variable definition</firstterm> and
-then a <firstterm>call</firstterm> to the procedure:</para>
-
-
-<programlisting>
-type messagefile;
-</programlisting>
-
-<para>
-First we define a new type, called messagefile.
-In this example, we will use this messagefile
-type as the type for our output message.
-</para>
-
-<sidebar><para>All data in SwiftScript must be typed,
-whether it is stored in memory or on disk. This example defines a
-very simple type. Later on we will see more complex type examples.
-</para>
-</sidebar>
-
-
-<programlisting>
-app (messagefile t) greeting() { 
-    echo "Hello, world!" stdout=@filename(t);
-}
-</programlisting>
-
-<para>
-Next we define a procedure called greeting. This procedure will write out
-the "hello world" message to a file.
-</para>
-
-<para>
-To achieve this, it executes the unix utility 'echo' with a parameter
-"Hello, world!" and directs the standard output into the output file.
-</para>
-
-<para>
-The actual file to use is specified by the
-<firstterm>return parameter</firstterm>, t.
-</para>
-
-<programlisting>
-messagefile outfile <"hello.txt">;
-</programlisting>
-
-<para>
-Here we define a variable called outfile. The type of this variable is
-messagefile, and we specify that the contents of this variable will
-be stored on disk in a file called hello.txt
-</para>
-
-<programlisting>
-outfile = greeting();
-</programlisting>
-
-<para>
-Now we call the greeting procedure, with its output going to the
-outfile variable and therefore to hello.txt on disk.
-</para>
-
-<para>Over the following exercises, we'll extend this simple
-hello world program to demonstrate various features of Swift.</para>
-
-</section>
-
-<section><title>Language features</title>
-
-<section> <title>Parameters</title>
-
-<para>
-Procedures can have parameters. Input parameters specify inputs to the
-procedure and output parameters specify outputs. Our helloworld greeting
-procedure already uses an output parameter, t, which indicates where the
-greeting output will go. In this section, we will add an input parameter
-to the greeting function.</para>
-<para>The code changes from <filename>first.swift</filename>
-are highlighted below.</para>
-
-<programlisting>
-type messagefile;
-
-(messagefile t) greeting (string s) {
-    app {
-        echo s stdout=@filename(t);
-    }
-}
-
-messagefile outfile <"hello2.txt">;
-
-outfile = greeting("hello world");
-</programlisting>
-
-<para>We have modified the signature of the greeting procedure to indicate
-that it takes a single parameter, s, of type 'string'.</para>
-<para>We have modified the invocation of the 'echo' utility so that it
-takes the value of s as a parameter, instead of the string literal
-"Hello, world!".</para>
-<para>We have modified the output file definition to point to a different
-file on disk.</para>
-<para>We have modified the invocation of greeting so that a greeting
-string is supplied.</para>
-
-<para>The code for this section can be found in 
-<filename>parameter.swift</filename>. It can be
-invoked using the swift command, with output appearing in 
-<filename>hello2.txt</filename>:</para>
-
-<screen>
-$ <userinput>swift parameter.swift</userinput>
-</screen>
-
-<para>Now that we can choose our greeting text, we can call the same
-procedure with different parameters to generate several output files with
-different greetings. The code is in manyparam.swift and can be run as before
-using the swift command.
-</para>
-
-<programlisting>
-type messagefile;
-
-(messagefile t) greeting (string s) {
-    app {
-        echo s stdout=@filename(t);
-    }
-}
-
-messagefile english <"english.txt">;
-messagefile french <"francais.txt">;
-english = greeting("hello");
-french = greeting("bonjour");
-
-messagefile japanese <"nihongo.txt">;
-japanese = greeting("konnichiwa");
-</programlisting>
-
-<para>Note that we can intermingle definitions of variables with invocations
-of procedures.</para>
-<para>When this program has been run, there should be three new files in the
-working directory (english.txt, francais.txt and nihongo.txt) each containing
-a greeting in a different language.</para>
-
-<para>In addition to specifying parameters positionally, parameters can
-be named, and if desired a default value can be specified - see 
-<link linkend="tutorial.named-parameters">Named and optional
-parameters</link>.</para>
-</section>
-<section><title>Adding another application</title>
-<para>
-Now we'll define a new application procedure. The procedure we define
-will capitalise all the words in the input file.
-</para>
-
-<para>To do this, we'll use the unix 'tr' (translate) utility.
-
-Here is an example of using <command>tr</command> on the unix
-command line, not using Swift:</para>
-
-<screen>
-$ <userinput>echo hello | tr '[a-z]' '[A-Z]'</userinput>
-HELLO
-</screen>
-
-<para>
-There are several steps:
-<itemizedlist>
-<listitem><para>transformation catalog</para></listitem>
-<listitem><para>application block</para></listitem>
-</itemizedlist>
-</para>
-
-<para>First we need to modify the
-<firstterm>transformation catalog</firstterm> to define
-a logical transformation for the tc utility.  The transformation
-catalog can be found in <filename>etc/tc.data</filename>.
-There are already several entries specifying where programs can
-be found. Add a new line to the file, specifying where 
-<command>tr</command> can be found
-(usually in <filename>/usr/bin/tr</filename>
-but it may differ on your system), like this:
-</para>
-
-<programlisting>
-localhost       tr      /usr/bin/tr     INSTALLED       INTEL32::LINUX  null
-</programlisting>
-
-<para>For now, ignore all of the fields except the second and the third.
-The second field 'tr' specifies a logical application name and the
-third specifies the location of the application executable.
-</para>
-
-<para>Now that we have defined where to find <command>tr</command>, we can
-use it in SwiftScript.
-</para>
-
-<para>
-We can define a new procedure, <command>capitalise</command> which calls
-tr.
-<programlisting>
-(messagefile o) capitalise(messagefile i) {   
-    app {
-        tr "[a-z]" "[A-Z]" stdin=@filename(i) stdout=@filename(o);
-    }
-}
-</programlisting>
-</para>
-
-
-<para>
-We can call capitalise like this:
-<programlisting>
-messagefile final <"capitals.txt">;
-final = capitalise(hellofile);
-</programlisting>
-</para>
-
-<para>
-So a full program based on the first exercise might look like this:
-
-<programlisting>
-type messagefile {} 
-
-(messagefile t) greeting (string s) {   
-    app {
-        echo s stdout=@filename(t);
-    }
-}
-
-(messagefile o) capitalise(messagefile i) {   
-    app {
-        tr "[a-z]" "[A-Z]" stdin=@filename(i) stdout=@filename(o);
-    }
-}
-
-messagefile hellofile <"hello.txt">;
-messagefile final <"capitals.txt">;
-
-hellofile = greeting("hello from Swift");
-final = capitalise(hellofile);
-</programlisting>
-</para>
-
-<para>We can use the swift command to run this:
-
-<screen>
-$ <userinput>swift second_procedure.swift</userinput>
-[...]
-$ <userinput>cat capitals.txt</userinput>
-HELLO FROM SWIFT
-</screen>
-</para>
-
-</section>
-
-<section><title>Anonymous files</title>
-
-<para>In the previous section, the file 
-<filename>greeting.txt</filename> is used only to
-store an intermediate result. We don't really care about which name is used
-for the file, and we can let Swift choose the name.</para>
-
-<para>To do that, omit the mapping entirely when declaring outfile:
-<programlisting>
-messagefile outfile;
-</programlisting>
-</para>
-
-<para>
-Swift will choose a filename, which in the present version will be
-in a subdirectory called <filename>_concurrent</filename>.
-</para>
-
-</section>
-
-<section><title>Datatypes</title>
-
-<para>
-All data in variables and files has a data type.  So
-far, we've seen two types:
-
-<itemizedlist>
-<listitem>string - this is a built-in type for storing strings of text in
-memory, much like in other programming languages</listitem>
-<listitem>messagefile - this is a user-defined type used to mark
-files as containing messages</listitem>
-</itemizedlist>
-</para>
-
-<para>
-SwiftScript has the additional built-in types:
-<firstterm>boolean</firstterm>, <firstterm>integer</firstterm> and
-<firstterm>float</firstterm> that function much like their counterparts
-in other programming languages.
-</para>
-
-<para>It is also possible to create user defined types with more
-structure, for example:
-<programlisting>
-type details {
-    string name;
-    int pies;
-}
-</programlisting>
-Each element of the structured type can be accessed using a . like this:
-<programlisting>
-person.name = "john";
-</programlisting>
-</para>
-
-<para>
-The following complete program, types.swift, outputs a greeting using a user-defined
-structure type to hold parameters for the message:
-
-<programlisting>
-type messagefile {} 
-
-type details {
-    string name;
-    int pies;
-}
-
-(messagefile t) greeting (details d) {   
-    app {
-        echo "Hello. Your name is" d.name "and you have eaten" d.pies "pies." stdout=@filename(t);
-    }
-}
-
-details person;
-
-person.name = "John";
-person.pies = 3;
-
-messagefile outfile <"q15.txt>";
-
-outfile = greeting(person);
-</programlisting>
-</para>
-
-<para>
-Structured types can be comprised of marker types for files. See the later
-section on mappers for more information about this.
-</para>
-
-</section>
-
-<section><title>Arrays</title>
-<para>We can define arrays using the [] suffix in a variable declaration:
-<programlisting>
-messagefile m[];
-</programlisting>
-This program, q5.swift, will declare an array of message files.
-</para>
-
-<programlisting>
-type messagefile;
-
-(messagefile t) greeting (string s[]) {
-    app {
-        echo s[0] s[1] s[2] stdout=@filename(t);
-    }
-}
-
-messagefile outfile <"q5out.txt">;
-
-string words[] = ["how","are","you"];
-
-outfile = greeting(words);
-
-</programlisting>
-
-<para>Observe that the type of the parameter to greeting is now an
-array of strings, 'string s[]', instead of a single string, 'string s',
-that elements of the array can be referenced numerically, for example
-s[0], and that the array is initialised using an array literal,
-["how","are","you"].</para>
-
-</section>
-
-<section><title>Mappers</title>
-
-<para>A significant difference between SwiftScript and other languages is
-that data can be referred to on disk through variables in a very
-similar fashion to data in memory.  For example, in the above
-examples we have seen a variable definition like this:</para>
-
-<programlisting>
-messagefile outfile <"q13greeting.txt">;
-</programlisting>
-
-<para>This means that 'outfile' is a dataset variable, which is
-mapped to a file on disk called 'g13greeting.txt'. This variable
-can be assigned to using = in a similar fashion to an in-memory
-variable.  We can say that 'outfile' is mapped onto the disk file
-'q13greeting.txt' by a <firstterm>mapper</firstterm>.
-</para>
-
-<para>There are various ways of mapping in SwiftScript. Two forms have already
-been seen in this tutorial. Later exercises will introduce more forms.
-</para>
-
-<para>The two forms of mapping seen so far are:</para>
-
-<itemizedlist>
-<para>
-simple named mapping - the name of the file that a variable is
-mapped to is explictly listed. Like this:
-<programlisting>
-messagefile outfile <"greeting.txt">;
-</programlisting>
-
-This is useful when you want to explicitly name input and output
-files for your program. For example, 'outfile' in exercise HELLOWORLD.
-
-</para>
-
-<para>
-anonymous mapping - no name is specified in the source code.
-A name is automatically generated for the file. This is useful
-for intermediate files that are only referenced through SwiftScript,
-such as 'outfile' in exercise ANONYMOUSFILE. A variable declaration
-is mapped anonymously by ommitting any mapper definition, like this:
-
-<programlisting>
-messagefile outfile;
-</programlisting>
-
-</para>
-
-</itemizedlist>
-
-<para>Later exercises will introduce other ways of mapping from
-disk files to SwiftScript variables.</para>
-
-<para>TODO: introduce @v syntax.</para>
-
-<section><title>The regexp mapper</title>
-<para>In this exercise, we introduce the <firstterm>regexp mapper</firstterm>.
-This mapper transforms a string expression using a regular expression,
-and uses the result of that transformation as the filename to map.</para>
-<para>
-<filename>regexp.swift</filename> demonstrates the use of this by placing output into a file that
-is based on the name of the input file: our input file is mapped
-to the inputfile variable using the simple named mapper, and then
-we use the regular expression mapper to map the output file. Then we
-use the countwords() procedure to count the works in the input file and 
-store the result in the output file. In order for the countwords() procedure 
-to work correctly, add the wc utility (usually found in /usr/bin/wc) to tc.data.
-</para>
-
-<para>
-The important bit of <filename>regexp.swift</filename> is:
-<programlisting>
-messagefile inputfile <"q16.txt">;
-
-countfile c <regexp_mapper;
-             source=@inputfile,
-             match="(.*)txt",
-             transform="\\1count"
-            >;
-</programlisting>
-</para>
-</section>
-
-<section><title>fixed_array_mapper</title>
-<para>
-The <firstterm>fixed array mapper</firstterm> maps a list of files into
-an array - each element of the array is mapped into one file in the
-specified directory. See <filename>fixedarray.swift</filename>.
-</para>
-<programlisting>
-string inputNames = "one.txt two.txt three.txt";
-string outputNames = "one.count two.count three.count";
-
-messagefile inputfiles[] <fixed_array_mapper; files=inputNames>;
-countfile outputfiles[] <fixed_array_mapper; files=outputNames>;
-
-outputfiles[0] = countwords(inputfiles[0]);
-outputfiles[1] = countwords(inputfiles[1]);
-outputfiles[2] = countwords(inputfiles[2]);
-</programlisting>
-</section>
-
-</section>
-
-<section><title>foreach</title>
-<para>SwiftScript provides a control structure, foreach, to operate
-on each element of an array.</para>
-<para>In this example, we will run the previous word counting example
-over each file in an array without having to explicitly list the
-array elements. The source code for this example is in 
-<filename>foreach.swift</filename>. The three input files
-(<filename>one.txt</filename>, <filename>two.txt</filename> and
-<filename>three.txt</filename>) are supplied. After
-you have run the workflow, you should see that there are three output
-files (<filename>one.count</filename>, <filename>two.count</filename>
-and <filename>three.count</filename>) each containing the word
-count for the corresponding input file. We combine the use of the
-fixed_array_mapper and the regexp_mapper.</para>
-<programlisting>
-string inputNames = "one.txt two.txt three.txt";
-
-messagefile inputfiles[] <fixed_array_mapper; files=inputNames>;
-
-
-foreach f in inputfiles {
-  countfile c <regexp_mapper;
-               source=@f,
-               match="(.*)txt",
-               transform="\\1count">;
-  c = countwords(f);
-}
-</programlisting>
-
-</section>
-
-<section><title>If</title>
-<para>
-Decisions can be made using 'if', like this:
-</para>
-
-<programlisting>
-if(morning) {
-  outfile = greeting("good morning");
-} else {
-  outfile = greeting("good afternoon");
-}
-</programlisting>
-
-<para>
- <filename>if.swift</filename> contains a simple example of 
-this. Compile and run <filename>if.swift</filename> and see that it
-outputs 'good morning'. Changing the 'morning'
-variable from true to false will cause the program to output 'good
-afternoon'.
-</para>
-</section>
-
-<section id="tutorial.iterate"><title>Sequential iteration</title>
-<para>A development version of Swift after 0.2 (revision 1230) introduces
-a sequential iteration construct.</para>
-<para>
-The following example demonstrates a simple application: each step of the
-iteration is a string representation of the byte count of the previous
-step's output, with iteration terminating when the byte count reaches zero.
-</para>
-
-<para>
-Here's the program:
-</para>
-<programlisting>
-
-type counterfile;
-
-(counterfile t) echo(string m) { 
-  app {
-    echo m stdout=@filename(t);
-  }
-}
-
-(counterfile t) countstep(counterfile i) {
-  app {
-    wcl @filename(i) @filename(t);
-  }
-}
-
-counterfile a[]  <simple_mapper;prefix="foldout">;
-
-a[0] = echo("793578934574893");
-
-iterate v {
-  a[v+1] = countstep(a[v]);
- trace("extract int value ", at extractint(a[v+1]));
-} until (@extractint(a[v+1]) <= 1);
-
-</programlisting>
-
-<para>
-<command>echo</command> is the standard unix echo.</para>
-
-<para> <command>wcl</command>
-is our application code - it counts the number of bytes in the one file
-and writes that count out to another, like this:
-</para>
-<screen>
-$ <userinput>cat ../wcl</userinput>
-#!/bin/bash
-echo -n $(wc -c < $1) > $2
-
-$ <userinput>echo -n hello > a</userinput>
-$ <userinput>wcl a b</userinput>
-$ <userinput>cat b</userinput>
-5
-</screen>
-
-<para>Install the above wcl script somewhere and add a transformation catalog
-entry for it. Then run the example program like this:
-</para>
-
-<screen>
-$ <userinput>swift iterate.swift</userinput>
-Swift svn swift-r3334 cog-r2752
-
-RunID: 20100526-2259-gtlz8zf4
-Progress:
-SwiftScript trace: extract int value , 16.0
-SwiftScript trace: extract int value , 2.0
-SwiftScript trace: extract int value , 1.0
-Final status:  Finished successfully:4
-
-$ <userinput>ls foldout*</userinput>
-foldout0000  foldout0001  foldout0002  foldout0003
-</screen>
-
-</section>
-
-</section>
-<section><title>Runtime features</title>
-
-<section><title>Visualising the workflow as a graph</title>
-
-<para>
-When running a workflow, its possible to generate a provenance graph at the
-same time:
-<screen>
-$ <userinput>swift -pgraph graph.dot first.swift</userinput>
-$ <userinput>dot -ograph.png -Tpng graph.dot</userinput>
-</screen>
-graph.png can then be viewed using your favourite image viewer.
-Dot is a utility for drawing directed graphs. It is a part of the graphViz package and must be installed separately. More information about graphViz can be found at http://www.graphviz.org.
-</para>
-</section>
-
-<section><title>Running on a remote site</title>
-
-<para>As configured by default, all jobs are run locally. In the previous
-examples, we've invoked 'echo' and 'tr' executables from our
-SwiftScript program. These have been run on the local system
-(the same computer on which you ran 'swift'). We can also make our
-computations run on a remote resource.</para>
-
-<para>WARNING: This example is necessarily more vague than previous examples,
-because its requires access to remote resources. You should ensure that you
-can submit a job using the globus-job-run (or globusrun-ws?) command(s).
-</para>
-
-<para>We do not need to modify any SwiftScript code to run on another resource.
-Instead, we must modify another catalog, the 'site catalog'. This catalog
-provides details of the location that applications will be run, with the
-default settings referring to the local machine. We will modify it to
-refer to a remote resource - the UC Teraport cluster. If you are not a
-UC Teraport user, you should use details of a different resource that
-you do have access to.
-</para>
-
-<para>The site catalog is located in etc/sites.xml and is a relatively
-straightforward XML format file. We must modify each of the following
-three settings: gridftp (which indicates how and where data can be
-transferred to the remote resource), jobmanager (which indicates how
-applications can be run on the remote resource) and workdirectory
-(which indicates where working storage can be found on the
-remote resource).</para>
-</section>
-
-<section><title>Writing a mapper</title>
-
-<para>
-This section will introduce writing a custom mapper so that Swift is able
-to access data files laid out in application-specific ways.
-</para>
-<para>
-An application-specific mapper must take the form of a Java class
-that implements the 
-<ulink url="http://www.ci.uchicago.edu/swift/javadoc/vdsk/org/griphyn/vdl/mapping/Mapper.html">Mapper</ulink> interface.
-</para>
-<para>
-Usually you don't need to implement this interface directly, because
-Swift provides a number of more concrete classes with some functionality
-already implemented.
-</para>
-<para>The hierarchy of helper classes is:</para>
-
-<para>
-<ulink url="http://www.ci.uchicago.edu/swift/javadoc/vdsk/org/griphyn/vdl/mapping/Mapper.html">Mapper</ulink>
-- This is the abstract interface for mappers in Swift. You
-must implement methods to provide access to mapper properties, to map
-from a SwiftScript dataset path (such as foo[1].bar) to a file name,
-to check whether a file exists. None of the default Swift mappers
-implement this interface directly - instead they use one of the
-following helper classes.</para>
-
-<para>
-<ulink url="http://www.ci.uchicago.edu/swift/javadoc/vdsk/org/griphyn/vdl/mapping/AbstractMapper.html">AbstractMapper</ulink>
-- This provides helper methods to manage mapper 
-properties and to handle existance checking. Examples of mappers which
-use this class are:
-<ulink url="http://www.ci.uchicago.edu/swift/guides/userguide.php#mapper.array_mapper">array_mapper</ulink>,
-<ulink url="http://www.ci.uchicago.edu/swift/guides/userguide.php#mapper.csv_mapper">csv_mapper</ulink>,
-<ulink url="http://www.ci.uchicago.edu/swift/guides/userguide.php#mapper.fixed_array_mapper">fixed_array_mapper</ulink>,
-<ulink url="http://www.ci.uchicago.edu/swift/guides/userguide.php#mapper.regexp_mapper">regexp_mapper</ulink> and
-<ulink url="http://www.ci.uchicago.edu/swift/guides/userguide.php#mapper.single_file_mapper">single file mapper</ulink>
-.</para>
-
-<para>
-<ulink url="http://www.ci.uchicago.edu/swift/javadoc/vdsk/org/griphyn/vdl/mapping/file/AbstractFileMapper.html">AbstractFileMapper</ulink>
- - This provides a helper class for mappers
-which select files based on selecting files from a directory listing.
-It is necessary to write some helper methods that are different from
-the above mapper methods. Examples of mappers which use this class
-are:
-<ulink url="http://www.ci.uchicago.edu/swift/guides/userguide.php#mapper.simple_mapper">simple_mapper</ulink>,
-<ulink url="http://www.ci.uchicago.edu/swift/guides/userguide.php#mapper.filesys_mapper">filesys_mapper</ulink> and the (undocumented)
-StructuredRegularExpressionMapper.
-</para>
-
-<para>
-In general, to write a mapper, choose either the AbstractMapper or the
-AbstractFileMapper and extend those. If your mapper will generally
-select the files it returns based on a directory listing and will
-convert paths to filenames using some regular conversion (for example, in
-the way that simple_mapper maps files in a directory that match a
-particular pattern), then you should probably use the AbstractFileMapper.
-If your mapper will produce a list of files in some other way (for example,
-in the way that csv_mapper maps based on filenames given in a CSV
-file rather than looking at which files are in a directory), then you
-should probably use the AbstractMapper.
-</para>
-
-<section><title>Writing a very basic mapper</title>
-<para>In this section, we will write a very basic (almost useless)
-mapper that will map a SwiftScript dataset into a hardcoded file
-called <filename>myfile.txt</filename>, like this:
-
-</para>
-<screen>
-
-    Swift variable                            Filename
-
-      var   <----------------------------->    myfile.txt
-
-</screen>
-
-<para>
-We should be able to use the mapper we write in a SwiftScript program
-like this:
-</para>
-
-<programlisting>
-type file;
-
-file f <my_first_mapper>;
-</programlisting>
-
-<para>
-First we must choose a base class - AbstractMapper or AbstractFileMapper.
-We aren't going to use a directory listing to decide on our mapping
-- we are getting the mapping from some other source (in fact, it
-will be hard coded). So we will use AbstractMapper.
-</para>
-
-<para>
-So now onto the source code. We must define a subclass of
-<ulink href="http://www.ci.uchicago.edu/swift/javadoc/vdsk/org/griphyn/vdl/mapping/AbstractMapper.html">AbstractMapper</ulink> and implement several
-mapper methods: isStatic, existing, and map. These methods are documented
-in the javadoc for the
-<ulink href="http://www.ci.uchicago.edu/swift/javadoc/vdsk/org/griphyn/vdl/mapping/Mapper.html">Mapper</ulink>
-interface.
-</para>
-
-<para>
-Here is the code implementing this mapper. Put this in your source 
-<filename>vdsk</filename> directory, make a directory
-<filename>src/tutorial/</filename> and put this
-file in <filename>src/tutorial/MyFirstMapper.java</filename>
-</para>
-<programlisting>
-package tutorial;
-
-import java.util.Arrays;
-import java.util.Collection;
-import java.util.Collections;
-
-import org.griphyn.vdl.mapping.AbsFile;
-import org.griphyn.vdl.mapping.AbstractMapper;
-import org.griphyn.vdl.mapping.Path;
-import org.griphyn.vdl.mapping.PhysicalFormat;
-
-public class MyFirstMapper extends AbstractMapper {
-
-  AbsFile myfile = new AbsFile("myfile.txt");
-
-  public boolean isStatic() {
-    return false;
-  }
-
-  public Collection existing() {
-    if (myfile.exists())
-      return Arrays.asList(new Path[] {Path.EMPTY_PATH});
-    else
-      return Collections.EMPTY_LIST;
-  }
-
-  public PhysicalFormat map(Path p) {
-    if(p.equals(Path.EMPTY_PATH))
-      return myfile;
-    else
-      return null;
-  }
-}
-
-</programlisting>
-
-<para>Now we need to inform the Swift engine about the existence of this
-mapper. We do that by editing the MapperFactory class definition, in
-<filename>src/org/griphyn/vdl/mapping/MapperFactory.java</filename> and
-adding a registerMapper call alongside the existing registerMapper calls,
-like this:
-</para>
-<programlisting>
-registerMapper("my_first_mapper", tutorial.MyFirstMapper.class);
-</programlisting>
-
-<para>The first parameter is the name of the mapper that will be used
-in SwiftScript program. The second parameter is the new Mapper class
-that we just wrote.
-</para>
-
-<para>
-Now rebuild Swift using the 'ant redist' target.
-</para>
-
-<para>
-This new Swift build will be aware of your new mapper. We can test it out
-with a hello world program:
-</para>
-<programlisting>
-type messagefile;
-
-(messagefile t) greeting() {
-    app {
-        echo "hello" stdout=@filename(t);
-    }
-}
-
-messagefile outfile <my_first_mapper>;
-
-outfile = greeting();
-</programlisting>
-<para>Run this program, and hopefully you will find the "hello" string has
-been output into the hard coded output file <filename>myfile.txt</filename>:
-</para>
-<screen>
-$ <userinput>cat myfile.txt</userinput>
-hello
-</screen>
-
-<para>So that's a first very simple mapper implemented. Compare the
-source code to the single_file_mapper in
-<ulink url="http://www.ci.uchicago.edu/trac/swift/browser/trunk/src/org/griphyn/vdl/mapping/file/SingleFileMapper.java">SingleFileMapper.java</ulink>. There is
-not much more code to the single_file_mapper - mostly code to deal
-with the file parameter.
-</para>
-
-</section>
-
-<!--
-<section><title>a mapper implementing the AbstractMapper class</title>
-<para>
-In this section, we'll develop a mapper that maps SwiftScript structures
-to a set of files, with a basename specified as a mapper parameter and
-an extension corresponding to the SwiftScript structure name.
-</para>
-<para>
-To illustrate, consider the following code fragment.
-</para>
-
-<programlisting>
-struct volume {
-    file img;
-    file hdr;
-}
-
-volume v <tutorial_extension_mapper; basename="subject1">;
-</programlisting>
-
-<para>
-Our mapper will map the volume variable v into two files, called
-"subject1.img" and "subject1.hdr" corresponding to the two elements in
-the structure.
-</para>
-
-<para>
-We can implement the mapper in several steps. First we must write a Java
-class implementing the mapper. Then we must tell Swift about this new mapper.
-</para>
-
-<para>
-Here is the Java source code for our mapper. Read it and then we will go
-through the various pieces.
-</para>
-
-<programlisting>
-
-package org.griphyn.vdl.mapping.file;
-
-import java.io.File;
-
-import java.util.ArrayList;
-import java.util.Collection;
-
-import org.griphyn.vdl.mapping.AbstractMapper;
-import org.griphyn.vdl.mapping.MappingParam;
-import org.griphyn.vdl.mapping.Path;
-
-public class TutorialExtensionMapper extends AbstractMapper
-{
-
-    /** defines the single parameter to this mapper */
-    public static final MappingParam PARAM_BASE = new MappingParam("base");
-
-    public String map(Path p) {
-        String basename = PARAM_BASE.getStringValue(this);
-        String filename = basename + "." + p;
-        System.err.println("this is the tutorial mapper, mapping path "+p+" to filename "+filename);
-        return filename;
-    }
-
-    public boolean exists(Path p) {
-        String fn = map(p);
-        File f = new File(fn);
-        return f.exists();
-    }
-
-    public Collection existing() {
-        return new ArrayList();
-    }
-
-
-    public boolean isStatic() {
-        return true;
-    }
-}
-
-
-</programlisting>
-
-<para>
-Our mapper class, TutorialExtensionMapper, must implement the Mapper interface.
-We do this by extending the AbstractMapper class, which provides
-implementations for some of the Mapper methods
-whilst leaving others to be implemented by the specific
-mapper.
-</para>
-
-<para>
-There is helper code to handle most tasks related to mapper parameters.
-We need to create a MapperParam object for our single parameter, 'base'.
-If we wanted more than one parameter, we would create a MapperParam
-object for each parameter. We can then use the mapper parameter object
-later on in the code to retrieve the value of the parameter as specified
-in the SwiftScript mapper description.
-</para>
-
-<para>
-The core mapping functionality appears in the map method. This method will
-be called by the Swift engine whenever it needs to translate a SwiftScript
-variable into a filename. It is passed a Path object, which represents
-the path from the top level to the part that we want to filename for.
-So when swift wants the filename for v.img in the above
-usage example, it will ask the mapper for v to map the path "img" to a
-filename. In our example, the mapping consists of taking the 'base'
-parameter and combining it with the path to give a filename.
-</para>
-
-<para>
-A few other methods need to be implemented.
-</para>
-
-<para> exists() takes a Path
-and returns true if the file backing that path exists. It is implemented
-in a straightforward fashion.
-</para>
-
-<para>existing() returns a Collection of Paths, listing the paths that
-already exist. In the tutorial, we cheat and return an empty collection.
-</para>
-
-<para>isStatic() identifies whether mappings from this mapper are
-fixed at creation time, or can change over time. For most simple
-mappers, it should return true. There is more discussion in the
-Mapper class javadoc.
-</para>
-
-<para>
-We place the above java source file into the Swift source tree
-in a file called src/org/griphyn/vdl/mapping/file/TutorialExtensionMapper.java
-</para>
-
-<para>Now we need to inform the Swift engine about the existence of this
-mapper. We do that by editing the MapperFactory class definition, in
-src/org/griphyn/vdl/mapping/MapperFactory.java and adding a
-registerMapper call alongside the existing registerMapper calls, like this:
-</para>
-<programlisting>
-registerMapper("tutorial_extension_mapper",
-org.griphyn.vdl.mapping.file.TutorialExtensionMapper.class);
-</programlisting>
-
-<para>The first parameter is the name of the mapper that will be used
-in SwiftScript mapper declarations. The second parameter is the class
-that we defined above that implements the mapping behaviour.
-</para>
-
-<para>
-With the mapper source file in place, and MapperFactory updated, we can
-recompile Swift (see instructions elsewhere on the Swift web site) and
-then write SwiftScript programs which use our new mapper.
-</para>
-
-</section>
--->
-</section>
-
-<section> <title>Starting and restarting</title>
-
-<para>
-Now we're going to try out the restart capabilities of Swift. We will make
-a workflow that will deliberately fail, and then we will fix the problem
-so that Swift can continue with the workflow.
-</para>
-
-<para>
-First we have the program in working form, restart.swift.
-</para>
-
-<programlisting>
-type file;
-
-(file f) touch() {
-  app {
-    touch @f;
-  }
-}
-
-(file f) processL(file inp) {
-  app {
-    echo "processL" stdout=@f;
-  }
-}
-
-(file f) processR(file inp) {
-  app {
-    broken "process" stdout=@f;
-  }
-}
-
-(file f) join(file left, file right) {
-  app { 
-    echo "join" @left @right stdout=@f;
-  } 
-}
-
-file f = touch();
-
-file g = processL(f);
-file h = processR(f);
-
-file i = join(g,h);
-</programlisting>
-
-<para>
-We must define some transformation catalog entries:
-</para>
-
-<programlisting>
-localhost	touch	/usr/bin/touch	INSTALLED	INTEL32::LINUX	null
-localhost	broken	/bin/true	INSTALLED	INTEL32::LINUX	null
-</programlisting>
-
-<para>
-Now we can run the program:
-</para>
-
-<programlisting>
-$ swift restart.swift  
-Swift 0.9 swift-r2860 cog-r2388
-
-RunID: 20100526-1119-3kgzzi15
-Progress:
-Final status:  Finished successfully:4
-</programlisting>
-
-<para>
-Four jobs run - touch, echo, broken and a final echo. (note that broken
-isn't actually broken yet).
-</para>
-
-<para>
-Now we will break the 'broken' job and see what happens. Replace the
-definition in tc.data for 'broken' with this:
-</para>
-<programlisting>
-localhost    broken     /bin/false   INSTALLED       INTEL32::LINUX  null
-</programlisting>
-
-<para>Now when we run the workflow, the broken task fails:</para>
-
-<programlisting>
-$ swift restart.swift 
-
-Swift 0.9 swift-r2860 cog-r2388
-
-RunID: 20100526-1121-tssdcljg
-Progress:
-Progress:  Stage in:1  Finished successfully:2
-Execution failed:
-	Exception in broken:
-Arguments: [process]
-Host: localhost
-Directory: restart-20100526-1121-tssdcljg/jobs/1/broken-1i6ufisj
-stderr.txt: 
-stdout.txt: 
-
-</programlisting>
-
-<para>From the output we can see that touch and the first echo completed,
-but then broken failed and so swift did not attempt to execute the
-final echo.</para>
-
-<para>There will be a restart log with the same name as the RunID:
-</para>
-
-<programlisting>
-$ ls *20100526-1121-tssdcljg*rlog
-restart-20100526-1121-tssdcljg.0.rlog
-</programlisting>
-
-<para>This restart log contains enough information for swift to know
-which parts of the workflow were executed successfully.</para>
-
-<para>We can try to rerun it immediately, like this:</para>
-
-<programlisting>
-$ swift -resume restart-20100526-1121-tssdcljg.0.rlog restart.swift 
-
-Swift 0.9 swift-r2860 cog-r2388
-
-RunID: 20100526-1125-7yx0zi6d
-Progress:
-Execution failed:
-	Exception in broken:
-Arguments: [process]
-Host: localhost
-Directory: restart-20100526-1125-7yx0zi6d/jobs/m/broken-msn1gisj
-stderr.txt: 
-stdout.txt: 
-
-----
-
-Caused by:
-	Exit code 1
-
-</programlisting>
-
-<para>
-Swift tried to resume the workflow by executing 'broken' again. It did not
-try to run the touch or first echo jobs, because the restart log says that
-they do not need to be executed again.
-</para>
-
-<para>Broken failed again, leaving the original restart log in place.</para>
-
-<para>Now we will fix the problem with 'broken' by restoring the original
-tc.data line that works.</para>
-
-<para>Remove the existing 'broken' line and replace it with the successful
-tc.data entry above:
-</para>
-
-<programlisting>
-localhost       broken          /bin/true   INSTALLED       INTEL32::LINUX  null
-</programlisting>
-
-<para>
-Now run again:
-</para>
-
-<programlisting>
-$ swift -resume restart-20100526-1121-tssdcljg.0.rlog restart.swift
-
-wift 0.9 swift-r2860 cog-r2388
-
-RunID: 20100526-1128-a2gfuxhg
-Progress:
-Final status:  Initializing:2  Finished successfully:2
-</programlisting>
-
-<para>Swift tries to run 'broken' again. This time it works, and so
-Swift continues on to execute the final piece of the workflow as if
-nothing had ever gone wrong.
-</para>
-
-</section>
-</section>
-
-<section><title>bits</title>
-<section id="tutorial.named-parameters"><title>Named and optional parameters</title>
-<para>In addition to specifying parameters positionally, parameters can
-be named, and if desired a default value can be specified:
-
-<programlisting>
-(messagefile t) greeting (string s="hello") {
-    app {
-        echo s stdout=@filename(t);
-    }
-}
-</programlisting>
-
-When we invoke the procedure, we can specify values for the parameters
-by name. The following code can be found in q21.swift.
-
-<programlisting>
-french = greeting(s="bonjour");
-</programlisting>
-
-or we can let the default value apply:
-
-<programlisting>
-english = greeting();
-</programlisting>
-
-</para>
-</section>
-</section>
-</article>
-

Deleted: branches/release-0.92/docs/type-hierarchy.fig
===================================================================
--- branches/release-0.92/docs/type-hierarchy.fig	2011-05-16 15:43:22 UTC (rev 4477)
+++ branches/release-0.92/docs/type-hierarchy.fig	2011-05-16 19:29:01 UTC (rev 4478)
@@ -1,29 +0,0 @@
-#FIG 3.2  Produced by xfig version 3.2.5
-Landscape
-Center
-Inches
-Letter  
-100.00
-Single
--2
-1200 2
-2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
-	 4950 1350 4050 1800
-2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
-	 5325 1350 6225 1800
-2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
-	 3525 2100 3150 2850
-2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
-	 3975 2100 4275 2775
-2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
-	 6525 2175 6225 2775
-2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
-	 6900 2175 7125 2775
-4 0 0 50 -1 0 12 0.0000 4 195 960 4575 1200 Swift types\001
-4 0 0 50 -1 0 12 0.0000 4 195 1275 2475 3075 Primitive types\001
-4 0 0 50 -1 0 12 0.0000 4 195 615 2475 3330 (eg int)\001
-4 0 0 50 -1 0 12 0.0000 4 195 1170 3975 3075 Marker types\001
-4 0 0 50 -1 0 12 0.0000 4 195 615 5925 3000 Arrays\001
-4 0 0 50 -1 0 12 0.0000 4 150 900 6900 3000 Structures\001
-4 0 0 50 -1 0 12 0.0000 4 195 1440 6225 2100 Composite types\001
-4 0 0 50 -1 0 12 0.0000 4 195 1155 3300 2025 Atomic types\001

Deleted: branches/release-0.92/docs/type-hierarchy.png
===================================================================
(Binary files differ)

Deleted: branches/release-0.92/docs/userguide-rotated.jpeg
===================================================================
(Binary files differ)

Deleted: branches/release-0.92/docs/userguide-shane.jpeg
===================================================================
(Binary files differ)

Deleted: branches/release-0.92/docs/userguide.xml
===================================================================
--- branches/release-0.92/docs/userguide.xml	2011-05-16 15:43:22 UTC (rev 4477)
+++ branches/release-0.92/docs/userguide.xml	2011-05-16 19:29:01 UTC (rev 4478)
@@ -1,4337 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-
-<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook V4.2//EN" "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" [] >
-
-<article>
-	<articleinfo revision="0.1">
-		<title>Swift User Guide</title>
-		<subtitle>Source control $LastChangedRevision$</subtitle>
-	</articleinfo>
-
-	<section id="overview">
-		<title>Overview</title>
-		<para>
-This manual provides reference material for Swift: the SwiftScript language
-and the Swift runtime system. For introductory material, consult
-the <ulink url="./tutorial.php">Swift <!-- http://www.ci.uchicago.edu/swift/guides/ -->
-tutorial</ulink>.
-	</para>
-	<para>
-Swift is a data-oriented coarse grained scripting language that
-supports dataset typing and mapping, dataset iteration,
-conditional branching, and procedural composition.
-	</para>
-	<para>
-Swift programs (or <firstterm>workflows</firstterm>) are written in
-a language called <firstterm>SwiftScript</firstterm>.
-	</para>
-	<para>
-SwiftScript programs are dataflow oriented - they are primarily
-concerned with processing (possibly large) collections of data files,
-by invoking programs to do that processing. Swift handles execution of
-such programs on remote sites by choosing sites, handling the staging
-of input and output files to and from the chosen sites and remote execution
-of program code.
-	</para>
-	</section>
-	<section id="language">
-		<title>The SwiftScript Language</title>
-<section><title>Language basics</title>
-<para>
-A Swift script describes data, application components, invocations
-of applications components, and the inter-relations (data flow)
-between those invocations.
-</para>
-<para>
-Data is represented in a script by strongly-typed single-assignment
-variables. The syntax superficially resembles C and Java. For example,
-<literal>{</literal> and <literal>}</literal> characters are used to
-enclose blocks of statements.
-</para>
-<para>
-Types in Swift can be <firstterm>atomic</firstterm> or
-<firstterm>composite</firstterm>. An atomic type can be either a
-<firstterm>primitive type</firstterm> or a <firstterm>mapped type</firstterm>.
-Swift provides a fixed set of primitive types, such as
-<firstterm>integer</firstterm> and <firstterm>string</firstterm>. A mapped
-type indicates that the actual data does not reside in CPU addressable
-memory (as it would in conventional programming languages), but in
-POSIX-like files. Composite types are further subdivided into
-<firstterm>structures</firstterm> and <firstterm>arrays</firstterm>.
-Structures are similar in most respects to structure types in other languages.
-Arrays use numeric indices, but are sparse. They can contain elements of
-any type, including other array types, but all elements in an array must be
-of the same type.  We often refer to instances of composites of mapped types
-as <firstterm>datasets</firstterm>.
-</para>
-<imagedata fileref="type-hierarchy.png" />
-<para>
-Mapped type and composite type variable declarations can be annotated with a
-<firstterm>mapping descriptor</firstterm> indicating the file(s) that make up
-that dataset.  For example, the following line declares a variable named
-<literal>photo</literal> with type <literal>image</literal>. It additionally
-declares that the data for this variable is stored in a single file named
-<filename>shane.jpeg</filename>.
-</para>
-
-<programlisting>
-  image photo <"shane.jpeg">;
-</programlisting>
-
-<para>
-Component programs of scripts are declared in an <firstterm>app
-declaration</firstterm>, with the description of the command line syntax
-for that program and a list of input and output data. An <literal>app</literal>
-block describes a functional/dataflow style interface to imperative
-components.
-</para>
-
-<para>
-For example, the following example lists a procedure which makes use  of
-the <ulink url="http://www.imagemagick.org/"> ImageMagick</ulink>
-<command>convert</command> command to rotate a supplied
-image by a specified angle:
-</para>
-
-<programlisting>
-  app (image output) rotate(image input) {
-    convert "-rotate" angle @input @output;
-  }
-</programlisting>
-
-<para>
-A procedure is invoked using the familiar syntax:
-</para>
-
-<programlisting>
-  rotated = rotate(photo, 180);
-</programlisting>
-
-<para>
-While this looks like an assignment, the actual unix level execution
-consists of invoking the command line specified in the <literal>app</literal>
-declaration, with variables on the left of the assignment bound to the
-output parameters, and variables to the right of the procedure
-invocation passed as inputs.
-</para>
-
-<para>
-The examples above have used the type <literal>image</literal> without any
-definition of that type. We can declare it as a <firstterm>marker type</firstterm>
-which has no structure exposed to SwiftScript:
-</para>
-
-<programlisting>
-  type image;
-</programlisting>
-
-<para>
-This does not indicate that the data is unstructured; but it indicates
-that the structure of the data is not exposed to SwiftScript. Instead,
-SwiftScript will treat variables of this type as individual opaque
-files.
-</para>
-
-<para>
-With mechanisms to declare types, map variables to data files, and
-declare and invoke procedures, we can build a complete (albeit simple)
-script:
-</para>
-
-<programlisting>
- type image;
- image photo <"shane.jpeg">;
- image rotated <"rotated.jpeg">;
-
- app (image output) rotate(image input, int angle) {
-    convert "-rotate" angle @input @output;
- }
-
- rotated = rotate(photo, 180);
-</programlisting>
-
-<para>
-This script can be invoked from the command line:
-</para>
-
-<screen>
-  $ <userinput>ls *.jpeg</userinput>
-  shane.jpeg
-  $ <userinput>swift example.swift</userinput>
-  ...
-  $ <userinput>ls *.jpeg</userinput>
-  shane.jpeg rotated.jpeg
-</screen>
-
-<para>
-This executes a single <literal>convert</literal> command, hiding from the
-user features such as remote multisite execution and fault tolerance that
-will be discussed in a later section.
-</para>
-<figure> <title>shane.jpeg</title>
-<imagedata fileref="userguide-shane.jpeg" />
-</figure>
-<figure> <title>rotated.jpeg</title>
-<imagedata fileref="userguide-rotated.jpeg" />
-</figure>
-</section>
-
-<section><title>Arrays and Parallel Execution</title>
-<para>
-Arrays of values can be declared using the <literal>[]</literal> suffix. An
-array be mapped to a collection of files, one element per file, by using
-a different form of mapping expression.  For example, the
-<link linkend="mapper.filesys_mapper"><literal>filesys_mapper</literal></link>
-maps all files matching a particular unix glob pattern into an array:
-</para>
-
-<programlisting>
-  file frames[] <filesys_mapper; pattern="*.jpeg">;
-</programlisting>
-
-<para>
-The <firstterm><literal>foreach</literal></firstterm> construct can be used
-to apply the same block of code to each element of an array:
-</para>
-
-<programlisting>
-   foreach f,ix in frames {
-     output[ix] = rotate(frames, 180);
-   }
-</programlisting>
-
-<para>
-Sequential iteration can be expressed using the <literal>iterate</literal>
-construct:
-</para>
-
-<programlisting>
-   step[0] = initialCondition();
-   iterate ix {
-     step[ix] = simulate(step[ix-1]);
-   }
-</programlisting>
-
-<para>
-This fragment will initialise the 0-th element of the <literal>step</literal>
-array to some initial condition, and then repeatedly run the
-<literal>simulate</literal> procedure, using each execution's outputs as
-input to the next step.
-</para>
-
-</section>
-
-<section><title>Ordering of execution</title>
-
-<para>
-Non-array variables are <firstterm>single-assignment</firstterm>, which
-means that they must be assigned to exactly one value during execution.
-A procedure or expression will be executed when all of its input parameters
-have been assigned values. As a result of such execution, more variables may
-become assigned, possibly allowing further parts of the script to
-execute.
-</para>
-
-<para>
-In this way, scripts are implicitly parallel. Aside from serialisation
-implied by these dataflow dependencies, execution of component programs
-can proceed in parallel.
-</para>
-
-<para>
-In this fragment, execution of procedures <literal>p</literal> and
-<literal>q</literal> can happen in parallel:
-</para>
-
-<programlisting>
-  y=p(x);
-  z=q(x);
-</programlisting>
-
-<para>while in this fragment, execution is serialised by the variable
-<literal>y</literal>, with procedure <literal>p</literal> executing
-before <literal>q</literal>.</para>
-
-<programlisting>
- y=p(x);
- z=q(y);
-</programlisting>
-
-<para>
-Arrays in SwiftScript are more
-<firstterm>monotonic</firstterm> - a generalisation of being
-assignment. Knowledge about the
-content of an array increases during execution, but cannot otherwise
-change. Each element of the array is itself single assignment or monotonic
-(depending on its type).
-During a run all values for an array are eventually known, and that array
-is regarded as <firstterm>closed</firstterm>.
-</para>
-
-<para>
-Statements which deal with the array as a whole will often wait for the array
-to be closed before executing (thus, a closed array is the equivalent
-of a non-array type being assigned). However, a <literal>foreach</literal>
-statement will apply its body to elements of an array as they become
-known. It will not wait until the array is closed.
-</para>
-
-<para>
-Consider this script:
-</para>
-
-<programlisting>
- file a[];
- file b[];
- foreach v,i in a {
-   b[i] = p(v);
- }
- a[0] = r();
- a[1] = s();
-</programlisting>
-
-<para>
-Initially, the <literal>foreach</literal> statement will have nothing to
-execute, as the array <literal>a</literal> has not been assigned any values.
-The procedures <literal>r</literal> and <literal>s</literal> will execute.
-As soon as either of them is finished, the corresponding invocation of
-procedure <literal>p</literal> will occur. After both <literal>r</literal>
-and <literal>s</literal> have completed, the array <literal>a</literal> will
-be closed since no other statements in the script make an assignment to
-<literal>a</literal>.
-</para>
-
-</section>
-
-<section><title>Compound procedures</title>
-<para>
-As with many other programming languages, procedures consisting of SwiftScript
-code can be defined. These differ from the previously mentioned procedures
-declared with the <literal>app</literal> keyword, as they invoke other
-SwiftScript procedures rather than a component program.
-</para>
-
-<programlisting>
- (file output) process (file input) {
-   file intermediate;
-   intermediate = first(input);
-   output = second(intermediate);
- }
-
- file x <"x.txt">;
- file y <"y.txt">;
- y = process(x);
-</programlisting>
-
-<para>
-This will invoke two procedures, with an intermediate data file named
-anonymously connecting the <literal>first</literal> and
-<literal>second</literal> procedures.
-</para>
-
-<para>
-Ordering of execution is generally determined by execution of
-<literal>app</literal> procedures, not by any containing compound procedures.
-In this code block:
-</para>
-
-<programlisting>
- (file a, file b) A() {
-   a = A1();
-   b = A2();
- }
- file x, y, s, t;
- (x,y) = A();
- s = S(x);
- t = S(y);
-</programlisting>
-
-<para>
-then a valid execution order is: <literal>A1 S(x) A2 S(y)</literal>. The
-compound procedure <literal>A</literal> does not have to have fully completed
-for its return values to be used by subsequent statements.
-</para>
-
-</section>
-
-<section><title>More about types</title>
-<para>
-Each variable and procedure parameter in SwiftScript is strongly typed.
-Types are used to structure data, to aid in debugging and checking program
-correctness and to influence how Swift interacts with data.
-</para>
-
-<para>
-The <literal>image</literal> type declared in previous examples is a
-<firstterm>marker type</firstterm>. Marker types indicate that data for a
-variable is stored in a single file with no further structure exposed at
-the SwiftScript level.
-</para>
-
-<para>
-Arrays have been mentioned above, in the arrays section. A code block
-may be applied to each element of an array using <literal>foreach</literal>;
-or individual elements may be references using <literal>[]</literal> notation.
-</para>
-
-<para>There are a number of primitive types:</para>
-
-<table frame="all">
- <tgroup cols="2" align="left" colsep="1" rowsep="1">
-  <thead><row><entry>type</entry><entry>contains</entry></row></thead>
-  <tbody>
-   <row><entry>int</entry><entry>integers</entry></row>
-   <row><entry>string</entry><entry>strings of text</entry></row>
-   <row><entry>float</entry><entry>floating point numbers, that behave the same as Java <literal>double</literal>s</entry></row>
-   <row><entry>boolean</entry><entry>true/false</entry></row>
-  </tbody>
- </tgroup>
-</table>
-
-<para>
-Complex types may be defined using the <literal>type</literal> keyword:
-</para>
-<programlisting>
-  type headerfile;
-  type voxelfile;
-  type volume {
-    headerfile h;
-    voxelfile v;
-  }
-</programlisting>
-
-<para>
-Members of a complex type can be accessed using the <literal>.</literal>
-operator:
-</para>
-
-<programlisting>
-  volume brain;
-  o = p(brain.h);
-</programlisting>
-
-<para>
-Sometimes data may be stored in a form that does not fit with Swift's
-file-and-site model; for example, data might be stored in an RDBMS on some
-database server. In that case, a variable can be declared to have
-<firstterm><literal>external</literal></firstterm> type. This indicates that
-Swift should use the variable to determine execution dependency, but should
-not attempt other data management; for example, it will not perform any form
-of data stage-in or stage-out it will not manage local data caches on sites;
-and it will not enforce component program atomicity on data output. This can
-add substantial responsibility to component programs, in exchange for allowing
-arbitrary data storage and access methods to be plugged in to scripts.
-</para>
-
-<programlisting>
-  type file;
-
-  app (external o) populateDatabase() {
-    populationProgram;
-  }
-
-  app (file o) analyseDatabase(external i) {
-    analysisProgram @o;
-  }
-
-  external database;
-  file result <"results.txt">;
-
-  database = populateDatabase();
-  result = analyseDatabase(database);
-</programlisting>
-
-<para>
-Some external database is represented by the <literal>database</literal>
-variable. The <literal>populateDatabase</literal> procedure populates the
-database with some data, and the <literal>analyseDatabase</literal> procedure
-performs some subsequent analysis on that database. The declaration of
-<literal>database</literal> contains no mapping; and the procedures which
-use <literal>database</literal> do not reference them in any way; the
-description of <literal>database</literal> is entirely outside of the script.
-The single assignment and execution ordering rules will still apply though;
-<literal>populateDatabase</literal> will always be run before
-<literal>analyseDatabase</literal>.
-</para>
-
-</section>
-
-<section><title>Data model</title>
-<para>Data processed by Swift is strongly typed. It may be take the form
-of values in memory or as out-of-core files on disk. Language constructs
-called mappers specify how each piece of data is stored.</para>
-
-<section><title>Mappers</title>
-		<para>
-When a DSHandle represents a data file (or container of datafiles), it is
-associated with a mapper. The mapper is used to
-identify which files belong to that DSHandle.
-		</para>
-		<para>
-A dataset's physical representation is declared by a mapping descriptor,
-which defines how each element in the dataset's logical schema is
-stored in, and fetched from, physical structures such as directories,
-files, and remote servers.
-</para>
-
-<para>
-Mappers are parameterized to take into account properties such as
-varying dataset location.
-In order
-to access a dataset, we need to know three things: its type,
-its mapping, and the value(s) of any parameter(s) associated
-with the mapping descriptor. For example, if we want to describe a dataset,
-of type imagefile, and whose physical
-representation is a file called "file1.bin" located at "/home/yongzh/data/",
-then the dataset might be declared as follows:
-</para>
-
-<programlisting>
-imagefile f1<single_file_mapper;file="/home/yongzh/data/file1.bin">
-</programlisting>
-
-<para>
-The above example declares a dataset called f1, which uses a single
-file mapper to map a file from a specific location.
-</para>
-<para>
-SwiftScript has a simplified syntax for this case, since single_file_mapper
-is frequently used:
-
-<programlisting>
-binaryfile f1<"/home/yongzh/data/file1.bin">
-</programlisting>
-</para>
-
-<para>
-Swift comes with a number of mappers that handle common mapping patterns.
-These are documented in the <link linkend="mappers">mappers section</link>
-of this guide.
-</para>
-
-</section>
-
-		</section>
-		<section>
-			<title>More technical details about SwiftScript</title>
-<para>The syntax of SwiftScript has a superficial resemblance to C and
-Java. For example, { and } characters are used to enclose blocks of
-statements.
-</para>
-			<para>
-A SwiftScript program consists of a number of statements.
-Statements may declare types, procedures and variables, assign values to
-variables, and express operations over arrays.
-			</para>
-
-
-		<section><title>Variables</title>
-<para>Variables in SwiftScript are declared to be of a specific type.
-Assignments to those variables must be data of that type.
-SwiftScript variables are single-assignment - a value may be assigned
-to a variable at most once. This assignment can happen at declaration time
-or later on in execution. When an attempt to read from a variable
-that has not yet been assigned is made, the code performing the read
-is suspended until that variable has been written to. This forms the
-basis for Swift's ability to parallelise execution - all code will
-execute in parallel unless there are variables shared between the code
-that cause sequencing.</para>
-
-		<section>
-			<title>Variable Declarations</title>
-			<para>
-Variable declaration statements declare new variables. They can
-optionally assign a value to them or map those variables to on-disk files.
-			</para>
-<para>
-Declaration statements have the general form:
-<programlisting>
-  typename variablename (<mapping> | = initialValue ) ;
-</programlisting>
-The format of the mapping expression is defined in the Mappers section.
-initialValue may be either an expression or a procedure call that
-returns a single value.
-</para>
-<para>Variables can also be declared in a multivalued-procedure statement,
-described in another section.</para>
-		</section>
-
-		<section>
-			<title>Assignment Statements</title>
-			<para>
-Assignment statements assign values to previously declared variables.
-Assignments may only be made to variables that have not already been
-assigned. Assignment statements have the general form:
-
-<programlisting>
-  variable = value;
-</programlisting>
-where value can be either an expression or a procedure call that returns
-a single value.
-			</para>
-
-			<para>
-Variables can also be assigned in a multivalued-procedure statement,
-described in another section.
-			</para>
-		</section>
-</section>
-
-<section><title>Procedures</title>
-
-<para>There are two kinds of procedure: An atomic procedure, which
-describes how an external program can be executed; and compound
-procedures which consist of a sequence of SwiftScript statements.
-</para>
-
-			<para>
-A procedure declaration defines the name of a procedure and its
-input and output parameters. SwiftScript procedures can take multiple
-inputs and produce multiple outputs.  Inputs are specified to the right
-of the function name, and outputs are specified to the left. For example:
-
-<programlisting>
-(type3 out1, type4 out2) myproc (type1 in1, type2 in2)
-</programlisting>
-
-The above example declares a procedure called <literal>myproc</literal>, which
-has two inputs <literal>in1</literal> (of type <literal>type1</literal>)
-and <literal>in2</literal> (of type <literal>type2</literal>)
-and two outputs <literal>out1</literal> (of type <literal>type3</literal>)
-and <literal>out2</literal> (of type <literal>type4</literal>).
-			</para>
-
-			<para>
-A procedure input parameter can be an <firstterm>optional
-parameter</firstterm> in which case it must be declared with a default
-value.  When calling a procedure, both positional parameter and named
-parameter passings can be passed, provided that all optional
-parameters are declared after the required parameters and any
-optional parameter is bound using keyword parameter passing.
-For example, if <literal>myproc1</literal> is defined as:
-
-<programlisting>
-(binaryfile bf) myproc1 (int i, string s="foo")
-</programlisting>
-
-Then that procedure can be called like this, omitting the optional
-parameter <literal>s</literal>:
-
-<programlisting>
-binaryfile mybf = myproc1(1);
-</programlisting>
-
-or like this supplying a value for the optional parameter
-<literal>s</literal>:
-
-<programlisting>
-binaryfile mybf = myproc1 (1, s="bar");
-</programlisting>
-
-			</para>
-
-<section id="procedures.atomic"><title>Atomic procedures</title>
-			<para>
-An atomic procedure specifies how to invoke an
-external executable program, and how logical data
-types are mapped to command line arguments.
-			</para>
-
-			<para>
-Atomic procedures are defined with the <literal>app</literal> keyword:
-<programlisting>
-app (binaryfile bf) myproc (int i, string s="foo") {
-	myapp i s @filename(bf);
-}
-</programlisting>
-
-which specifies that <literal>myproc</literal> invokes an executable
-called <literal>myapp</literal>,
-passing the values of <literal>i</literal>, <literal>s</literal>
-and the filename of <literal>bf</literal> as command line arguments.
-			</para>
-</section>
-
-<section id="procedures.compound"><title>Compound procedures</title>
-			<para>
-A compound procedure contains a set of SwiftScript statements:
-
-<programlisting>
-(type2 b) foo_bar (type1 a) {
-	type3 c;
-	c = foo(a);    // c holds the result of foo
-	b = bar(c);    // c is an input to bar
-}
-</programlisting>
-		</para>
-
-		</section>
-</section>
-		<section>
-			<title>Control Constructs</title>
-			<para>
-SwiftScript provides <literal>if</literal>, <literal>switch</literal>,
-<literal>foreach</literal>, and <literal>iterate</literal> constructs,
-with syntax and semantics similar to comparable constructs in
-other high-level languages.
-			</para>
-			<section><title>foreach</title>
-			<para>
-The <literal>foreach</literal> construct is used to apply a block of statements to
-each element in an array. For example:
-
-<programlisting>
-check_order (file a[]) {
-	foreach f in a {
-		compute(f);
-	}
-}
-</programlisting>
-</para>
-<para>
-<literal>foreach</literal> statements have the general form:
-
-<programlisting>
-foreach controlvariable (,index) in expression {
-    statements
-}
-</programlisting>
-
-The block of statements is evaluated once for each element in
-<literal>expression</literal> which must be an array,
-with <literal>controlvariable</literal> set to the corresponding element
-and <literal>index</literal> (if specified) set to the
-integer position in the array that is being iterated over.
-
-			</para>
-			</section>
-
-			<section><title>if</title>
-			<para>
-The <literal>if</literal> statement allows one of two blocks of statements to be
-executed, based on a boolean predicate. <literal>if</literal> statements generally
-have the form:
-<programlisting>
-if(predicate) {
-    statements
-} else {
-    statements
-}
-</programlisting>
-
-where <literal>predicate</literal> is a boolean expression.
-			</para>
-			</section>
-
-			<section><title>switch</title>
-			<para>
-<literal>switch</literal> expressions allow one of a selection of blocks to be chosen based on
-the value of a numerical control expression. <literal>switch</literal> statements take the
-general form:
-<programlisting>
-switch(controlExpression) {
-    case n1:
-        statements2
-    case n2:
-        statements2
-    [...]
-    default:
-        statements
-}
-</programlisting>
-The control expression is evaluated, the resulting numerical value used to
-select a corresponding <literal>case</literal>, and the statements belonging to that
-<literal>case</literal> block
-are evaluated. If no case corresponds, then the statements belonging to
-the <literal>default</literal> block are evaluated.
-			</para>
-<para>Unlike C or Java switch statements, execution does not fall through to
-subsequent <literal>case</literal> blocks, and no <literal>break</literal>
-statement is necessary at the end of each block.
-</para>
-			</section>
-
-			<section id="construct.iterate"><title>iterate</title>
-				<para>
-<literal>iterate</literal> expressions allow a block of code to be evaluated repeatedly, with an
-integer parameter sweeping upwards from 0 until a termination condition
-holds.
-				</para>
-				<para>
-The general form is:
-<programlisting>
-iterate var {
-	statements;
-} until (terminationExpression);
-</programlisting>
-with the variable <literal>var</literal> starting at 0 and increasing
-by one in each iteration. That
-variable is in scope in the statements block and when evaluating the
-termination expression.
-				</para>
-			</section>
-		</section>
-	</section>
-
-	<section><title>Operators</title>
-
-<para>The following infix operators are available for use in
-SwiftScript expressions.
-</para>
-<table frame="all">
-<tgroup cols="2" align="left" colsep="1" rowsep="1">
- <thead><row><entry>operator</entry><entry>purpose</entry></row></thead>
- <tbody>
-  <row><entry>+</entry><entry>numeric addition; string concatenation</entry></row>
-  <row><entry>-</entry><entry>numeric subtraction</entry></row>
-  <row><entry>*</entry><entry>numeric multiplication</entry></row>
-  <row><entry>/</entry><entry>floating point division</entry></row>
-  <row><entry>%/</entry><entry>integer division</entry></row>
-  <row><entry>%%</entry><entry>integer remainder of division</entry></row>
-  <row><entry>== !=</entry><entry>comparison and not-equal-to</entry></row>
-  <row><entry> < > <= >=</entry><entry>numerical ordering</entry></row>
-  <row><entry>&& ||</entry><entry>boolean and, or</entry></row>
-  <row><entry>!</entry><entry>boolean not</entry></row>
- </tbody>
-</tgroup>
-</table>
-	</section>
-
-	<section id="globals"><title>Global constants</title>
-		<para>
-At the top level of a SwiftScript program, the <literal>global</literal>
-modified may be added to a declaration so that it is visible throughout
-the program, rather than only at the top level of the program. This allows
-global constants (of any type) to be defined. (since Swift 0.10)
-		</para>
-	</section>
-
-	<section id="imports"><title>Imports</title>
-		<para>
-The <literal>import</literal> directive can be used to import definitions from
-another SwiftScript file. (since Swift 0.10)
-		</para>
-		<para>
-For example, a SwiftScript program might contain this:
-			<programlisting>
-import defs;
-file f;
-			</programlisting>
-which would import the content of <filename>defs.swift</filename> in the
-current directory:
-			<programlisting>
-type file;
-			</programlisting>
-		</para>
-		<para>
-Imported files are read from the current working directory.
-		</para>
-		<para>
-There is no requirement that a module is imported only once. If a module
-is imported multiple times, for example in different files, then Swift will
-only process the imports once.
-		</para>
-		<para>
-Imports may contain anything that is valid in a SwiftScript program,
-including code that causes remote execution.
-		</para>
-	</section>
-
-</section>
-		<section id="mappers">
-		<title>Mappers</title>
-		<para>
-Mappers provide a mechanism to specify the layout of mapped datasets on
-disk. This is needed when Swift must access files to transfer them to
-remote sites for execution or to pass to applications.</para>
-		<para>
-Swift provides a number of mappers that are useful in common cases. This
-section details those standard mappers. For more complex cases, it is
-possible to write application-specific mappers in Java and
-use them within a SwiftScript program.
-		</para>
-
-		<section id="mapper.single_file_mapper"><title>The single file mapper</title>
-
-			<para>
-The <literal>single_file_mapper</literal> maps a single physical file to a dataset.
-			</para>
-<para>
-		<screen>
-
-    Swift variable ------------------->  Filename
-
-       f                                 myfile
-
-       f[0]                              INVALID
-
-       f.bar                             INVALID
-
-		</screen>
-</para>
-
-<table frame="all">
-<tgroup cols='2' align='left' colsep='1' rowsep='1'>
- <thead>
-  <row>
-   <entry>parameter</entry>
-   <entry>meaning</entry>
-  </row>
- </thead>
- <tbody>
-  <row><entry>file</entry><entry>The location of the physical file including path and file name.</entry></row>
- </tbody>
-</tgroup>
-</table>
-
-<para>Example:
-			<programlisting>
-	file f <single_file_mapper;file="plot_outfile_param">;</programlisting>
-
-There is a simplified syntax for this mapper:
-
-
-			<programlisting>
-	file f <"plot_outfile_param">;</programlisting>
-</para>
-	</section>
-
-	<section id="mapper.simple_mapper"><title>The simple mapper</title>
-<para>The <literal>simple_mapper</literal> maps a file or a list of files
-into an array by prefix, suffix, and pattern.  If more than one file is
-matched, each of the file names will be mapped as a subelement of the dataset.
-</para>
-
-<table frame="all">
-<tgroup cols='2' align='left' colsep='1' rowsep='1'>
- <thead>
-  <row>
-   <entry>parameter</entry>
-   <entry>meaning</entry>
-  </row>
- </thead>
- <tbody>
-  <row><entry>location</entry><entry>A directory that the files are located.</entry></row>
-  <row><entry>prefix</entry><entry>The prefix of the files</entry></row>
-  <row><entry>suffix</entry><entry>The suffix of the files, for instance: <literal>".txt"</literal></entry></row>
-  <row><entry>pattern</entry><entry>A UNIX glob style pattern, for instance:
-<literal>"*foo*"</literal> would match all file names that
-contain <literal>foo</literal>. When this mapper is used to specify output
-filenames, <literal>pattern</literal> is ignored.</entry></row>
- </tbody>
-</tgroup>
-</table>
-
-
-
-<para>Examples:</para>
-
-<para>
-		<programlisting>
-	type file;
-	file f <simple_mapper;prefix="foo", suffix=".txt">;
-			</programlisting>
-The above maps all filenames that start with <filename>foo</filename> and
-have an extension <filename>.txt</filename> into file f.
-
-		<screen>
-
-    Swift variable ------------------->  Filename
-
-       f                                 foo.txt
-
-		</screen>
-</para>
-
-<para>
-	<programlisting>
-type messagefile;
-
-(messagefile t) greeting(string m) {.
-    app {
-        echo m stdout=@filename(t);
-    }
-}
-
-messagefile outfile <simple_mapper;prefix="foo",suffix=".txt">;
-
-outfile = greeting("hi");
-	</programlisting>
-
-This will output the string 'hi' to the file <filename>foo.txt</filename>.
-	</para>
-
-	<para>
-The <literal>simple_mapper</literal> can be used to map arrays. It will map the array index
-into the filename between the prefix and suffix.
-
-<programlisting>
-type messagefile;
-
-(messagefile t) greeting(string m) {
-    app {
-        echo m stdout=@filename(t);
-    }
-}
-
-messagefile outfile[] <simple_mapper;prefix="baz",suffix=".txt">;
-
-outfile[0] = greeting("hello");
-outfile[1] = greeting("middle");
-outfile[2] = greeting("goodbye");
-</programlisting>
-
-		<screen>
-
-    Swift variable ------------------->  Filename
-
-       outfile[0]                        baz0000.txt
-       outfile[1]                        baz0001.txt
-       outfile[2]                        baz0002.txt
-
-		</screen>
-
-	</para>
-
-	<para>
-<literal>simple_mapper</literal> can be used to map structures. It will map the name of the
-structure member into the filename, between the prefix and the
-suffix.
-
-	<programlisting>
-type messagefile;
-
-type mystruct {
-  messagefile left;
-  messagefile right;
-};
-
-(messagefile t) greeting(string m) {
-    app {
-        echo m stdout=@filename(t);
-    }
-}
-
-mystruct out <simple_mapper;prefix="qux",suffix=".txt">;
-
-out.left = greeting("hello");
-out.right = greeting("goodbye");
-	</programlisting>
-
-This will output the string "hello" into the file
-<filename>qux.left.txt</filename> and the string "goodbye"
-into the file <filename>qux.right.txt</filename>.
-
-		<screen>
-
-    Swift variable ------------------->  Filename
-
-       out.left                          quxleft.txt
-       out.right                         quxright.txt
-
-		</screen>
-	</para>
-
-	</section>
-
-	<section id="mapper.concurrent_mapper"><title>concurrent mapper</title>
-<para>
-<literal>concurrent_mapper</literal> is almost the same as the simple mapper,
-except that it is used to map an output file, and the filename
-generated will contain an extract sequence that is unique.
-This mapper is the default mapper for variables when no mapper is
-specified.
-</para>
-
-<table frame="all">
-<tgroup cols='2' align='left' colsep='1' rowsep='1'>
- <thead>
-  <row>
-   <entry>parameter</entry>
-   <entry>meaning</entry>
-  </row>
- </thead>
- <tbody>
-  <row><entry>location</entry><entry>A directory that the files are located.</entry></row>
-  <row><entry>prefix</entry><entry>The prefix of the files</entry></row>
-  <row><entry>suffix</entry><entry>The suffix of the files, for instance: <literal>".txt"</literal></entry></row>
-  <row><entry>pattern</entry><entry>A UNIX glob style pattern, for instance:
-<literal>"*foo*"</literal> would match all file names that
-contain <literal>foo</literal>. When this mapper is used to specify output
-filenames, <literal>pattern</literal> is ignored.</entry></row>
- </tbody>
-</tgroup>
-</table>
-
-
-	<para>Example:
-		<programlisting>
-	file f1;
-	file f2 <concurrent_mapper;prefix="foo", suffix=".txt">;
-			</programlisting>
-The above example would use concurrent mapper for <literal>f1</literal> and
-<literal>f2</literal>, and
-generate <literal>f2</literal> filename with prefix <filename>"foo"</filename> and extension <filename>".txt"</filename>
-	</para>
-	</section>
-
-	<section id="mapper.filesys_mapper"><title>file system mapper</title>
-
-<para><literal>filesys_mapper</literal> is similar to the simple mapper,
-but maps a file or
-a list of files to an array. Each of the filename is
-mapped as an element in the array. The order of files in the resulting
-array is not defined.
-	</para>
-
-<para>TODO: note on difference between location as a relative vs absolute
-path wrt staging to remote location - as mihael said:
-It's because you specify that location in the mapper. Try location="."
-instead of location="/sandbox/..."</para>
-
-<table frame="all">
-<tgroup cols='2' align='left' colsep='1' rowsep='1'>
- <thead>
-  <row>
-   <entry>parameter</entry>
-   <entry>meaning</entry>
-  </row>
- </thead>
- <tbody>
-  <row><entry>location</entry><entry>The directory where the files are located.</entry></row>
-  <row><entry>prefix</entry><entry>The prefix of the files</entry></row>
-  <row><entry>suffix</entry><entry>The suffix of the files, for instance: <literal>".txt"</literal></entry></row>
-  <row><entry>pattern</entry><entry>A UNIX glob style pattern, for instance:
-<literal>"*foo*"</literal> would match all file names that
-contain <literal>foo</literal>.
-</entry></row>
- </tbody>
-</tgroup>
-</table>
-
-	<para>Example:
-			<programlisting>
-	file texts[] <filesys_mapper;prefix="foo", suffix=".txt">;
-			</programlisting>
-The above example would map all filenames that start with <filename>"foo"</filename>
-and have an extension <filename>".txt"</filename> into the array <literal>texts</literal>.
-For example, if the specified directory contains files: <filename>foo1.txt</filename>, <filename>footest.txt</filename>,
-<filename>foo__1.txt</filename>, then the mapping might be:
-		<screen>
-
-    Swift variable ------------------->  Filename
-
-       texts[0]                          footest.txt
-       texts[1]                          foo1.txt
-       texts[2]                          foo__1.txt
-
-		</screen>
-</para>
-	</section>
-
-	<section id="mapper.fixed_array_mapper"><title>fixed array mapper</title>
-<para>The <literal>fixed_array_mapper</literal> maps from a string that
-contains a list of filenames into a file array.</para>
-
-<table frame="all">
-<tgroup cols='2' align='left' colsep='1' rowsep='1'>
- <thead>
-  <row>
-   <entry>parameter</entry>
-   <entry>meaning</entry>
-  </row>
- </thead>
- <tbody>
-  <row><entry>files</entry><entry>A string that contains a list of filenames, separated by space, comma or colon</entry></row>
-  </tbody>
- </tgroup>
-</table>
-
-	<para>Example:
-			<programlisting>
-	file texts[] <fixed_array_mapper;files="file1.txt, fileB.txt, file3.txt">;
-			</programlisting>
-would cause a mapping like this:
-		<screen>
-
-    Swift variable ------------------->  Filename
-
-       texts[0]                          file1.txt
-       texts[1]                          fileB.txt
-       texts[2]                          file3.txt
-
-		</screen>
-</para>
-	</section>
-
-	<section id="mapper.array_mapper"><title>array mapper</title>
-	<para>The <literal>array_mapper</literal> maps from an array of strings
-into a file</para>
-
-<table frame="all">
-<tgroup cols='2' align='left' colsep='1' rowsep='1'>
- <thead>
-  <row>
-   <entry>parameter</entry>
-   <entry>meaning</entry>
-  </row>
- </thead>
- <tbody>
-  <row><entry>files</entry><entry>An array of strings containing one filename per element</entry></row>
-  </tbody>
-</tgroup>
-</table>
-
-	<para> Example:
-		<programlisting>
-string s[] = [ "a.txt", "b.txt", "c.txt" ];
-
-file f[] <array_mapper;files=s>;
-		</programlisting>
-This will establish the mapping:
-		<screen>
-
-    Swift variable ------------------->  Filename
-
-       f[0]                              a.txt
-       f[1]                              b.txt
-       f[2]                              c.txt
-
-		</screen>
-
-	</para>
-	</section>
-
-	<section id="mapper.regexp_mapper"><title>regular expression mapper</title>
-<para>The <literal>regexp_mapper</literal> transforms one file name to
-another using regular expression matching.</para>
-
-
-<table frame="all">
-<tgroup cols='2' align='left' colsep='1' rowsep='1'>
- <thead>
-  <row>
-   <entry>parameter</entry>
-   <entry>meaning</entry>
-  </row>
- </thead>
- <tbody>
-    <row><entry>source</entry><entry>The source file name</entry></row>
-    <row><entry>match</entry><entry>Regular expression pattern to match, use
-<literal>()</literal> to match whatever regular expression is inside the
-parentheses, and indicate the start and end of a group; the contents of a
-group can be retrieved with the <literal>\\number</literal> special sequence
-(two backslashes are needed because the backslash is an escape sequence introducer)
-</entry></row>
-    <row><entry>transform</entry><entry>The pattern of the file name to
-transform to, use <literal>\number</literal> to reference the
-group matched.</entry></row>
- </tbody>
-</tgroup>
-</table>
-
-<para>Example:
-	<programlisting>
-  string s = "picture.gif";
-  file f <regexp_mapper;
-    source=s,
-    match="(.*)gif",
-    transform="\\1jpg">; </programlisting>
-
-This example transforms a string ending <literal>gif</literal> into one
-ending <literal>jpg</literal> and maps that to a file.
-
-		<screen>
-    Swift variable ------------------->  Filename
-
-       f                                    picture.jpg
-		</screen>
-
-</para>
-
-</section>
-
-<section><title>csv mapper</title>
-
-<para>
-The <literal>csv_mapper</literal> maps the content of a CSV (comma-separated
-value) file into an array of structures. The dataset type needs to be
-correctly defined to conform to the column names in the
-file. For instance, if the file contains columns:
-<literal>name age GPA</literal> then the type needs to have member elements
-like this:
-<programlisting>
-  type student {
-    file name;
-    file age;
-    file GPA;
-  }
-</programlisting>
-
-If the file does not contain a header with column info, then the column
-names are assumed as <literal>column1</literal>, <literal>column2</literal>,
-etc.
-</para>
-
-<table frame="all">
-<tgroup cols='2' align='left' colsep='1' rowsep='1'>
- <thead>
-  <row>
-   <entry>parameter</entry>
-   <entry>meaning</entry>
-  </row>
- </thead>
- <tbody>
-    <row><entry>file</entry><entry>The name of the CSV file to read mappings from.</entry></row>
-    <row><entry>header</entry><entry>Whether the file has a line describing header info; default is <literal>true</literal></entry></row>
-    <row><entry>skip</entry><entry>The number of lines to skip at the beginning (after header line); default is <literal>0</literal>.</entry></row>
-    <row><entry>hdelim</entry><entry>Header field delimiter; default is the value of the <literal>delim</literal> parameter</entry></row>
-    <row><entry>delim</entry><entry>Content field delimiters; defaults are space, tab and comma</entry></row>
- </tbody>
-</tgroup>
-</table>
-
-	<para>Example:
-			<programlisting>
-	student stus[] <csv_mapper;file="stu_list.txt">;
-			</programlisting>
-The above example would read a list of student info from file
-<filename>"stu_list.txt"</filename> and map them into a student array. By default, the file should contain a header line specifying the names of the columns.
-If <filename>stu_list.txt</filename> contains the following:
-<screen>
-name,age,gpa
-101-name.txt, 101-age.txt, 101-gpa.txt
-name55.txt, age55.txt, age55.txt
-q, r, s
-</screen>
-then some of the mappings produced by this example would be:
-		<screen>
-
-    Swift variable ------------------->  Filename
-
-       stus[0].name                         101-name.txt
-       stus[0].age                          101-age.txt
-       stus[0].gpa                          101-gpa.txt
-       stus[1].name                         name55.txt
-       stus[1].age                          age55.txt
-       stus[1].gpa                          gpa55.txt
-       stus[2].name                         q
-       stus[2].age                          r
-       stus[2].gpa                          s
-
-		</screen>
-</para>
-	</section>
-
-	<section id="mapper.ext_mapper"><title>external mapper</title>
-		<para>
-The external mapper, <literal>ext</literal> maps based on the output of a
-supplied Unix executable.
-		</para>
-
-<table frame="all">
-<tgroup cols='2' align='left' colsep='1' rowsep='1'>
- <thead>
-  <row>
-   <entry>parameter</entry>
-   <entry>meaning</entry>
-  </row>
- </thead>
- <tbody>
-    <row><entry>exec</entry><entry>The name of the executable
-(relative to the current directory, if an absolute path is not
-specified)</entry></row>
-    <row><entry>*</entry><entry>Other parameters are passed to the
-executable prefixed with a <literal>-</literal> symbol</entry></row>
- </tbody>
-</tgroup>
-</table>
-
-	<para>
-The output of the executable should consist of two columns of data, separated
-by a space. The first column should be the path of the mapped variable,
-in SwiftScript syntax (for example <literal>[2]</literal> means the 2nd element of an
-array) or the symbol <literal>$</literal> to represent the root of the mapped variable.
-	</para>
-
-	<para> Example:
-With the following in <filename>mapper.sh</filename>,
-			<screen>
-#!/bin/bash
-echo "[2] qux"
-echo "[0] foo"
-echo "[1] bar"
-			</screen>
-
-then a mapping statement:
-
-			<programlisting>
-	student stus[] <ext;exec="mapper.sh">;
-			</programlisting>
-
-would map
-
-		<screen>
-
-    Swift variable ------------------->  Filename
-
-       stus[0]                              foo
-       stus[1]                              bar
-       stus[2]                              qux
-
-		</screen>
-
-		</para>
-
-	</section>
-
-	<section id="mapper.mapping_uris"><title>mapping URIs</title>
-		<para>
-The above mappers may be used to map files based on a URI which can be specified in the filename. This is useful for mapping files on remote machines.
-		</para>
-	<para>
-
-	<para> Example:
-
-			<programlisting>
-	student st <single_file_mapper;file="gsiftp://communicado.ci.uchicago.edu//tmp/student.txt">;
-			</programlisting>
-
-		</para>
-		</para>
-
-	</section>
-
-	</section>
-	<section id="commands"><title>Commands</title>
-		<para>
-The commands detailed in this section are available in the
-<filename>bin/</filename> directory of a Swift installation and can
-by run from the commandline if that directory is placed on the
-PATH.
-		</para>
-	<section id="swiftcommand">
-	<title>swift</title>
-	<para>
-The <command>swift</command> command is the main command line tool
-for executing SwiftScript programs.
-	</para>
-	<section><title>Command-line Syntax</title>
-<para>The <command>swift</command> command is invoked as follows:
-<command>swift [options] SwiftScript-program [SwiftScript-arguments]</command>
-with options taken from the following list, and SwiftScript-arguments
-made available to the SwiftScript program through the
-<link linkend="function.arg">@arg</link> function.
-</para>
-  <variablelist><title>Swift command-line options</title>
-  <varlistentry><term>-help or -h</term>
-    <listitem><para>
-      Display usage information </para></listitem>
-  </varlistentry>
-  <varlistentry><term>-typecheck</term>
-    <listitem><para>
-      Does a typecheck of a SwiftScript program, instead of executing it.</para></listitem>
-  </varlistentry>
-
-  <varlistentry><term>-dryrun</term>
-    <listitem><para>
-      Runs the SwiftScript program without submitting any jobs (can be used to get
-      a graph)
-    </para></listitem>
-  </varlistentry>
-
-  <varlistentry><term>-monitor</term>
-    <listitem><para>
-      Shows a graphical resource monitor
-    </para></listitem>
-  </varlistentry>
-
-  <varlistentry><term>-resume <literal>file</literal></term>
-    <listitem><para>
-      Resumes the execution using a log file
-    </para></listitem>
-  </varlistentry>
-
-  <varlistentry><term>-config <literal>file</literal></term>
-    <listitem><para>
-      Indicates the Swift configuration file to be used for this run.
-      Properties in this configuration file will override the default
-      properties. If individual command line arguments are used for
-      properties, they will override the contents of this file.
-    </para></listitem>
-  </varlistentry>
-
-  <varlistentry><term>-verbose | -v</term>
-    <listitem><para>
-      Increases the level of output that Swift produces on the console
-      to include more detail about the execution
-    </para></listitem>
-  </varlistentry>
-
-  <varlistentry><term>-debug | -d</term>
-    <listitem><para>
-      Increases the level of output that Swift produces on the console
-      to include lots of detail about the execution
-    </para></listitem>
-  </varlistentry>
-
-  <varlistentry><term>-logfile <literal>file</literal></term>
-    <listitem><para>
-      Specifies a file where log messages should go to. By default
-      Swift uses the name of the program being run and a numeric index
-      (e.g. myworkflow.1.log)
-    </para></listitem>
-  </varlistentry>
-
-  <varlistentry><term>-runid <literal>identifier</literal></term>
-    <listitem><para>
-      Specifies the run identifier. This must be unique for every invocation
-      and is used in several places to keep files from different executions
-      cleanly separated. By default, a datestamp and random number are used
-      to generate a run identifier. When using this parameter, care should be
-      taken to ensure that the run ID remains unique with respect to all
-      other run IDs that might be used, irrespective of (at least) expected
-      execution sites, program or user.
-    </para></listitem>
-  </varlistentry>
-
-  <varlistentry><term>-tui</term>
-    <listitem>
-      Displays an interactive text mode monitor during a run. (since Swift 0.9)
-    </listitem>
-  </varlistentry>
-
-</variablelist>
-
-<para>In addition, the following Swift properties can be set on the
-command line:
-
-<itemizedlist>
-<listitem>caching.algorithm</listitem>
-<listitem>clustering.enabled</listitem>
-<listitem>clustering.min.time</listitem>
-<listitem>clustering.queue.delay</listitem>
-<listitem>ip.address</listitem>
-<listitem>kickstart.always.transfer</listitem>
-<listitem>kickstart.enabled</listitem>
-<listitem>lazy.errors</listitem>
-<listitem>pgraph</listitem>
-<listitem>pgraph.graph.options</listitem>
-<listitem>pgraph.node.options</listitem>
-<listitem>sitedir.keep</listitem>
-<listitem>sites.file</listitem>
-<listitem>tc.file</listitem>
-<listitem>tcp.port.range</listitem>
-</itemizedlist>
-</para>
-
-	</section>
-	<section><title>Return codes</title>
-	<para>
-The <command>swift</command> command may exit with the following return codes:
-<table frame="all">
-<tgroup cols='2' align='left' colsep='1' rowsep='1'>
- <thead>
-  <row>
-   <entry>value</entry>
-   <entry>meaning</entry>
-  </row>
- </thead>
- <tbody>
-  <row><entry>0</entry><entry>success</entry></row>
-  <row><entry>1</entry><entry>command line syntax error or missing project name</entry></row>
-  <row><entry>2</entry><entry>error during execution</entry></row>
-  <row><entry>3</entry><entry>error during compilation</entry></row>
-  <row><entry>4</entry><entry>input file does not exist</entry></row>
- </tbody>
- </tgroup>
-</table>
-	</para>
-	</section>
-	<section><title>Environment variables</title>
-		<para>The <command>swift</command> is influenced by the
-following environment variables:
-		</para>
-		<para>
-<literal>GLOBUS_HOSTNAME</literal>, <literal>GLOBUS_TCP_PORT_RANGE</literal> - set in the environment before running
-Swift. These can be set to inform Swift of the
-configuration of your local firewall. More information can be found in
-<ulink url="http://dev.globus.org/wiki/FirewallHowTo">the Globus firewall
-How-to</ulink>.
-		</para>
-		<para>
-<literal>COG_OPTS</literal> - set in the environment before running Swift. Options set in this
-variable will be passed as parameters to the Java Virtual Machine which
-will run Swift. The parameters vary between virtual machine imlementations,
-but can usually be used to alter settings such as maximum heap size.
-Typing 'java -help' will sometimes give a list of commands. The Sun Java
-1.4.2 command line options are <ulink url="http://java.sun.com/j2se/1.4.2/docs/tooldocs/windows/java.html">documented here</ulink>.
-		</para>
-
-
-	</section>
-
-	</section>
-
-	<section><title>swift-osg-ress-site-catalog</title>
-			<para>
-The <command>swift-osg-ress-site-catalog</command> command generates a site
-catalog based on <ulink url="http://www.opensciencegrid.org/">OSG</ulink>'s
-ReSS information system (since Swift 0.9)
-			</para>
-			<para>
-Usage: <command>swift-osg-ress-site-catalog [options]</command>
-			</para>
-			<variablelist>
-			<varlistentry><term>--help</term>
-				<listitem>
-<para>Show help message</para>
-				</listitem>
-			</varlistentry>
-
-			<varlistentry><term>--vo=[name]</term>
-				<listitem>
-<para>Set what VO to query ReSS for</para>
-				</listitem>
-			</varlistentry>
-
-			<varlistentry><term>--engage-verified</term>
-				<listitem>
-<para>Only retrieve sites verified by the Engagement VO site
-verification tests This can not be used together with <literal>--vo</literal>,
-as the query will only work for sites advertising support for the
-Engagement VO.</para>
-
-<para>This option means information will be retrieved from the
-Engagement collector instead of the top-level ReSS collector.</para>
-
-				</listitem>
-			</varlistentry>
-
-			<varlistentry><term>--out=[filename]</term>
-				<listitem>
-<para>Write to [filename] instead of stdout</para>
-				</listitem>
-			</varlistentry>
-
-			<varlistentry><term>--condor-g</term>
-				<listitem>
-<para>Generates sites files which will submit jobs using a local Condor-G
-installation rather than through direct GRAM2 submission. (since Swift 0.10)</para>
-				</listitem>
-			</varlistentry>
-
-
-
-			</variablelist>
-
-
-		</section>
-
-
-	<section><title>swift-plot-log</title>
-		<para>
-<command>swift-plot-log</command> generates summaries of Swift run log
-files.
-		</para>
-		<para>
-Usage: <command>swift-plot-log [logfile] [targets]</command>
-		</para>
-		<para>
-When no targets are specified, <command>swift-plog-log</command> will
-generate an HTML report for the run. When targets are specified, only
-those named targets will be generated.
-		</para>
-	</section>
-
-	</section>
-	<section id="appmodel"> <title>Executing <literal>app</literal> procedures</title>
-	<para>
-This section describes how Swift executes <literal>app</literal> procedures,
-and requirements on the behaviour of application programs used in
-<literal>app</literal> procedures.
-These requirements are primarily to ensure
-that the Swift can run your application in different places and with the
-various fault tolerance mechanisms in place.
-	</para>
-
-<section><title>Mapping of <literal>app</literal> semantics into unix
-process execution semantics</title>
-
-<para>This section describes how an <literal>app</literal> procedure
-invocation is translated into a (remote) unix process execution. It does not
-describe the mechanisms by which Swift performs that translation; that
-is described in the next section.</para>
-
-<para>In this section, this example SwiftScript program is used
-for reference:</para>
-
-<programlisting>
- type file;
-
- app (file o) count(file i) {
-   wc @i stdout=@o;
- }
-
- file q <"input.txt">;
- file r <"output.txt">;
-</programlisting>
-
-<para>
-The executable for wc will be looked up in tc.data.
-</para>
-
-<para>
-This unix executable will then be executed in some <firstterm>application
-procedure workspace</firstterm>. This means:
-</para>
-
-<para>
-Each application procedure workspace will have an application workspace
-directory.  (TODO: can collapse terms //application procedure workspace//
-and //application workspace directory// ?
-</para>
-
-<para>
-This application workspace directory will not be shared with any other
-<firstterm>application procedure execution attempt</firstterm>; all
-application procedure
-execution attempts will run with distinct application procedure
-workspaces. (for the avoidance of doubt:
- If a <firstterm>SwiftScript procedure invocation</firstterm> is subject
-to multiple application procedure execution attempts (due to Swift-level
-restarts, retries or replication) then each of those application procedure
-execution attempts will be made in a different application procedure workspace.
-)</para>
-
-<para>
-The application workspace directory will be a directory on a POSIX
-filesystem accessible throughout the application execution by the
-application executable.
-</para>
-
-<para>
-Before the <firstterm>application executable</firstterm> is executed:
-</para>
-
-<itemizedlist>
-
-<listitem><para>
-The application workspace directory will exist.
-</para></listitem>
-
-<listitem><para>
-The <firstterm>input files</firstterm> will exist inside the application workspace
-directory (but not necessarily as direct children; there may be
-subdirectories within the application workspace directory).
-</para></listitem>
-
-<listitem><para>
-The input files will be those files <firstterm>mapped</firstterm>
-to <firstterm>input parameters</firstterm> of the application procedure
-invocation. (In the example, this means that the file
-<filename>input.txt</filename> will exist in the application workspace
-directory)
-</para></listitem>
-
-<listitem><para>
-For each input file dataset, it will be the case that
-<literal>@filename</literal> or
-<literal>@filenames</literal> invoked with that dataset as a parameter
-will return the path
-relative to the application workspace directory for the file(s) that are
-associated with that dataset. (In the example, that means that <literal>@i</literal> will
-evaluate to the path <filename>input.txt</filename>)
-</para></listitem>
-
-<listitem><para>
-For each <firstterm>file-bound</firstterm> parameter of the Swift procedure invocation, the
-associated files (determined by data type?) will always exist.
-</para></listitem>
-
-<listitem><para>
-The input files must be treated as read only files. This may or may not
-be enforced by unix file system permissions. They may or may not be copies
-of the source file (conversely, they may be links to the actual source file).
-</para></listitem>
-
-</itemizedlist>
-
-<para>
-During/after the <firstterm>application executable execution</firstterm>,
-the following must be true:
-</para>
-
-<itemizedlist>
-<listitem><para>
-If the application executable execution was successful (in the opinion
-of the application executable), then the application executable should
-exit with <firstterm>unix return code</firstterm> <literal>0</literal>;
-if the application executable execution
-was unsuccessful (in the opinion of the application executable), then the
-application executable should exit with unix return code not equal to
-<literal>0</literal>.
-</para></listitem>
-
-<listitem><para>
-Each file mapped from an output parameter of the SwiftScript procedure
-call must exist. Files will be mapped in the same way as for input files.
-</para>
-<para>
-(? Is it defined that output subdirectories will be precreated before
-execution or should app executables expect to make them? That's probably
-determined by the present behaviour of wrapper.sh)
-</para></listitem>
-
-<listitem><para>
-Output produced by running the application executable on some inputs should
-be the same no matter how many times, when or where that application
-executable is run. 'The same' can vary depending on application (for example,
-in an application it might be acceptable for a PNG->JPEG conversion to
-produce different, similar looking, output jpegs depending on the
-environment)
-</para></listitem>
-
-</itemizedlist>
-
-<para>
-Things to not assume:
-</para>
-
-<itemizedlist>
-
-<listitem><para>
-anything about the path of the application workspace directory
-</para></listitem>
-
-<listitem><para>
-that either the application workspace directory will be deleted or will
-continue to exist or will remain unmodified after execution has finished
-</para></listitem>
-
-<listitem><para>
-that files can be passed(?def) between application procedure invocations
-through any mechanism except through files known to Swift through the
-mapping mechanism (there is some exception here for <literal>external</literal>
-datasets - there are a separate set of assertions that hold for
-<literal>external</literal> datasets)
-</para></listitem>
-
-<listitem><para>
-that application executables will run on any particular site of those
-available, or than any combination of applications will run on the same or
-different sites.
-</para></listitem>
-
-</itemizedlist>
-
-</section>
-
-<section><title>
-How Swift implements the site execution model
-</title>
-
-<para>
-This section describes the implementation of the semantics described
-in the previous section.
-</para>
-
-<para>
-Swift executes application procedures on one or more <firstterm>sites</firstterm>.
-</para>
-
-<para>
-Each site consists of:
-</para>
-
-<itemizedlist>
-<listitem><para>
-worker nodes. There is some <firstterm>execution mechanism</firstterm>
-through which the Swift client side executable can execute its
-<firstterm>wrapper script</firstterm> on those
-worker nodes. This is commonly GRAM or Falkon or coasters.
-</para></listitem>
-
-<listitem><para>
-a site-shared file system. This site shared filesystem is accessible
-through some <firstterm>file transfer mechanism</firstterm> from the
-Swift client side
-executable. This is commonly GridFTP or coasters. This site shared
-filesystem is also accessible through the posix file system on all worker
-nodes, mounted at the same location as seen through the file transfer
-mechanism. Swift is configured with the location of some <firstterm>site working
-directory</firstterm> on that site-shared file system.
-</para></listitem>
-</itemizedlist>
-
-<para>
-There is no assumption that the site shared file system for one site is
-accessible from another site.
-</para>
-
-<para>
-For each workflow run, on each site that is used by that run, a <firstterm>run
-directory</firstterm> is created in the site working directory, by the Swift client
-side.
-</para>
-
-<para>
-In that run directory are placed several subdirectories:
-</para>
-
-<itemizedlist>
-<listitem><para>
-<filename>shared/</filename> - site shared files cache
-</para></listitem>
-
-<listitem><para>
-<filename>kickstart/</filename> - when kickstart is used, kickstart record files
-for each job that has generated a kickstart record.
-</para></listitem>
-
-
-<listitem><para>
-<filename>info/</filename> - wrapper script log files
-</para></listitem>
-
-<listitem><para>
-<filename>status/</filename> - job status files
-</para></listitem>
-
-<listitem><para>
-<filename>jobs/</filename> - application workspace directories (optionally placed here -
-see below)
-</para></listitem>
-</itemizedlist>
-
-<para>
-Application execution looks like this:
-</para>
-
-<para>
-For each application procedure call:
-</para>
-
-<para>
-The Swift client side selects a site; copies the input files for that
-procedure call to the site shared file cache if they are not already in
-the cache, using the file transfer mechanism; and then invokes the wrapper
-script on that site using the execution mechanism.
-</para>
-
-<para>
-The wrapper script creates the application workspace directory; places the
-input files for that job into the application workspace directory using
-either <literal>cp</literal> or <literal>ln -s</literal> (depending on a configuration option); executes the
-application unix executable; copies output files from the application
-workspace directory to the site shared directory using <literal>cp</literal>; creates a
-status file under the <filename>status/</filename> directory; and exits, returning control to
-the Swift client side. Logs created during the execution of the wrapper
-script are stored under the <filename>info/</filename> directory.
-</para>
-
-<para>
-The Swift client side then checks for the presence of and deletes a status
-file indicating success; and copies files from the site shared directory to
-the appropriate client side location.
-</para>
-
-<para>
-The job directory is created (in the default mode) under the <filename>jobs/</filename>
-directory. However, it can be created under an arbitrary other path, which
-allows it to be created on a different file system (such as a worker node
-local file system in the case that the worker node has a local file
-system).
-</para>
-
-</section>
-<imagedata fileref="swift-site-model.png" />
-	</section>
-
-	<section id="techoverview">
-	<title>Technical overview of the Swift architecture</title>
-	<para>
-This section attempts to provide a technical overview of the Swift
-architecture.
-	</para>
-
-	<section><title>karajan - the core execution engine</title>
-	</section>
-
-	<section><title>Execution layer</title>
-	<para>
-The execution layer causes an application program (in the form of a unix
-executable) to be executed either locally or remotely.
-	</para>
-	<para>
-The two main choices are local unix execution and execution through GRAM.
-Other options are available, and user provided code can also be plugged in.
-	</para>
-	<para>
-The <link linkend="kickstart">kickstart</link> utility can
-be used to capture environmental information at execution time
-to aid in debugging and provenance capture.
-	</para>
-	</section>
-
-	<section><title>SwiftScript language compilation layer</title>
-	<para>
-Step i: text to XML intermediate form parser/processor. parser written in
-ANTLR - see resources/VDL.g. The XML Schema Definition (XSD) for the
-intermediate language is in resources/XDTM.xsd.
-	</para>
-	<para>
-Step ii: XML intermediate form to Karajan workflow. Karajan.java - reads
-the XML intermediate form. compiles to karajan workflow language - for
-example, expressions are converted from SwiftScript syntax into Karajan
-syntax, and function invocations become karajan function invocations
-with various modifications to parameters to accomodate return parameters
-and dataset handling.
-	</para>
-	</section>
-
-	<section><title>Swift/karajan library layer</title>
-	<para>
-Some Swift functionality is provided in the form of Karajan libraries
-that are used at runtime by the Karajan workflows that the Swift
-compiler generates.
-	</para>
-	</section>
-
-	</section>
-
-	<section id="extending"><title>Ways in which Swift can be extended</title>
-<para>Swift is extensible in a number of ways. It is possible to add
-mappers to accomodate different filesystem arrangements, site selectors
-to change how Swift decides where to run each job, and job submission
-interfaces to submit jobs through different mechanisms.
-</para>
-<para>A number of mappers are provided as part of the Swift release and
-documented in the <link linkend="mappers">mappers</link> section.
-New mappers can be implemented
-in Java by implementing the org.griphyn.vdl.mapping.Mapper interface. The
-<ulink url="http://www.ci.uchicago.edu/swift/guides/tutorial.php">Swift
-tutorial</ulink> contains a simple example of this.
-</para>
-<para>Swift provides a default site selector, the Adaptive Scheduler.
-New site selectors can be plugged in by implementing the
-org.globus.cog.karajan.scheduler.Scheduler interface and modifying
-libexec/scheduler.xml and etc/karajan.properties to refer to the new
-scheduler.
-</para>
-<para>Execution providers and filesystem providers, which allow to Swift
-to execute jobs and to stage files in and out through mechanisms such
-as GRAM and GridFTP can be implemented as Java CoG kit providers.
-</para>
-	</section>
-
-	<section id="functions"><title>Function reference</title>
-		<para>
-This section details functions that are available for use in the SwiftScript
-language.
-		</para>
-		<section id="function.arg"><title>@arg</title>
-			<para>
-Takes a command line parameter name as a string parameter and an optional
-default value and returns the value of that string parameter from the
-command line. If no default value is specified and the command line parameter
-is missing, an error is generated. If a default value is specified and the
-command line parameter is missing, <literal>@arg</literal> will return the default value.
-			</para>
-			<para>
-Command line parameters recognized by <literal>@arg</literal> begin with exactly one hyphen
-and need to be positioned after the script name.
-			</para>
-
-			<para>For example:</para>
-			<programlisting>
-trace(@arg("myparam"));
-trace(@arg("optionalparam", "defaultvalue"));
-			</programlisting>
-			<screen>
-$ <userinput>swift arg.swift -myparam=hello</userinput>
-Swift v0.3-dev r1674 (modified locally)
-
-RunID: 20080220-1548-ylc4pmda
-SwiftScript trace: defaultvalue
-SwiftScript trace: hello
-			</screen>
-
-		</section>
-
-		<section id="function.extractint"><title>@extractint</title>
-			<para>
-<literal>@extractint(file)</literal> will read the specified file, parse an integer from the
-file contents and return that integer.
-			</para>
-		</section>
-
-		<section id="function.filename"><title>@filename</title>
-			<para>
-<literal>@filename(v)</literal> will return a string containing the filename(s) for the file(s)
-mapped to the variable <literal>v</literal>. When more than one filename is returned, the
-filenames will be space separated inside a single string return value.
-			</para>
-		</section>
-		<section id="function.filenames"><title>@filenames</title>
-			<para>
-<literal>@filenames(v)</literal> will return multiple values (!) containing the filename(s) for
-the file(s) mapped to the variable <literal>v</literal>. (compare to
-<link linkend="function.filename">@filename</link>)
-			</para>
-		</section>
-		<section id="function.regexp"><title>@regexp</title>
-			<para>
-<literal>@regexp(input,pattern,replacement)</literal> will apply regular expression
-substitution using the <ulink url="http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html">Java java.util.regexp API</ulink>. For example:
-<programlisting>
-string v =  @regexp("abcdefghi", "c(def)g","monkey");
-</programlisting>
-will assign the value <literal>"abmonkeyhi"</literal> to the variable <literal>v</literal>.
-			</para>
-		</section>
-		<section id="function.strcat"><title>@strcat</title>
-			<para>
-<literal>@strcat(a,b,c,d,...)</literal> will return a string containing all of the strings
-passed as parameters joined into a single string. There may be any number
-of parameters.
-			</para>
-			<para>
-The <literal>+</literal> operator concatenates two strings: <literal>@strcat(a,b)</literal> is the same as <literal>a + b</literal>
-			</para>
-		</section>
-		<section id="function.strcut"><title>@strcut</title>
-			<para>
-<literal>@strcut(input,pattern)</literal> will match the regular expression in the pattern
-parameter against the supplied input string and return the section that
-matches the first matching parenthesised group.
-			</para>
-			<para>
-For example:
-			</para>
-
-			<programlisting>
-string t = "my name is John and i like puppies.";
-string name = @strcut(t, "my name is ([^ ]*) ");
-string out = @strcat("Your name is ",name);
-trace(out);
-			</programlisting>
-
-			<para>
-will output the message: <literal>Your name is John</literal>.
-			</para>
-		</section>
-
-		<section id="function.strsplit"><title>@strsplit</title>
-			<para>
-<literal>@strsplit(input,pattern)</literal> will split the input string based on separators
-that match the given pattern and return a string array. (since Swift 0.9)
-			</para>
-			<para>
-Example:
-			</para>
-
-			<programlisting>
-string t = "my name is John and i like puppies.";
-string words[] = @strsplit(t, "\\s");
-foreach word in words {
-	trace(word);
-}
-			</programlisting>
-
-			<para>
-will output one word of the sentence on each line (though
-not necessarily in order, due to the fact that foreach
-iterations execute in parallel).
-			</para>
-		</section>
-
-
-		<section id="function.toint"><title>@toint</title>
-			<para>
-<literal>@toint(input)</literal> will parse its input string into an integer. This can be
-used with <literal>@arg</literal> to pass input parameters to a SwiftScript program as
-integers.
-			</para>
-		</section>
-	</section>
-
-	<section id="procedures"><title>Built-in procedure reference</title>
-		<para>
-This section details built-in procedures that are available for use in
-the SwiftScript language.
-		</para>
-
-		<section id="procedure.readdata"><title>readData</title>
-			<para>
-<literal>readData</literal> will read data from a specified file.
-			</para>
-			<para>
-The format of the input file is controlled by the type of the return
-value.
-			</para>
-
-			<para>
-For scalar return types, such as int, the specified file should contain
-a single value of that type.
-			</para>
-			<para>
-For arrays of scalars, the specified file should contain one value
-per line.
-			</para>
-			<para>
-For structs of scalars, the file should contain two rows.
-The first row should be structure member names separated by whitespace.
-The second row should be the corresponding values for each structure
-member, separated by whitespace, in the same order as the header row.
-			</para>
-			<para>
-For arrays of structs, the file should contain a heading row listing
-structure member names separated by whitespace. There should be one row
-for each element of the array, with structure member elements listed in
-the same order as the header row and separated by whitespace. (since Swift 0.4)
-			</para>
-
-		</section>
-		<section id="procedure.readdata2"><title>readdata2</title>
-			<para>
-<literal>readdata2</literal> will read data from a specified file, like <literal>readdata</literal>, but using
-a different file format more closely related to that used by the
-ext mapper.
-			</para>
-			<para>
-Input files should list, one per line, a path into a Swift structure, and
-the value for that position in the structure:
-				<screen>
-rows[0].columns[0] = 0
-rows[0].columns[1] = 2
-rows[0].columns[2] = 4
-rows[1].columns[0] = 1
-rows[1].columns[1] = 3
-rows[1].columns[2] = 5
-				</screen>
-which can be read into a structure defined like this:
-				<programlisting>
-type vector {
-        int columns[];
-}
-
-type matrix {
-        vector rows[];
-}
-
-matrix m;
-
-m = readData2("readData2.in");
-				</programlisting>
-			</para>
-
-			<para>
-(since Swift 0.7)
-			</para>
-		</section>
-		<section id="procedure.trace"><title>trace</title>
-			<para>
-<literal>trace</literal> will log its parameters. By default these will appear on both stdout
-and in the run log file. Some formatting occurs to produce the log message.
-The particular output format should not be relied upon. (since Swift 0.4)
-			</para>
-		</section>
-
-		<section id="procedure.writedata"><title>writeData</title>
-			<para>
-<literal>writeData</literal> will write out data structures in the format
-described for <literal>readData</literal>
-			</para>
-		</section>
-	</section>
-
-	<section id="engineconfiguration">
-
-		<title>Swift configuration properties</title>
-
-		<para>
-
-			Various aspects of the behavior of the Swift Engine can be
-			configured through properties. The Swift Engine recognizes a global,
-			per installation properties file which can found in <filename
-			class="file">etc/swift.properties</filename> in the Swift installation directory and a user
-			properties file which can be created by each user in <filename
-			class="file">~/.swift/swift.properties</filename>. The Swift Engine
-			will first load the global properties file. It will then try to load
-			the user properties file. If a user properties file is found,
-			individual properties explicitly set in that file will override the
-			respective properties in the global properties file. Furthermore,
-			some of the properties can be overridden directly using command line
-			arguments to the <link
-			linkend="swiftcommand"><command>swift</command> command</link>.
-
-		</para>
-
-			<para>
-
-				Swift properties are specified in the following format:
-
-<screen>
-<name>=<value>
-</screen>
-
-				The value can contain variables which will be expanded when the
-				properties file is read. Expansion is performed when the name of
-				the variable is used inside the standard shell dereference
-				construct: <literal>${<varname>name</varname>}</literal>. The following variables
-				can be used in the Swift configuration file:
-
-				<variablelist>
-					<title>Swift Configuration Variables</title>
-
-					<varlistentry>
-						<term>
-							<varname>swift.home</varname>
-						</term>
-						<listitem>
-							<para>
-
-								Points to the Swift installation directory
-								(<filename
-								class="directory"><envar>$SWIFT_HOME</envar></filename>). In general, this should not be set
-as Swift can find its own installation directory, and incorrectly setting it
-may impair the correct functionality of Swift.
-
-							</para>
-						</listitem>
-					</varlistentry>
-
-					<varlistentry>
-						<term>
-							<varname>user.name</varname>
-						</term>
-						<listitem>
-							<para>
-
-								The name of the current logged in user.
-
-							</para>
-						</listitem>
-					</varlistentry>
-
-					<varlistentry>
-						<term>
-							<varname>user.home</varname>
-						</term>
-						<listitem>
-							<para>
-
-								The user's home directory.
-
-							</para>
-						</listitem>
-					</varlistentry>
-				</variablelist>
-
-				The following is a list of valid Swift properties:
-
-				<variablelist>
-					<title>Swift Properties</title>
-
-					<varlistentry id="property.caching.algorithm">
-						<term>
-							<property>caching.algorithm</property>
-						</term>
-						<listitem>
-
-							<para>
-								Valid values: <literal>LRU</literal>
-							</para>
-
-							<para>
-								Default value: <literal>LRU</literal>
-							</para>
-
-							<para>
-
-								Swift caches files that are staged in on remote
-								resources, and files that are produced remotely
-								by applications, such that they can be re-used
-								if needed without being transfered again.
-								However, the amount of remote file system space
-								to be used for caching can be limited using the
-								<link linkend="profile.swift.storagesize"><property>swift:storagesize</property></link> profile
-								entry in the sites.xml file. Example:
-
-<screen>
-
-<pool handle="example" sysinfo="INTEL32::LINUX">
-	<gridftp url="gsiftp://example.org" storage="/scratch/swift" major="2" minor="4" patch="3"/>
-	<jobmanager universe="vanilla" url="example.org/jobmanager-pbs" major="2" minor="4" patch="3"/>
-	<workdirectory>/scratch/swift</workdirectory>
-	<profile namespace="SWIFT" key="storagesize">20000000</profile>
-</pool>
-
-</screen>
-
-
-								The decision of which files to keep in the cache
-								and which files to remove is made considering
-								the value of the
-								<property>caching.algorithm</property> property.
-								Currently, the only available value for this
-								property is <literal>LRU</literal>, which would
-								cause the least recently used files to be
-								deleted first.
-
-							</para>
-						</listitem>
-					</varlistentry>
-
-					<varlistentry id="property.clustering.enabled">
-						<term>
-							<property>clustering.enabled</property>
-						</term>
-						<listitem>
-
-							<para>
-								Valid values: <literal>true</literal>, <literal>false</literal>
-							</para>
-
-							<para>
-								Default value: <literal>false</literal>
-							</para>
-
-							<para>
-								Enables <link linkend="clustering">clustering</link>.
-							</para>
-
-						</listitem>
-					</varlistentry>
-
-					<varlistentry id="property.clustering.min.time">
-						<term>
-							<property>clustering.min.time</property>
-						</term>
-						<listitem>
-
-							<para>
-								Valid values: <parameter><int></parameter>
-							</para>
-
-							<para>
-								Default value: <literal>60</literal>
-							</para>
-
-							<para>
-
-								Indicates the threshold wall time for
-								clustering, in seconds. Jobs that have a
-								wall time smaller than the value of this
-								property will be considered for clustering.
-
-							</para>
-						</listitem>
-					</varlistentry>
-
-					<varlistentry id="property.clustering.queue.delay">
-						<term>
-							<property>clustering.queue.delay</property>
-						</term>
-						<listitem>
-
-							<para>
-								Valid values: <parameter><int></parameter>
-							</para>
-
-							<para>
-								Default value: <literal>4</literal>
-							</para>
-
-							<para>
-
-								This property indicates the interval, in
-								seconds, at which the clustering queue is
-								processed.
-
-							</para>
-						</listitem>
-					</varlistentry>
-
-					<varlistentry id="property.execution.retries">
-						<term>execution.retries</term>
-						<listitem>
-							<para>
-								Valid values: positive integers
-							</para>
-							<para>
-								Default value: 2
-							</para>
-							<para>
-								The number of time a job will be retried if it
-								fails (giving a maximum of 1 +
-								execution.retries attempts at execution)
-							</para>
-						</listitem>
-					</varlistentry>
-
-
-					<varlistentry id="property.foreach.max.threads">
-						<term>foreach.max.threads</term>
-						<listitem>
-							<para>
-								Valid values: positive integers
-							</para>
-							<para>
-								Default value: 1024
-							</para>
-							<para>
-Limits the number of concurrent iterations that each foreach statement
-can have at one time. This conserves memory for swift programs that
-have large numbers of iterations (which would otherwise all be executed
-in parallel). (since Swift 0.9)
-							</para>
-						</listitem>
-					</varlistentry>
-
-
-					<varlistentry>
-						<term>
-							<property>ip.address</property>
-						</term>
-						<listitem>
-
-							<para>
-								Valid values: <parameter><ipaddress></parameter>
-							</para>
-
-							<para>
-								Default value: N/A
-							</para>
-
-							<para>
-								The Globus GRAM service uses a callback
-								mechanism to send notifications about the status
-								of submitted jobs. The callback mechanism
-								requires that the Swift client be reachable from
-								the hosts the GRAM services are running on.
-								Normally, Swift can detect the correct IP address
-								of the client machine. However, in certain cases
-								(such as the client machine having more than one
-								network interface) the automatic detection
-								mechanism is not reliable. In such cases, the IP
-								address of the Swift client machine can be
-								specified using this property. The value of this
-								property must be a numeric address without quotes.
-							</para>
-							<para>
-								This option is deprecated and the hostname
-								property should be used instead.
-							</para>
-						</listitem>
-					</varlistentry>
-
-					<varlistentry>
-						<term>
-							<property>kickstart.always.transfer</property>
-						</term>
-						<listitem>
-
-							<para>
-								Valid values: <literal>true</literal>, <literal>false</literal>
-							</para>
-
-							<para>
-								Default value: <literal>false</literal>
-							</para>
-
-							<para>
-
-								This property controls when output from
-								Kickstart is transfered back to the submit site,
-								if Kickstart is enabled. When set to
-								<literal>false</literal>, Kickstart output is
-								only transfered for jobs that fail. If set to
-								<literal>true</literal>, Kickstart output is
-								transfered after every job is completed or
-								failed.
-
-							</para>
-						</listitem>
-					</varlistentry>
-
-					<varlistentry>
-						<term>
-							<property>kickstart.enabled</property>
-						</term>
-						<listitem>
-
-							<para>
-								Valid values: <literal>true</literal>, <literal>false</literal>, <literal>maybe</literal>
-							</para>
-
-							<para>
-								Default value: <literal>maybe</literal>
-							</para>
-
-							<para>
-
-								This option allows controlling of
-								when Swift uses <link linkend="kickstart">Kickstart</link>. A value of
-								<literal>false</literal> disables the use of
-								Kickstart, while a value of
-								<literal>true</literal> enables the use of
-								Kickstart, in which case sites specified in the
-								<filename type="file">sites.xml</filename> file
-								must have valid
-								<parameter>gridlaunch</parameter> attributes.
-								The <literal>maybe</literal> value will
-								enable the use of Kickstart only
-								on sites that have the
-								<parameter>gridlaunch</parameter> attribute
-								specified.
-
-							</para>
-						</listitem>
-					</varlistentry>
-
-					<varlistentry>
-						<term>
-							<property>lazy.errors</property>
-						</term>
-						<listitem>
-
-							<para>
-								Valid values: <literal>true</literal>, <literal>false</literal>
-							</para>
-
-							<para>
-								Default value: <literal>false</literal>
-							</para>
-
-							<para>
-
-								Swift can report application errors in two
-								modes, depending on the value of this property.
-								If set to <constant>false</constant>, Swift will
-								report the first error encountered and
-								immediately stop execution. If set to
-								<constant>true</constant>, Swift will attempt to
-								run as much as possible from a SwiftScript program before
-								stopping execution and reporting all errors
-								encountered.
-							</para>
-							<para>When developing SwiftScript programs, using the
-								default value of <constant>false</constant> can
-								make the program easier to debug. However
-								in production runs, using <constant>true</constant>
-								will allow more of a SwiftScript program to be run before
-								Swift aborts execution.
-							</para>
-						</listitem>
-					</varlistentry>
-
-					<varlistentry>
-						<term>
-							<property>pgraph</property>
-						</term>
-						<listitem>
-
-							<para>
-								Valid values: <literal>true</literal>, <literal>false</literal>, <parameter><file></parameter>
-							</para>
-
-							<para>
-								Default value: <literal>false</literal>
-							</para>
-
-							<para>
-
-								Swift can generate a
-<ulink url="http://www.graphviz.org/">Graphviz</ulink> file representing
-								the structure of the SwiftScript program it has run. If this
-								property is set to <literal>true</literal>,
-								Swift will save the provenance graph in a file
-								named by concatenating the program name and the
-								instance ID (e.g. <filename
-								class="file">helloworld-ht0adgi315l61.dot</filename>).
-							</para>
-							<para>
-								If set to <literal>false</literal>, no
-								provenance  graph will be generated. If a file
-								name is used, then  the provenance graph will be
-								saved in the specified file.
-							</para>
-							<para>
-								The generated dot file can be rendered
-								into a graphical form using
-								<ulink
-								url="http://www.graphviz.org/">Graphviz</ulink>,
-								for example with a command-line such as:
-							</para>
-							<screen>
-$ <userinput>swift -pgraph graph1.dot q1.swift</userinput>
-$ <userinput>dot -ograph.png -Tpng graph1.dot</userinput>
-							</screen>
-						</listitem>
-					</varlistentry>
-
-					<varlistentry>
-						<term>
-							<property>pgraph.graph.options</property>
-						</term>
-						<listitem>
-
-							<para>
-								Valid values: <parameter><string></parameter>
-							</para>
-
-							<para>
-								Default value: <literal>splines="compound", rankdir="TB"</literal>
-							</para>
-
-							<para>
-
-								This property specifies a <ulink
-								url="http://www.graphviz.org">Graphviz</ulink>
-								specific set of parameters for the graph.
-
-							</para>
-						</listitem>
-					</varlistentry>
-
-					<varlistentry>
-						<term>
-							<property>pgraph.node.options</property>
-						</term>
-						<listitem>
-
-							<para>
-								Valid values: <parameter><string></parameter>
-							</para>
-
-							<para>
-								Default value: <literal>color="seagreen", style="filled"</literal>
-							</para>
-
-							<para>
-
-								Used to specify a set of <ulink
-								url="http://www.graphviz.org">Graphviz</ulink>
-								specific properties for the nodes in the graph.
-
-							</para>
-						</listitem>
-					</varlistentry>
-
-					<varlistentry>
-						<term>
-							<property>provenance.log</property>
-						</term>
-						<listitem>
-							<para>
-								Valid values: <literal>true</literal>, <literal>false</literal>
-							</para>
-							<para>
-								Default value: <literal>false</literal>
-							</para>
-							<para>
-								This property controls whether the log file will contain provenance information enabling this will increase the size of log files, sometimes significantly.
-							</para>
-						</listitem>
-					</varlistentry>
-
-					<varlistentry>
-						<term>
-							<property>replication.enabled</property>
-						</term>
-						<listitem>
-
-							<para>
-								Valid values: <literal>true</literal>, <literal>false</literal>
-							</para>
-
-							<para>
-								Default value: <literal>false</literal>
-							</para>
-
-							<para>
-Enables/disables replication. Replication is used to deal with jobs sitting
-in batch queues for abnormally large amounts of time. If replication is enabled
-and certain conditions are met, Swift creates and submits replicas of jobs, and
-allows multiple instances of a job to compete.
-							</para>
-						</listitem>
-					</varlistentry>
-
-					<varlistentry>
-						<term>
-							<property>replication.limit</property>
-						</term>
-						<listitem>
-
-							<para>
-								Valid values: positive integers
-							</para>
-
-							<para>
-								Default value: 3
-							</para>
-
-							<para>
-The maximum number of replicas that Swift should attempt.
-							</para>
-						</listitem>
-					</varlistentry>
-
-					<varlistentry>
-						<term>
-							<property>sitedir.keep</property>
-						</term>
-						<listitem>
-							<para>
-								Valid values: <parameter>true</parameter>, <parameter>false</parameter>
-							</para>
-
-							<para>
-								Default value: <literal>false</literal>
-							</para>
-
-							<para>
-Indicates whether the working directory on the remote site should be
-left intact even when a run completes successfully. This can be
-used to inspect the site working directory for debugging purposes.
-							</para>
-						</listitem>
-					</varlistentry>
-
-					<varlistentry>
-						<term>
-							<property>sites.file</property>
-						</term>
-						<listitem>
-
-							<para>
-								Valid values: <parameter><file></parameter>
-							</para>
-
-							<para>
-								Default value: ${<varname>swift.home</varname>}<literal>/etc/sites.xml</literal>
-							</para>
-
-							<para>
-
-								Points to the location of the site
-								catalog, which contains a list of all sites that
-								Swift should use.
-
-
-							</para>
-						</listitem>
-					</varlistentry>
-
-					<varlistentry>
-						<term>
-							<property>status.mode</property>
-						</term>
-						<listitem>
-							<para>
-								Valid values: <parameter>files</parameter>, <parameter>provider</parameter>
-							</para>
-
-							<para>
-								Default value: <literal>files</literal>
-							</para>
-
-							<para>
-Controls how Swift will communicate the result code of running user programs
-from workers to the submit side. In <literal>files</literal> mode, a file
-indicating success or failure will be created on the site shared filesystem.
-In <literal>provider</literal> mode, the execution provider job status will
-be used.
-							</para>
-							<para>
-<literal>provider</literal> mode requires the underlying job execution system
-to correctly return exit codes. In at least the cases of GRAM2, and clusters
-used with any provider, exit codes are not returned, and so
-<literal>files</literal> mode must be used in those cases.  Otherwise,
-<literal>provider</literal> mode can be used to reduce the amount of
-filesystem access. (since Swift 0.8)
-							</para>
-						</listitem>
-					</varlistentry>
-
-					<varlistentry>
-						<term>
-							<property>tc.file</property>
-						</term>
-						<listitem>
-							<para>
-								Valid values: <parameter><file></parameter>
-							</para>
-
-							<para>
-								Default value: ${<varname>swift.home</varname>}<literal>/etc/tc.data</literal>
-							</para>
-
-							<para>
-
-								Points to the location of the transformation
-								catalog file which contains information about
-								installed applications. Details about the format
-								of the transformation catalog can be found
-								<ulink
-								url="http://vds.uchicago.edu/vds/doc/userguide/html/H_TransformationCatalog.html">here</ulink>.
-
-							</para>
-						</listitem>
-					</varlistentry>
-
-  					<varlistentry>
-						<term>
-							<property>tcp.port.range</property>
-						</term>
-						<listitem>
-							<para>Valid values: <parameter><start></parameter>,<parameter><end></parameter> where start and end are integers</para>
-							<para>Default value: none</para>
-							<para>
-A TCP port range can be specified to restrict the ports on which
-GRAM callback services are started. This is likely needed if your
- submit host is behind a firewall, in which case the firewall
-should be configured to allow incoming connections on ports in
-the range.
-							</para>
-						</listitem>
-  </varlistentry>
-
-
-					<varlistentry>
-						<term>
-							<property>throttle.file.operations</property>
-						</term>
-						<listitem>
-
-							<para>
-								Valid values: <parameter><int></parameter>, <parameter>off</parameter>
-							</para>
-
-							<para>
-								Default value: <literal>8</literal>
-							</para>
-
-							<para>
-
-								Limits the total number of concurrent file
-								operations that can happen at any given time.
-								File operations (like transfers) require an
-								exclusive connection to a site. These
-								connections can be expensive to establish. A
-								large number of concurrent file operations may
-								cause Swift to attempt to establish many  such
-								expensive connections to various sites. Limiting
-								the number of concurrent file operations causes
-								Swift to use a small number of cached
-								connections and achieve better overall
-								performance.
-
-							</para>
-						</listitem>
-					</varlistentry>
-
-					<varlistentry>
-						<term>
-							<property>throttle.host.submit</property>
-						</term>
-						<listitem>
-
-							<para>
-								Valid values: <parameter><int></parameter>, <parameter>off</parameter>
-							</para>
-
-							<para>
-								Default value: <literal>2</literal>
-							</para>
-
-							<para>
-
-								Limits the number of concurrent submissions for
-								any of the sites Swift will try to send jobs to.
-								In other words it guarantees that no more than
-								the  value of this throttle jobs sent to any
-								site will be concurrently in a state of being
-								submitted.
-
-							</para>
-						</listitem>
-					</varlistentry>
-
-					<varlistentry id="property.throttle.score.job.factor">
-						<term>
-							<property>throttle.score.job.factor</property>
-						</term>
-						<listitem>
-
-							<para>
-								Valid values: <parameter><int></parameter>, <parameter>off</parameter>
-							</para>
-
-							<para>
-								Default value: <literal>4</literal>
-							</para>
-
-							<para>
-								The Swift scheduler has the ability to limit
-								the number of concurrent jobs allowed on a
-								site based on the performance history of that
-								site. Each site is assigned a score (initially
-								1), which can increase or decrease based on
-								whether the site yields successful or faulty
-								job runs. The score for a site can take values
-								in the (0.1, 100) interval. The number of
-								allowed jobs is calculated using the
-								following formula:
-							</para>
-							<para>
-								2 + score*throttle.score.job.factor
-							</para>
-							<para>
-								This means a site will always be allowed
-								at least two concurrent jobs and at most
-								2 + 100*throttle.score.job.factor. With a
-								default of 4 this means at least 2 jobs and
-								at most 402.
-							</para>
-							<para>
-								This parameter can also be set per site
-								using the jobThrottle profile key in a site
-								catalog entry.
-							</para>
-						</listitem>
-					</varlistentry>
-
-					<varlistentry>
-						<term>
-							<property>throttle.submit</property>
-						</term>
-						<listitem>
-
-							<para>
-								Valid values: <parameter><int></parameter>, <parameter>off</parameter>
-							</para>
-
-							<para>
-								Default value: <literal>4</literal>
-							</para>
-
-							<para>
-
-								Limits the number of concurrent submissions for
-								a run. This throttle only limits
-								the number of concurrent tasks (jobs) that are
-								being sent to sites, not the total number of
-								concurrent jobs that can be run. The submission
-								stage in GRAM is one of the most CPU expensive
-								stages (due mostly to the mutual authentication
-								and delegation). Having too many  concurrent
-								submissions can overload either or both the
-								submit host CPU and the remote host/head node
-								causing degraded performance.
-
-							</para>
-
-						</listitem>
-					</varlistentry>
-
-					<varlistentry>
-						<term>
-							<property>throttle.transfers</property>
-						</term>
-						<listitem>
-
-							<para>
-								Valid values: <parameter><int></parameter>, <parameter>off</parameter>
-							</para>
-
-							<para>
-								Default value: <literal>4</literal>
-							</para>
-
-							<para>
-
-								Limits the total number of concurrent file
-								transfers that can happen at any given time.
-								File transfers consume bandwidth. Too many
-								concurrent transfers can cause the network to be
-								overloaded preventing various other signaling
-								traffic from flowing properly.
-
-							</para>
-						</listitem>
-					</varlistentry>
-
-					<varlistentry>
-						<term>
-							<property>ticker.disable</property>
-						</term>
-						<listitem>
-							<para>
-								Valid values: <parameter>true</parameter>, <parameter>false</parameter>
-							</para>
-
-							<para>
-								Default value: <literal>false</literal>
-							</para>
-
-							<para>
-When set to true, suppresses the output progress ticker that Swift sends
-to the console every few seconds during a run (since Swift 0.9)
-							</para>
-						</listitem>
-					</varlistentry>
-
-					<varlistentry>
-						<term>
-							<property>wrapper.invocation.mode</property>
-						</term>
-						<listitem>
-							<para>
-Valid values: <parameter>absolute</parameter>, <parameter>relative</parameter>
-							</para>
-							<para>
-Default value: <literal>absolute</literal>
-							</para>
-							<para>
-Determines if Swift remote wrappers will be executed by specifying an
-absolute path, or a path relative to the job initial working directory.
-In most cases, execution will be successful with either option. However,
-some execution sites ignore the specified initial working directory, and
-so <literal>absolute</literal> must be used. Conversely on some sites,
-job directories appear in a different place on the worker node file system
-than on the filesystem access node, with the execution system handling
-translation of the job initial working directory. In such cases,
-<literal>relative</literal> mode must be used. (since Swift 0.9)
-							</para>
-
-						</listitem>
-					</varlistentry>
-
-					<varlistentry>
-						<term>
-							<property>wrapper.parameter.mode</property>
-						</term>
-						<listitem>
-							<para>
-Controls how Swift will supply parameters to the remote wrapper script.
-<literal>args</literal> mode will pass parameters on the command line. Some
-execution systems do not pass commandline parameters sufficiently cleanly
-for Swift to operate correctly.
-<literal>files</literal> mode will pass parameters through an additional
-input file (since Swift 0.95). This provides a cleaner communication channel
-for parameters, at the expense of transferring an additional file for each
-job invocation.
-							</para>
-						</listitem>
-
-					</varlistentry>
-
-					<varlistentry>
-						<term>
-							<property>wrapperlog.always.transfer</property>
-						</term>
-						<listitem>
-
-							<para>
-								Valid values: <literal>true</literal>, <literal>false</literal>
-							</para>
-
-							<para>
-								Default value: <literal>false</literal>
-							</para>
-
-							<para>
-
-								This property controls when output from
-								the Swift remote wrapper is transfered
-								back to the submit site. When set to
-								<literal>false</literal>, wrapper logs are
-								only transfered for jobs that fail. If set to
-								<literal>true</literal>, wrapper logs are
-								transfered after every job is completed or
-								failed.
-
-							</para>
-						</listitem>
-					</varlistentry>
-
-				</variablelist>
-
-				Example:
-
-<screen>
-sites.file=${vds.home}/etc/sites.xml
-tc.file=${vds.home}/etc/tc.data
-ip.address=192.168.0.1
-</screen>
-
-			</para>
-
-	</section>
-
-
-	<section id="profiles"><title>Profiles</title>
-		<para>
-Profiles are configuration parameters than can be specified either for
-sites or for transformation catalog entries. They influence the behaviour
-of Swift towards that site (for example, by changing the load Swift will
-place on that sites) or when running a particular procedure.
-		</para>
-		<para>
-Profile entries for a site are specified in the site catalog. Profile
-entries for specific procedures are specified in the transformation
-catalog.
-		</para>
-		<section id="profile.karajan"><title>Karajan namespace</title>
-			<para id="profile.karajan.maxsubmitrate"><literal>maxSubmitRate</literal> limits the maximum rate of job submission, in jobs per second.
-For example:
-<screen>
-<profile namespace="karajan" key="maxSubmitRate">0.2</profile>
-</screen>
-will limit job submission to 0.2 jobs per second (or equivalently,
-one job every five seconds).
-			</para>
-			<para id="profile.karajan.jobThrottle"><literal>jobThrottle</literal>
-allows the job throttle factor (see Swift property <link linkend="property.throttle.score.job.factor">throttle.score.job.factor</link>) to be set per site.
-			</para>
-			<para id="profile.karajan.initialScore"><literal>initialScore</literal>
-allows the initial score for rate limiting and site selection to be set to
-a value other than 0.
-			</para>
-			<para id="profile.karajan.delayBase"><literal>delayBase</literal> controls how much a site will be delayed when it performs poorly. With each reduction
-in a sites score by 1, the delay between execution attempts will increase by
-a factor of delayBase.</para>
-			<para id="profile.karajan.status.mode"><literal>status.mode</literal> allows the status.mode property to be set per-site instead of for an entire run.
-See the Swift configuration properties section for more information.
-(since Swift 0.8)</para>
-		</section>
-		<section id="profile.swift"><title>swift namespace</title>
-			<para id="profile.swift.storagesize"><literal>storagesize</literal> limits the
-amount of space that will be used on the remote site for temporary files.
-When more than that amount of space is used, the remote temporary file
-cache will be cleared using the algorithm specified in the
-<link linkend="property.caching.algorithm"><literal>caching.algorithm</literal></link> property.
-			</para>
-			<para id="swift.wrapperInterpreter"><literal>wrapperInterpreter</literal>
-The wrapper interpreter indicates the command (executable) to be used to run the Swift wrapper
-script. The default is "/bin/bash" on Unix sites and "cscript.exe" on Windows sites.
-			</para>
-			<para id="swift.wrapperInterpreterOptions"><literal>wrapperInterpreterOptions</literal>
-Allows specifying additional options to the executable used to run the Swift wrapper. The defaults
-are no options on Unix sites and "//Nologo" on Windows sites.
-			</para>
-			<para id="swift.wrapperScript"><literal>wrapperScript</literal>
-Specifies the name of the wrapper script to be used on a site. The defaults are "_swiftwrap" on
-Unix sites and "_swiftwrap.vbs" on Windows sites. If you specify a custom wrapper script, it
-must be present in the "libexec" directory of the Swift installation.
-			</para>
-			<para id="swift.cleanupCommand"><literal>cleanupCommand</literal>
-Indicates the command to be run at the end of a Swift run to clean up the run directories on a
-remote site. Defaults are "/bin/rm" on Unix sites and "cmd.exe" on Windows sites
-			</para>
-			<para id="swift.cleanupCommandOptions"><literal>cleanupCommandOptions</literal>
-Specifies the options to be passed to the cleanup command above. The options are passed in the
-argument list to the cleanup command. After the options, the last argument is the directory
-to be deleted. The default on Unix sites is "-rf". The default on Windows sites is ["/C", "del", "/Q"].
-			</para>
-
-		</section>
-		<section id="profile.globus"><title>Globus namespace</title>
-			<para id="profile.globus.maxwalltime"><literal>maxwalltime</literal> specifies a walltime limit for each job, in minutes.
-			</para>
-			<para>
-The following formats are recognized:
-				<itemizedlist>
-					<listitem>Minutes</listitem>
-					<listitem>Hours:Minutes</listitem>
-					<listitem>Hours:Minutes:Seconds</listitem>
-				</itemizedlist>
-			</para>
-			<para>Example:</para>
-<screen>
-localhost	echo	/bin/echo	INSTALLED	INTEL32::LINUX	GLOBUS::maxwalltime="00:20:00"
-</screen>
-			<para>When replication is enabled (see <link linkend="replication">replication</link>), then walltime will also be enforced at the Swift client side: when
-a job has been active for more than twice the maxwalltime, Swift will kill the
-job and regard it as failed.
-			</para>
-			<para>
-When clustering is used, <literal>maxwalltime</literal> will be used to
-select which jobs will be clustered together. More information on this is
-available in the <link linkend="clustering">clustering section</link>.
-			</para>
-			<para>
-When coasters as used, <literal>maxwalltime</literal> influences the default
-coaster worker maxwalltime, and which jobs will be sent to which workers.
-More information on this is available in the <link linkend="coasters">coasters
-section</link>.
-			</para>
-			<para id="profile.globus.queue"><literal>queue</literal>
-is used by the PBS, GRAM2 and GRAM4 providers. This profile
-entry specifies which queue jobs will be submitted to. The valid queue names
-are site-specific.
-			</para>
-			<para id="profile.globus.host_types"><literal>host_types</literal>
-specifies the types of host that are permissible for a job to run on.
-The valid values are site-specific. This profile entry is used by the
-GRAM2 and GRAM4 providers.
-			</para>
-			<para id="profile.globus.condor_requirements"><literal>condor_requirements</literal> allows a requirements string to be specified
-when Condor is used as an LRM behind GRAM2. Example: <literal><profile namespace="globus" key="condor_requirements">Arch == "X86_64" || Arch="INTEL"</profile></literal>
-			</para>
-			<para id="profile.slots"><literal>slots</literal>
-When using <link linkend="coasters">coasters</link>, this parameter
-specifies the maximum number of jobs/blocks that the coaster scheduler will have running at any given time.
-The default is 20.
-			</para>
-			<para id="profile.workersPerNode"><literal>workersPerNode</literal>
-This parameter determines how many coaster workers are
-started one each compute node. The default value is 1.
-			</para>
-			<para id="profile.nodeGranularity"><literal>nodeGranularity</literal>
-When allocating a coaster worker block, this parameter
-restricts the number of nodes in a block to a multiple of this value. The total number of workers will
-then be a multiple of workersPerNode * nodeGranularity. The default value is 1.
-			</para>
-			<para id="profile.allocationStepSize"><literal>allocationStepSize</literal>
-Each time the coaster block scheduler computes a schedule, it will attempt to allocate a
-number of slots from the number of available slots (limited using the above slots profile). This
-parameter specifies the maximum fraction of slots that are allocated in one schedule. Default is
-0.1.
-			</para>
-			<para id="profile.lowOverallocation"><literal>lowOverallocation</literal>
-Overallocation is a function of the walltime of a job which determines how long (time-wise) a
-worker job will be. For example, if a number of 10s jobs are submitted to the coaster service,
-and the overallocation for 10s jobs is 10, the coaster scheduler will attempt to start worker
-jobs that have a walltime of 100s. The overallocation is controlled by manipulating the end-points
-of an overallocation function. The low endpoint, specified by this parameter, is the overallocation
-for a 1s job. The high endpoint is the overallocation for a (theoretical) job of infinite length.
-The overallocation for job sizes in the [1s, +inf) interval is determined using an exponential decay function:
-
-overallocation(walltime) = walltime * (lowOverallocation - highOverallocation) * exp(-walltime * overallocationDecayFactor) + highOverallocation
-
-The default value of lowOverallocation is 10.
-			</para>
-			<para id="profile.highOverallocation"><literal>highOverallocation</literal>
-The high overallocation endpoint (as described above). Default: 1
-			</para>
-			<para id="profile.overallocationDecayFactor"><literal>overallocationDecayFactor</literal>
-The decay factor for the overallocation curve. Default 0.001 (1e-3).
-			</para>
-			<para id="profile.spread"><literal>spread</literal>
-When a large number of jobs is submitted to the a coaster service, the work is divided into blocks. This
-parameter allows a rough control of the relative sizes of those blocks. A value of 0 indicates that all work
-should be divided equally between the blocks (and blocks will therefore have equal sizes). A value of 1
-indicates the largest possible spread. The existence of the spread parameter is based on the assumption
-that smaller overall jobs will generally spend less time in the queue than larger jobs. By submitting
-blocks of different sizes, submitted jobs may be finished quicker by smaller blocks. Default: 0.9.
-			</para>
-			<para id="profile.reserve"><literal>reserve</literal>
-Reserve time is a time in the allocation of a worker that sits at the end of the worker time and
-is useable only for critical operations. For example, a job will not be submitted to a worker if
-it overlaps its reserve time, but a job that (due to inaccurate walltime specification) runs into
-the reserve time will not be killed (note that once the worker exceeds its walltime, the queuing
-system will kill the job anyway). Default 10 (s).
-			</para>
-			<para id="profile.maxnodes"><literal>maxnodes</literal>
-Determines the maximum number of nodes that can be allocated in one coaster block. Default: unlimited.
-			</para>
-			<para id="profile.maxtime"><literal>maxtime</literal>
-Indicates the maximum walltime that a coaster block can have. Default: unlimited.
-			</para>
-			<para id="profile.remoteMonitorEnabled"><literal>remoteMonitorEnabled</literal>
-If set to "true", the client side will get a Swing window showing, graphically, the state of the
-coaster scheduler (blocks, jobs, etc.). Default: false
-			</para>
-<!--
-			Reminds me this functionality should be added to the new coaster stuff
-			<para id="profile.globus.coasterInternalIP"><literal>coasterInternalIP</literal>
-specifies the internal address of the coaster head node, to be used by
-coaster workers to communicate with the coaster head node. This can be used
-when the address determined automatically by the coaster provider
-is inaccessible from coaster workers (for example, when the workers
-reside on an unrouted internal network). (since Swift 0.9)
-			</para>
-
--->
-		</section>
-
-		<section id="profile.env"><title>env namespace</title>
-			<para>
-Profile keys set in the env namespace will be set in the unix environment of the
-executed job. Some environment variables influence the worker-side
-behaviour of Swift:
-			</para>
-			<para>
-<literal>PATHPREFIX</literal> - set in env namespace profiles. This path is prefixed onto the start
-of the <literal>PATH</literal> when jobs are
-executed. It can be more useful than setting the <literal>PATH</literal> environment variable directly,
-because setting <literal>PATH</literal> will cause the execution site's default path to be lost.
-			</para>
-			<para>
-<literal>SWIFT_JOBDIR_PATH</literal> - set in env namespace profiles. If set, then Swift will
-use the path specified here as a worker-node local temporary directory to
-copy input files to before running a job. If unset, Swift will keep input
-files on the site-shared filesystem. In some cases, copying to a worker-node
-local directory can be much faster than having applications access the
-site-shared filesystem directly.
-			</para>
-			<para>
-<literal>SWIFT_EXTRA_INFO</literal> - set in env namespace profiles. If set,
-then Swift will execute the command specified in
-<literal>SWIFT_EXTRA_INFO</literal> on execution sites immediately before
-each application execution, and will record the stdout of that command in the
-wrapper info log file for that job. This is intended to allow software
-version and other arbitrary information about the remote site to be gathered
-and returned to the submit side. (since Swift 0.9)
-			</para>
-		</section>
-	</section>
-
-	<section id="sitecatalog"><title>The Site Catalog - sites.xml</title>
-		<para>
-The site catalog lists details of each site that Swift can use. The default
-file contains one entry for local execution, and a large number of
-commented-out example entries for other sites.
-		</para>
-
-		<para>
-By default, the site catalog is stored in <filename>etc/sites.xml</filename>.
-This path can be overridden with the <literal>sites.file</literal> configuration property,
-either in the Swift configuration file or on the command line.
-		</para>
-
-		<para>
-The sites file is formatted as XML. It consists of <literal><pool></literal> elements,
-one for each site that Swift will use.
-		</para>
-
-		<section><title>Pool element</title>
-		<para>
-Each <literal>pool</literal> element must have a <literal>handle</literal> attribute, giving a symbolic name
-for the site. This can be any name, but must correspond to entries for
-that site in the transformation catalog.
-		</para>
-
-		<para>
-Optionally, the <literal>gridlaunch</literal> attribute can be used to specify the path to
-<link linkend="kickstart">kickstart</link> on the site.
-		</para>
-
-		<para>
-Each <literal>pool</literal> must specify a file transfer method, an execution method
-and a remote working directory. Optionally, <link linkend="profiles">profile settings</link> can be specified.
-		</para>
-
-</section>
-<section><title>File transfer method</title>
-
-		<para>
-Transfer methods are specified with either
-the <literal><gridftp></literal> element or the
-<literal><filesystem></literal> element.
-		</para>
-		<para>
-To use gridftp or local filesystem copy, use the <literal><gridftp></literal>
-element:
-<screen>
-<gridftp  url="gsiftp://evitable.ci.uchicago.edu" />
-</screen>
-The <literal>url</literal> attribute may specify a GridFTP server, using the gsiftp URI scheme;
-or it may specify that filesystem copying will be used (which assumes that
-the site has access to the same filesystem as the submitting machine) using
-the URI <literal>local://localhost</literal>.
-		</para>
-		<para>
-Filesystem access using scp (the SSH copy protocol) can be specified using the
-<literal><filesystem></literal> element:
-<screen>
-<filesystem url="www11.i2u2.org" provider="ssh"/>
-</screen>
-For additional ssh configuration information, see the ssh execution
-provider documentation below.
-		</para>
-		<para>
-Filesystem access using <link linkend="coasters">CoG coasters</link> can be
-also be specified using the <literal><filesystem></literal> element. More detail about
-configuring that can be found in the <link linkend="coasters">CoG
-coasters</link> section.
-		</para>
-</section>
-
-<section><title>Execution method</title>
-
-		<para>
-Execution methods may be specified either with the <literal><jobmanager></literal>
-or <literal><execution></literal> element.
-		</para>
-
-		<para>
-The <literal><jobmanager></literal> element can be used to specify
-execution through GRAM2. For example,
-<screen>
-    <jobmanager universe="vanilla" url="evitable.ci.uchicago.edu/jobmanager-fork" major="2" />
-</screen>
-The <literal>universe</literal> attribute should always be set to vanilla. The
-<literal>url</literal> attribute
-should specify the name of the GRAM2 gatekeeper host, and the name of the
-jobmanager to use. The major attribute should always be set to 2.
-		</para>
-
-		<para>
-The <literal><execution></literal> element can be used to specify
-execution through other execution providers:
-		</para>
-		<para>
-To use GRAM4, specify the <literal>gt4</literal> provider. For example:
-<screen>
-<execution provider="gt4" jobmanager="PBS" url="tg-grid.uc.teragrid.org" />
-</screen>
-The <literal>url</literal> attribute should specify the GRAM4 submission site.
-The <literal>jobmanager</literal>
-attribute should specify which GRAM4 jobmanager will be used.
-		</para>
-
-		<para>
-For local execution, the <literal>local</literal> provider should be used,
-like this:
-<screen>
-<execution provider="local" url="none" />
-</screen>
-		</para>
-
-		<para>
-For PBS execution, the <literal>pbs</literal> provider should be used:
-<screen>
-<execution provider="pbs" url="none" />
-</screen>
-The <literal><link linkend="profile.globus.queue">GLOBUS::queue</link></literal> profile key
-can be used to specify which PBS queue jobs will be submitted to.
-		</para>
-
-		<para>
-For execution through a local Condor installation, the <literal>condor</literal>
-provider should be used. This provider can run jobs either in the default
-vanilla universe, or can use Condor-G to run jobs on remote sites.
-		</para>
-		<para>
-When running locally, only the <literal><execution></literal> element
-needs to be specified:
-<screen>
-<execution provider="condor" url="none" />
-</screen>
-		</para>
-		<para>
-When running with Condor-G, it is necessary to specify the Condor grid
-universe and the contact string for the remote site. For example:
-<screen>
- <execution provider="condor" />
- <profile namespace="globus" key="jobType">grid</profile>
- <profile namespace="globus" key="gridResource">gt2 belhaven-1.renci.org/jobmanager-fork</profile>
-</screen>
-		</para>
-
-		<para>
-For execution through SSH, the <literal>ssh</literal> provider should be used:
-<screen>
-<execution url="www11.i2u2.org" provider="ssh"/>
-</screen>
-with configuration made in <filename>~/.ssh/auth.defaults</filename> with
-the string 'www11.i2u2.org' changed to the appropriate host name:
-<screen>
-www11.i2u2.org.type=key
-www11.i2u2.org.username=hategan
-www11.i2u2.org.key=/home/mike/.ssh/i2u2portal
-www11.i2u2.org.passphrase=XXXX
-</screen>
-		</para>
-		<para>
-For execution using the
-<link linkend="coasters">CoG Coaster mechanism</link>, the <literal>coaster</literal> provider
-should be used:
-<screen>
-<execution provider="coaster" url="tg-grid.uc.teragrid.org"
-    jobmanager="gt2:gt2:pbs" />
-</screen>
-More details about configuration of coasters can be found in the
-<link linkend="coasters">section on coasters</link>.
-		</para>
-</section>
-<section><title>Work directory</title>
-
-		<para>
-The <literal>workdirectory</literal> element specifies where on the site files can be
-stored.
-<screen>
-<workdirectory>/home/benc</workdirectory>
-</screen>
-This file must be accessible through the transfer mechanism specified
-in the <literal><gridftp></literal> element and also mounted on all worker nodes that
-will be used for execution. A shared cluster scratch filesystem is
-appropriate for this.
-		</para>
-
-</section>
-<section><title>Profiles</title>
-
-		<para>
-<link linkend="profiles">Profile keys</link> can be specified using
-the <profile> element. For example:
-<screen>
-<profile namespace="globus" key="queue">fast</profile>
-</screen>
-		</para>
-		</section>
-
-		<para>
-The site catalog format is an evolution of the VDS site catalog format which
-is documented
-<ulink url="http://vds.uchicago.edu/vds/doc/userguide/html/H_SiteCatalog.html">here</ulink>.
-		</para>
-	</section>
-
-	<section id="transformationcatalog"><title>The Transformation Catalog - tc.data</title>
-		<para>
-The transformation catalog lists where application executables are located
-on remote sites.
-		</para>
-		<para>
-By default, the site catalog is stored in <filename>etc/tc.data</filename>.
-This path can be overridden with the <literal>tc.file</literal> configuration property,
-either in the Swift configuration file or on the command line.
-		</para>
-		<para>
-The format is one line per executable per site, with fields separated by
-tabs. Spaces cannot be used to separate fields.
-		</para>
-		<para>Some example entries:
-<screen>
-localhost  echo    /bin/echo       INSTALLED       INTEL32::LINUX  null
-localhost  touch   /bin/touch      INSTALLED       INTEL32::LINUX  null
-
-TGUC       touch   /usr/bin/touch  INSTALLED       INTEL32::LINUX  GLOBUS::maxwalltime="0:1"
-TGUC       R       /usr/bin/R      INSTALLED       INTEL32::LINUX  env::R_LIBS=/home/skenny/R_libs
-</screen>
-		</para>
-		<para>
-The fields are: site, transformation name, executable path, installation
-status, platform, and profile entries.
-		</para>
-		<para>
-The site field should correspond to a site name listed in the sites
-catalog.</para>
-		<para>
-The transformation name should correspond to the transformation name
-used in a SwiftScript <literal>app</literal> procedure.
-		</para>
-		<para>
-The executable path should specify where the particular executable is
-located on that site.
-		</para>
-		<para>
-The installation status and platform fields are not used. Set them to
-<literal>INSTALLED</literal> and <literal>INTEL32::LINUX</literal> respectively.
-		</para>
-		<para>
-The profiles field should be set to <literal>null</literal> if no profile entries are to be
-specified, or should contain the profile entries separated by semicolons.
-		</para>
-
-	<section id="transformationcatalog.shell_invocation"><title>shell invocation</title>
-		<para>
-Because the above implementation requires an entry for each executable on a given site, it is often preferable to simply have a single element for each site in the transformation catalog representing a wrapper that sets the environment and then invokes the shell to call a given application. This wrapper is installed on the site and can be used to set the <literal>PATH</literal> and other environment variables prior to invoking the shell so that each executable need not be entered in the <literal>tc.data</literal> file.
-		</para>
-		<para>for example, the entries for <literal>TGUC</literal> and <literal>localhost</literal> can now each be collapsed into a single line:
-<screen>
-localhost  shell    /usr/local/bin/swiftshell       INSTALLED       INTEL32::LINUX  null
-TGUC       shell    /usr/local/bin/swiftshell       INSTALLED       INTEL32::LINUX  null
-</screen>
-where <literal>swiftshell</literal> sets up the users's environment so that all the installed applications are added to the <literal>PATH</literal> before the application is invoked.
-		</para>
-<para>
-<literal>touch</literal> would now be called in the SwiftScript like this:
-<screen>
-app (file tout) shelltest(){
-    shell "touch" @filename(tout);
-}
-</screen>
-</para>
-		</section>
-	</section>
-
-	<section id="buildoptions"><title>Build options</title>
-		<para>
-See <ulink url="http://www.ci.uchicago.edu/swift/downloads/">the
-Swift download page</ulink> for instructions on downloading and
-building Swift from source. When building, various build options can
-be supplied on the ant commandline. These are summarised here:
-		</para>
-		<para>
-<literal>with-provider-condor</literal> - build with CoG condor provider
-		</para>
-		<para>
-<literal>with-provider-coaster</literal> - build with CoG coaster provider (see
-<link linkend="coasters">the section on coasters</link>). Since 0.8,
-coasters are always built, and this option has no effect.
-		</para>
-		<para>
-<literal>with-provider-deef</literal> - build with Falkon provider deef. In order for this
-option to work, it is necessary to check out the provider-deef code in
-the cog/modules directory alongside swift:
-
-			<screen>
-$ <userinput>cd cog/modules</userinput>
-$ <userinput>svn co https://svn.ci.uchicago.edu/svn/vdl2/provider-deef</userinput>
-$ <userinput>cd ../swift</userinput>
-$ <userinput>ant -Dwith-provider-deef=true redist</userinput>
-			</screen>
-
-		</para>
-		<para>
-<literal>with-provider-wonky</literal> - build with provider-wonky, an execution provider
-that provides delays and unreliability for the purposes of testing Swift's
-fault tolerance mechanisms. In order for this option to work, it is
-necessary to check out the provider-wonky code in the <filename>cog/modules</filename>
-directory alongside swift:
-
-			<screen>
-$ <userinput>cd cog/modules</userinput>
-$ <userinput>svn co https://svn.ci.uchicago.edu/svn/vdl2/provider-wonky</userinput>
-$ <userinput>cd ../swift</userinput>
-$ <userinput>ant -Dwith-provider-wonky=true redist</userinput>
-			</screen>
-		</para>
-		<para>
-<literal>no-supporting</literal> - produces a distribution without supporting commands such
-as <command>grid-proxy-init</command>. This is intended for when the Swift distribution will be
-used in an environment where those commands are already provided by other
-packages, where the Swift package should be providing only Swift
-commands, and where the presence of commands such as grid-proxy-init from
-the Swift distribution in the path will mask the presence of those
-commands from their true distribution package such as a Globus Toolkit
-package.
-<screen>
-$ <userinput>ant -Dno-supporting=true redist</userinput>
-</screen>
-		</para>
-	</section>
-
-	<section id="kickstart"> <title>Kickstart</title>
-		<para>
-
-Kickstart is a tool that can be used to gather various information
-about the remote execution environment for each job that Swift tries
-to run.
-		</para>
-
-		<para>
-For each job, Kickstart generates an XML <firstterm>invocation
-record</firstterm>. By default this record is staged back to the submit
-host if the job fails.
-		</para>
-
-		<para>
-Before it can be used it must be installed on the remote site and
-the sites file must be configured to point to kickstart.
-		</para>
-
-		<para>
-Kickstart can be downloaded as part of the Pegasus 'worker package' available
-from the worker packages section of <ulink url="http://pegasus.isi.edu/code.php">the Pegasus download page</ulink>.
-		</para>
-		<para>
-Untar the relevant worker package somewhere where it is visible to all of the
-worker nodes on the remote execution machine (such as in a shared application
-filesystem).
-		</para>
-
-<para>Now configure the gridlaunch attribute of the sites catalog
-to point to that path, by adding a <parameter>gridlaunch</parameter>
-attribute to the <function>pool</function> element in the site
-catalog:
-
-<screen>
-
-<pool handle="example" gridlaunch="/usr/local/bin/kickstart" sysinfo="INTEL32::LINUX">
-[...]
-</pool>
-
-</screen>
-
-		</para>
-
-		<para>
-There are various kickstat.* properties, which have sensible default
-values. These are documented in <link linkend="engineconfiguration">the
-properties section</link>.
-		</para>
-
-
-
-	</section>
-
-	<section id="reliability"><title>Reliability mechanisms</title>
-	<para>
-This section details reliabilty mechanisms in Swift: retries, restarts
-and replication.
-	</para>
-
-	<section id="retries"> <title>Retries</title>
-		<para>
-If an application procedure execution fails, Swift will attempt that
-execution again repeatedly until it succeeds, up until the limit
-defined in the <literal>execution.retries</literal> configuration
-property.
-		</para>
-		<para>
-Site selection will occur for retried jobs in the same way that it happens
-for new jobs. Retried jobs may run on the same site or may run on a
-different site.
-		</para>
-		<para>
-If the retry limit <literal>execution.retries</literal> is reached for an
-application procedure, then that application procedure will fail. This will
-cause the entire run to fail - either immediately (if the
-<literal>lazy.errors</literal> property is <literal>false</literal>) or
-after all other possible work has been attempted (if the
-<literal>lazy.errors</literal> property is <literal>true</literal>).
-		</para>
-	</section>
-
-	<section id="restart"> <title>Restarts</title>
-		<para>
-If a run fails, Swift can resume the program from the point of
-failure. When a run fails, a restart log file will be left behind in
-a file named using the unique job ID and a <filename>.rlog</filename> extension. This restart log
-can then be passed to a subsequent Swift invocation using the <literal>-resume</literal>
-parameter. Swift will resume execution, avoiding execution of invocations
-that have previously completed successfully. The SwiftScript source file
-and input data files should not be modified between runs.
-		</para>
-		<para>
-Every run creates a restart
-log file with a named composed of the file name of the workflow
-being executed, an invocation ID, a numeric ID, and the <filename
-class="file">.rlog</filename> extension. For example, <filename
-class="file">example.swift</filename>, when executed, could produce
-the following restart log file: <filename
-class="file">example-ht0adgi315l61.0.rlog</filename>. Normally, if
-the run completes successfully, the restart log file is
-deleted. If however the workflow fails, <command>swift</command>
-can use the restart log file to continue
-execution from a point before the
-failure occurred. In order to restart from a restart log
-file, the <option>-resume <parameter><filename
-class="file">logfile</filename></parameter></option> argument can be
-used after the SwiftScript program file name. Example:
-
-<screen>
-<prompt>$</prompt> <command>swift</command> <option>-resume <filename
-class="file">example-ht0adgi315l61.0.rlog</filename></option> <option><filename
-class="file">example.swift</filename></option>.
-</screen>
-
-		</para>
-	</section>
-
-	<section id="replication"><title>Replication</title>
-		<para>
-When an execution job has been waiting in a site queue for a certain
-period of time, Swift can resubmit replicas of that job (up to the limit
-defined in the <literal>replication.limit</literal> configuration property).
-When any of those jobs moves from queued to active state, all of the
-other replicas will be cancelled.
-		</para>
-		<para>
-This is intended to deal with situations where some sites have a substantially
-longer (sometimes effectively infinite) queue time than other sites.
-Selecting those slower sites can cause a very large delay in overall run time.
-		</para>
-		<para>
-Replication can be enabled by setting the
-<literal>replication.enabled</literal> configuration property to
-<literal>true</literal>. The maximum number of replicas that will be
-submitted for a job is controlled by the <literal>replication.limit</literal>
-configuration property.
-		</para>
-		<para>
-When replication is enabled, Swift will also enforce the
-<literal>maxwalltime</literal> profile setting for jobs as documented in
-the <link linkend="profiles">profiles section</link>.
-		</para>
-	</section>
-
-	</section>
-
-	<section id="clustering"><title>Clustering</title>
-		<para>
-Swift can group a number of short job submissions into a single larger
-job submission to minimize overhead involved in launching jobs (for example,
-caused by security negotiation and queuing delay). In general,
-<link linkend="coasters">CoG coasters</link> should be used in preference
-to the clustering mechanism documented in this section.
-		</para>
-
-		<para>
-By default, clustering is disabled. It can be activated by setting the
-<link linkend="property.clustering.enabled">clustering.enabled</link>
-property to true.
-		</para>
-
-		<para>
-A job is eligible for clustering if
-the <link linkend="profile.globus.maxwalltime"><property>GLOBUS::maxwalltime</property></link> profile is specified in the <filename
-type="file">tc.data</filename> entry for that job, and its value is
-less than the value of the
-<link linkend="property.clustering.min.time"><property>clustering.min.time</property></link>
-property.
-		</para>
-
-		<para>
-Two or more jobs are considered compatible if they share the same site
-and do not have conflicting profiles (e.g. different values for the same
-environment variable).
-		</para>
-
-		<para>
-When a submitted job is eligible for clustering,
-it will be put in a clustering queue rather than being submitted to
-a remote site. The clustering queue is processed at intervals
-specified by the
-<link linkend="property.clustering.queue.delay"><property>clustering.queue.delay</property></link>
-property. The processing of the clustering queue consists of selecting
-compatible jobs and grouping them into clusters whose maximum wall time does
-not exceed twice the value of the <property>clustering.min.time</property>
-property.
-		</para>
-
-
-	</section>
-
-
-
-	<section id="coasters"><title>Coasters</title>
-<para>Coasters were introduced in Swift v0.6 as an experimental feature.
-</para>
-<para>
-In many applications, Swift performance can be greatly enhanced by the
-use of CoG coasters. CoG coasters provide a low-overhead job submission
-and file transfer mechanism suited for the execution of short jobs
-(on the order of a few seconds) and the transfer of small files (on the
-order of a few kilobytes) for which other grid protocols such as GRAM
-and GridFTP are poorly suited.
-</para>
-<para>
-The coaster mechanism submits a head job using some other execution
-mechanism such as GRAM, and for each worker node that will be used in
-a remote cluster, it submits a worker job, again using some other
-execution mechanism such as GRAM. Details on the design of the coaster
-mechanism can be found
-<ulink url="http://wiki.cogkit.org/wiki/Coasters">
-here.</ulink>
-</para>
-<para>
-The head job manages file transfers and the dispatch of execution jobs
-to workers. Much of the overhead associated with other grid protocols
-(such as authentication and authorization, and allocation of worker nodes
-by the site's local resource manager) is reduced, because that overhead
-is associated with the allocation of a coaster head or coaster worker,
-rather than with every Swift-level procedure invocation; potentially hundreds
-or thousands of Swift-level procedure invocations can be run through a single
-worker.
-</para>
-<para>
-Coasters can be configured for use in two situations: job execution and
-file transfer.
-</para>
-<para>
-To use for job execution, specify a sites.xml execution element like this:
-<screen>
-<execution provider="coaster" jobmanager="gt2:gt2:pbs" url="grid.myhost.org">
-</screen>
-The jobmanager string contains more detail than with other providers. It
-contains either two or three colon separated fields:
-1:the provider to be use to execute the coaster head job - this provider
-will submit from the Swift client side environment. Commonly this will be
-one of the GRAM providers; 2: the provider
-to be used to execute coaster worker jobs. This provider will be used
-to submit from the coaster head job environment, so a local scheduler
-provider can sometimes be used instead of GRAM. 3: optionally, the
-jobmanager to be used when submitting worker job using the provider
-specified in field 2.
-</para>
-<para>
-To use for file transfer, specify a sites.xml filesystem element like this:
-<screen>
-<filesystem provider="coaster" url="gt2://grid.myhost.org" />
-</screen>
-The url parameter should be a pseudo-URI formed with the URI scheme being
-the name of the provider to use to submit the coaster head job, and the
-hostname portion being the hostname to be used to execute the coaster
-head job. Note that this provider and hostname will be used for execution
-of a coaster head job, not for file transfer; so for example, a GRAM
-endpoint should be specified here rather than a GridFTP endpoint.
-</para>
-<para>
-Coasters are affected by the following profile settings, which are
-documented in the <link linkend="profile.globus">Globus namespace profile
-section</link>:
-</para>
-
-<table frame="all">
- <tgroup cols="2" align="left" colsep="1" rowsep="1">
-  <thead><row><entry>profile key</entry><entry>brief description</entry></row></thead>
-  <tbody>
-   <row><entry>slots</entry><entry>How many maximum LRM jobs/worker blocks are allowed</entry></row>
-   <row><entry>workersPerNode</entry><entry>How many coaster workers to run per execution node</entry></row>
-   <row><entry>nodeGranularity</entry><entry>Each worker block uses a number of nodes that is a multiple of this number</entry></row>
-   <row><entry>lowOverallocation</entry><entry>How many times larger than the job walltime should a block's walltime be if all jobs are 1s long</entry></row>
-   <row><entry>highOverallocation</entry><entry>How many times larger than the job walltime should a block's walltime be if all jobs are infinitely long</entry></row>
-   <row><entry>overallocationDecayFactor</entry><entry>How quickly should the overallocation curve tend towards the highOverallocation as job walltimes get larger</entry></row>
-   <row><entry>spread</entry><entry>By how much should worker blocks vary in worker size</entry></row>
-   <row><entry>workersPerNode</entry><entry>How many coaster workers to run per execution node</entry></row>
-   <row><entry>reserve</entry><entry>How many seconds to reserve in a block's walltime for starting/shutdown operations</entry></row>
-   <row><entry>maxnodes</entry><entry>The maximum number of nodes allowed in a block</entry></row>
-   <row><entry>maxtime</entry><entry>The maximum number of walltime allowed for a block</entry></row>
-   <row><entry>remoteMonitorEnabled</entry><entry>If true, show a graphical display of the status of the coaster service</entry></row>
-  </tbody>
- </tgroup>
-</table>
-	</section>
-	<section id="localhowtos"><title>How-To Tips for Specific User Communities</title>
-		<section id="savinglogs"><title>Saving Logs - for UChicago CI Users</title>
-			<para>
-If you have a UChicago Computation Institute account, run this command in your
-submit directory after each run. It will copy all your logs and kickstart
-records into a directory at the CI for reporting, usage tracking, support and debugging.
-			</para>
-			<para>
-<screen>
-rsync --ignore-existing *.log *.d login.ci.uchicago.edu:/disks/ci-gpfs/swift/swift-logs/ --verbose
-</screen>
-			</para>
-		</section>
-		<section><title>Specifying TeraGrid allocations</title>
-<para>TeraGrid users with no default project or with several project
-allocations can specify a project allocation using a profile key in
-the site catalog entry for a TeraGrid site:
-<screen>
-<profile namespace="globus" key="project">TG-CCR080002N</profile>
-</screen>
-</para>
-
-<para>
-More information on the TeraGrid allocations process can
-be found <ulink url="http://www.teragrid.org/userinfo/access/allocations.php">here</ulink>.
-</para>
-
-		</section>
-		<section id="tips.mpi"><title>Launching MPI jobs from Swift</title>
-<para>
-Here is an example of running a simple MPI program.
-</para>
-<para>
-In SwiftScript, we make an invocation that does not look any different
-from any other invocation. In the below code, we do not have any input
-files, and have two output files on stdout and stderr:
-<programlisting>
-type file;
-
-(file o, file e) p() {
-    app {
-        mpi stdout=@filename(o) stderr=@filename(e);
-    }
-}
-
-file mpiout <"mpi.out">;
-file mpierr <"mpi.err">;
-
-(mpiout, mpierr) = p();
-</programlisting>
-</para>
-<para>
-Now we define how 'mpi' will run in tc.data:
-<screen>
-tguc    mpi             /home/benc/mpi/mpi.sh   INSTALLED       INTEL32::LINUX GLOBUS::host_xcount=3
-</screen>
-</para>
-<para>
-mpi.sh is a wrapper script that launches the MPI program. It must be installed
-on the remote site:
-<screen>
-#!/bin/bash
-mpirun -np 3 -machinefile $PBS_NODEFILE /home/benc/mpi/a.out
-</screen>
-</para>
-<para>
-Because of the way that Swift runs its server side code, provider-specific
-MPI modes (such as GRAM jobType=mpi) should not be used. Instead, the
-mpirun command should be explicitly invoked.
-</para>
-		</section>
-		<section id="tips.windows"><title>Running on Windows</title>
-			<para>
-
-				Since 10/11/09, the development version of Swift has the
-ability to run on a Windows machine, as well as the ability to submit
-jobs to a Windows site (provided that an appropriate provider is used).
-
-			</para>
-			<para>
-
-In order to launch Swift on Windows, use the provided batch file
-(swift.bat). In certain cases, when a large number of jar libraries are
-present in the Swift lib directory and depending on the exact location
-of the Swift installation, the classpath environment variable that the
-Swift batch launcher tries to create may be larger than what Windows can
-handle. In such a case, either install Swift in a directory closer to
-the root of the disk (say, c:\swift) or remove non-essential jar files
-from the Swift lib directory.
-
-			</para>
-
-			<para>
-
-Due to the large differences between Windows and Unix environments,
-Swift must use environment specific tools to achieve some of its goals.
-In particular, each Swift executable is launched using a wrapper script.
-This script is a Bourne Shell script. On Windows machines, which have no
-Bourne Shell interpreter installed by default, the Windows Scripting
-Host is used instead, and the wrapper script is written in VBScript.
-Similarly, when cleaning up after a run, the "/bin/rm" command available
-in typical Unix environments must be replaced by the "del" shell command.
-
-			</para>
-
-			<para>
-
-It is important to note that in order to select the proper set of tools
-to use, Swift must know when a site runs under Windows. To inform Swift
-of this, specify the "sysinfo" attribute for the "pool" element in the
-site catalog. For example:
-
-<programlisting>
-	<pool handle="localhost" sysinfo="INTEL32::WINDOWS">
-	...
-	</pool>
-</programlisting>
-
-			</para>
-		</section>
-	</section>
-
-  <section id="cdm">
-    <title>Collective Data Management</title>
-
-    <programlisting>
-
-CDM Features
-
-Overview:
-
-   1. The user specifies a CDM policy in a file, customarily fs.data.
-   2. fs.data is given to Swift on the command line.
-   3. The Swift data module (org.globus.swift.data) is informed of the CDM policy.
-   4. At job launch time, the VDL Karajan code queries the CDM policy,
-         1. altering the file staging phase, and
-         2. sending fs.data to the compute site.
-   5. At job run time, the Swift wrapper script
-         1. consults a Perl script to obtain policy, and
-         2. uses wrapper extensions to modify data movement.
-   6. Similarly, stage out can be changed.
-
-
-Command line:
-
-    * swift -sites.file sites.xml -tc.file tc.data -cdm.file fs.data stream.swift
-
-
-CDM policy file format:
-
-Example:
-
-# Describe CDM for my job
-property GATHER_LIMIT 1
-rule .*input.txt DIRECT /gpfs/homes/wozniak/data
-rule .*xfile*.data BROADCAST /dev/shm
-rule .* DEFAULT
-
-The lines contain:
-
-   1. A directive, either rule or property
-   2. A rule has:
-         1. A regular expression
-         2. A policy token
-         3. Additional policy-specific arguments
-   3. A property has
-         1. A policy property token
-         2. The token value
-   4. Comments with # .
-   5. Blank lines are ignored.
-
-
-Notes:
-
-   1. The policy file is used as a lookup database by Swift and Perl methods.
-   2. For example, a lookup with the database above given the argument input.txt would result in the Direct policy.
-   3. If the lookup does not succeed, the result is DEFAULT.
-   4. Policies are listed as subclasses of org.globus.swift.data.policy.Policy .
-
-
-Policy Descriptions:
-
-Default:
-
-    * Just use file staging as provided by Swift/CoG.  Identical to behavior if no CDM file is given.
-
-
-Broadcast:
-
-        rule .*xfile*.data BROADCAST /dev/shm
-
-    * The input files matching the pattern will be stored in the given directory, an LFS location, with links in the job directory.
-    * On the BG/P, this will make use of the f2cn tool.
-    * On machines not implementing an efficient broadcast, we will just ensure correctness.  For example, on a workstation, the location could be in a /tmp RAM FS.
-
-
-Direct:
-
-        rule .*input.txt DIRECT /gpfs/scratch/wozniak/
-
-    * Allows for direct I/O to the parallel FS without staging.
-    * The input files matching the pattern must already exist in the given directory, a GFS location.  Links will be placed in the job directory.
-    * The output files matching the pattern will be stored in the given directory, with links in the job directory.
-    * Example: In the rule above, the Swift-generated file name ./data/input.txt would be accessed by the user job in /gpfs/homes/wozniak/data/input.txt .
-
-
-Local:
-
-        rule .*input.txt LOCAL dd /gpfs/homes/wozniak/data obs=64K
-
-    * Allows for client-directed input copy to the compute node.
-    * The user may specify cp or dd as the input transfer program.
-
-    * The input files matching the pattern must already exist in the given directory, a GFS location.  Copies will be placed in the job directory.
-    * Argument list: [tool] [GFS directory] [tool arguments]*
-
-
-Gather:
-
-    property GATHER_LIMIT 500000000 # 500 MB
-    property GATHER_DIR /dev/shm/gather
-    property GATHER_TARGET /gpfs/wozniak/data/gather_target
-    rule .*.output.txt GATHER
-
-    * The output files matching the pattern will be present to tasks in the job directory as usual but noted in a _swiftwrap shell array GATHER_OUTPUT.
-    * The GATHER_OUTPUT files will be cached in the GATHER_DIR, an LFS location.
-    * The cache will be flushed when a job ends if a du on GATHER_DIR exceeds GATHER_LIMIT.
-    * As the cache fills or on stage out, the files will be bundled into randomly named tarballs in GATHER_TARGET, a GFS location.
-    * If the compute node is an SMP, GATHER_DIR is a shared resource.  It is protected by the link file GATHER_DIR/.cdm.lock .
-    * Unpacking the tarballs in GATHER_TARGET will produce the user-specified filenames.
-
-    Summary:
-
-   1. Files created by application
-   2. Acquire lock
-   3. Move files to cache
-   4. Check cache size
-   5. If limit exceeded, move all cache files to outbox
-   6. Release lock
-   7. If limit was exceeded, stream outbox as tarball to target
-
-
-    Notes:
-
-    * Gather required quite a bit of shell functionality to manage the lock, etc. This is placed in cdm_lib.sh .
-    * vdl_int.k needed an additional task submission (cdm_cleanup.sh) to perform the final flush at workflow completion time .  This task also uses cdm_lib.sh .
-
-
-VDL/Karajan processing:
-
-   1. CDM functions are available in Karajan via the cdm namespace.
-   2. These functions are defined in org.globus.swift.data.Query .
-   3. If CDM is enabled, VDL skips file staging for files unless the policy is DEFAULT.
-
-
-Swift wrapper CDM routines:
-
-   1. The cdm.pl script is shipped to the compute node if CDM is enabled.
-   2. When linking in inputs, CDM is consulted by _swiftwrap:cdm_lookup().
-   3. The cdm_action() shell function handles CDM methods, typically just producing a link.
-
-
-Test case:
-
-(See About.txt for more information.)
-
-   1. Simple test cases are in:
-      https://svn.mcs.anl.gov/repos/wozniak/collab/cdm/scripts/cdm-direct and
-      https://svn.mcs.anl.gov/repos/wozniak/collab/cdm/scripts/all-pairs
-   2. Do a:
-      mkdir cdm
-      cd cdm
-      svn co https://svn.mcs.anl.gov/repos/wozniak/collab/cdm/scripts
-   3. In cdm-direct, run:
-      source ./setup.sh local local local
-   4. Run workflow:
-      swift -sites.file sites.xml -tc.file tc.data -cdm.file fs.data stream.swift
-   5. Note that staging is skipped for input.txt
-      policy: file://localhost/input.txt : DIRECT
-      FILE_STAGE_IN_START file=input.txt ...
-      FILE_STAGE_IN_SKIP file=input.txt policy=DIRECT
-      FILE_STAGE_IN_END file=input.txt ...
-   6. In the wrapper output, the input file is handled by CDM functionality:
-      Progress  2010-01-21 13:50:32.466572727-0600  LINK_INPUTS
-      CDM_POLICY: DIRECT /homes/wozniak/cdm/scripts/cdm-direct
-      CDM: jobs/t/cp_sh-tkul4nmj input.txt DIRECT /homes/wozniak/cdm/scripts/cdm-direct
-      CDM[DIRECT]: Linking jobs/t/cp_sh-tkul4nmj/input.txt to /homes/wozniak/cdm/scripts/cdm-direct/input.txt
-      Progress  2010-01-21 13:50:32.486016708-0600  EXECUTE
-   7. all-pairs is quite similar but uses more policies.
-
-
-PTMap case:
-
-   1. Start with vanilla PTMap:
-      cd cdm
-      mkdir apps
-      cd apps
-      https://svn.mcs.anl.gov/repos/wozniak/collab/cdm/apps/ptmap
-   2. Source setup.sh
-   3. Use start.sh, which
-         1. applies CDM policy from fs.local.data
-
-
-CDM site-aware policy file format:
-
-Example:
-
-# Describe CDM for my job
-# Use DIRECT and BROADCAST if on cluster1, else use DEFAULT behavior
-property GATHER_LIMIT 1
-rule cluster1 .*input.txt DIRECT /gpfs/homes/wozniak/data
-rule cluster1 .*xfile*.data BROADCAST /dev/shm
-rule ANYWHERE .* DEFAULT
-
-The lines contain:
-
-   1. A directive, either rule or property
-   2. A rule has:
-         1. A regular expression for site matchin
-         2. A regular expression for filename matching
-         3. A policy token
-         4. Additional policy-specific arguments
-   3. A property has
-         1. A policy property token
-         2. The token value
-   4. Comments with # .
-   5. Blank lines are ignored.
-
-    </programlisting>
-
-  </section>
-</article>
-




More information about the Swift-commit mailing list