[Swift-commit] r3363 - text/parco10submission

Tue Jun 15 15:30:32 CDT 2010

Author: wozniak
Date: 2010-06-15 15:30:32 -0500 (Tue, 15 Jun 2010)
New Revision: 3363

Modified:
   text/parco10submission/paper.tex
Log:
Initial reorg of Section 2.


Modified: text/parco10submission/paper.tex
===================================================================

--- text/parco10submission/paper.tex	2010-06-15 20:17:01 UTC (rev 3362)
+++ text/parco10submission/paper.tex	2010-06-15 20:30:32 UTC (rev 3363)
@@ -162,10 +162,8 @@
 \section{The SwiftScript language}
 \label{Language}
 
-\subsection{Swift language concepts}
-
 The Swift programming model is data-oriented: it encapsulates the
-invocation of ``ordinary programs'' - technically, POSIX \emph{exec()}
+invocation of ``ordinary programs'' - technically, POSIX {\tt exec()}
 operations - in a manner that explicitly specifies the files and other
 arguments that are the inputs and outputs of each program
 invocation. This formal but simple model (elaborated in section
@@ -181,24 +179,23 @@
 workstation. The same script can then be executed on a cluster, one or
 more grids of clusters, and on large scale parallel supercomputers
 such as the Sun Constellation (ref) or the IBM Blue Gene/P.  (section
-\ref{ExecutingSites})
+\ref{ExecutingSites}). Notable features include:
 
 \item Automatic parallelization of program invocations, invoking
-programs that have no data dependencies in parallel (section
-\ref{Language})
+  programs that have no data dependencies in parallel;
 
 \item Automatic balancing work over available resources based
 on adaptive algorithms that account for both resource performance
 and reliability, and which throttle program invocations at a rate
-appropriate for each execution location and mechanism (section
-\ref{ExecutingSites}).
+appropriate for each execution location and mechanism;
 
-\item Reliability through retry and relocation of failed executions
-and restart of interrupted scripts from the point of
-failure. (section \ref{ExecutingReliably})
+\item Reliability through replication and automatic resubmission of
+  failed executions and restart of interrupted scripts from the point
+  of failure;
 
-\item Recording the provenance of data objects produced by a Swift
-script (section \ref{Provenance}).
+\item Formalizing the creation and management of data objects in the
+  language and recording the provenance of data objects produced by a
+  Swift script.
 
 \end{itemize}
 
@@ -209,42 +206,37 @@
 language, which makes the benefits above possible, can be summarized
 as follows:
 
+\subsection{Language basics}
+
+
+A Swift script describes data, application components, invocations
+of applications components, and the inter-relations (data flow)
+between those invocations, using a C-like syntax.
 Swift scripts are written as a set of procedures, composed upwards,
 starting with \emph{atomic procedures} which specify the execution of
-component programs, and then higher level procedures are composed as
+external programs, and then higher level procedures are composed as
 pipelines (or more generally, graphs) of sub-procedures.  Atomic
 procedures specify the inputs and outputs of application programs in
 terms of files and other parameters. Compound procedures are composed
-of a graph of calls to atomic and other compound procedures
+into a conceptual graph of calls to atomic and other compound
+procedures.
 
 Swift variables hold either primitive values, files, or collections of
-files.  Atomic variables are \emph{single assignment}, which provides
-the basis for Swift's model of procedure chaining.  Procedures are
+files. All variables are \emph{single assignment}, which provides the
+basis for Swift's model of procedure chaining.  Procedures are
 executed when their input parameters have all been set from existing
 data or prior procedure executions.  Procedures are chained by
 specifying that an output variable of one procedure is passed as the
-input variable to the second procedure.
+input variable to the second procedure. This dataflow model means that
+Swift procedures are not necessarily executed in source-code order but
+rather when their input data becomes available.
 
-% This dataflow model means that
-% Swift procedures are not necessarily executed in source-code order but
-% rather when their input data becomes available.
-
 Variables are declared with a type, and when they contain files
 are associated with a \emph{mapper} which indicates how physical
 data files are associated with the logical representation of Swift's
 data model of variables and collections.
 
-
-\subsection{Language basics}
-
-A Swift script describes data, application components, invocations
-of applications components, and the inter-relations (data flow)
-between those invocations.
-
-  Data is represented in a script by strongly-typed single-assignment
-variables, using a C-like syntax.
-
-  Types in Swift can be \emph{atomic} or \emph{composite}. An atomic
+Types in Swift can be \emph{atomic} or \emph{composite}. An atomic
 type can be either a \emph{primitive type} or a \emph{mapped type}.
 Swift provides a fixed set of primitive types, such as \emph{integer} or
 \emph{string}. A mapped type indicates that the actual data does not
@@ -278,21 +270,21 @@
 describes a functional/dataflow style interface to imperative
 components.
 
-For example, the following example lists a procedure which makes use  of
-the ImageMagick\cite{ImageMagick} convert command to rotate a supplied
-image by a specified angle:
+For example, the following example lists a procedure which makes use
+of the ImageMagick\cite{ImageMagick} convert command to rotate a
+supplied image by a specified angle:
 
- \begin{verbatim}
+\begin{verbatim}
   app (image output) rotate(image input) {
     convert "-rotate" angle @input @output;
   }
- \end{verbatim}
+\end{verbatim}
 
-A procedure is invoked using the familiar syntax:
+A procedure is invoked using a syntax similar to that of the C family:
 
- \begin{verbatim}
+\begin{verbatim}
   rotated = rotate(photo, 180);
- \end{verbatim}
+\end{verbatim}
 
 While this looks like an assignment, the actual unix level execution
 consists of invoking the command line specified in the \verb|app|
@@ -304,9 +296,9 @@
 definition of that type. We can declare it as a \emph{marker type}
 which has no structure exposed to SwiftScript:
 
- \begin{verbatim}
+\begin{verbatim}
  type image;
- \end{verbatim}
+\end{verbatim}
 
 This does not indicate that the data is unstructured; but it indicates
 that the structure of the data is not exposed to SwiftScript. Instead,
@@ -329,7 +321,7 @@
  rotated = rotate(photo, 180);
 \end{verbatim}
 
-This script can be invoked from the command line:
+This script can be invoked from the command line as:
 
 \begin{verbatim}
   $ ls *.jpeg
@@ -353,27 +345,27 @@
 \verb|filesys_mapper| maps
 all files matching a particular unix glob pattern into an array:
 
- \begin{verbatim}
+\begin{verbatim}
   file frames[] <filesys_mapper; pattern="*.jpeg">;
- \end{verbatim}
+\end{verbatim}
 
 The \verb|foreach| construct can be used to apply the same procedure
 call(s) to each element of an array:
 
- \begin{verbatim}
+\begin{verbatim}
    foreach f,ix in frames {
      output[ix] = rotate(frames, 180);
    }
- \end{verbatim}
+\end{verbatim}
 
 Sequential iteration can be expressed using the \verb|iterate| construct:
 
- \begin{verbatim}
+\begin{verbatim}
    step[0] = initialCondition();
    iterate ix {
      step[ix] = simulate(step[ix-1]);
    }
- \end{verbatim}
+\end{verbatim}
 
 This fragment will initialise the 0-th element of the \verb|step| array
 to some initial condition, and then repeatedly run the \verb|simulate|
@@ -381,17 +373,16 @@
 
 \subsection{Ordering of execution}
 
-Non-array variables are \emph{single-assignment}, which means that they
-must be assigned to exactly one value during execution. A procedure or
-expression will be executed when all of its input parameters have been
-assigned values. As a result of such execution, more variables may
-become assigned, possibly allowing further parts of the script to
-execute.
+Non-array variables are \emph{single-assignment}, which means that
+they must be assigned to exactly one value during execution. A
+procedure or expression will be executed when all of its input
+parameters have been assigned values. As a result of such execution,
+more variables may become assigned, possibly allowing further parts of
+the script to execute. In this way, scripts are implicitly
+concurrent. Aside from serialisation implied by these dataflow
+dependencies, execution of component programs can proceed without
+synchronization in time.
 
-In this way, scripts are implicitly parallel. Aside from serialisation
-implied by these dataflow dependencies, execution of component programs
-can proceed in parallel.
-
 In this fragment, execution of procedures \verb|p| and \verb|q| can
 happen in parallel:
 
@@ -413,14 +404,13 @@
 content of an array increases during execution, but cannot otherwise
 change. Once a value for a particular element is known, then it cannot
 change. Eventually, all values for an array are known, and that array
-is regarded as \emph{closed}.
+is regarded as \emph{closed}.  Statements which deal with the array as
+a whole will wait for the array to be closed before executing (thus, a
+closed array is the equivalent of a non-array type being
+assigned). However, a \verb|foreach| statement will apply its body to
+elements of an array as they become known. It will not wait until the
+array is closed.
 
-Statements which deal with the array as a whole will wait for the array
-to be closed before executing (thus, a closed array is the equivalent
-of a non-array type being assigned). However, a \verb|foreach|
-statement will apply its body to elements of an array as they become
-known. It will not wait until the array is closed.
-
 Consider this script:
  \begin{verbatim}
  file a[];
@@ -442,16 +432,14 @@
 
 \subsection{Compound procedures}
 
-As with many other programming languages, procedures consisting of SwiftScript
-code can be defined. These differ from the previously mentioned procedures
-declared with the \verb|app| keyword, as they invoke other SwiftScript
-procedures rather than a component program.
+As with many other programming languages, procedures consisting of
+SwiftScript code can be defined. These differ from the previously
+mentioned procedures declared with the \verb|app| keyword, as they
+invoke other SwiftScript procedures rather than a component
+program. The basic structure of a composite procedure may be thought
+of as a graph of calls to other procedures.
 
-The basic structure of a composite procedure is a graph of calls to
-other procedures. (TODO: does talking about call graphs make sense in
-the context of programming language-style descriptions?)
-
- \begin{verbatim}
+\begin{verbatim}
  (file output) process (file input) {
    file intermediate;
    intermediate = first(input);
@@ -461,7 +449,7 @@
  file x <"x.txt">;
  file y <"y.txt">;
  y = process(x);
- \end{verbatim}
+\end{verbatim}
 
 This will invoke two procedures, with an intermediate data file named
 anonymously connecting the \verb|first| and \verb|second| procedures.
@@ -469,7 +457,7 @@
 Ordering of execution is generally determined by execution of \verb|app|
 procedures, not by any containing procedures. In this code block:
 
- \begin{verbatim}
+\begin{verbatim}
  (file a, file b) A() {
    a = A1();
    b = A2();
@@ -478,7 +466,7 @@
  (x,y) = A();
  s = S(x);
  t = S(y);
- \end{verbatim}
+\end{verbatim}
 
 then a valid execution order is: \verb|A1 S(x) A2 S(y)|. The
 compound procedure \verb|A| does not have to have fully completed
@@ -544,8 +532,6 @@
 data storage and access methods to be plugged in to scripts.
 
 \begin{verbatim}
-  # TODO I just made this up; need to check
-  # that it actually works
   type file;
 
   app (extern o) populateDatabase() {
@@ -572,9 +558,6 @@
 The single assignment and execution ordering rules will still apply though;
 populateDatabase will always be run before analyseDatabase.
 
-TODO mappings may be to URLs, not only to local filesystem files; and more
-explicit description of what mapping is.
-
 \subsection{Swift mappers}
 Swift contains a number of built-in mappers. A representative sample
 of these is listed in table \ref{mappertable}.
@@ -597,17 +580,16 @@
 \subsection{The execution environment for component programs}
 \label{LanguageEnvironment}
 
-  A SwiftScript \verb|app| declaration describes how a component
-program is invoked. In order to ensure the correctness of the
-Swift model, the environment in which programs are executed needs
-to be constrained.
+A SwiftScript \verb|app| declaration describes how a component program
+is invoked. In order to ensure the correctness of the Swift model, the
+environment in which programs are executed needs to be constrained.
 
-  A program is invoked in its own working directory; in that working
+A program is invoked in its own working directory; in that working
 directory or one of its subdirectories, the program can expect to find
-all of the files that are passed as inputs to the application block; and
-on exit, it should leave all files named by that application block in
-the same working directory. Applications should also not assume that
-they will be executed on a particular host (to facilitate site
+all of the files that are passed as inputs to the application block;
+and on exit, it should leave all files named by that application block
+in the same working directory. Applications should also not assume
+that they will be executed on a particular host (to facilitate site
 portability), run in in any particular order with respect to other
 application invocations in a script (except those implied by data
 dependency), or that their working directories will or will not be
@@ -616,39 +598,38 @@
 Consider the \verb|app| declaration for the \verb|rotate| procedure in
 section N.
 
- \begin{verbatim}
+\begin{verbatim}
  app (file output) rotate(file input, int angle)
- \end{verbatim}
+\end{verbatim}
 
-  The procedure signature declares the inputs and outputs for this
+The procedure signature declares the inputs and outputs for this
 procedure. As in many other programming languages, this defines the
 type signatures and names of parameters; this also defines which files
-will be placed into the application working directory before execution,
-and which files will be expected there after execution. For the above
-declaration, the file mapped to the \verb|input| parameter will be
-placed in the working directory beforehand, and the file mapped to
-\verb|output| will be expected there after execution; the input
-parameter \verb|angle| is of primitive type\footnote{need to define
-primitive type earlier on here...} and so no files are staged in for
-this parameter.
+will be placed into the application working directory before
+execution, and which files will be expected there after execution. For
+the above declaration, the file mapped to the \verb|input| parameter
+will be placed in the working directory beforehand, and the file
+mapped to \verb|output| will be expected there after execution; the
+input parameter \verb|angle| is of primitive type and so no files are
+staged in for this parameter.
 
- \begin{verbatim}
+\begin{verbatim}
  convert "-rotate" angle @input @output;
- \end{verbatim}
+\end{verbatim}
 
-  The body of the \verb|app| block defines the unix command-line that
+The body of the \verb|app| block defines the unix command-line that
 will be executed when this procedure is invoked. The first token (in
-this case \verb|convert|) defines a \emph{transformation name} which is
-used to determine the unix executable name. Subsequent expressions,
+this case \verb|convert|) defines a \emph{transformation name} which
+is used to determine the unix executable name. Subsequent expressions,
 separated by spaces, define the command-line arguments for that
 executable: \verb|"-rotate"| is a string literal; angle specifies the
 value of the angle parameter; the syntax \verb|@variable| evaluates to
 the filename of the supplied variable, thus \verb|@input| and
 \verb|@output| evaluate to the filenames of the corresponding
-parameters. It should be noted that it is possible to take the filename
-of \verb|output| even though it is a return parameter; although the
-value of that variable has not yet been computed, the filename where
-that value will go is already known.
+parameters. It should be noted that it is possible to take the
+filename of \verb|output| even though it is a return parameter;
+although the value of that variable has not yet been computed, the
+filename where that value will go is already known.
 
 TODO comment (here?) about how this model appears somewhat constrained
 but provides a well defined atomicity that can be used for various
@@ -662,21 +643,23 @@
 
 \section{Execution}
 \label{Execution}
+
 Swift is implemented by compiling to a Karajan program, which provides
 several benefits. A notable benefit visible to users is that of
-providers. This enbles the Swift execution model to be extended by
+providers. This enables the Swift execution model to be extended by
 adding new data providers and job execution providers. This is
-explained in more detail in section \ref{ExecutingSites}: Executing on a remote site.
+explained in more detail in section \ref{ExecutingSites}: Executing on
+a remote site.
 
 \subsection{Executing on a remote site}
 \label{ExecutingSites}
 
-  With the above restrictions, execution of a unix program on a remote
+With the above restrictions, execution of a unix program on a remote
 site is straightforward. The Swift runtime must prepare a remote
 working directory for each job with appropriate input files staged in;
 then it must execute the program; and then it must stage the output
 files back out the submitting system. The site model used by Swift is
-shown in figure \ref{FigureSwiftModel}
+shown in Figure~\ref{FigureSwiftModel}.
 
 \begin{figure*}[htbp]
   \begin{center}