[Swift-commit] r3363 - text/parco10submission
noreply at svn.ci.uchicago.edu
noreply at svn.ci.uchicago.edu
Tue Jun 15 15:30:32 CDT 2010
Author: wozniak
Date: 2010-06-15 15:30:32 -0500 (Tue, 15 Jun 2010)
New Revision: 3363
Modified:
text/parco10submission/paper.tex
Log:
Initial reorg of Section 2.
Modified: text/parco10submission/paper.tex
===================================================================
--- text/parco10submission/paper.tex 2010-06-15 20:17:01 UTC (rev 3362)
+++ text/parco10submission/paper.tex 2010-06-15 20:30:32 UTC (rev 3363)
@@ -162,10 +162,8 @@
\section{The SwiftScript language}
\label{Language}
-\subsection{Swift language concepts}
-
The Swift programming model is data-oriented: it encapsulates the
-invocation of ``ordinary programs'' - technically, POSIX \emph{exec()}
+invocation of ``ordinary programs'' - technically, POSIX {\tt exec()}
operations - in a manner that explicitly specifies the files and other
arguments that are the inputs and outputs of each program
invocation. This formal but simple model (elaborated in section
@@ -181,24 +179,23 @@
workstation. The same script can then be executed on a cluster, one or
more grids of clusters, and on large scale parallel supercomputers
such as the Sun Constellation (ref) or the IBM Blue Gene/P. (section
-\ref{ExecutingSites})
+\ref{ExecutingSites}). Notable features include:
\item Automatic parallelization of program invocations, invoking
-programs that have no data dependencies in parallel (section
-\ref{Language})
+ programs that have no data dependencies in parallel;
\item Automatic balancing work over available resources based
on adaptive algorithms that account for both resource performance
and reliability, and which throttle program invocations at a rate
-appropriate for each execution location and mechanism (section
-\ref{ExecutingSites}).
+appropriate for each execution location and mechanism;
-\item Reliability through retry and relocation of failed executions
-and restart of interrupted scripts from the point of
-failure. (section \ref{ExecutingReliably})
+\item Reliability through replication and automatic resubmission of
+ failed executions and restart of interrupted scripts from the point
+ of failure;
-\item Recording the provenance of data objects produced by a Swift
-script (section \ref{Provenance}).
+\item Formalizing the creation and management of data objects in the
+ language and recording the provenance of data objects produced by a
+ Swift script.
\end{itemize}
@@ -209,42 +206,37 @@
language, which makes the benefits above possible, can be summarized
as follows:
+\subsection{Language basics}
+
+
+A Swift script describes data, application components, invocations
+of applications components, and the inter-relations (data flow)
+between those invocations, using a C-like syntax.
Swift scripts are written as a set of procedures, composed upwards,
starting with \emph{atomic procedures} which specify the execution of
-component programs, and then higher level procedures are composed as
+external programs, and then higher level procedures are composed as
pipelines (or more generally, graphs) of sub-procedures. Atomic
procedures specify the inputs and outputs of application programs in
terms of files and other parameters. Compound procedures are composed
-of a graph of calls to atomic and other compound procedures
+into a conceptual graph of calls to atomic and other compound
+procedures.
Swift variables hold either primitive values, files, or collections of
-files. Atomic variables are \emph{single assignment}, which provides
-the basis for Swift's model of procedure chaining. Procedures are
+files. All variables are \emph{single assignment}, which provides the
+basis for Swift's model of procedure chaining. Procedures are
executed when their input parameters have all been set from existing
data or prior procedure executions. Procedures are chained by
specifying that an output variable of one procedure is passed as the
-input variable to the second procedure.
+input variable to the second procedure. This dataflow model means that
+Swift procedures are not necessarily executed in source-code order but
+rather when their input data becomes available.
-% This dataflow model means that
-% Swift procedures are not necessarily executed in source-code order but
-% rather when their input data becomes available.
-
Variables are declared with a type, and when they contain files
are associated with a \emph{mapper} which indicates how physical
data files are associated with the logical representation of Swift's
data model of variables and collections.
-
-\subsection{Language basics}
-
-A Swift script describes data, application components, invocations
-of applications components, and the inter-relations (data flow)
-between those invocations.
-
- Data is represented in a script by strongly-typed single-assignment
-variables, using a C-like syntax.
-
- Types in Swift can be \emph{atomic} or \emph{composite}. An atomic
+Types in Swift can be \emph{atomic} or \emph{composite}. An atomic
type can be either a \emph{primitive type} or a \emph{mapped type}.
Swift provides a fixed set of primitive types, such as \emph{integer} or
\emph{string}. A mapped type indicates that the actual data does not
@@ -278,21 +270,21 @@
describes a functional/dataflow style interface to imperative
components.
-For example, the following example lists a procedure which makes use of
-the ImageMagick\cite{ImageMagick} convert command to rotate a supplied
-image by a specified angle:
+For example, the following example lists a procedure which makes use
+of the ImageMagick\cite{ImageMagick} convert command to rotate a
+supplied image by a specified angle:
- \begin{verbatim}
+\begin{verbatim}
app (image output) rotate(image input) {
convert "-rotate" angle @input @output;
}
- \end{verbatim}
+\end{verbatim}
-A procedure is invoked using the familiar syntax:
+A procedure is invoked using a syntax similar to that of the C family:
- \begin{verbatim}
+\begin{verbatim}
rotated = rotate(photo, 180);
- \end{verbatim}
+\end{verbatim}
While this looks like an assignment, the actual unix level execution
consists of invoking the command line specified in the \verb|app|
@@ -304,9 +296,9 @@
definition of that type. We can declare it as a \emph{marker type}
which has no structure exposed to SwiftScript:
- \begin{verbatim}
+\begin{verbatim}
type image;
- \end{verbatim}
+\end{verbatim}
This does not indicate that the data is unstructured; but it indicates
that the structure of the data is not exposed to SwiftScript. Instead,
@@ -329,7 +321,7 @@
rotated = rotate(photo, 180);
\end{verbatim}
-This script can be invoked from the command line:
+This script can be invoked from the command line as:
\begin{verbatim}
$ ls *.jpeg
@@ -353,27 +345,27 @@
\verb|filesys_mapper| maps
all files matching a particular unix glob pattern into an array:
- \begin{verbatim}
+\begin{verbatim}
file frames[] <filesys_mapper; pattern="*.jpeg">;
- \end{verbatim}
+\end{verbatim}
The \verb|foreach| construct can be used to apply the same procedure
call(s) to each element of an array:
- \begin{verbatim}
+\begin{verbatim}
foreach f,ix in frames {
output[ix] = rotate(frames, 180);
}
- \end{verbatim}
+\end{verbatim}
Sequential iteration can be expressed using the \verb|iterate| construct:
- \begin{verbatim}
+\begin{verbatim}
step[0] = initialCondition();
iterate ix {
step[ix] = simulate(step[ix-1]);
}
- \end{verbatim}
+\end{verbatim}
This fragment will initialise the 0-th element of the \verb|step| array
to some initial condition, and then repeatedly run the \verb|simulate|
@@ -381,17 +373,16 @@
\subsection{Ordering of execution}
-Non-array variables are \emph{single-assignment}, which means that they
-must be assigned to exactly one value during execution. A procedure or
-expression will be executed when all of its input parameters have been
-assigned values. As a result of such execution, more variables may
-become assigned, possibly allowing further parts of the script to
-execute.
+Non-array variables are \emph{single-assignment}, which means that
+they must be assigned to exactly one value during execution. A
+procedure or expression will be executed when all of its input
+parameters have been assigned values. As a result of such execution,
+more variables may become assigned, possibly allowing further parts of
+the script to execute. In this way, scripts are implicitly
+concurrent. Aside from serialisation implied by these dataflow
+dependencies, execution of component programs can proceed without
+synchronization in time.
-In this way, scripts are implicitly parallel. Aside from serialisation
-implied by these dataflow dependencies, execution of component programs
-can proceed in parallel.
-
In this fragment, execution of procedures \verb|p| and \verb|q| can
happen in parallel:
@@ -413,14 +404,13 @@
content of an array increases during execution, but cannot otherwise
change. Once a value for a particular element is known, then it cannot
change. Eventually, all values for an array are known, and that array
-is regarded as \emph{closed}.
+is regarded as \emph{closed}. Statements which deal with the array as
+a whole will wait for the array to be closed before executing (thus, a
+closed array is the equivalent of a non-array type being
+assigned). However, a \verb|foreach| statement will apply its body to
+elements of an array as they become known. It will not wait until the
+array is closed.
-Statements which deal with the array as a whole will wait for the array
-to be closed before executing (thus, a closed array is the equivalent
-of a non-array type being assigned). However, a \verb|foreach|
-statement will apply its body to elements of an array as they become
-known. It will not wait until the array is closed.
-
Consider this script:
\begin{verbatim}
file a[];
@@ -442,16 +432,14 @@
\subsection{Compound procedures}
-As with many other programming languages, procedures consisting of SwiftScript
-code can be defined. These differ from the previously mentioned procedures
-declared with the \verb|app| keyword, as they invoke other SwiftScript
-procedures rather than a component program.
+As with many other programming languages, procedures consisting of
+SwiftScript code can be defined. These differ from the previously
+mentioned procedures declared with the \verb|app| keyword, as they
+invoke other SwiftScript procedures rather than a component
+program. The basic structure of a composite procedure may be thought
+of as a graph of calls to other procedures.
-The basic structure of a composite procedure is a graph of calls to
-other procedures. (TODO: does talking about call graphs make sense in
-the context of programming language-style descriptions?)
-
- \begin{verbatim}
+\begin{verbatim}
(file output) process (file input) {
file intermediate;
intermediate = first(input);
@@ -461,7 +449,7 @@
file x <"x.txt">;
file y <"y.txt">;
y = process(x);
- \end{verbatim}
+\end{verbatim}
This will invoke two procedures, with an intermediate data file named
anonymously connecting the \verb|first| and \verb|second| procedures.
@@ -469,7 +457,7 @@
Ordering of execution is generally determined by execution of \verb|app|
procedures, not by any containing procedures. In this code block:
- \begin{verbatim}
+\begin{verbatim}
(file a, file b) A() {
a = A1();
b = A2();
@@ -478,7 +466,7 @@
(x,y) = A();
s = S(x);
t = S(y);
- \end{verbatim}
+\end{verbatim}
then a valid execution order is: \verb|A1 S(x) A2 S(y)|. The
compound procedure \verb|A| does not have to have fully completed
@@ -544,8 +532,6 @@
data storage and access methods to be plugged in to scripts.
\begin{verbatim}
- # TODO I just made this up; need to check
- # that it actually works
type file;
app (extern o) populateDatabase() {
@@ -572,9 +558,6 @@
The single assignment and execution ordering rules will still apply though;
populateDatabase will always be run before analyseDatabase.
-TODO mappings may be to URLs, not only to local filesystem files; and more
-explicit description of what mapping is.
-
\subsection{Swift mappers}
Swift contains a number of built-in mappers. A representative sample
of these is listed in table \ref{mappertable}.
@@ -597,17 +580,16 @@
\subsection{The execution environment for component programs}
\label{LanguageEnvironment}
- A SwiftScript \verb|app| declaration describes how a component
-program is invoked. In order to ensure the correctness of the
-Swift model, the environment in which programs are executed needs
-to be constrained.
+A SwiftScript \verb|app| declaration describes how a component program
+is invoked. In order to ensure the correctness of the Swift model, the
+environment in which programs are executed needs to be constrained.
- A program is invoked in its own working directory; in that working
+A program is invoked in its own working directory; in that working
directory or one of its subdirectories, the program can expect to find
-all of the files that are passed as inputs to the application block; and
-on exit, it should leave all files named by that application block in
-the same working directory. Applications should also not assume that
-they will be executed on a particular host (to facilitate site
+all of the files that are passed as inputs to the application block;
+and on exit, it should leave all files named by that application block
+in the same working directory. Applications should also not assume
+that they will be executed on a particular host (to facilitate site
portability), run in in any particular order with respect to other
application invocations in a script (except those implied by data
dependency), or that their working directories will or will not be
@@ -616,39 +598,38 @@
Consider the \verb|app| declaration for the \verb|rotate| procedure in
section N.
- \begin{verbatim}
+\begin{verbatim}
app (file output) rotate(file input, int angle)
- \end{verbatim}
+\end{verbatim}
- The procedure signature declares the inputs and outputs for this
+The procedure signature declares the inputs and outputs for this
procedure. As in many other programming languages, this defines the
type signatures and names of parameters; this also defines which files
-will be placed into the application working directory before execution,
-and which files will be expected there after execution. For the above
-declaration, the file mapped to the \verb|input| parameter will be
-placed in the working directory beforehand, and the file mapped to
-\verb|output| will be expected there after execution; the input
-parameter \verb|angle| is of primitive type\footnote{need to define
-primitive type earlier on here...} and so no files are staged in for
-this parameter.
+will be placed into the application working directory before
+execution, and which files will be expected there after execution. For
+the above declaration, the file mapped to the \verb|input| parameter
+will be placed in the working directory beforehand, and the file
+mapped to \verb|output| will be expected there after execution; the
+input parameter \verb|angle| is of primitive type and so no files are
+staged in for this parameter.
- \begin{verbatim}
+\begin{verbatim}
convert "-rotate" angle @input @output;
- \end{verbatim}
+\end{verbatim}
- The body of the \verb|app| block defines the unix command-line that
+The body of the \verb|app| block defines the unix command-line that
will be executed when this procedure is invoked. The first token (in
-this case \verb|convert|) defines a \emph{transformation name} which is
-used to determine the unix executable name. Subsequent expressions,
+this case \verb|convert|) defines a \emph{transformation name} which
+is used to determine the unix executable name. Subsequent expressions,
separated by spaces, define the command-line arguments for that
executable: \verb|"-rotate"| is a string literal; angle specifies the
value of the angle parameter; the syntax \verb|@variable| evaluates to
the filename of the supplied variable, thus \verb|@input| and
\verb|@output| evaluate to the filenames of the corresponding
-parameters. It should be noted that it is possible to take the filename
-of \verb|output| even though it is a return parameter; although the
-value of that variable has not yet been computed, the filename where
-that value will go is already known.
+parameters. It should be noted that it is possible to take the
+filename of \verb|output| even though it is a return parameter;
+although the value of that variable has not yet been computed, the
+filename where that value will go is already known.
TODO comment (here?) about how this model appears somewhat constrained
but provides a well defined atomicity that can be used for various
@@ -662,21 +643,23 @@
\section{Execution}
\label{Execution}
+
Swift is implemented by compiling to a Karajan program, which provides
several benefits. A notable benefit visible to users is that of
-providers. This enbles the Swift execution model to be extended by
+providers. This enables the Swift execution model to be extended by
adding new data providers and job execution providers. This is
-explained in more detail in section \ref{ExecutingSites}: Executing on a remote site.
+explained in more detail in section \ref{ExecutingSites}: Executing on
+a remote site.
\subsection{Executing on a remote site}
\label{ExecutingSites}
- With the above restrictions, execution of a unix program on a remote
+With the above restrictions, execution of a unix program on a remote
site is straightforward. The Swift runtime must prepare a remote
working directory for each job with appropriate input files staged in;
then it must execute the program; and then it must stage the output
files back out the submitting system. The site model used by Swift is
-shown in figure \ref{FigureSwiftModel}
+shown in Figure~\ref{FigureSwiftModel}.
\begin{figure*}[htbp]
\begin{center}
More information about the Swift-commit
mailing list