[Swift-commit] r3871 - text/parco10submission

Thu Jan 6 13:42:31 CST 2011

Author: dsk
Date: 2011-01-06 13:42:30 -0600 (Thu, 06 Jan 2011)
New Revision: 3871

Modified:
   text/parco10submission/paper.tex
Log:
updates in section 2


Modified: text/parco10submission/paper.tex
===================================================================

--- text/parco10submission/paper.tex	2011-01-06 15:54:40 UTC (rev 3870)
+++ text/parco10submission/paper.tex	2011-01-06 19:42:30 UTC (rev 3871)
@@ -334,18 +334,18 @@
 \section{The Swift language}
 \label{Language}
 
-Swift is by design a sparse, minimal scripting
-language which executes external programs remotely and in parallel.
+Swift is, by design, a sparse, minimal scripting
+language that executes external programs remotely and in parallel.
 As such, Swift has only a very limited set of data
 types, operators, and built-in functions.
-Its simple, uniform data model is composed of a few atomic types (which can be simple scalar values or references to external files) and two collection types (arrays and structures).
+Its simple, uniform data model is composed of a few atomic types (that can be scalar values or references to external files) and two collection types (arrays and structures).
 
 A Swift script describes data, application components, invocations
-of applications components, and the inter-relations (data flow)
+of applications components, and the interrelations (data flow)
 between those invocations, using a C-like syntax.
 Swift scripts are written as a set of functions, composed upwards,
 starting with \emph{atomic functions} that specify the execution of
-external programs, and then higher level functions are composed as
+external programs. Higher level functions are then composed as
 pipelines (or more generally, graphs) of sub-functions.
 
 Unlike most other scripting languages, Swift expresses
@@ -355,59 +355,60 @@
 invocation.  Swift scripts similarly declare all output files that results from program invocations.
 This enables Swift to provide distributed, location-independent execution of external application programs.
 
-The Swift parallel execution model is based on two concepts that are applied uniformly throughout the language. First, every Swift data element behaves like a \emph{future}. By ``data element'', we mean both the named variables within a function's environment, such as its local variables, parameters, and returns, and the individual elements of array and structure collections. Second, every expression in a Swift program is conceptually executed in parallel. Expressions (including function evaluations) wait for input values when they are required, and then set their result values as their computation proceeds. These fundamental concepts are discussed in more detail below.
+The Swift parallel execution model is based on two concepts that are applied uniformly throughout the language. First, every Swift data element behaves like a \emph{future}. By ``data element'', we mean both the named variables exposed to Swift within a function's environment, such as its local variables, parameters, and returns, and the individual elements of array and structure collections. Second, all expressions in a Swift program are conceptually executed in parallel. Expressions (including function evaluations) wait for input values when they are required, and then set their result values as their computation proceeds. These fundamental concepts are discussed in more detail below.
 
 % can be thought of as a massively-parallel lazy (ie, on-demand, or just in time) evaluation - say later on?
 
 \subsection{Data model}
 
-Every data object in Swift is built up from atomic data elements which contain three fields: a value, a state, and a queue of function invocations that are waiting for the value to be set.
+Every data object in Swift is built up from atomic data elements that contain three fields: a value, a state, and a queue of function invocations that are waiting for the value to be set.
 
 Variables are used in Swift to name the local variables, arguments, and returns of a function. Every Swift variable is assigned a concrete data type, based on a very simple type model (with no concepts of inheritance, abstraction, etc). The outermost function in a Swift (akin to ``main'' in C) is only unique in that the variables in its environment can be declared ``global'' to make them accessible to every other function in the script.
 
-Swift data elements (atomic variables and array elements) are \emph{single-assignment}:
-they behave as futures and can be assigned at most one value during execution.
+Swift data elements (atomic variables and array elements) are \emph{single-assignment}---
+they can be assigned at most one value during execution---and behave as futures. 
 This semantic provides the
 basis for Swift's model of parallel function evaluation and chaining.
-While Swift arrays and structures are not
-single-assignment, each of their elements are.
+While Swift collection types (arrays and structures) are not
+single-assignment, each of their elements is single-assignment.
 
-Each variables in a Swift script is declared to be of a specific (single) type.
+Each variable in a Swift script is declared to be of a specific (single) type.
 Swift provides three basic classes of data types:
 
 \emph{Primitive types} are provided for integer, float, string, and boolean values by the Swift runtime. Common operators are defined for
 primitive types, such as arithmetic, concatenation, explicit conversion, etc.
 An additional primitive type ``external'' is provided for manual synchronization.
 
-\emph{Mapped types} are data elements that refer, through a process called``mapping'' to files external to the Swift script. These are the files that will be read and written by the external application programs called by Swift.
+\emph{Mapped types} are data elements that refer (through a process called``mapping'') to files external to the Swift script. These are the files that will be read and written by the external application programs called by Swift.
 The mapping process can map single variables to single files, and structures and arrays to collections of files.
 Primitive and mapped types are called \emph{atomic types}.
 
 \emph{Collection types} are provided in Swift by \emph{arrays} and \emph{structures}.
-Structure fields can be of any type, while arrays contain only uniform values of a single type. One
+Structure fields can be of any type, while arrays contain values of only a single type. One
 array type is provided for every atomic type (integer, string, boolean, and file reference).
 Arrays use numeric
 indices, but are sparse.
 Both types of collections can contain members of atomic or collection types. Structures contain a finite number of elements. Arrays contain a varying number of elements. Structures and arrays can both recursively reference other structures and arrays in addition to atomic values. Arrays can be nested to provide multi-dimensional indexing.
 
-Due to the dynamic, highly parallel nature of Swift, its arrays have no notion of size. Array elements can be set as a script's execution progresses. The number of elements set increases monotonically. An array is considered ``closed'' when no further statements that set an element of the array can be executed. This state is recognized at run time by information obtained from compile-time analysis of the script's call graph. Also, since all data elements have single-assignment semantics, no garbage collection issues arise.
+Due to the dynamic, highly parallel nature of Swift, its arrays have no notion of size. Array elements can be set as a script's execution progresses. The number of elements set increases monotonically. An array is considered ``closed'' when no further statements that set an element of the array can be executed. This state is recognized at run time by information obtained from compile-time analysis of the script's call graph. Also, since all data elements have single-assignment semantics, no garbage collection issues arise. \katznote{does this follow? garbage collection removed variables that are no longer needed - I don't see how single assignment helps here.}
 
 Variables that are declared to be file references
-are associated with a \emph{mapper} which defines (often through a dynamic lookup process) the
+are associated with a \emph{mapper}, which defines (often through a dynamic lookup process) the
 data files that are to be mapped to the variable. Array and structure elements that are declared to be file references are similarly mapped.
 
-Mapped type and composite type variable declarations can be annotated with a
+Mapped type and composite \katznote{I don't know what composite means here}
+type variable declarations can be annotated with a
 \emph{mapping} descriptor that specify the file(s) that are to be mapped to the Swift data element(s).
 
 For example, the following line declares a variable named \verb|photo| of
-type \verb|image|. Since image is a fileRef type, it additionally declares that the
+type \verb|image|. Since image is a fileRef type \katznote{how do I know this? And,  should ``fileRef'' have been defined 2 paragraphs ago?}, it additionally declares that the
 variable refers to a single file named \verb|shane.jpeg|
 
 \begin{verbatim}
    image photo <"shane.jpeg">;
 \end{verbatim}
 
-We can declare {\tt image} to be an \emph{external file type}:
+We can declare {\tt image} to be an \emph{external file type}: \katznote{is this different from a fileRef type?}
 
 \begin{verbatim}
    type image {};
@@ -441,18 +442,20 @@
 Members of a structure can be accessed using the \verb|.| operator:
 
 \begin{verbatim}
-   snapshot s;
-   image i;
-   i = s.i;
+   snapshot sn;
+   image im;
+   im = sn.i;
 \end{verbatim}
 
+\katznote{please check the above - I changed a couple of variables so ``i'' wasn't used twice for different things in the same example.}
+
 \subsection{Execution model}
 
 Swift has three types of functions:
 
 \emph{Built-in functions} are defined in the Java code of the Swift runtime system, and perform various utility functions (numeric conversion, string manipulation, etc.) Operators (+, *, etc.) defined by the language behave similarly.
 
-\emph{Application interface functions} (declared using the app keyword)
+\emph{Application interface functions} (declared using the \verb|app| keyword)
 specify the interface (input files and parameters, and output files) of application programs in
 terms of files and other parameters. They serve as an adapter between the Swift programming model and the mechanisms used to invoke application programs at run time.
 
@@ -460,15 +463,14 @@
 that call atomic and other compound
 functions.
 
-Through the use of futures, functions are
+Through the use of futures, functions can be
 executed when their input parameters have all been set from existing
 data or prior function executions.  Function calls are chained by
 specifying that an output variable of one function is passed as the
 input variable to the second function.
-%\katznote{mention futures here?}
 This dataflow model means that
 Swift functions are not necessarily executed in source-code order but
-rather when their input data becomes available.
+rather, when their input data become available.
 % mention that every expression in the body of a function or sub-expression is conceptually executed in parallel, and physically executed when all of their arguments have been assigned a value.
 
 %vvvv
@@ -476,7 +478,7 @@
 declaration} that describes the command line syntax for that
 program and its input and output files.
 For example, the following example lists a function that makes use
-of the common utility {\tt convert}~\cite{ImageMagick_WWW} to rotate an
+of the common utility, {\tt convert}~\cite{ImageMagick_WWW}, to rotate an
 image by a specified angle:
 
 \begin{verbatim}
@@ -486,7 +488,7 @@
 \end{verbatim}
 
 %\katznote{do you need to say anything about where/how convert is defined/located?}
-(The {\tt convert} executable is located at run time through a catalog of applications or through a PATH environment variable.)
+(The {\tt convert} executable is found at run time in a catalog of applications or through a PATH environment variable.)
 
 The rotate function is then invoked as follows:
 
@@ -526,8 +528,8 @@
 \end{verbatim}
 
 This executes a single \verb|convert| command, while automatically performing for the user features
-such as remote multisite execution and fault tolerance, which will be
-discussed in a later section.
+such as remote multisite execution and fault tolerance, which are
+discussed later.
 
 In addition to function invocation, the Swift language provides conditional
 execution through the \emph{if} and \emph{switch} statements as well as
@@ -611,8 +613,8 @@
 \label{ordering}
 
 Since all variables and collection elements are single-assignment,
-they can be assigned a value at most once during the execution of a script.
-A function or expression will be executed when all of its input
+%they can be assigned a value at most once during the execution of a script.
+a function or expression can be executed when all of its input
 parameters have been assigned values. As a result of such execution,
 more variables may become assigned, possibly allowing further parts of
 the script to execute. In this way, scripts are implicitly
@@ -631,17 +633,18 @@
    z=q(y);
 \end{verbatim}
 
-Arrays in Swift are treated as collections of simple variables, in the sense that all array elements are single-assignment futures.
+Arrays in Swift are treated as collections of simple variables, in the sense that all array elements are single-assignment.
 Once the value of an array element is
 set, then it cannot change. When all the values for the array which can be set (as determined by limited flow analysis) are
-set, then the array is regarded as \emph{closed}. Statements which
+set, then the array is regarded as \emph{closed}. 
+\katznote{the few lines before this in this paragraph have been repeated from earlier in the section.}  Statements that
 deal with the array as a whole will wait for the array to be closed
 before executing. An example of such an action is the expansion of the array values into an app command line.
-Thus, the closing of an array is the equivalent to setting an atomic variable, with respect to any statement that was waiting for the array itself to get a value. However, a \verb|foreach| statement
+Thus, the closing of an array is the equivalent to setting a future variable, with respect to any statement that was waiting for the array itself to be assigned a value. However, a \verb|foreach| statement
 will apply its body of statements to elements of an array, as they are set to a value. It
 will not wait until the array is closed. In practice this type of ``pipelining'' gives Swift scripts a high degree of parallelism at run time.
 
-Because of simplicity and regularity of the Swift data model, a high degree of implicit parallelism is achieved. For example, a foreach() statement that processes an array returned by a function may begin processing members of the returned array that have been already set, even before the function completes and returns. This yields programs that are very heavily pipelined with significant overlapping parallel activities.
+Because of the simplicity and regularity of the Swift data model, a high degree of implicit parallelism is achieved. For example, a foreach() statement that processes an array returned by a function may begin processing members of the returned array that have been already set, even before the entire function completes and returns. This yields programs that are very heavily pipelined with significant overlapping parallel activities.
 
 Consider the script below:
 \begin{verbatim}
@@ -835,7 +838,7 @@
 set.
 
 Collections of files can be mapped to complex types (arrays and structures)
-using a variety of built-in mappers. For example, the \verb|simple mapper| used in this expression will
+using a variety of built-in mappers. For example, the \verb|simple mapper| used in the next expression will
 map the files \verb|data.p| and \verb|data.m| to the variable members
 \verb|m.h| and \verb|m.v| respectively:
 
@@ -891,7 +894,7 @@
 portability), run in in any particular order with respect to other
 application invocations in a script (except those implied by data
 dependency), or that their working directories will or will not be
-cleaned up after execution.
+cleaned up after execution. \katznote{say something about apps should not cause side-effects?}
 
 Consider the following \verb|app| declaration for the \verb|rotate|
 function:
@@ -917,8 +920,8 @@
 executed when the function is invoked. The first token (in this case
 \verb|convert|) defines a \emph{transformation name} which is used to
 determine the executable name. Subsequent expressions define the command-line arguments for that executable:
-``\verb|-rotate|'' is a string literal; angle specifies the value of the
-angle parameter; the syntax \verb|@variable| evaluates to the filename
+``\verb|-rotate|'' is a string literal;  \verb|angle| specifies the value of the
+angle parameter; and the syntax \verb|@variable| evaluates to the filename
 of the supplied variable, thus \verb|@input| and \verb|@output|
 evaluate to the filenames of the corresponding parameters. It should
 be noted that it is possible to take the filename of \verb|output|
@@ -936,7 +939,7 @@
 and
 remote file transfer and data management.
 Both remote execution and data transfer and management functions are provided through generalized
-abstracted interfaces called ``providers''.
+abstracted interfaces called \emph{providers}.
 Data providers enable data transfer and management to be performed through a wide variety of protocols including direct local copying, GridFTP, HTTP, WebDAV, SCP, and FTP.
 Execution providers enable job execution to take place using direct POSIX process fork, Globus GRAM, Condor (and Condor-G), PBS, SGE, SSH.
 The Swift execution model can thus be extended by