[Swift-commit] r3866 - text/parco10submission

Wed Jan 5 18:12:04 CST 2011

Author: wilde
Date: 2011-01-05 18:12:04 -0600 (Wed, 05 Jan 2011)
New Revision: 3866

Modified:
   text/parco10submission/paper.tex
Log:
Revised Language section intro and Data Model subsection.

Modified: text/parco10submission/paper.tex
===================================================================

--- text/parco10submission/paper.tex	2011-01-05 23:59:34 UTC (rev 3865)
+++ text/parco10submission/paper.tex	2011-01-06 00:12:04 UTC (rev 3866)
@@ -334,18 +334,69 @@
 \section{The Swift language}
 \label{Language}
 
-%%% \begin{msection}
+Swift is by design a sparse, minimal scripting
+language which executes external programs remotely and in parallel.
+As such, Swift has only a very limited set of data
+types, operators, and built-in functions.
+Its simple, uniform data model is composed of a few atomic types (which can be simple scalar values or references to external files) and two collection types (arrays and structures).
 
-\subsection{Language facilities}
+Swift expresses the
+invocation of ``ordinary programs''---technically, POSIX {\tt exec()}
+operations---in a manner that explicitly specifies the files and command-line
+arguments that are the inputs of each program
+invocation.  It similarly expresses all output files that results from the programs.
+This enables Swift to provide distributed, location-independent execution of external application programs.
 
-At the core of the Swift language are function definitions, of which
+The Swift parallel execution model is based on two concepts that are applied uniformly throughout the language. First, every Swift data element behaves like a \emph{future}. By ``data element'', we mean both the named variables within a function's environment, such as its local variables, parameters, and returns, and the individual elements of array and structure collections. Second, every expression in a Swift program is conceptually executed in parallel. Expressions (including function evaluations) wait for input values when they are required, and then set their result values as their computation proceeds. These fundamental concepts are discussed in more detail below.
+
+% can be thought of as a massively-parallel lazy (ie, on-demand, or just in time) evaluation - say later on?
+
+\subsection{Data model}
+
+Every data object in Swift is built up from atomic data elements which contain three fields: a value, a state, and a queue of function invocations that are waiting for the value to be set.
+
+Variables are used in Swift to name the local variables, arguments, and returns of a function. Every Swift variable is assigned a concrete data type, based on a very simple type model (with no concepts of inheritance, abstraction, etc). The outermost function in a Swift (akin to ``main'' in C) is only unique in that the variables in its environment can be declared ``global'' to make them accessible to every other function in the script.
+
+Swift provides three basic classes of data types:
+
+\emph{Primitive types} are provided for integer, float, string, and boolean values by the Swift runtime. Common operators are defined for
+primitive types, such as arithmetic, concatenation, explicit conversion, etc.
+An additional primitive type ``external'' is provided for manual synchronization.
+
+\emph{Mapped types} are data elements that refer, through a process called``mapping'' to files external to the Swift script. These are the files that will be read and written by the external application programs called by Swift.
+The mapping process can map single variables to single files, and structures and arrays to collections of files.
+
+Primitive and mapped types are called \emph{atomic types}.
+
+\hide{Swift mapped types can be
+seen as generalizations of reference types in traditional languages in
+that reference types are language representations of data stored in
+internal memory, in contrast with primitive (value) types for which no
+explicit storage is generally specified.}
+
+\hide{There is no syntactic distinction between primitive types and fileRef
+types, and the semantic differences between the two classes of
+types are minimal.}
+% clarify minimal
+
+\hide{Atomic mapped types do not specify any information about the structure of
+the data. It is up to the user to assign a ``proper'' type to external
+data. Consequently Swift must and does implement nominal type equivalence.}
+
+\emph{Collection types} are provided in Swift by \emph{arrays} and \emph{structures}.
+Structure fields can be of any type, while arrays contain only uniform values of a single type. Both types of collections can contain members of atomic or collection types. Structures contain a finite number of elements. Arrays contain a varying number of elements. Structures and arrays can both recursively contain each other in addition to atomic values.
+
+Due to the dynamic nature of the execution model, Swift arrays have no notion of size. An array is considered ``closed'' when no further statements that set an element of the array can be executed. This state is recognized at run time by information obtained from compile-time analysis of the script's call graph. 
+
+\subsection{Functions - Mihael}
+
+At the core of the Swift language are function definitions, of which 
 two types exist:
 \begin{description}
 \item[External functions] (also called ``atomic'') are functions whose
 implementations are not written in Swift. Currently external functions
-are implemented as command-line applications\footnote{Note that some Swift scripts are specified as library calls.}.
-
-\item[Internal functions] (also called ``compound'') are functions
+are implemented as command-line applications.
+\item[Internal functions] (also called ``compound'') are functions 
 implemented in Swift.
 \end{description}
 
@@ -360,9 +411,9 @@
 }
 %%%
 
-The \emph{if} and \emph{switch} statements are rather standard, but
-\emph{foreach} merits more discussion. Similar to \emph{Go}~\cite{GOLANG}
-and \emph{Python}, its control ``variables'' can be both
+The \emph{if} and \emph{switch} statements are rather standard, but 
+\emph{foreach} merits more discussion. Similar to \emph{Go}
+(\cite{GOLANG}) and \emph{Python}, its control ``variables'' can be both
 an index and a value. The syntax is as follows:
 
 \begin{verbatim}
@@ -371,103 +422,13 @@
 }
 \end{verbatim}
 
-This is necessary because Swift does not allow the use of mutable state
-(i.e., variables are single-assignment).  Therefore, one is not able
+This is necessary because Swift does not allow the use of mutable state 
+(i.e., variables are single-assignment), therefore one would not be able
 to write statements such as \verb|i = i + 1|.
 
-\subsection{Data model}
 
-Swift provides two basic classes of data types:
-\begin{description}
-\item[Primitive types] (\emph{integer}, \emph{string}) are types
-provided by the Swift runtime. Standard operators are defined for
-primitive types, such as addition, multiplication, concatenation, etc.
-
-\item[Mapped types] are types of data for which some external
-implementation exists. Swift provides a mechanism to describe
-isomorphisms between instances of Swift data structures and subsets in
-the  external implementation. This mechanism is called ``mapping''  and
-specific instances of isomorphisms are called ``mappers''. Currently the
-only external implementation is a POSIX-like filesystem.
-
-However  the
-``external'' data type can be used to accommodate any external data that
-Swift cannot and should not directly handle.
-
-Swift mapped types can be
-seen as generalizations of reference types in traditional languages in
-that reference types are language representations of data stored in
-internal memory, in contrast with primitive (value) types for which no
-explicit storage is generally specified.
-\end{description}
-
-There is no syntactic distinction between primitive types and fileRef
-types, and the semantic differences between the two classes of
-types are minimal.
-% clarify minimal
-
-Data can be aggregated using two ``composite types'': \emph{arrays} and \emph{structures}.
-This can be done recursively in that arrays of arrays, structures
-containing structures,  arrays of structures and structures
-containing arrays can be created. Types that have no internal
-structure  (i.e. scalar and hence non-composite types) are called ``atomic types''.
-
-Atomic mapped types do not specify any information about the structure of
-the data. It is up to the user to assign a ``proper'' type to external
-data. Consequently Swift must and does implement nominal type equivalence.
-
-%%% \end{msection}
-
-The Swift programming model is data-oriented: it encapsulates the
-invocation of ``ordinary programs''---technically, POSIX {\tt exec()}
-operations---in a manner that explicitly specifies the files and other
-arguments that are the inputs and outputs of each program
-invocation. This formal but simple model enables Swift to provide
-several critical characteristics not provided by, nor readily
-implemented in, existing scripting languages such as Perl, Python, or
-shells.
-% mention that python comes close with futures and decorators.
-Notable features include:
-
-\begin{itemize}
-
-\item Location transparent execution: automatically selection of a
-  location for each program invocation and management of diverse execution
-  environments. A Swift script can be tested on a single local
-  workstation, and then the same script can be executed on a cluster, one
-  or more grids of clusters, and/or on large scale parallel
-  supercomputers such as the Sun
-  Constellation~\cite{SunConstellation_2008} or the IBM Blue
-  Gene/P~\cite{BGP_2008}.
-
-\item Automatic parallelization of program invocations: parallel invocation
-  of programs that have no data dependencies.
-
-\item Automatic balancing of work over available resources, based
-on adaptive algorithms that account for both resource performance
-and reliability, and that throttle program invocations at a rate
-appropriate for each execution location and mechanism.
-
-\item Reliability, through replication and automatic resubmission of
-  failed executions and restarting of interrupted scripts from the point
-  of failure.
-
-\item Formalizing the creation and management of data objects in the
-  language and recording the provenance of data objects produced by a
-  Swift script.
-
-\end{itemize}
-
-Swift is intentionally designed to be a sparse, minimal scripting
-language. Its sole purpose is to sequence and schedule the execution
-of other programs. As such, Swift has only a very limited set of data
-types, operators, and built-in functions. The essence of the Swift
-language, which makes the benefits above possible, can be summarized
-as follows:
-
 \subsection{Language basics}
 
-
 A Swift script describes data, application components, invocations
 of applications components, and the inter-relations (data flow)
 between those invocations, using a C-like syntax.
@@ -846,7 +807,7 @@
 component program atomicity on data output.
 
 \katznote{this previous sentence
-has a lot of stuff that hasn't been defined, and the next one is equally confusing at this point in the paper.}
+has a lot of stuff that hasn't been defined, and the next one is equally confusing at this point in the paper.} 
 
 This can add substantial
 responsibility to component programs, in exchange for allowing arbitrary
@@ -944,9 +905,43 @@
 structured Swift variable, can represent a large, structured data
 set.
 
-\subsection{The execution environment for component programs}
+\subsection{Swift runtime environment}
+
 \label{LanguageEnvironment}
 
+Notable runtime features include:
+
+\begin{itemize}
+
+\item Location transparent execution: automatically selection of a
+  location for each program invocation and management of diverse execution
+  environments. A Swift script can be tested on a single local
+  workstation, and then the same script can be executed on a cluster, one
+  or more grids of clusters, and/or on large scale parallel
+  supercomputers such as the Sun
+  Constellation~\cite{SunConstellation_2008} or the IBM Blue
+  Gene/P~\cite{BGP_2008}.
+
+\item Automatic parallelization of program invocations: parallel invocation
+  of programs that have no data dependencies.
+
+\item Automatic balancing of work over available resources, based
+on adaptive algorithms that account for both resource performance
+and reliability, and that throttle program invocations at a rate
+appropriate for each execution location and mechanism.
+
+\item Reliability, through replication and automatic resubmission of
+  failed executions and restarting of interrupted scripts from the point
+  of failure.
+
+\item Formalizing the creation and management of data objects in the
+  language and recording the provenance of data objects produced by a
+  Swift script.
+
+\end{itemize}
+
+
+
 A Swift \verb|app| declaration describes how a component program
 is invoked. In order to ensure the correctness of the Swift model, the
 environment in which programs are executed needs to be constrained.
@@ -1000,7 +995,7 @@
 \section{Execution}
 \label{Execution}
 
-Swift is implemented by compiling to a Karajan program~\cite{Karajan}, which provides
+Swift is implemented by compiling to a Karajan program\cite{Karajan}, which provides
 several benefits: a lightweight threading model,
 futures,
 remote job execution,
@@ -1115,10 +1110,10 @@
 will fail, ultimately resulting in the entire script failing.
 
 In such a case, Swift provides a \emph{restart log} that encapsulates
-which function invocations have been successfully completed.
+which function invocations have been successfully completed. 
 %%%%%% What manual interv. and why???
 After
-appropriate manual intervention,
+appropriate manual intervention, 
 a subsequent Swift run may be started
 with this restart log; this will avoid re-execution of already
 executed invocations.
@@ -1190,7 +1185,7 @@
 Using Swift to submit to a large number of sites poses a number of
 practical challenges that are not encountered when running on a small
 number of sites. These challenges are seen when comparing execution on
-the relatively static TeraGrid~\cite{TeraGrid_2005} with execution on the
+the relatively static TeraGrid~\cite{TeraGrid_2005} with execution on the 
 more dynamic Open Science
 Grid (OSG)~\cite{OSG_2007}, where the set of sites that may be used is
 large and changing. It is impractical to maintain a site catalog by
@@ -1791,7 +1786,7 @@
 Dryad graphs are explicitly developed by the programmer; Swift graphs are implicit and the programmer doesn't worry about them. A tool called Nebula was originally developed
 above Dryad, but it doesn't seem to be supported currently.  It appears to have been
 used for clusters and well-connected groups of clusters in a single administrative domain,
-unlike Swift supports a wider variety of platforms.  Also related is DryadLINQ~\cite{DryadLINQ},
+while Swift supports a wider variety of platforms.  Also related is DryadLINQ~\cite{DryadLINQ},
 which generates Dryad computations from the LINQ extensions to C\#.
 
 GEL~\cite{GEL} is somewhat similar to Swift.  It defines programs to be run, then