[Swift-commit] r3856 - text/parco10submission

Wed Jan 5 14:27:12 CST 2011

Author: dsk
Date: 2011-01-05 14:27:12 -0600 (Wed, 05 Jan 2011)
New Revision: 3856

Modified:
   text/parco10submission/paper.pdf
   text/parco10submission/paper.tex
Log:
changes merging intro and rationale - in progress


Modified: text/parco10submission/paper.pdf
===================================================================
(Binary files differ)

Modified: text/parco10submission/paper.tex
===================================================================

--- text/parco10submission/paper.tex	2011-01-05 19:37:04 UTC (rev 3855)
+++ text/parco10submission/paper.tex	2011-01-05 20:27:12 UTC (rev 3856)
@@ -136,12 +136,27 @@
 abstracts notions of processes and external data for distributed
 parallel execution of application programs. Swift scripts are location-independent and automatically parallelized by exploiting the maximal concurrency permitted by their data dependencies and by resource availability.
 
+\katznote{integrate with previous paragraph}
+Swift is implicitly parallel and distributed, in that the user does not explicitly code either parallel behavior or synchronization (or mutual exclusion); does not code explicit data transfer of files to the execution sites of jobs and back. In fact no knowledge of runtime execution locations is directly specified in a Swift script. The function model on which Swift is based ensures that execution of Swift scripts is deterministic, thus simplifying the scripting process.
+
+\katznote{fix this}
+Having the results of a Swift script be independent of the way that its function invocations
+are parallelized implies that the functions must, for the same input,
+produce the same output, irrespective of the time, order or location in
+which they are ``executed''. %This characteristic is reminiscent of
+%referential transparency, and one may readily extend the concept to
+%encompass arbitrary processes without difficulty.
+
 As a language, Swift is simpler than most scripting languages because it does not replicate the capabilities that existing scripting languages like Perl, Python, and shells do very well, but instead makes it easy to call such scripts as small applications.
 % say: it has fewer statements, limited data types and a compact library of useful support primitives. It can be extended using built-in functions coded in Java, and by mappers coded as Java built-ins or as external scripts. These functions execute in parallel as part of expression evaluation in the same mapper as externally called application programs or scripts do.
 Swift can execute scripts that perform tens of thousands of program
 invocations on highly parallel resources, and handle the unreliable
 and dynamic aspects of wide-area distributed resources. Such issues are handled by Swift's runtime system, and are not manifest in the user's scripts.
 
+%keep: discuss kthread/function duality; dont confuse with the parameter issue?
+Swift enables users to specify process composition by representing processes as functions, where input data files and process parameters become function parameters and output data files become function return values. \katznote{these are really Karajan threads - forward link to that?  - what do we call these in this paper?} A Swift script is a graph of function calls - each function call ... see section 3.
+%keep:
+
 The Swift language
 provides a high level representation of collections of data and a
 specification of how those collections are to be mapped to that
@@ -150,13 +165,50 @@
 external programs on clusters, grids and other parallel platforms, providing
 automated site selection, data management, and reliability.
 
+
+%keep
+The exact number of processing units available on such shared resources
+varies with time. In order to take advantage of as many processing units
+as possible during the execution of a Swift program, it is necessary to
+be flexible in the way the execution of individual processes is
+parallelized. 
+
+We choose to make the Swift language purely functional (i.e., all operations
+have a well-defined set of inputs and outputs, all variables are write-once,
+and side effects are disallowed in the language) in order to prevent the difficulties that
+arise from having to track side effects to ensure determinism in complex
+concurrency scenarios.
+Functional programming allows consistent
+implementations of evaluation strategies different from the widespread
+eager evaluation, as seen in lazily evaluated languages
+such as Haskell~\cite{Haskell}.
+ 
+In order to achieve automatic
+parallelization, Swift is based on the synchronization construct of \emph{futures}~\cite{Futures}, which
+can result in abundant parallelism. Every Swift variable (including every members of structures and arrays) is a future.
+Using a futures-based evaluation strategy has an enormous benefit:
+automatic parallelization is achieved without the need for
+dependency analysis, which would significantly complicate the Swift implementation.
+
+We believe that the missing feature in current scripting languages is
+sufficient specification and encapsulation of inputs to, and outputs
+from, a given application, such that an execution environment could
+automatically make remote execution transparent.
+
+Without this,
+achieving location transparency %and automated parallel execution
+is not feasible.  Swift adds to scripting what the remote procedure call (RPC) paradigm
+\cite{RPC} adds to programming: by formalizing the inputs and outputs of
+applications that have been declared as app() functions, it provides a way to make the remote
+execution of applications transparent.
+
 Swift has been described previously~\cite{Swift_2007};
 this paper goes into greater depth in describing the parallel aspects of the Swift language, how
 its implementation handles large-scale and distributed execution
 environments, and its contribution to distributed and parallel programming models.
 
 The remainder of this paper is organized as follows.
-Section~\ref{Rationale} explains the motivation for the Swift programming model.
+%Section~\ref{Rationale} explains the motivation for the Swift programming model.
 In Section~\ref{Language} we present the major concepts and language
 structure of Swift. Section~\ref{Execution} provides details of the
 implementation, including the distributed architecture that enables
@@ -167,96 +219,58 @@
 ongoing and future work in the Swift project, and we offer concluding
 remarks in Section~\ref{Conclusion}.
 
-\section{Rationale for the Swift programming model}
-\label{Rationale}
+%\section{Rationale for the Swift programming model}
+%\label{Rationale}
 
 %%% \begin{msection}
 
 % said already:
-The main goal of Swift is to allow the composition of coarse grained
-processes, and to parallelize and manage the execution of scripts
-on distributed collections of parallel resources.
+%The main goal of Swift is to allow the composition of coarse grained
+%processes, and to parallelize and manage the execution of scripts
+%on distributed collections of parallel resources.
 
-%keep:
-Swift is implicitly parallel and distributed, in that the user does not explicitly code either parallel behavior or synchronization (or mutual exclusion); does not code explicit data transfer of files to the execution sites of jobs and back. In fact no knowledge of runtime execution locations is directly specified in a Swift script. The function model on which Swift is based ensures that execution of Swift scripts is deterministic, thus simplifying the scripting process.
 
-%adjust: address degrees of determinism
-Having the results of a Swift script be independent of the way that its function invocations
-are parallelized implies that the functions must, for the same input,
-produce the same output, irrespective of the time, order or location in
-which they are ``executed''. This characteristic is reminiscent of
-referential transparency, and one may readily extend the concept to
-encompass arbitrary processes without difficulty.
+% consider: where to define kthreads and jthreads; how to describe the function/process duality; where to discuss the implementation
 
-%keep: discuss kthread/function duality; dont confuse with the parameter issue?
-Swift enables users to specify process composition by representing processes as functions, with input data files and process parameters become function parameters and output data files become function return values.
-%keep
-The exact number of processing units available on such shared resources
-varies with time. In order to take advantage of as many processing units
-as possible during the execution of a Swift program, it is necessary to
-be flexible in the way the execution of individual processes is
-parallelized. 
+%keep: how best to state process/function duality?  Each invocation of a function is a process; all functions run in parallel; foreach loops are unfolded and run in parallel; essentially the entire program is unfolded. (Note: iterate stops this behavior and is thus useful; address scalability issues of this and future graph partitioning; how throttling keeps this manageable.
 
-% consider: where to define kthreads and jthreads; how to describe the function/process duality; where to discuss the implementation
+%This duality allows the formal specification of process behavior. In the following Swift statement, the semantics are defined in terms of the specification of the function
+%``rotate'' when supplied with specific parameter types:
+%\begin{verbatim}
+%  rotatedImage = rotate(image, angle);
+%\end{verbatim}
 
-%keep: how best to state process/function duality?  Each invocationu of a function is a process; all functions run in parallel; foreach loops are unfolded and run in parallel; essentinally the entire program is unfolded. (Note: itereate stops this behavior and is thus useful; address scalability issues of this and future graph partioning; how throttling keeps this manageable.
-This duality allows the formal specification of process behavior. In the following Swift statement, the semantics are defined in terms of the specification of the function
-``rotate'' when supplied with specific parameter types:
-\begin{verbatim}
-  rotatedImage = rotate(image, angle);
-\end{verbatim}
-
 %Q: should we have any code examples in the intro?  eg: 1 call, 1 foreach?
 
-\hide{
 % and whether the
 % implementation can be described as a ``library call'' or a ``program
 % invocation'' changes nothing with respect to what the piece of program
 % fundamentally does: produce a rotated version of the original. 
 
-Indeed, there is no strict requirement in the specification of the Swift
-language dictating that functions be implemented as command-line
-applications. They can equally consist of library calls or functions
-written in Swift itself, as long as they are side-effect free. 
+%Indeed, there is no strict requirement in the specification of the Swift
+%language dictating that functions be implemented as command-line
+%applications. They can equally consist of library calls or functions
+%written in Swift itself, as long as they are side-effect free. 
 
 %A soft restriction arises from the desire to distribute the execution of
 %functions across a collection of heterogeneous resources, which, with
 %the advent of projects such as TeraGrid, suggests an implementation in
 %which functions are applications readily executable on them through the
 %careful employment of grid middleware.
-}
 
-%keep:
-Note that some Swift scripts are specified as library calls.
-
 %decide: is referential transparency relevant?
-Having established the constraint that Swift functions must in general
-be referentially transparent, and in order to preserve referential
-transparency at different levels of abstractions within the language, it
-follows that the appropriate form for the Swift language is functional.
+%Having established the constraint that Swift functions must in general
+%be referentially transparent, and in order to preserve referential
+%transparency at different levels of abstractions within the language, it
+%follows that the appropriate form for the Swift language is functional.
 
 %keep: discuss determinism, side effects, referential transparency, and interleaving???
-%I think the KEY aspect of "functional" is (a) in-out tracking for distribtability and side effect management and (b) the write-once-future model for all data.
 
-We choose to make the Swift language purely functional (i.e., we disallow
-side effects in the language) in order to prevent the difficulties that
-arise from having to track side effects to ensure determinism in complex
-concurrency scenarios.
+%I think the KEY aspect of "functional" is (a) in-out tracking for distributability and side effect management and (b) the write-once-future model for all data.
 
 %discuss: is lazy vs eager relevant? What does it really mean to swift?
-Functional programming allows consistent
-implementations of evaluation strategies different from the widespread
-eager evaluation, as seen in lazily evaluated languages
-such as Haskell \cite{Haskell}.
- 
-%Keep: KEY:
-In order to achieve automatic
-parallelization in Swift is based on the synchronization construct of \emph{futures}\cite{Futures}, which
-results in eager parallelism. Every Swift variable (including every members of structures and arrays) is a write-once future.
+
 % consider: In this process, we trade the ability to efficiently deal with infinite structures for the ability to minimize computation time. I think this pertains to the "unroll everything" strategy.
-Using a futures-based evaluation strategy has an enormous benefit:
-automatic parallelization is achieved without the need for
-dependency analysis, which would significantly complicate the Swift implementation.
 
 \hide{A number of issues may be noted at this point. First, there exist a
 certain class of processes that may break referential transparency,
@@ -284,18 +298,7 @@
 
 %%% vvvvv This is rationale and dovetails with the functional model parts above:
 
-We believe that the missing feature in current scripting languages is
-sufficient specification and encapsulation of inputs to, and outputs
-from, a given application, such that an execution environment could
-automatically make remote execution transparent.
 
-Without this,
-achieving location transparency and automated parallel execution is
-not feasible.  Swift adds to scripting what the remote procedure call (RPC) paradigm
-\cite{RPC} adds to programming: by formalizing the inputs and outputs of
-applications that have been declared as app() functions, it provides a way to make the parallel and remote
-execution of applications transparent.
-
 %%% ^^^^^
 
 \section{The Swift language}
@@ -310,7 +313,8 @@
 \begin{description}
 \item[External functions] (also called ``atomic'') are functions whose
 implementations are not written in Swift. Currently external functions
-are implemented as command-line applications.
+are implemented as command-line applications\footnote{Note that some Swift scripts are specified as library calls.}.
+
 \item[Internal functions] (also called ``compound'') are functions 
 implemented in Swift.
 \end{description}