[Swift-commit] r3862 - text/parco10submission

Wed Jan 5 17:22:00 CST 2011

Author: wozniak
Date: 2011-01-05 17:22:00 -0600 (Wed, 05 Jan 2011)
New Revision: 3862

Modified:
   text/parco10submission/paper.tex
Log:
Stake out Performance section


Modified: text/parco10submission/paper.tex
===================================================================

--- text/parco10submission/paper.tex	2011-01-05 23:15:41 UTC (rev 3861)
+++ text/parco10submission/paper.tex	2011-01-05 23:22:00 UTC (rev 3862)
@@ -69,7 +69,7 @@
 
 \begin{abstract}
 
-The work of scientists, engineers and statisticians often requires executing domain-specific application programs a 
+The work of scientists, engineers and statisticians often requires executing domain-specific application programs a
 massive number of times on large collections of file-based data.  This  process requires complex data management to pass data to, from, and between application invocations. Distributed and
 parallel computing resources can greatly speed up
 such processing, but their use increases the complexity of the programming effort and presents new barriers. The Swift parallel scripting language reduces these complexities with a
@@ -102,14 +102,14 @@
 %It is intended to serve as a higher level framework for composing the interaction of
 %concurrently executing programs (even parallel ones) and scripts
 %written in other scripting languages.
-Swift %is a higher-level language that 
+Swift %is a higher-level language that
 focuses not on the details of executing sequences or
 pipelines of scripts and programs (even parallel ones), but rather on the issues that arise
 from the concurrent execution, composition, and coordination of many independent computational tasks at
 large scale.
 %
 Swift scripts express the execution of programs to produce datasets using a C-like syntax
-consisting of function definitions and expressions, but with 
+consisting of function definitions and expressions, but with
 dataflow-driven semantics and implicit parallelism.
 
 %The emergence of large-scale production computing infrastructure such
@@ -161,7 +161,7 @@
 varies with time. In order to take advantage of as many processing units
 as possible during the execution of a Swift program, it is necessary to
 be flexible in the way the execution of individual processes is
-parallelized. 
+parallelized.
 Swift exploits the maximal concurrency permitted by data dependencies and by resource availability.
 
 %keep: discuss kthread/function duality; dont confuse with the parameter issue?
@@ -169,7 +169,7 @@
 %
 % Mihael thinks that the graph of function calls is not quite appropriate since this may
 % project an overly simplified view. Swift allows
-% this graph to be built dynamically. For that matter, any parallel program will end up 
+% this graph to be built dynamically. For that matter, any parallel program will end up
 % having a trace that can be represented as a graph. And any purely functional program
 % will have a graph representing dependencies (and again, that graph may be a non-trivial
 % thing built at run-time). It so happens that, using futures, we make the dependency
@@ -192,7 +192,7 @@
 implementations of evaluation strategies different from the widespread
 eager evaluation, as seen in lazily evaluated languages
 such as Haskell~\cite{Haskell}.
- 
+
 In order to achieve automatic
 parallelization, Swift is based on the synchronization construct of \emph{futures}~\cite{Futures}, which
 can result in abundant parallelism. Every Swift variable (including every members of structures and arrays) is a future.
@@ -202,7 +202,7 @@
 
 % Mihael thinks it's more powerful to say that dependency analysis
 % is a complex issue for any non-functional language. It is not that
-% some swift constraints make dependency analysis complex, but 
+% some swift constraints make dependency analysis complex, but
 % the very idea of dependency analysis when allowing side-effects
 % is complex.
 
@@ -251,7 +251,7 @@
 %keep: how best to state process/function duality?  Each invocation of a function is a process; all functions run in parallel; foreach loops are unfolded and run in parallel; essentially the entire program is unfolded. (Note: iterate stops this behavior and is thus useful; address scalability issues of this and future graph partitioning; how throttling keeps this manageable.
 
 % re iterate. foreach can also stop this behavior if there are inter-iteration dependencies
-% 
+%
 
 %This duality allows the formal specification of process behavior. In the following Swift statement, the semantics are defined in terms of the specification of the function
 %``rotate'' when supplied with specific parameter types:
@@ -264,12 +264,12 @@
 % and whether the
 % implementation can be described as a ``library call'' or a ``program
 % invocation'' changes nothing with respect to what the piece of program
-% fundamentally does: produce a rotated version of the original. 
+% fundamentally does: produce a rotated version of the original.
 
 %Indeed, there is no strict requirement in the specification of the Swift
 %language dictating that functions be implemented as command-line
 %applications. They can equally consist of library calls or functions
-%written in Swift itself, as long as they are side-effect free. 
+%written in Swift itself, as long as they are side-effect free.
 
 %A soft restriction arises from the desire to distribute the execution of
 %functions across a collection of heterogeneous resources, which, with
@@ -308,7 +308,7 @@
 processes. Under the assumption that eager evaluation of compositions of
 Monte Carlo processes also produces valid results, an eager Swift
 implementation (which is the case with the current implementation)
-readily accommodates Monte Carlo processes. 
+readily accommodates Monte Carlo processes.
 
 However, further discussion
 is necessary if optimizations (such as memoization) are employed.
@@ -338,14 +338,14 @@
 
 \subsection{Language facilities}
 
-At the core of the Swift language are function definitions, of which 
+At the core of the Swift language are function definitions, of which
 two types exist:
 \begin{description}
 \item[External functions] (also called ``atomic'') are functions whose
 implementations are not written in Swift. Currently external functions
 are implemented as command-line applications\footnote{Note that some Swift scripts are specified as library calls.}.
 
-\item[Internal functions] (also called ``compound'') are functions 
+\item[Internal functions] (also called ``compound'') are functions
 implemented in Swift.
 \end{description}
 
@@ -360,7 +360,7 @@
 }
 %%%
 
-The \emph{if} and \emph{switch} statements are rather standard, but 
+The \emph{if} and \emph{switch} statements are rather standard, but
 \emph{foreach} merits more discussion. Similar to \emph{Go}~\cite{GOLANG}
 and \emph{Python}, its control ``variables'' can be both
 an index and a value. The syntax is as follows:
@@ -371,7 +371,7 @@
 }
 \end{verbatim}
 
-This is necessary because Swift does not allow the use of mutable state 
+This is necessary because Swift does not allow the use of mutable state
 (i.e., variables are single-assignment).  Therefore, one is not able
 to write statements such as \verb|i = i + 1|.
 
@@ -383,7 +383,7 @@
 provided by the Swift runtime. Standard operators are defined for
 primitive types, such as addition, multiplication, concatenation, etc.
 
-\item[Mapped types] are types of data for which some external 
+\item[Mapped types] are types of data for which some external
 implementation exists. Swift provides a mechanism to describe
 isomorphisms between instances of Swift data structures and subsets in
 the  external implementation. This mechanism is called ``mapping''  and
@@ -846,7 +846,7 @@
 component program atomicity on data output.
 
 \katznote{this previous sentence
-has a lot of stuff that hasn't been defined, and the next one is equally confusing at this point in the paper.} 
+has a lot of stuff that hasn't been defined, and the next one is equally confusing at this point in the paper.}
 
 This can add substantial
 responsibility to component programs, in exchange for allowing arbitrary
@@ -1115,10 +1115,10 @@
 will fail, ultimately resulting in the entire script failing.
 
 In such a case, Swift provides a \emph{restart log} that encapsulates
-which function invocations have been successfully completed. 
+which function invocations have been successfully completed.
 %%%%%% What manual interv. and why???
 After
-appropriate manual intervention, 
+appropriate manual intervention,
 a subsequent Swift run may be started
 with this restart log; this will avoid re-execution of already
 executed invocations.
@@ -1190,7 +1190,7 @@
 Using Swift to submit to a large number of sites poses a number of
 practical challenges that are not encountered when running on a small
 number of sites. These challenges are seen when comparing execution on
-the relatively static TeraGrid~\cite{TeraGrid_2005} with execution on the 
+the relatively static TeraGrid~\cite{TeraGrid_2005} with execution on the
 more dynamic Open Science
 Grid (OSG)~\cite{OSG_2007}, where the set of sites that may be used is
 large and changing. It is impractical to maintain a site catalog by
@@ -1662,6 +1662,11 @@
 
 \end{verbatim}
 
+\section{Performance Characteristics}
+\label{Performance}
+
+
+
 \section{Related Work}
 \label{Related}
 
@@ -1720,7 +1725,7 @@
 \begin{itemize}
 
 \item Programming model: MapReduce only supports key-value pairs as
-  input or output datasets and two types of computation functions, 
+  input or output datasets and two types of computation functions,
   map and reduce; Swift provides a type system and allows the
   definition of complex data structures and arbitrary computational
   procedures.
@@ -1780,7 +1785,7 @@
 above Dryad, but it doesn't seem to be supported currently.  It appears to have been
 used for clusters and well-connected groups of clusters in a single administrative domain,
 unlike Swift supports a wider variety of platforms.  Also related is DryadLINQ~\cite{DryadLINQ},
-which generates Dryad computations from the LINQ extensions to C\#. 
+which generates Dryad computations from the LINQ extensions to C\#.
 
 GEL~\cite{GEL} is somewhat similar to Swift.  It defines programs to be run, then
 uses a script to express the order in which they should be run, handling the needed
@@ -1795,7 +1800,7 @@
 A few groups have been working on parallel and distributed versions of make~\cite{GXPmake, makeflow}.  These tools use the concept of virtual data, where the user defines the processing by which data is created, then calls for the final data product.  The make-like tools determine what processing is needed to get from the existing files to the final product, which includes
 running processing tasks.  If this is run on a distributed system, data movement also must
 be handled by the tools. In comparison, Swift is a language, which may be slightly
-less compact for describing applications that can be represented as static DAGs, but 
+less compact for describing applications that can be represented as static DAGs, but
 also allows easy programming of applications that have cycles and runtime decisions,
 such as in optimization problems.
 
@@ -1839,7 +1844,7 @@
 scheduling Coasters workers using the standard job submission
 techniques and employing an internal IP network.
 
-\mikenote{In order to achieve automatic parallelization in Swift, instead of using thunks (i.e., suspended computations), which yield lazy 
+\mikenote{In order to achieve automatic parallelization in Swift, instead of using thunks (i.e., suspended computations), which yield lazy
 evaluation, we employ futures, which result in eager parallelism. In this process, we trade the ability to efficiently deal with infinite structures for the ability to minimize computation time. It must, however, be noted that a middle ground exists: lazy futures (futures whose computation is delayed until a value is first needed).}
 
 \subsection{Filesystem access optimizations}