[Swift-commit] r3862 - text/parco10submission
noreply at svn.ci.uchicago.edu
noreply at svn.ci.uchicago.edu
Wed Jan 5 17:22:00 CST 2011
Author: wozniak
Date: 2011-01-05 17:22:00 -0600 (Wed, 05 Jan 2011)
New Revision: 3862
Modified:
text/parco10submission/paper.tex
Log:
Stake out Performance section
Modified: text/parco10submission/paper.tex
===================================================================
--- text/parco10submission/paper.tex 2011-01-05 23:15:41 UTC (rev 3861)
+++ text/parco10submission/paper.tex 2011-01-05 23:22:00 UTC (rev 3862)
@@ -69,7 +69,7 @@
\begin{abstract}
-The work of scientists, engineers and statisticians often requires executing domain-specific application programs a
+The work of scientists, engineers and statisticians often requires executing domain-specific application programs a
massive number of times on large collections of file-based data. This process requires complex data management to pass data to, from, and between application invocations. Distributed and
parallel computing resources can greatly speed up
such processing, but their use increases the complexity of the programming effort and presents new barriers. The Swift parallel scripting language reduces these complexities with a
@@ -102,14 +102,14 @@
%It is intended to serve as a higher level framework for composing the interaction of
%concurrently executing programs (even parallel ones) and scripts
%written in other scripting languages.
-Swift %is a higher-level language that
+Swift %is a higher-level language that
focuses not on the details of executing sequences or
pipelines of scripts and programs (even parallel ones), but rather on the issues that arise
from the concurrent execution, composition, and coordination of many independent computational tasks at
large scale.
%
Swift scripts express the execution of programs to produce datasets using a C-like syntax
-consisting of function definitions and expressions, but with
+consisting of function definitions and expressions, but with
dataflow-driven semantics and implicit parallelism.
%The emergence of large-scale production computing infrastructure such
@@ -161,7 +161,7 @@
varies with time. In order to take advantage of as many processing units
as possible during the execution of a Swift program, it is necessary to
be flexible in the way the execution of individual processes is
-parallelized.
+parallelized.
Swift exploits the maximal concurrency permitted by data dependencies and by resource availability.
%keep: discuss kthread/function duality; dont confuse with the parameter issue?
@@ -169,7 +169,7 @@
%
% Mihael thinks that the graph of function calls is not quite appropriate since this may
% project an overly simplified view. Swift allows
-% this graph to be built dynamically. For that matter, any parallel program will end up
+% this graph to be built dynamically. For that matter, any parallel program will end up
% having a trace that can be represented as a graph. And any purely functional program
% will have a graph representing dependencies (and again, that graph may be a non-trivial
% thing built at run-time). It so happens that, using futures, we make the dependency
@@ -192,7 +192,7 @@
implementations of evaluation strategies different from the widespread
eager evaluation, as seen in lazily evaluated languages
such as Haskell~\cite{Haskell}.
-
+
In order to achieve automatic
parallelization, Swift is based on the synchronization construct of \emph{futures}~\cite{Futures}, which
can result in abundant parallelism. Every Swift variable (including every members of structures and arrays) is a future.
@@ -202,7 +202,7 @@
% Mihael thinks it's more powerful to say that dependency analysis
% is a complex issue for any non-functional language. It is not that
-% some swift constraints make dependency analysis complex, but
+% some swift constraints make dependency analysis complex, but
% the very idea of dependency analysis when allowing side-effects
% is complex.
@@ -251,7 +251,7 @@
%keep: how best to state process/function duality? Each invocation of a function is a process; all functions run in parallel; foreach loops are unfolded and run in parallel; essentially the entire program is unfolded. (Note: iterate stops this behavior and is thus useful; address scalability issues of this and future graph partitioning; how throttling keeps this manageable.
% re iterate. foreach can also stop this behavior if there are inter-iteration dependencies
-%
+%
%This duality allows the formal specification of process behavior. In the following Swift statement, the semantics are defined in terms of the specification of the function
%``rotate'' when supplied with specific parameter types:
@@ -264,12 +264,12 @@
% and whether the
% implementation can be described as a ``library call'' or a ``program
% invocation'' changes nothing with respect to what the piece of program
-% fundamentally does: produce a rotated version of the original.
+% fundamentally does: produce a rotated version of the original.
%Indeed, there is no strict requirement in the specification of the Swift
%language dictating that functions be implemented as command-line
%applications. They can equally consist of library calls or functions
-%written in Swift itself, as long as they are side-effect free.
+%written in Swift itself, as long as they are side-effect free.
%A soft restriction arises from the desire to distribute the execution of
%functions across a collection of heterogeneous resources, which, with
@@ -308,7 +308,7 @@
processes. Under the assumption that eager evaluation of compositions of
Monte Carlo processes also produces valid results, an eager Swift
implementation (which is the case with the current implementation)
-readily accommodates Monte Carlo processes.
+readily accommodates Monte Carlo processes.
However, further discussion
is necessary if optimizations (such as memoization) are employed.
@@ -338,14 +338,14 @@
\subsection{Language facilities}
-At the core of the Swift language are function definitions, of which
+At the core of the Swift language are function definitions, of which
two types exist:
\begin{description}
\item[External functions] (also called ``atomic'') are functions whose
implementations are not written in Swift. Currently external functions
are implemented as command-line applications\footnote{Note that some Swift scripts are specified as library calls.}.
-\item[Internal functions] (also called ``compound'') are functions
+\item[Internal functions] (also called ``compound'') are functions
implemented in Swift.
\end{description}
@@ -360,7 +360,7 @@
}
%%%
-The \emph{if} and \emph{switch} statements are rather standard, but
+The \emph{if} and \emph{switch} statements are rather standard, but
\emph{foreach} merits more discussion. Similar to \emph{Go}~\cite{GOLANG}
and \emph{Python}, its control ``variables'' can be both
an index and a value. The syntax is as follows:
@@ -371,7 +371,7 @@
}
\end{verbatim}
-This is necessary because Swift does not allow the use of mutable state
+This is necessary because Swift does not allow the use of mutable state
(i.e., variables are single-assignment). Therefore, one is not able
to write statements such as \verb|i = i + 1|.
@@ -383,7 +383,7 @@
provided by the Swift runtime. Standard operators are defined for
primitive types, such as addition, multiplication, concatenation, etc.
-\item[Mapped types] are types of data for which some external
+\item[Mapped types] are types of data for which some external
implementation exists. Swift provides a mechanism to describe
isomorphisms between instances of Swift data structures and subsets in
the external implementation. This mechanism is called ``mapping'' and
@@ -846,7 +846,7 @@
component program atomicity on data output.
\katznote{this previous sentence
-has a lot of stuff that hasn't been defined, and the next one is equally confusing at this point in the paper.}
+has a lot of stuff that hasn't been defined, and the next one is equally confusing at this point in the paper.}
This can add substantial
responsibility to component programs, in exchange for allowing arbitrary
@@ -1115,10 +1115,10 @@
will fail, ultimately resulting in the entire script failing.
In such a case, Swift provides a \emph{restart log} that encapsulates
-which function invocations have been successfully completed.
+which function invocations have been successfully completed.
%%%%%% What manual interv. and why???
After
-appropriate manual intervention,
+appropriate manual intervention,
a subsequent Swift run may be started
with this restart log; this will avoid re-execution of already
executed invocations.
@@ -1190,7 +1190,7 @@
Using Swift to submit to a large number of sites poses a number of
practical challenges that are not encountered when running on a small
number of sites. These challenges are seen when comparing execution on
-the relatively static TeraGrid~\cite{TeraGrid_2005} with execution on the
+the relatively static TeraGrid~\cite{TeraGrid_2005} with execution on the
more dynamic Open Science
Grid (OSG)~\cite{OSG_2007}, where the set of sites that may be used is
large and changing. It is impractical to maintain a site catalog by
@@ -1662,6 +1662,11 @@
\end{verbatim}
+\section{Performance Characteristics}
+\label{Performance}
+
+
+
\section{Related Work}
\label{Related}
@@ -1720,7 +1725,7 @@
\begin{itemize}
\item Programming model: MapReduce only supports key-value pairs as
- input or output datasets and two types of computation functions,
+ input or output datasets and two types of computation functions,
map and reduce; Swift provides a type system and allows the
definition of complex data structures and arbitrary computational
procedures.
@@ -1780,7 +1785,7 @@
above Dryad, but it doesn't seem to be supported currently. It appears to have been
used for clusters and well-connected groups of clusters in a single administrative domain,
unlike Swift supports a wider variety of platforms. Also related is DryadLINQ~\cite{DryadLINQ},
-which generates Dryad computations from the LINQ extensions to C\#.
+which generates Dryad computations from the LINQ extensions to C\#.
GEL~\cite{GEL} is somewhat similar to Swift. It defines programs to be run, then
uses a script to express the order in which they should be run, handling the needed
@@ -1795,7 +1800,7 @@
A few groups have been working on parallel and distributed versions of make~\cite{GXPmake, makeflow}. These tools use the concept of virtual data, where the user defines the processing by which data is created, then calls for the final data product. The make-like tools determine what processing is needed to get from the existing files to the final product, which includes
running processing tasks. If this is run on a distributed system, data movement also must
be handled by the tools. In comparison, Swift is a language, which may be slightly
-less compact for describing applications that can be represented as static DAGs, but
+less compact for describing applications that can be represented as static DAGs, but
also allows easy programming of applications that have cycles and runtime decisions,
such as in optimization problems.
@@ -1839,7 +1844,7 @@
scheduling Coasters workers using the standard job submission
techniques and employing an internal IP network.
-\mikenote{In order to achieve automatic parallelization in Swift, instead of using thunks (i.e., suspended computations), which yield lazy
+\mikenote{In order to achieve automatic parallelization in Swift, instead of using thunks (i.e., suspended computations), which yield lazy
evaluation, we employ futures, which result in eager parallelism. In this process, we trade the ability to efficiently deal with infinite structures for the ability to minimize computation time. It must, however, be noted that a middle ground exists: lazy futures (futures whose computation is delayed until a value is first needed).}
\subsection{Filesystem access optimizations}
More information about the Swift-commit
mailing list