[Swift-commit] r3857 - text/parco10submission
noreply at svn.ci.uchicago.edu
noreply at svn.ci.uchicago.edu
Wed Jan 5 14:52:25 CST 2011
Author: dsk
Date: 2011-01-05 14:52:24 -0600 (Wed, 05 Jan 2011)
New Revision: 3857
Modified:
text/parco10submission/paper.pdf
text/parco10submission/paper.tex
Log:
finished cut at merging intro and rationale sections together.
Modified: text/parco10submission/paper.pdf
===================================================================
(Binary files differ)
Modified: text/parco10submission/paper.tex
===================================================================
--- text/parco10submission/paper.tex 2011-01-05 20:27:12 UTC (rev 3856)
+++ text/parco10submission/paper.tex 2011-01-05 20:52:24 UTC (rev 3857)
@@ -97,23 +97,25 @@
Swift is a scripting language designed for composing
application programs into distributed,
-parallelized applications for execution on clusters, grids, clouds and supercomputers
-with tens to hundreds of thousands of processors. It is intended to
-serve as a higher level framework for composing the interaction of
-concurrently executing programs (even parallel ones) and scripts written in other scripting languages. Swift
-scripts express the execution of programs to produce datasets using a C-like syntax
+parallelized applications for execution on clusters, grids, clouds, and supercomputers
+with tens to hundreds of thousands of processors.
+%It is intended to serve as a higher level framework for composing the interaction of
+%concurrently executing programs (even parallel ones) and scripts
+%written in other scripting languages.
+Swift %is a higher-level language that
+focuses not on the details of executing sequences or
+pipelines of scripts and programs (even parallel ones), but rather on the issues that arise
+from the concurrent execution, composition, and coordination of many independent computational tasks at
+large scale.
+%
+Swift scripts express the execution of programs to produce datasets using a C-like syntax
consisting of function definitions and expressions, but with
dataflow-driven semantics and implicit parallelism.
-The emergence of large-scale production computing infrastructure such
-as clusters, grids and supercomputers, and the
-inherent complexity of programming on these systems, necessitates a
-new approach.
-Swift is a higher-level language
-that focuses not on the details of executing sequences or
-pipelines of programs, but rather on the issues that arise
-from the concurrent execution, composition, and coordination of many independent computational tasks at
-large scale.
+%The emergence of large-scale production computing infrastructure such
+%as clusters, grids and supercomputers, and the
+%inherent complexity of programming on these systems, necessitates a
+%new approach.
While many application needs involve the execution of a single large
message-passing parallel program, many others require the coupling or
@@ -134,16 +136,15 @@
glued together and executed in parallel at large scale.
It regularizes and
abstracts notions of processes and external data for distributed
-parallel execution of application programs. Swift scripts are location-independent and automatically parallelized by exploiting the maximal concurrency permitted by their data dependencies and by resource availability.
+parallel execution of application programs.
-\katznote{integrate with previous paragraph}
-Swift is implicitly parallel and distributed, in that the user does not explicitly code either parallel behavior or synchronization (or mutual exclusion); does not code explicit data transfer of files to the execution sites of jobs and back. In fact no knowledge of runtime execution locations is directly specified in a Swift script. The function model on which Swift is based ensures that execution of Swift scripts is deterministic, thus simplifying the scripting process.
-
-\katznote{fix this}
+Swift is implicitly parallel and distributed (or location-independent), in that the user does not explicitly code either parallel behavior or synchronization (or mutual exclusion) and does not code explicit transfer of files to and from execution sites. In fact, no knowledge of runtime execution locations is directly specified in a Swift script. The function model on which Swift is based ensures that execution of Swift scripts is deterministic (if the called functions are themselves deterministic), thus simplifying the scripting process.
Having the results of a Swift script be independent of the way that its function invocations
are parallelized implies that the functions must, for the same input,
produce the same output, irrespective of the time, order or location in
-which they are ``executed''. %This characteristic is reminiscent of
+which they are ``executed''.
+
+%This characteristic is reminiscent of
%referential transparency, and one may readily extend the concept to
%encompass arbitrary processes without difficulty.
@@ -152,27 +153,24 @@
Swift can execute scripts that perform tens of thousands of program
invocations on highly parallel resources, and handle the unreliable
and dynamic aspects of wide-area distributed resources. Such issues are handled by Swift's runtime system, and are not manifest in the user's scripts.
-
-%keep: discuss kthread/function duality; dont confuse with the parameter issue?
-Swift enables users to specify process composition by representing processes as functions, where input data files and process parameters become function parameters and output data files become function return values. \katznote{these are really Karajan threads - forward link to that? - what do we call these in this paper?} A Swift script is a graph of function calls - each function call ... see section 3.
-%keep:
-
-The Swift language
-provides a high level representation of collections of data and a
-specification of how those collections are to be mapped to that
-abstract representation and processed by external
-programs. Underlying this is an implementation that executes the
-external programs on clusters, grids and other parallel platforms, providing
-automated site selection, data management, and reliability.
-
-
-%keep
The exact number of processing units available on such shared resources
varies with time. In order to take advantage of as many processing units
as possible during the execution of a Swift program, it is necessary to
be flexible in the way the execution of individual processes is
parallelized.
+Swift exploits the maximal concurrency permitted by data dependencies and by resource availability.
+%keep: discuss kthread/function duality; dont confuse with the parameter issue?
+Swift enables users to specify process composition by representing processes as functions, where input data files and process parameters become function parameters and output data files become function return values. %\katznote{these are really Karajan threads - forward link to that? - what do we call these in this paper?} A Swift script is a graph of function calls - each function call ... see section~\ref{Execution}.
+%
+Swift also
+provides a high level representation of collections of data (used as
+function inputs and outputs) and a
+specification (``mappers'') that allows those collections to be processed by external
+programs. %Underlying this is an implementation that executes the
+%external programs on clusters, grids and other parallel platforms, providing
+%automated site selection, data management, and reliability.
+
We choose to make the Swift language purely functional (i.e., all operations
have a well-defined set of inputs and outputs, all variables are write-once,
and side effects are disallowed in the language) in order to prevent the difficulties that
@@ -194,11 +192,11 @@
sufficient specification and encapsulation of inputs to, and outputs
from, a given application, such that an execution environment could
automatically make remote execution transparent.
-
+%
Without this,
achieving location transparency %and automated parallel execution
-is not feasible. Swift adds to scripting what the remote procedure call (RPC) paradigm
-\cite{RPC} adds to programming: by formalizing the inputs and outputs of
+is not feasible. Swift adds to scripting what the remote procedure call (RPC)
+paradigm~\cite{RPC} adds to programming: by formalizing the inputs and outputs of
applications that have been declared as app() functions, it provides a way to make the remote
execution of applications transparent.
@@ -331,8 +329,8 @@
%%%
The \emph{if} and \emph{switch} statements are rather standard, but
-\emph{foreach} merits more discussion. Similar to \emph{Go}
-(\cite{GOLANG}) and \emph{Python}, its control ``variables'' can be both
+\emph{foreach} merits more discussion. Similar to \emph{Go}~\cite{GOLANG}
+and \emph{Python}, its control ``variables'' can be both
an index and a value. The syntax is as follows:
\begin{verbatim}
@@ -342,7 +340,7 @@
\end{verbatim}
This is necessary because Swift does not allow the use of mutable state
-(i.e., variables are single-assignment), therefore one would not be able
+(i.e., variables are single-assignment). Therefore, one is not able
to write statements such as \verb|i = i + 1|.
\subsection{Data model}
@@ -970,7 +968,7 @@
\section{Execution}
\label{Execution}
-Swift is implemented by compiling to a Karajan program\cite{Karajan}, which provides
+Swift is implemented by compiling to a Karajan program~\cite{Karajan}, which provides
several benefits: a lightweight threading model,
futures,
remote job execution,
@@ -1825,7 +1823,7 @@
%% In contrast to a text-oriented programming language like SwiftScript,
%% some scientists prefer to design simple programs using GUI design tools.
-%% An example of this is the LONI Pipeline tool\cite{LONIPIPELINE}. Preliminary
+%% An example of this is the LONI Pipeline tool~\cite{LONIPIPELINE}. Preliminary
%% investigations suggest that scientific workflows designed with that tool
%% can be straightforwardly compiled into SwiftScript and thus benefit from
%% Swift's execution system.
@@ -1840,7 +1838,7 @@
%% \subsection{Language development}
%% TODO: describe how it becomes more functional as time passes, as is
-%% becoming more popular. can ref mapreduce here\cite{MAPREDUCE} eg map
+%% becoming more popular. can ref mapreduce here~\cite{MAPREDUCE} eg map
%% operator extension - looks like foreach; and maybe some other
%% popular-ish functional language eg F\#
@@ -1859,7 +1857,7 @@
%% TODO: debugging of distributed system - can have a non-futures section
%% on what is available now - logprocessing module, as well as
-%% mentioning CEDPS\cite{CEDPS} as somewhat promising(?) for the future.
+%% mentioning CEDPS~\cite{CEDPS} as somewhat promising(?) for the future.
%% \subsection{Swift as a library}
%% Could existing programs execute Swift calls through a library
More information about the Swift-commit
mailing list