[Swift-commit] r2425 - text/hpdc09submission
noreply at svn.ci.uchicago.edu
noreply at svn.ci.uchicago.edu
Fri Jan 9 19:22:43 CST 2009
Author: wilde
Date: 2009-01-09 19:22:41 -0600 (Fri, 09 Jan 2009)
New Revision: 2425
Modified:
text/hpdc09submission/paper.latex
Log:
Revised abstract and intro and reduced them close to 1.5 page target.
Modified: text/hpdc09submission/paper.latex
===================================================================
--- text/hpdc09submission/paper.latex 2009-01-10 00:35:31 UTC (rev 2424)
+++ text/hpdc09submission/paper.latex 2009-01-10 01:22:41 UTC (rev 2425)
@@ -7,9 +7,11 @@
\bibliographystyle{abbrv} % for ACM SIGS style
-\title{SwiftScript - a language for loosely coupled distributed
-parallel scripting \\ draft - contact benc at ci.uchicago.edu}
+\title{Swift - a language for distributed
+parallel scripting}
+% draft - contact benc at ci.uchicago.edu
+
% ACM styleguide says max 3 authors here, rest in acknowledgements
\numberofauthors{4}
@@ -24,34 +26,33 @@
\alignauthor Mihael Hategan \\
\affaddr{University of Chicago Computation Institute}\\
\and
+\alignauthor Sarah Kenny \\
+ \affaddr{University of Chicago Computation Institute}\\
\alignauthor Michael Wilde \\
\affaddr{University of Chicago Computation Institute}\\
\affaddr{Argonne National Laboratory} \\
-\alignauthor Sarah Kenny \\
- \affaddr{University of Chicago Computation Institute}\\
}
\maketitle
-\verb|$Id$|
-
\begin{abstract}
-Scientists, engineers and business analysts often pursue their work by
-applying domain-specific programs to massive collections of file-based data.
-Distributed and parallel computing resources provide a powerful way to
-get more of this type of work done faster, but using such resources
-imposes additional complexities.
+Scientists, engineers and business analysts often work by performing a
+massive number of runs of domain-specific programs, typically coupled
+``loosely'' by large collections of file-based data. Distributed and
+parallel computing resources provide a powerful way to get more of
+this type of work done faster, but using such resources imposes
+additional complexities.
-Swift reduces these complexities with a scripting language for composing
-ordinary application programs (serial or parallel) into more powerful
-parallel applications that can be executed on distributed
-resources. Applications expressed
-in Swift are location-independent and automatically parallelized.
+Swift reduces these complexities with a scripting language for
+composing ordinary application programs (serial or parallel) into more
+powerful parallel applications that can be executed on distributed
+resources. Applications expressed in Swift are location-independent
+and automatically parallelized.
-Swift can execute scripts that perform hundreds of thousands of
-program invocations on highly parallel resources, and deal with the
-unreliable and dynamic aspects of wide-area distributed resources.
+Swift can execute scripts that perform tens of thousands of program
+invocations on highly parallel resources, and handle the unreliable
+and dynamic aspects of wide-area distributed resources.
The language provides a high level representation of collections of
data and a specification of how those collections are to be mapped to
@@ -61,7 +62,8 @@
automated site selection, data management, and reliability.
We present the language, details of the implementation, application
-examples, measurements, and ongoing research.
+examples, measurements, and ongoing research, focusing on its
+importance as a distributed computing paradigm.
% TODO: DECIDE: Drop SwiftScript, use Swift throughout to refer to the language?
@@ -71,186 +73,154 @@
Swift is a scripting language designed for composing ordinary
application programs (serial or parallel) into distributed,
-parallelized, applications. It can execute scripts that perform
-hundreds of thousands of program invocations on highly parallel
-resources, and its design is expected to scale to handle millions of
-invocations and to thus address the needs of ``many-task
-computing''\cite{MTC}\cite{FALKONSC08}.
+parallelized applications for execution on grids and supercomputers
+with tens to hundreds of thousands of processors. It is intended to
+serve as a higher level framework for composing parallel pipelines of
+other programs and scripts, sitting above (and utilizing) existing
+scripting languages and applications. Swift scripts express the
+execution of programs to produce datasets using a dataflow-driven
+specification. The application programs executed by a Swift script can
+be binary executables or can be scripts written in any other scripting
+language.
-Swift's purpose is to enable ``loosely coupled scripting'' in a
-convenient, powerful fashion. It is intended to serve as a higher
-level framework for composing parallel pipelines of other programs and
-scripts. Much like a Makefile encapsulates the compilation
-process, Swift puts a data-driven ``make-like'' wrapper around the
-execution of programs (which can themselves be scripts written in any
-other scripting language, or binary executables).
-
-Swift's contribution and primary value resides in the fact that it
-provides the minimal language constructs needed to coerce the process
-of specifiying how applications are glued together at large scale into
-a simple compact form of expression, while keeping the language simple
-and elegant, and carefully not replacing or overlapping with the tasks
-that existing scripting langauges do well. Swift regularizes and
+Swift's contribution and primary value is that it provides a simple,
+minimal set of language constructs to specifiy how applications are
+glued together at large scale in a simple compact form, while keeping
+the language simple and elegant, and minimizing any overlap with the
+tasks that existing scripting langauges do well. Swift regularizes and
abstracts both the notion of data and process for distributed parallel
execution of application programs.
-This paper goes deeper than prior papers\cite{SWIFTSWF08,SWIFTNNN} in
-describing the details of the swift language, detailing how it is
-implemented, and discussing its role in the toolkit of solutions for
-distributed parallel programming.
+This paper goes into greater depth than prior publications
+\cite{SWIFTSWF08,SWIFTNNN} in describing the Swift language, how its
+implementation handles large-scale and distributed execution
+environments, and its contribution to distributed parallel computing.
+TODO: Provide a compelling example here, perhaps with
+a code segment, of the power of Swift, in a single paragraph.
+
\subsection{Swift language concepts}
-The Swift programming model is data-oriented: it
-encapsulates the invocation of ``ordinary programs'' - technically, POSIX
-\emph{exec()} operations - in a manner that explicitly specifies the files
-and other arguments that are the inputs and outputs of each program
-invocation. This formal but simple model (elaborated on in section
-\ref{LanguageEnvironment})
-enables Swift to provide four critical features not provided
-by scripting languages like Perl, Python, or the various command-line shells:
+The Swift programming model is data-oriented: it encapsulates the
+invocation of ``ordinary programs'' - technically, POSIX \emph{exec()}
+operations - in a manner that explicitly specifies the files and other
+arguments that are the inputs and outputs of each program
+invocation. This formal but simple model (elaborated in section
+\ref{LanguageEnvironment}) enables Swift to provide several critical
+features not provided by - nor readily implemented in - existing
+scripting languages like Perl, Python, or shells:
\begin{itemize}
\item Location transparent execution: automatically selecting a
-location for each program invocation (section \ref{ExecutingSites})
+location for each program invocation and managing diverse execution
+environments. Swift scripts can be tested on a single local
+workstation. The same script can then be executed on a cluster, one or
+more grids of clusters, and on large scale parallel supercomputers
+such as the Sun Constellation (ref) or the IBM Blue Gene/P. (section
+\ref{ExecutingSites})
-\item Automatic parallelization of program invocations invoking
+\item Automatic parallelization of program invocations, invoking
programs that have no data dependencies in parallel (section
\ref{Language})
-\item Management of program invocations such as throttling
-to a rate appropriate for each execution location and mechanism
-(section \ref{ExecutingSites}).
+\item Automatic balancing work over available resources based
+on adaptive algorithms that account for both resource performance
+and reliability, and which throttle program invocations at a rate
+appropriate for each execution location and mechanism (section
+\ref{ExecutingSites}).
-\item Reliability through retry (and re-siting) of failed executions
+\item Reliability through retry and relocation of failed executions
and restart of interrupted scripts from the point of
failure. (section \ref{ExecutingReliably})
-\item Automatic balancing of the workload to available resources
-based on adaptive algorithms that can account for both resource
-performance and reliability.
+\item Recording the provenance of data objects produced by a Swift
+script (section \ref{Provenance}).
-\item Recording the provenance of derived data objects (section
-\ref{Provenance}).
-
\end{itemize}
-In the rest of this section, we provide an overview of Swift's main
-concepts. Each concept is elaborated, with examples, in subsequent
-sections. [TODO: we will need to adjust between how much to specify
-here, and how much to state just before each construct is introduced.
+Swift is intentionally designed to be a sparse, minimal scripting
+language. Its sole purpose is to sequence and schedule the execution
+of other programs. As such, Swift has only a very limited set of data
+types, operators, and built-in functions. The essence of the Swift
+language, which makes the benefits above possible, can be summarized
+as follows:
-\emph{Execution of atomic procedures}. Underlying this is an
-implementation to execute scripts on grid and other platforms,
-providing built-in site selection, data management and
-reliability. Swift scripts can be tested on a single local
-workstation. The same script can then be executed on a cluster, one or
-more grids of clusters, and on large scale parallel supercomputers
-such as the Sun Constellation (ref) or the IBM Blue Gene/P. More information
-about the implementation is found in section \ref{Execution}.
+Swift scripts are written as a set of procedures, composed upwards,
+starting with \emph{atomic procedures} which specify the execution of
+component programs, and then higher level procedures are composed as
+pipelines (or more generally, graphs) of sub-procedures. Atomic
+procedures specify the inputs and outputs of application programs in
+terms of files and other parameters. Compound procedures are composed
+of a graph of calls to atomic and other compound procedures
-\emph{Minimalist nature of Swift}. As a scripting language, Swift is
-intentionally designed to be a sparse, minimal language. Its primary
-purpose is to coordinate and manage the execution of
-other programs. As such, it has only a very limited set of data types,
-operators, and built-in functions.
+Swift variables hold either primitive values, files, or collections of
+files. Atomic variables are \emph{single assignment}, which provides
+the basis for Swift's model of procedure chaining. Procedures are
+executed when their input parameters have all been set from existing
+data or prior procedure executions. Procedures are chained by
+specifying that an output variable of one procedure is passed as the
+input variable to the second procedure.
-We believe strongly in, and our experience reinforces, the principle
-that Swift - or languages like it - play an important role in the
-family of programming languages. Ordinary scripting languages provide
-the constructs for manipulating files and typically contain rich
-operators, primitives and libraries for large classes of useful
-operations such as string processing, math operations, internet and
-file operations.
+% This dataflow model means that
+% Swift procedures are not necessarily executed in source-code order but
+% rather when their input data becomes available.
-Swift programs typically contain very little code to manipulate data
-directly.
+Variables are declared with a type, and when they contain files
+are associated with a \emph{mapper} which indicates how physical
+data files are associated with the logical representation of Swift's
+data model of variables and collections.
-\emph{Variables, data flow and procedures}. Swift variables hold
-primitive values, or collections of files.
-Variables are \emph{single assignment}, which is the
-basis for Swift's model of procedure chaining.
-Procedures are executed when their input parameters are all defined (i.e.
-have a value).
-Procedures are chained by specifying that an output variable of one
-procedure is passed as the input variable to the second procedure. This
-dataflow model means that within a script, procedures are not executed
-in source-code order; instead they are executed as input data becomes
-available.
-
-Variables are given a type, and when they contain collections of files,
-are associated with a \emph{mapper} which indicates how the layout of
-data files is associated with the logical representation in the Swift
-data model. See section \ref{LanguageTypes}.
-
-Swift programs are composed
-starting with \emph{atomic procedures} which execute component programs,
-and then higher level procedures are composed as pipelines of sub-procedures.
-
\subsection{Rationale for creating Swift}
-\emph{TODO: This section needs much polishing/condensing.}
+Why do we need Swift? Why create yet another scripting language for
+the execution of application programs when so many exist? Swift was
+developed to create a higher-level language that focuses not on the
+details of executing sequences or ``pipelines'' of programs, but
+rather on specific issues that arise from scale.
-Why do we need Swift? Why create yet another scripting
-language for the execution of application programs when so many exist?
-Swift was developed to create a higher-level language that focuses not
-on the details of executing sequences or ``pipelines'' of programs, but
-rather on specific issues that arise from scale. These issues,
-however, once identified, seem to equally well apply to, and benefit
-the execution of, application pipelines that are not large-scale and
-not necessarily distributed.
+% These issues,
+% however, once identified, seem to equally well apply to, and benefit
+% the execution of, application pipelines that are not large-scale and
+% not necessarily distributed. Our motivation for developing Swift is
+% based on the following premises:
-Our motivation for developing Swift is based on the following premises:
-
-Scaling up requires the distribution of execution among
-many computers (``resources''), and hence a ``grid'' approach. Even if a
-single large parallel resource suffices, users won't always have
-access to the same supercomputer cluster: resources are scarce, and users often need or
-want to utilize whatever resource happened to be available or economical at the moment
-when they need to perform intensive computation.
-
While many application needs involve the execution of a single large
and perhaps message-passing parallel app, many others require the
coupling or orchestration of large numbers of application invocations:
either many invocations of the same app, or many invocations of
sequences and patterns of several apps. In this model, existing apps
become like functions in programming, and users typically need to
-execute many of them.
+execute many of them. Scaling up requires the distribution of such
+workloads among many computers (``resources''), and hence a ``grid''
+approach. Even if a single large parallel resource suffices, users
+won't always have access to the same supercomputer cluster: hence the
+need to utilize whatever resource happened to be available or
+economical at the moment when they need to perform intensive
+computation - without continued reprogramming or adjustment of scripts.
-Ousterhout in (Ousterhout 1998) eloquently laid out the rational and
-motivation for scripting languages. As the creator of Tcl [ref], he
-described here the difference between programming and scripting, and
-the place of each in the scheme of applying computers to solving
-problems.
+% Ousterhout in (Ousterhout 1998) eloquently laid out the rational and
+% motivation for scripting languages. As the creator of Tcl [ref], he
+% described here the difference between programming and scripting, and
+% the place of each in the scheme of applying computers to solving
+% problems.
What's missing in current scripting languages is sufficient
specification and encapsulation of inputs to, and outputs from, a
given application, such that an execution environment could
-automatically make remote execution transparent.
+automatically make remote execution transparent. Without this,
+achieving location transparancy and automated parallel execution is
+not feasible. Swift adds to scripting what RPC adds to programming:
+by formalizing the inputs and outputs of
+``applications-as-procedures'', it provides a way to make the remote -
+and hence parallel - execution of applications fairly transparent.
-In a sense, Swift adds to scripting what RPC adds to programming: by
-formalizing the inputs and outputs of ``applications-as-procedures'', it
-provides a way to make the remote - and hence parallel - execution of
-applications fairly transparent.
+TODO: Refine and condense this rationale.
-It's useful to draw an analogy here between Swift and make. Just as a
-``makefile'' - which in a sense is can be considered a script or program
-(or certainly a ``recipe'') - is a specification of how to derive an
-application program, a Swift script is a recipe for how to produce a
-set of data. Unlike make, in which case the derived product is only
-produced once, in Swift, derived datasets are repetitively derived.
-
-Usage of Swift. Swift is achieving growing use on a variety of science
-problems.
-
-... TODO: provide details.
-
-TODO: In the remainder of this paper, ... we present the language,
+In the remainder of this paper, we present the language,
details of the implementation, application use-cases and ongoing
-research.
+research. TODO: refine this sentence.
-
\section{The SwiftScript language}
\label{Language}
@@ -1362,6 +1332,21 @@
workflow restart, reliable execution over multiple Grid sites, and
(via Falkon and CoG coasters) fast job execution.
+\section{Conclusion}
+
+Our experience reinforces the belief that Swift plays an important
+role in the family of programming languages. Ordinary scripting
+languages provide the constructs for manipulating files and typically
+contain rich operators, primitives, and libraries for large classes of
+useful operations such as string, math, internet, and file
+operations. In contrast, Swift scripts typically contain very little
+code that manipulates data directly. They contain instead the "data
+flow recipes" and input/output specifications of each program
+invocation such that the location and environment transparency goals
+can be implemented automatically by the Swift environment.
+
+TODO: Polish conclusion - was pasted here from intro and doesnt fit yet.
+
\section{Acknowledgements}
TODO: authors beyond number 3 go here according to ACM style guide, rather
@@ -1449,4 +1434,8 @@
%\end{thebibliography}
\bibliography{paper} % for ACM SIGS style
+
+\verb|$Id$|
+
\end{document}
+
More information about the Swift-commit
mailing list