[Swift-commit] r2421 - text/hpdc09submission
noreply at svn.ci.uchicago.edu
noreply at svn.ci.uchicago.edu
Fri Jan 9 16:00:33 CST 2009
Author: wilde
Date: 2009-01-09 16:00:32 -0600 (Fri, 09 Jan 2009)
New Revision: 2421
Modified:
text/hpdc09submission/paper.latex
Log:
Merged conflicts with Mihael's latest edits on the abstract.
Edits to the intro and a few typos elsewhere.
Modified: text/hpdc09submission/paper.latex
===================================================================
--- text/hpdc09submission/paper.latex 2009-01-09 21:02:25 UTC (rev 2420)
+++ text/hpdc09submission/paper.latex 2009-01-09 22:00:32 UTC (rev 2421)
@@ -10,7 +10,6 @@
\title{SwiftScript - a language for loosely coupled distributed
parallel scripting \\ draft - contact benc at ci.uchicago.edu}
-
% ACM styleguide says max 3 authors here, rest in acknowledgements
\numberofauthors{4}
@@ -44,35 +43,39 @@
get more of this type of work done faster, but using such resources
imposes additional complexities.
-Swift addresses these complexities with a scripting language for composing
+Swift reduces these complexities with a scripting language for composing
ordinary application programs (serial or parallel) into more powerful
-distributed, parallelized applications. Applications expressed in Swift
-are location-independent and automatically parallelized.
+parallel applications that can be executed on distributed
+resources. Applications expressed
+in Swift are location-independent and automatically parallelized.
-Swift can execute scripts that perform tens to hundreds of thousands of
+Swift can execute scripts that perform hundreds of thousands of
program invocations on highly parallel resources, and deal with the
unreliable and dynamic aspects of wide-area distributed resources.
-The language provides a high level representation of collections of data
-and a specification of how those collections are to be mapped to that
-abstract representation and processed by component programs. Underlying
-this is an implementation that executes component applications on grids
-and other parallel platforms, providing automated site selection, data
-management, and reliability.
+The language provides a high level representation of collections of
+data and a specification of how those collections are to be mapped to
+that abstract representation and processed by component
+programs. Underlying this is an implementation that executes the
+component programs on grids and other parallel platforms, providing
+automated site selection, data management, and reliability.
We present the language, details of the implementation, application
examples, measurements, and ongoing research.
+% TODO: DECIDE: Drop SwiftScript, use Swift throughout to refer to the language?
+
\end{abstract}
\section{Introduction}
Swift is a scripting language designed for composing ordinary
application programs (serial or parallel) into distributed,
-parallelized, applications. It can execute scripts that perform tens
-to hundreds of thousands of program invocations on highly parallel
-resources, and its design is intended to scale to runs of many
-millions of invocations.
+parallelized, applications. It can execute scripts that perform
+hundreds of thousands of program invocations on highly parallel
+resources, and its design is expected to scale to handle millions of
+invocations and to thus address the needs of ``many-task
+computing''\cite{MTC}\cite{FALKONSC08}.
Swift's purpose is to enable ``loosely coupled scripting'' in a
convenient, powerful fashion. It is intended to serve as a higher
@@ -82,13 +85,20 @@
execution of programs (which can themselves be scripts written in any
other scripting language, or binary executables).
-As a ``parallel scripting
-language'', Swift is typically used to specify and execute scientific
-``workflows'' - which we define here as the execution of a
-series of steps to perform larger domain-specific tasks. We use the
-term workflow as defined by (Taylor et. al. 2006). So we often call a
-Swift script a workflow. TODO: Drop this paragraph/concept? Or crisp it up.
+Swift's contribution and primary value resides in the fact that it
+provides the minimal language constructs needed to coerce the process
+of specifiying how applications are glued together at large scale into
+a simple compact form of expression, while keeping the language simple
+and elegant, and carefully not replacing or overlapping with the tasks
+that existing scripting langauges do well. Swift regularizes and
+abstracts both the notion of data and process for distributed parallel
+execution of application programs.
+This paper goes deeper than prior papers\cite{SWIFTSWF08,SWIFTNNN} in
+describing the details of the swift language, detailing how it is
+implemented, and discussing its role in the toolkit of solutions for
+distributed parallel programming.
+
\subsection{Swift language concepts}
The Swift programming model is data-oriented: it
@@ -101,17 +111,22 @@
by scripting languages like Perl, Python, or the various command-line shells:
\begin{itemize}
-\item It can provide location transparent execution: automatically
-selecting a location for a given program invocation (section
-\ref{ExecutingSites})
-\item It can automatically parallelize the execution flow
-of program invocations, executing invocations that have no data
-dependencies in parallel, whilst throttling parallel invocations to a rate
-appropriate for each execution location
-\item It can record the provenance of derived data objects
-\item It can provide reliability through retrying of failed executions during
-a run and by logging completed work so that an interrupted script can be
-restarted from the point of interruption. (section \ref{ExecutingReliably})
+
+\item Location transparent execution: automatically selecting a
+location for each program invocation (section \ref{ExecutingSites})
+
+\item Automatic parallelization of program invocations invoking
+programs that have no data dependencies in parallel (section
+\ref{Language}) and throttling invocations to a rate appropriate for
+each execution location (section \ref{ExecutingSites}).
+
+\item Reliability through retry (and re-siting) of failed executions
+and restart of interrupted scripts from the point of
+failure. (section \ref{ExecutingReliably})
+
+\item Recording the provenance of derived data objects (section
+\ref{Provenance}).
+
\end{itemize}
In the rest of this section, we provide an overview of Swift's main
@@ -145,9 +160,6 @@
Swift programs typically contain very little code to manipulate data
directly.
-
-
-
\emph{Variables, data flow and procedures}. Swift variables hold
primitive values, or collections of files.
Variables are \emph{single assignment}, which is the
@@ -233,6 +245,7 @@
\section{The SwiftScript language}
+\label{Language}
\subsection{Language basics}
@@ -336,7 +349,8 @@
such as remote multisite execution and fault tolerance that will be
discussed in a later section.
-\subsection{Working with arrays}
+\subsection{Arrays and Parallel Execution}
+\label{ArraysAndForeach}
Arrays of values can be declared using the \verb|[]| suffix. An array
can be mapped to a collection of files, one element per file, by using
@@ -713,8 +727,8 @@
This file may be constructed by hand or mechanically from some
pre-existing database (such as a grid's existing discovery system).
-The site catalog may contain definitions fo rmultiple sites in which
-case exectuion will be attemted on all sites. In the presence of
+The site catalog may contain definitions for multiple sites in which
+case execution will be attemted on all sites. In the presence of
multiple sites, it is necessary to choose between the avalable sites.
The Swift \emph{site selector} achivees this by maintaining a score for
each site which determines the load that Swift will place on that site.
@@ -1123,6 +1137,7 @@
substituted for GridFTP.
\subsection{Provenance}
+\label{Provenance}
Swift produces log information regarding the provenance of its output files.
In an existing development module, this information can be imported into
@@ -1229,6 +1244,13 @@
\section{Comparison to Other Systems}
+As a ``parallel scripting
+language'', Swift is typically used to specify and execute scientific
+``workflows'' - which we define here as the execution of a
+series of steps to perform larger domain-specific tasks. We use the
+term workflow as defined by (Taylor et. al. 2006). So we often call a
+Swift script a workflow. TODO: Drop this paragraph/concept? Or crisp it up. Perhaps break down the systems that we compare Swift to into a few classes...?
+
Coordination languages and systems such as Linda\cite{LINDA},
Strand\cite{STRAN} and PCN\cite{PCN} allow composition of
distributed or parallel components, but usually require the components
More information about the Swift-commit
mailing list