[Swift-commit] r2421 - text/hpdc09submission

noreply at svn.ci.uchicago.edu noreply at svn.ci.uchicago.edu
Fri Jan 9 16:00:33 CST 2009


Author: wilde
Date: 2009-01-09 16:00:32 -0600 (Fri, 09 Jan 2009)
New Revision: 2421

Modified:
   text/hpdc09submission/paper.latex
Log:
Merged conflicts with Mihael's latest edits on the abstract.
Edits to the intro and a few typos elsewhere.


Modified: text/hpdc09submission/paper.latex
===================================================================
--- text/hpdc09submission/paper.latex	2009-01-09 21:02:25 UTC (rev 2420)
+++ text/hpdc09submission/paper.latex	2009-01-09 22:00:32 UTC (rev 2421)
@@ -10,7 +10,6 @@
 \title{SwiftScript - a language for loosely coupled distributed
 parallel scripting \\ draft - contact benc at ci.uchicago.edu}
 
-
 % ACM styleguide says max 3 authors here, rest in acknowledgements
 
 \numberofauthors{4}
@@ -44,35 +43,39 @@
 get more of this type of work done faster, but using such resources
 imposes additional complexities.
 
-Swift addresses these complexities with a scripting language for composing
+Swift reduces these complexities with a scripting language for composing
 ordinary application programs (serial or parallel) into more powerful
-distributed, parallelized applications. Applications expressed in Swift
-are location-independent and automatically parallelized. 
+parallel applications that can be executed on distributed
+resources. Applications expressed
+in Swift are location-independent and automatically parallelized.
 
-Swift can execute scripts that perform tens to hundreds of thousands of
+Swift can execute scripts that perform hundreds of thousands of
 program invocations on highly parallel resources, and deal with the
 unreliable and dynamic aspects of wide-area distributed resources.
 
-The language provides a high level representation of collections of data
-and a specification of how those collections are to be mapped to that
-abstract representation and processed by component programs. Underlying
-this is an implementation that executes component applications on grids
-and other parallel platforms, providing automated site selection, data
-management, and reliability. 
+The language provides a high level representation of collections of
+data and a specification of how those collections are to be mapped to
+that abstract representation and processed by component
+programs. Underlying this is an implementation that executes the
+component programs on grids and other parallel platforms, providing
+automated site selection, data management, and reliability.
 
 We present the language, details of the implementation, application
 examples, measurements, and ongoing research.
 
+% TODO: DECIDE: Drop SwiftScript, use Swift throughout to refer to the language?
+
 \end{abstract}
 
 \section{Introduction}
 
 Swift is a scripting language designed for composing ordinary
 application programs (serial or parallel) into distributed,
-parallelized, applications. It can execute scripts that perform tens
-to hundreds of thousands of program invocations on highly parallel
-resources, and its design is intended to scale to runs of many
-millions of invocations.
+parallelized, applications. It can execute scripts that perform
+hundreds of thousands of program invocations on highly parallel
+resources, and its design is expected to scale to handle millions of
+invocations and to thus address the needs of ``many-task
+computing''\cite{MTC}\cite{FALKONSC08}.
 
 Swift's purpose is to enable ``loosely coupled scripting'' in a
 convenient, powerful fashion. It is intended to serve as a higher
@@ -82,13 +85,20 @@
 execution of programs (which can themselves be scripts written in any
 other scripting language, or binary executables).
 
-As a ``parallel scripting
-language'', Swift is typically used to specify and execute scientific
-``workflows'' - which we define here as the execution of a
-series of steps to perform larger domain-specific tasks. We use the
-term workflow as defined by (Taylor et. al. 2006). So we often call a
-Swift script a workflow. TODO: Drop this paragraph/concept? Or crisp it up.
+Swift's contribution and primary value resides in the fact that it
+provides the minimal language constructs needed to coerce the process
+of specifiying how applications are glued together at large scale into
+a simple compact form of expression, while keeping the language simple
+and elegant, and carefully not replacing or overlapping with the tasks
+that existing scripting langauges do well. Swift regularizes and
+abstracts both the notion of data and process for distributed parallel
+execution of application programs.
 
+This paper goes deeper than prior papers\cite{SWIFTSWF08,SWIFTNNN} in
+describing the details of the swift language, detailing how it is
+implemented, and discussing its role in the toolkit of solutions for
+distributed parallel programming.
+
 \subsection{Swift language concepts}
 
 The Swift programming model is data-oriented: it
@@ -101,17 +111,22 @@
 by scripting languages like Perl, Python, or the various command-line shells:
 
 \begin{itemize}
-\item It can provide location transparent execution: automatically
-selecting a location for a given program invocation (section
-\ref{ExecutingSites})
-\item It can automatically parallelize the execution flow
-of program invocations, executing invocations that have no data
-dependencies in parallel, whilst throttling parallel invocations to a rate
-appropriate for each execution location
-\item It can record the provenance of derived data objects
-\item It can provide reliability through retrying of failed executions during
-a run and by logging completed work so that an interrupted script can be
-restarted from the point of interruption. (section \ref{ExecutingReliably})
+
+\item Location transparent execution: automatically selecting a
+location for each program invocation (section \ref{ExecutingSites})
+
+\item Automatic parallelization of program invocations invoking
+programs that have no data dependencies in parallel (section
+\ref{Language}) and throttling invocations to a rate appropriate for
+each execution location (section \ref{ExecutingSites}).
+
+\item Reliability through retry (and re-siting) of failed executions
+and restart of interrupted scripts from the point of
+failure. (section \ref{ExecutingReliably})
+
+\item Recording the provenance of derived data objects (section
+\ref{Provenance}).
+
 \end{itemize}
 
 In the rest of this section, we provide an overview of Swift's main
@@ -145,9 +160,6 @@
 Swift programs typically contain very little code to manipulate data
 directly.
 
-
-
-
 \emph{Variables, data flow and procedures}. Swift variables hold
 primitive values, or collections of files.
 Variables are \emph{single assignment}, which is the
@@ -233,6 +245,7 @@
 
 
 \section{The SwiftScript language}
+\label{Language}
 
 \subsection{Language basics}
 
@@ -336,7 +349,8 @@
 such as remote multisite execution and fault tolerance that will be
 discussed in a later section.
 
-\subsection{Working with arrays}
+\subsection{Arrays and Parallel Execution}
+\label{ArraysAndForeach}
 
 Arrays of values can be declared using the \verb|[]| suffix. An array
 can be mapped to a collection of files, one element per file, by using
@@ -713,8 +727,8 @@
 This file may be constructed by hand or mechanically from some
 pre-existing database (such as a grid's existing discovery system).
 
-The site catalog may contain definitions fo rmultiple sites in which
-case exectuion will be attemted on all sites. In the presence of
+The site catalog may contain definitions for multiple sites in which
+case execution will be attemted on all sites. In the presence of
 multiple sites, it is necessary to choose between the avalable sites.
 The Swift \emph{site selector} achivees this by maintaining a score for
 each site which determines the load that Swift will place on that site.
@@ -1123,6 +1137,7 @@
 substituted for GridFTP.
 
 \subsection{Provenance}
+\label{Provenance}
 
 Swift produces log information regarding the provenance of its output files.
 In an existing development module, this information can be imported into
@@ -1229,6 +1244,13 @@
 
 \section{Comparison to Other Systems}
 
+As a ``parallel scripting
+language'', Swift is typically used to specify and execute scientific
+``workflows'' - which we define here as the execution of a
+series of steps to perform larger domain-specific tasks. We use the
+term workflow as defined by (Taylor et. al. 2006). So we often call a
+Swift script a workflow. TODO: Drop this paragraph/concept? Or crisp it up. Perhaps break down the systems that we compare Swift to into a few classes...?
+
 Coordination languages and systems such as Linda\cite{LINDA},
 Strand\cite{STRAN} and PCN\cite{PCN} allow composition of
 distributed or parallel components, but usually require the components




More information about the Swift-commit mailing list