[Swift-commit] r2396 - text/hpdc09submission

noreply at svn.ci.uchicago.edu noreply at svn.ci.uchicago.edu
Mon Jan 5 14:22:25 CST 2009


Author: wilde
Date: 2009-01-05 14:22:25 -0600 (Mon, 05 Jan 2009)
New Revision: 2396

Modified:
   text/hpdc09submission/paper.latex
Log:
Added initial draft of Introduction. Needs much trimming to avoid overlap with section 2. "Rationale" is a placeholder at the moment.



Modified: text/hpdc09submission/paper.latex
===================================================================
--- text/hpdc09submission/paper.latex	2009-01-05 18:59:00 UTC (rev 2395)
+++ text/hpdc09submission/paper.latex	2009-01-05 20:22:25 UTC (rev 2396)
@@ -59,8 +59,194 @@
 
 \section{Introduction}
 
-TODO
+Swift is a scripting language designed for composing ordinary
+application programs (serial or parallel) into distributed,
+parallelized, applications. It can execute scripts that perform tens
+to hundreds of thousands of program invocations on highly parallel
+resources, and its design is intended to scale to runs of many
+millions of invocations.
 
+Swift's purpose is to enable ``loosely coupled scripting'' in a
+convenient, powerful fashion. It is intended to serve as a higher
+level framework for composing parallel pipelines of other programs and
+scripts - much like using a Makefile to encapsulate the compilation
+process, Swift puts a data-driven ``make-like'' wrapper around the
+execution of programs (which can themselves be scripts written in any
+other scripting language, or binary executables).
+
+Swift is data-oriented: the core of its programming model is to
+encapsulate the invocation of ``ordinary programs'' - technically, POSIX
+\emph{exec()} operations - in a manner that explicitly specifies the data
+objects (files) that are the inputs and outputs of each program
+invocation. This formal but simple specification of data inputs and
+outputs enables Swift to provide four critical features not provided
+by scripting languages like Perl, or Python, ``shells'', or Tcl:
+
+\begin{itemize}
+\item It can provide location transparent execution: automatically
+selecting a location (i.e. a specific distributed system) for a given
+program invocation
+\item It can automatically parallelize (and throttle) the execution flow
+of program invocations, executing invocations that have no data
+dependencies in parallel.
+\item It can record the provenance of derived data objects (and related
+caller-callee information)
+\item The Swift execution engine records the progress of a script's
+execution, so if interrupted or terminated, it can be restarted from
+the point of interruption, without re-executing any work that was
+logged as successfully completed.
+\item Swift variables can represent references to files, which are ten
+passed to application programs to operate on.
+\end{itemize}
+
+In the rest of this section, we provide an overview of Swift's main
+concepts. These concepts are elaborated, with examples, in subsequent
+sections. [Note: we will need to adjust between how much to specify
+here, and how mush to state just before each construct is introduced.
+
+\emph{The term ``workflow''.} Because Swift serves as a ``parallel scripting
+language'', it is typically used to specify and execute scientific
+``workflows'' - which we define here as the pattern of execution of a
+series of steps to perform larger domain-specific tasks. We use the
+term workflow as defined by (Taylor et. al. 2006). So we often call a
+Swift script a workflow.
+
+\emph{Dataset typing and mapping model}. Swift provides a high level
+representation of collections of data and how those collections are to
+be processed by component programs. It provides a structured data-type
+model for representing collections of files and directories that are
+passed to Swift procedures, and a mapping model to convert between the
+physical representation of data on storage systems and the logical
+representation of data structures in the abstract swift programming
+model.
+
+Swift itself does not specify how the physical data environment is
+implemented. This specification is instead left up to a set of
+mappers.
+
+\emph{Execution of atomic functions}. Underlying this is an
+implementation to execute scripts on grid and other platforms,
+providing built-in site selection, data management and
+reliability. Swift scripts can be tested on a single local
+workstation. The same script can then be executed on a cluster, one o
+more grids of clusters, and on large scale parallel supercomputers
+such as the Sun Constellation (ref) or the IBM Blue Gene / P.
+
+Swift programs can also span environments: ...explain
+
+Swift is implemented by compiling to a Karajan program, which provides
+several benefits. A notable benefit visible to users is that of
+providers. This enbles the swift execution model to be extended by
+adding new data providers and job execution providers. This is
+explained in more detail in section X: Swift Implementation.
+
+\emph{Minimalist nature of Swift}. As a scripting language, Swift is
+intentionally designed to be a sparse, minimal language. Its primary
+purpose is to coordinate, throttle, and sequence the execution of
+other programs. As such, it has only a very limited set of data types,
+operators, and built-in functions.
+
+We believe strongly in, and our experience reinforces, the principle
+that Swift - or languages like it - play an important role in the
+family of programming languages. Ordinary scripting languages provide
+the constructs for manipulating files and typically contain rich
+operators, primitives and libraries for large classes of useful
+operations such as string processing, math operations, internet and
+file operations.
+
+Swift programs typically contain very little code to manipulate data
+directly.
+
+\emph{Function composition and Libraries}. Swift programs are composed
+starting with ``atomic'' functions, and then higher level functions are
+composed as pipelines of sub-functions. The basic structure of a
+composite function is a graph of calls to other functions.
+
+Recursive function calls [are / are not] supported  [Relevant? FIXME]
+
+\emph{Variables, single assignment and data flow}. Swift variables hold
+primitive values, or references to datasets, which are files or
+collections of files. Variables are ``single assignment'', which is the
+basis for Swift's model of function chaining. Functions are executed
+when their variables are all set. Functions are chained by specifying
+that an output variable of one function is passed as the input
+variable to the second function.
+
+This dataflow model means that within a swift program, functions are
+executed when their data is available - which is not necessarily in
+source-code order.
+
+Swift function arguments are lists of such variables, and return lists
+of such variables. Swift does not yet have a notion of
+libraries. Swift programs execute as if all functions called in the
+script are present in a single logical source file and are thus passed
+to the Swift virtual machine all at once.
+
+\emph{Rationale}. Why do we need Swift? Why create yet another scripting
+language for the execution of application programs when so many exist?
+Swift was developed to create a higher-level language that focuses not
+on the details of executing sequences or ``pipelines'' of programs, but
+rather on specific issues that arise from scale. These issues,
+however, once identified, seem to equally well apply to, and benefit
+the execution of, application pipelines that are not large-scale and
+not necessarily distributed.
+
+\emph{FIXME: This section needs much polishing/condensing.}
+
+Our motivation for developing Swift is based on the following premises:
+
+Scale (at least in the grid) requires distribution of execution among
+many computers (``resources''), and hence a ``grid'' approach. Even if a
+single large parallel resource suffices, users won't always have
+access to the same one: resources are scarce, and users often need or
+want to utilize whatever resource happened to be free at the moment
+when they need to perform intensive computation.
+
+While many application needs involve the execution of a single large
+and perhaps message-passing parallel app, many others require the
+coupling or orchestration of large numbers of application invocations:
+either many invocations of the same app, or many invocations of
+sequences and patterns of several apps. In this model, existing apps
+become like functions in programming, and users typically need to
+execute many of them.
+
+Ousterhout in (Ousterhout 1998) eloquently laid out the rational and
+motivation for scripting languages. As the creator of Tcl [ref], he
+described here the difference between programming and scripting, and
+the place of each in the scheme of applying computers to solving
+problems.
+
+What's missing in current scripting languages is sufficient
+specification and encapsulation of inputs to, and outputs from, a
+given application, such that an execution environment could
+automatically make remote execution transparent.
+
+In a sense, Swift adds to scripting what RPC adds to programming: by
+formalizing the inputs and outputs of ``applications-as-functions'', it
+provides a way to make the remote - and hence parallel - execution of
+applications fairly transparent.
+
+It's useful to draw an analogy here between Swift and make. Just as a
+``makefile'' -  which in a sense is can be considered a script or program
+(or certainly a ``recipe'') - is a specification of how to derive an
+application program, a Swift script is a recipe for how to produce a
+set of data. Unlike make, in which case the derived product is only
+produced once, in Swift, derived datasets are repetitively derived.
+
+Could existing programs execute Swift calls through a library
+approach?  The answer to this is certainly ``yes'', and we examine this
+in more detail in section X.
+
+Usage of Swift. Swift is achieving growing use on a variety of science
+problems.
+
+... FIXME: provide details.
+
+In the remainder of this paper, FIXME ... we present the language,
+details of the implementation, application use-cases and ongoing
+research.
+
+
 \section{The SwiftScript language}
 
 \subsection{Overview of the language}




More information about the Swift-commit mailing list