[Swift-commit] r3885 - text/parco10submission

Thu Jan 6 23:48:18 CST 2011

Author: wilde
Date: 2011-01-06 23:48:18 -0600 (Thu, 06 Jan 2011)
New Revision: 3885

Modified:
   text/parco10submission/paper.tex
Log:
Resolved most katznotes and hatnotes. Added a mikenote to emphasize the uniqueness of the parallel model.

Modified: text/parco10submission/paper.tex
===================================================================

--- text/parco10submission/paper.tex	2011-01-07 05:05:50 UTC (rev 3884)
+++ text/parco10submission/paper.tex	2011-01-07 05:48:18 UTC (rev 3885)
@@ -364,8 +364,8 @@
 invocation.  Swift scripts similarly declare all output files that results from program invocations.
 This enables Swift to provide distributed, location-independent execution of external application programs.
 
-The Swift parallel execution model is based on two concepts that are applied uniformly throughout the language. First, every Swift data element behaves like a \emph{future}. By ``data element'', we mean both the named variables exposed to Swift within a function's environment, such as its local variables, parameters, and returns, and the individual elements of array and structure collections. Second, all expressions in a Swift program are conceptually executed in parallel. Expressions (including function evaluations) wait for input values when they are required, and then set their result values as their computation proceeds. These fundamental concepts are discussed in more detail below.
-
+The Swift parallel execution model is based on two concepts that are applied uniformly throughout the language. First, every Swift data element behaves like a \emph{future}. By ``data element'', we mean both the named variables within a function's environment, such as its local variables, parameters, and returns, and the individual elements of array and structure collections. Second, all expressions in a Swift program are conceptually executed in parallel. Expressions (including function evaluations) wait for input values when they are required, and then set their result values as their computation proceeds. These fundamental concepts are discussed in more detail below.
+\mikenote{This concept is a major highlight of the swift programming model - I meant to highlight it under "Execution model" but did not. We should do so.}
 % can be thought of as a massively-parallel lazy (ie, on-demand, or just in time) evaluation - say later on?
 
 \subsection{Data model}
@@ -399,27 +399,31 @@
 indices, but are sparse.
 Both types of collections can contain members of atomic or collection types. Structures contain a finite number of elements. Arrays contain a varying number of elements. Structures and arrays can both recursively reference other structures and arrays in addition to atomic values. Arrays can be nested to provide multi-dimensional indexing.
 
-Due to the dynamic, highly parallel nature of Swift, its arrays have no notion of size. Array elements can be set as a script's execution progresses. The number of elements set increases monotonically. An array is considered ``closed'' when no further statements that set an element of the array can be executed. This state is recognized at run time by information obtained from compile-time analysis of the script's call graph. Also, since all data elements have single-assignment semantics, no garbage collection issues arise. \katznote{does this follow? garbage collection removed variables that are no longer needed - I don't see how single assignment helps here.}
-\mihaelnote{I think we should not mention the garbage collection issue. In fact, we don't and we should implement
-garbage collection at the "dual" level (i.e., clean temp files) as well as remove unused futures from memory}
+Due to the dynamic, highly parallel nature of Swift, its arrays have no notion of size. Array elements can be set as a script's execution progresses. The number of elements set increases monotonically. An array is considered ``closed'' when no further statements that set an element of the array can be executed. This state is recognized at run time by information obtained from compile-time analysis of the script's call graph. 
 
+%Also, since all data elements have single-assignment semantics, no garbage collection issues arise. \katznote{does this follow? garbage collection removed variables that are no longer needed - I don't see how single assignment helps here.}
+%\mihaelnote{I think we should not mention the garbage collection issue. In fact, we don't and we should implement
+%garbage collection at the "dual" level (i.e., clean temp files) as well as remove unused futures from memory}
+% Mike: I mentioned GC as it pertains to structures and arrays: since swift is single-assignment, structures and arrays can never get de-referenced and thus dont need to be GC'ed - *I think*. But I can see that internal objects like futures should be, and given that they dont, its best to steer clear of this issue for now.
+
 Variables that are declared to be file references
 are associated with a \emph{mapper}, which defines (often through a dynamic lookup process) the
 data files that are to be mapped to the variable. Array and structure elements that are declared to be file references are similarly mapped.
 
-Mapped type and collection \katznote{I don't know what composite means here}\mihaelnote{changed "composite type" to "collection type" as introduced earlier}
+Mapped type and collection 
 type variable declarations can be annotated with a
 \emph{mapping} descriptor that specify the file(s) that are to be mapped to the Swift data element(s).
 
 For example, the following line declares a variable named \verb|photo| of
-type \verb|image|. Since image is a fileRef type \katznote{how do I know this? And,  should ``fileRef'' have been defined 2 paragraphs ago?}, it additionally declares that the
+type \verb|image|. Since image is a mapped file type, it additionally declares that the
 variable refers to a single file named \verb|shane.jpeg|
 
+%\katznote{how do I know this? And,  should ``fileRef'' have been defined 2 paragraphs ago?}
 \begin{verbatim}
    image photo <"shane.jpeg">;
 \end{verbatim}
 
-We can declare {\tt image} to be an \emph{external file type}: \katznote{is this different from a fileRef type?}
+We can declare {\tt image} to be an \emph{mapped file type}:
 
 \begin{verbatim}
    type image {};
@@ -458,7 +462,7 @@
    im = sn.i;
 \end{verbatim}
 
-\katznote{please check the above - I changed a couple of variables so ``i'' wasn't used twice for different things in the same example.}
+%\katznote{please check the above - I changed a couple of variables so ``i'' wasn't used twice for different things in the same example.}
 
 \subsection{Execution model}
 
@@ -620,10 +624,10 @@
 \end{description}
 }
 
-\subsection{Ordering of execution and implicit parallelism}
+\subsection{Implicit parallelism}
 \label{ordering}
 
-\mikenote{Rename this as Parallelism model?; stress and show how highly parallel the model is - the idea that the workflow is fully expanded but throttled.}
+%\mikenote{Rename this as Parallelism model?; stress and show how highly parallel the model is - the idea that the workflow is fully expanded but throttled.}
 
 Since all variables and collection elements are single-assignment,
 %they can be assigned a value at most once during the execution of a script.
@@ -907,7 +911,7 @@
 portability), run in in any particular order with respect to other
 application invocations in a script (except those implied by data
 dependency), or that their working directories will or will not be
-cleaned up after execution. \katznote{say something about apps should not cause side-effects?}
+cleaned up after execution. In addition, applications should should strive to avoid side-effects which could both limit their location-independence and the determinism (either actual or de-facto) of the overall results of Swift script that call them.
 
 Consider the following \verb|app| declaration for the \verb|rotate|
 function:
@@ -1061,14 +1065,8 @@
 
 In such a case, Swift provides a \emph{restart log} that encapsulates
 which function invocations have been successfully completed.
-\mikenote{What manual interv. and why???}
-\katznote{Maybe ignore this, and just say: A subsequent Swift run may be started
+A subsequent Swift run may be started
 with this restart log; this will avoid re-execution of already
-executed invocations.}
-After
-appropriate manual intervention,
-a subsequent Swift run may be started
-with this restart log; this will avoid re-execution of already
 executed invocations.
 
 A different class of failure is when jobs are submitted to a site but
@@ -1589,8 +1587,7 @@
 scheduling Coasters workers using the standard job submission
 techniques and employing an internal IP network.
 
-\mikenote{In order to achieve automatic parallelization in Swift, instead of using thunks (i.e., suspended computations), which yield lazy
-evaluation, we employ futures, which result in eager parallelism. In this process, we trade the ability to efficiently deal with infinite structures for the ability to minimize computation time. It must, however, be noted that a middle ground exists: lazy futures (futures whose computation is delayed until a value is first needed).} \katznote{this is very confusing to me - it's mixing too many concepts and overloading lazy and eager.}
+In order to achieve automatic parallelization in Swift, we ubiquitously employ futures and lightweight threads, which result in eager and massive parallelism but which has a large cost in terms of space and internal object management. We are exploring several alternatives to optimize this tradeoff and increase Swift scalability to ever larger task graphs. The solution space here includes ``lazy futures (whose computation is delayed until a value is first needed)'' and distributed task graphs with multiple, distributed evaluation engines running on separate compute nodes.
 
 \subsection{Filesystem access optimizations}