[Swift-commit] r3369 - text/parco10submission

noreply at svn.ci.uchicago.edu noreply at svn.ci.uchicago.edu
Tue Jun 15 17:26:14 CDT 2010


Author: wozniak
Date: 2010-06-15 17:26:13 -0500 (Tue, 15 Jun 2010)
New Revision: 3369

Modified:
   text/parco10submission/Wozniak.bib
   text/parco10submission/paper.tex
Log:
Fixed references: Compile script output is now clean


Modified: text/parco10submission/Wozniak.bib
===================================================================
--- text/parco10submission/Wozniak.bib	2010-06-15 21:49:55 UTC (rev 3368)
+++ text/parco10submission/Wozniak.bib	2010-06-15 22:26:13 UTC (rev 3369)
@@ -1,5 +1,3 @@
-% This file was created with JabRef 2.5.
-% Encoding: Cp1252
 
 @STRING{CCGRID = {Proc. CCGrid}}
 
@@ -5916,7 +5914,7 @@
   timestamp = {2010/04/02}
 }
 
- at comment{jabref-meta: selector_publisher:1;}
+{jabref-meta: selector_publisher:1;}
 
 @INBOOK{P2P-Grids_2003,
   author = {Geoffrey Fox and Dennis Gannon and Sung-Hoon Ko and
@@ -6014,7 +6012,7 @@
   number = 17,
   year={2004}
 }
- at comment{publisher={Oxford Univ Press}}
+{publisher={Oxford Univ Press}}
 
 @INPROCEEDINGS{GridBatch_2008,
   title = {{GridBatch}: Cloud Computing for Large-Scale
@@ -6042,7 +6040,7 @@
   number = 5,
   year = 2009
 }
- at comment{Pages: 541--551}
+{Pages: 541--551}
 
 @ARTICLE{OSG_2007,
   author = {Ruth Pordes and Don Petravick and Bill Kramer and
@@ -6066,7 +6064,7 @@
   number = 1833,
   year = 2005,
 }
- at comment{pages 1715-1728}
+{pages 1715-1728}
 
 @MISC{RFC:4918_WebDAV_2007,
   author = {{IETF} {Network Working Group}},
@@ -6084,7 +6082,7 @@
   volume = 2,
   year = 2002,
 }
- at comment{pages={43}}
+{pages={43}}
 
 @ARTICLE{ParaView_2001,
   title = {Large-Scale Data Visualization Using
@@ -6096,7 +6094,7 @@
   number = 4,
   year = {2001},
 }
- at comment{pages={34--41},
+{pages={34--41},
 publisher={Published by the IEEE Computer Society}}
 
 @MISC{I2U2_WWW,
@@ -6155,11 +6153,55 @@
   year = 2010
 }
 
- at comment{jabref-meta: selector_publisher:}
+ at MISC{ImageMagick_WWW,
+  title = {{ImageMagick} Project Web Site},
+  url = {http://www.imagemagick.org},
+  year = 2010
+}
 
- at comment{jabref-meta: selector_author:}
+ at INPROCEEDINGS{Strand_1989,
+  title = {{Strand}: {A} practical parallel programming language},
+  author = {Foster, I. and Taylor, S.},
+  booktitle = {Proc. North American Conference on Logic Programming},
+  year = 1989
+}
 
- at comment{jabref-meta: selector_journal:}
+ at ARTICLE{PCN_1993,
+  title = {Productive parallel programming: {T}he {PCN} approach},
+  author = {Foster, I. and Olson, R. and Tuecke, S.},
+  journal = {Scientific Programming},
+  volume = 1,
+  number = 1,
+  year = 1992,
+}
+{pages = {51--66},   publisher={IOS Press}}
 
- at comment{jabref-meta: selector_keywords:}
+ at article{Sawzall_2005,
+  title = {Interpreting the data: {P}arallel analysis with {Sawzall}},
+  author = {Pike, R. and Dorward, S. and Griesemer, R. and Quinlan, S.},
+  journal = {Scientific Programming},
+  volume = {13},
+  number = {4},
+  year = 2005,
+}
+{pages = {277--298},   publisher={IOS Press}}
 
+ at BOOK{BPEL_2006,
+ author = {Juric, Matjaz B.},
+ title = {Business Process Execution Language for Web Services},
+ year = 2006,
+}
+isbn = 1904811817, publisher = {Packt Publishing}
+
+ at INBOOK{Sedna_2007,
+  chapter = {{S}edna: {A} {BPEL}-Based Environment for
+             Visual Scientific Workflow Modeling},
+  title = {Workflows for e-{S}cience},
+  publisher = {Springer},
+  year = 2007,
+  author = {Bruno Wassermann and Wolfgang Emmerich and Ben
+                  Butchart and Nick Cameron and Liang Chen and Jignesh
+                  Patel}
+}
+pages = {18},
+

Modified: text/parco10submission/paper.tex
===================================================================
--- text/parco10submission/paper.tex	2010-06-15 21:49:55 UTC (rev 3368)
+++ text/parco10submission/paper.tex	2010-06-15 22:26:13 UTC (rev 3369)
@@ -103,10 +103,10 @@
 component programs on grids and other parallel platforms, providing
 automated site selection, data management, and reliability.
 
-This paper goes into greater depth than prior publications
-\cite{SWIFTSWF08,SWIFTNNN} in describing the Swift language, how its
-implementation handles large-scale and distributed execution
-environments, and its contribution to distributed parallel computing.
+This paper goes into greater depth than prior
+publications~\cite{Swift_2007} in describing the Swift language, how
+its implementation handles large-scale and distributed execution
+environments, and its contribution to distributed and parallel computing.
 
 \subsection{Rationale}
 
@@ -265,7 +265,7 @@
 components.
 
 For example, the following example lists a procedure which makes use
-of the ImageMagick\cite{ImageMagick} convert command to rotate a
+of the ImageMagick~\cite{ImageMagick_WWW} convert command to rotate a
 supplied image by a specified angle:
 
 \begin{verbatim}
@@ -874,6 +874,7 @@
 change.
 
 \section{Applications}
+\label{Applications}
 
 TODO: two or three applications in brief. discuss both the application
 behaviour in relation to Swift, but underlying grid behaviour in
@@ -1004,8 +1005,10 @@
 alignment is done by the program reslice. Note that variable yR, being
 the output of the first step and the input of the second step, defines
 the data dependencies between the two steps. The pipeline is
-illustrated in the center of Figure~\ref{FMRIFigure2}, while in figure
-\ref{FMRIgraph} we show the expanded graph for a 20-volume run. Each
+illustrated in the center of % Figure~\ref{FMRIFigure2},
+while in figure
+% \ref{FMRIgraph}
+we show the expanded graph for a 20-volume run. Each
 volume comprises an image file and a header file, so there are a total
 of 40 input files and 40 output files. We can also apply the same
 procedure to a run containing hundreds or thousands of volumes.
@@ -1121,6 +1124,7 @@
 \end{verbatim}
 
 \section{Future work}
+\label{Future}
 
 \subsection{Automatic characterisation of site and application behaviour}
 
@@ -1141,9 +1145,9 @@
 GRAM and LRM overhead.
 
 A resource provisioning system such as Falkon\cite{FALKON} or the
-CoG\cite{COG} coaster mechanism developed for Swift can be used to
-ameliorate this overhead, by incurring the allocation overhead once per
-worker node.
+CoG~\cite{CoG_2001} coaster mechanism developed for Swift can be used
+to ameliorate this overhead, by incurring the allocation overhead once
+per worker node.
 
 Both of these mechanisms can be plugged into Swift straightforwardly
 through the CoG provider API.
@@ -1280,37 +1284,36 @@
 The rational and motivation for scripting languages, the
 difference between programming and scripting, and the place of each in
 the scheme of applying computers to solving problems, has been
-laid out previously~\cite{Ousterhout}.
+laid out previously~\cite{Scripting_1998}.
 
-Coordination languages and systems such as Linda\cite{LINDA},
-Strand\cite{STRAN} and PCN\cite{PCN} allow composition of
+Coordination languages and systems such as Linda~\cite{LINDA},
+Strand~\cite{STRAND_1989} and PCN~\cite{PCN_1993} allow composition of
 distributed or parallel components, but usually require the components
 to be programmed in specific languages and linked with the systems;
 where we need to coordinate procedures that may already exist (e.g.,
 legacy applications), were coded in various programming languages and
 run in different platforms and architectures.  Linda defines a set of
 coordination primitives for concurrent agents to put and retrieve
-tuples from a shared data space called a tuple space, which serves as the
-medium for communication and coordination. Strand and PCN use
-single-assignment variables\cite{singleassigment} as coordination mechanism. Like Linda,
-Strand and PCN are data driven in the sense that the action of sending
-and receiving data are decoupled, and processes execute only when data
-are available. The Swift system uses similar mechanism called future
-[16] for workflow evaluation and scheduling.
+tuples from a shared data space called a tuple space, which serves as
+the medium for communication and coordination. Strand and PCN use
+single-assignment variables as coordination
+mechanism. Like Linda, Strand and PCN are data driven in the sense
+that the action of sending and receiving data are decoupled, and
+processes execute only when data are available.
 
-MapReduce\cite{MAPREDUCE} also provides a programming models and a runtime system
-to support the processing of large scale datasets. The two key
-functions \emph{map} and \emph{reduce} are borrowed from functional language: a
-map function iterates over a set of items, performs a specific
-operation on each of them and produces a new set of items, where a
-reduce function performs aggregation on a set of items. The runtime
-system automatically partitions input data and schedules the execution
-of programs in a large cluster of commodity machines. The system is
-made fault tolerant by checking worker nodes periodically and
-reassigning failed jobs to other worker nodes. Sawzall\cite{sawzall} is an
-interpreted language that builds on MapReduce and separates the
-filtering and aggregation phases for more concise program
-specification and better parallelization.
+MapReduce~\cite{MapReduce_2004} also provides a programming models and
+a runtime system to support the processing of large scale
+datasets. The two key functions \emph{map} and \emph{reduce} are
+borrowed from functional language: a map function iterates over a set
+of items, performs a specific operation on each of them and produces a
+new set of items, where a reduce function performs aggregation on a
+set of items. The runtime system automatically partitions input data
+and schedules the execution of programs in a large cluster of
+commodity machines. The system is made fault tolerant by checking
+worker nodes periodically and reassigning failed jobs to other worker
+nodes. Sawzall\cite{Sawzall_2005} is an interpreted language that
+builds on MapReduce and separates the filtering and aggregation phases
+for more concise program specification and better parallelization.
 
 Swift and MapReduce/Sawzall share the same goals to providing a
 programming tool for the specification and execution of large parallel
@@ -1342,35 +1345,35 @@
 
 \end{itemize}
 
-BPEL\cite{BPEL} is a Web Service-based standard that specifies how a set of
-Web services interact to form a larger, composite Web Service. BPEL is
-starting to be tested in scientific contexts\cite{BPELScience}. While BPEL can
-transfer data as XML messages, for very large scale datasets, data
-exchange must be handled via separate mechanisms. In BPEL 1.0
-specification, it does not have support for dataset
-iterations. According to Emmerich et al, an application with
-repetitive patterns on a collection of datasets could result in a BPEL
-document of 200MB in size, and BPEL is cumbersome if not impossible to
-write for computational scientists\cite{BPEL2}. Although BPEL can use XML
-Schema to describe data types, it does not provide support for mapping
-between a logical XML view and arbitrary physical representations.
+BPEL~\cite{BPEL_2006} is a Web Service-based standard that specifies
+how a set of Web services interact to form a larger, composite Web
+Service. BPEL is starting to be tested in scientific contexts. While
+BPEL can transfer data as XML messages, for very large scale datasets,
+data exchange must be handled via separate mechanisms. In BPEL 1.0
+specification, it does not have support for dataset iterations. An
+application with repetitive patterns on a collection of datasets could
+result in large, repetitive BPEL documents~\cite{Sedna_2007}, and BPEL
+is cumbersome if not impossible to write for computational
+scientists. Although BPEL can use XML Schema to describe data types,
+it does not provide support for mapping between a logical XML view and
+arbitrary physical representations.
 
-DAGMan\cite{DAGman} provides a workflow engine that manages Condor jobs
-organized as directed acyclic graphs (DAGs) in which each edge
-corresponds to an explicit task precedence. It has no knowledge of
-data flow, and in distributed environment works best with a
-higher-level, data-cognizant layer. It is based on static workflow
+DAGMan~\cite{Condor_Experience_2004} provides a workflow engine that
+manages Condor jobs organized as directed acyclic graphs (DAGs) in
+which each edge corresponds to an explicit task precedence. It has no
+knowledge of data flow, and in distributed environment works best with
+a higher-level, data-cognizant layer. It is based on static workflow
 graphs and lacks dynamic features such as iteration or conditional
 execution, although these features are being researched.
 
-Pegasus\cite{Pegasus} is primarily a set of DAG transformers. Pegasus planners
-translate a workflow graph into a location specific DAGMan input file,
-adding stages for data staging, inter-site transfer and data
-registration. They can prune tasks for files that already exist,
-select sites for jobs, and cluster jobs based on various
-criteria. Pegasus performs graph transformation with the knowledge of
-the whole workflow graph, while in Swift, the structure of a workflow
-is constructed and expanded dynamically.
+Pegasus~\cite{Pegasus_2005} is primarily a set of DAG
+transformers. Pegasus planners translate a workflow graph into a
+location specific DAGMan input file, adding stages for data staging,
+inter-site transfer and data registration. They can prune tasks for
+files that already exist, select sites for jobs, and cluster jobs
+based on various criteria. Pegasus performs graph transformation with
+the knowledge of the whole workflow graph, while in Swift, the
+structure of a workflow is constructed and expanded dynamically.
 
 Swift integrates the CoG Karajan workflow engine. Karajan provides the
 libraries and primitives for job scheduling, data transfer, and Grid
@@ -1380,6 +1383,7 @@
 (via Falkon and CoG coasters) fast job execution.
 
 \section{Conclusion}
+\label{Conclusion}
 
 Our experience reinforces the belief that Swift plays an important
 role in the family of programming languages. Ordinary scripting




More information about the Swift-commit mailing list