[Swift-commit] r3859 - text/parco10submission

Wed Jan 5 15:42:56 CST 2011

Author: dsk
Date: 2011-01-05 15:42:56 -0600 (Wed, 05 Jan 2011)
New Revision: 3859

Modified:
   text/parco10submission/paper.bib
   text/parco10submission/paper.tex
Log:
changes in related work section (7)


Modified: text/parco10submission/paper.bib
===================================================================

--- text/parco10submission/paper.bib	2011-01-05 21:39:39 UTC (rev 3858)
+++ text/parco10submission/paper.bib	2011-01-05 21:42:56 UTC (rev 3859)
@@ -218,7 +218,87 @@
 year={1986}
 }
 
+ at inproceedings{Dryad,
+title={Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks},
+author={Michael Isard and Mihai Budiu and Yuan Yu and Andrew Birrell and Dennis Fetterly},
+booktitle={Proceedings of European Conference on Computer Systems (EuroSys)},
+month={Mar},
+year={2007}
+}
 
+ at inproceedings{DryadLINQ,
+title={{DryadLINQ}: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language},
+author={Yuan Yu and Michael Isard and Dennis Fetterly and Mihai Budiu and Ulfar Erlingsson and Pradeep Kumar Gunda and Jon Currey},
+booktitle={Proceedings of Symposium on Operating System Design and Implementation (OSDI)},
+month={Dec},
+year={2008}
+}
+
+ at article{GEL,
+ author = {Ching Lian, Chua and Tang, Francis and Issac, Praveen and Krishnan, Arun},
+ title = {GEL: Grid execution language},
+ journal = {J. Parallel Distrib. Comput.},
+ volume = {65},
+ issue = {7},
+ month = {July},
+ year = {2005},
+ issn = {0743-7315},
+ pages = {857--869},
+ numpages = {13},
+ url = {http://dx.doi.org/10.1016/j.jpdc.2005.03.002},
+ doi = {http://dx.doi.org/10.1016/j.jpdc.2005.03.002},
+ acmid = {1088525},
+ publisher = {Academic Press, Inc.},
+ address = {Orlando, FL, USA},
+ keywords = {Grid application development, Grid computing, Grid programming, Workflows},
+} 
+
+ at inproceedings{DataFlowShell,
+ author = {Walker, Edward and Xu, Weijia and Chandar, Vinoth},
+ title = {Composing and executing parallel data-flow graphs with shell pipes},
+ booktitle = {Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science},
+ series = {WORKS '09},
+ year = {2009},
+ isbn = {978-1-60558-717-2},
+ location = {Portland, Oregon},
+ pages = {11:1--11:10},
+ articleno = {11},
+ numpages = {10},
+ url = {http://doi.acm.org/10.1145/1645164.1645175},
+ doi = {http://doi.acm.org/10.1145/1645164.1645175},
+ acmid = {1645175},
+ publisher = {ACM},
+ address = {New York, NY, USA},
+ keywords = {coordination languages, data-flow processing, parallel processing},
+} 
+
+ at article{GXPmake,
+author = {Kenjiro Taura and Takuya Matsuzaki and Makoto Miwa and Yoshikazu Kamoshida and Daisaku Yokoyama and Nan Dun and Takeshi Shibata and Choi Sung Jun and Jun'ichi Tsujii},
+title = {Design and Implementation of GXP Make -- A Workflow System Based on Make},
+journal ={IEEE International Conference on eScience},
+isbn = {978-0-7695-4290-4},
+year = {2010},
+pages = {214--221},
+doi = {http://doi.ieeecomputersociety.org/10.1109/eScience.2010.43},
+publisher = {IEEE Computer Society},
+address = {Los Alamitos, CA, USA},
+}
+
+ at article {makeflow,
+   author = {Yu, Li and Moretti, Christopher and Thrasher, Andrew and Emrich, Scott and Judd, Kenneth and Thain, Douglas},
+   affiliation = {University of Notre Dame Department of Computer Science and Engineering South Bend USA},
+   title = {Harnessing parallelism in multicore clusters with the All-Pairs, Wavefront, and Makeflow abstractions},
+   journal = {Cluster Computing},
+   publisher = {Springer Netherlands},
+   issn = {1386-7857},
+   keyword = {Computer Science},
+   pages = {243-256},
+   volume = {13},
+   issue = {3},
+   url = {http://dx.doi.org/10.1007/s10586-010-0134-7},
+   note = {10.1007/s10586-010-0134-7},
+   year = {2010}
+}
 % Items below are from an older paper - retain for the moment in case any are useful here
 
 @article{condor-g,

Modified: text/parco10submission/paper.tex
===================================================================
--- text/parco10submission/paper.tex	2011-01-05 21:39:39 UTC (rev 3858)
+++ text/parco10submission/paper.tex	2011-01-05 21:42:56 UTC (rev 3859)
@@ -1662,10 +1662,9 @@
 
 \end{verbatim}
 
-\section{Comparison to Other Systems}
+\section{Related Work}
 \label{Related}
 
-\katznote{I would change the section name to ``related work''}
 %% As a ``parallel scripting language'', Swift is typically used to
 %% specify and execute scientific ``workflows'' - which we define here as
 %% the execution of a series of steps to perform larger domain-specific
@@ -1774,7 +1773,30 @@
 the knowledge of the whole workflow graph, while in Swift, the
 structure of a workflow is constructed and expanded dynamically.
 
-Swift integrates with the CoG Karajan workflow engine. Karajan
+Drayd~\cite{Dryad} is an  infrastructure for running data-parallel programs on a parallel or distributed system.  In addition to allowing files to be used for passing data between
+tasks (like Swift), it also allows TCP pipes and shared memory FIFOs to be used.
+Dryad tasks are written in C++, while Swift tasks can be written in any language.
+Dryad graphs are explicitly developed by the programmer; Swift graphs are implicit and the programmer doesn't worry about them. A tool called Nebula was originally developed
+above Dryad, but it doesn't seem to be supported currently.  It appears to have been
+used for clusters and well-connected groups of clusters in a single administrative domain,
+unlike Swift supports a wider variety of platforms.  Also related is DryadLINQ~\cite{DryadLINQ},
+which generates Dryad computations from the LINQ extensions to C\#. 
+
+GEL~\cite{GEL} is somewhat similar to Swift.  It defines programs to be run, then
+uses a script to express the order in which they should be run, handling the needed
+data movement and job execution for the user.  The user explicitly
+states what is parallel and what is not, unlike Swift, which determines this
+based on data dependencies.
+
+Walker et al.~\cite{DataFlowShell} have recently been developing extensions to
+BASH that allow a user to define a dataflow graph, including the concepts
+of fork, join, cycles, and key-value aggregation, but just on a single parallel system.
+
+A few groups have been working on parallel and distributed versions of make~\cite{GXPmake, makeflow}.  These tools use the concept of virtual data, where the user defines the processing by which data is created, then calls for the final data product.  The make-like tools determine what processing is needed to get from the existing files to the final product, which includes
+running processing tasks.  If this is run on a distributed system, data movement also must
+be handled by the tools. \katznote{Need to say something about Swift in comparison}
+
+Swift integrates with the CoG Karajan workflow engine~\cite{Karajan}. Karajan
 provides the libraries and primitives for job scheduling, data
 transfer, and grid job submission; Swift adds support for high-level
 abstract specification of large parallel computations, data
@@ -1782,14 +1804,6 @@
 grid sites, and (via Falkon~\cite{Falkon_2008} and CoG coasters)
 \katznote{need to talk about what CoG coasters is vs coasters as previously introduced, or clear up the fact that the previous ``coasters'' didn't talk about CoG.} fast job execution.
 
-\katznote{Re: Dryad
-Dryad allows TCP pipes and shared memory FIFOs for passing data between tasks, unlike Swift.
-(Dryad also allows files-just pointing out a difference).
-Dryad tasks are written in C++ (but not required?)  It also looks more like a component model in some ways.
-Dryad graphs are explicitly developed by the programmer; Swift grafts are implicit and the programmer doesn't worry about them
-Nebula on top of Dryad looks much more similar to Swift  (I don't really know anything about Nebula - is it still supported?  And how is it related to LINQ)
-Is there something about the systems that Dryad is meant for vs those that Swift is meant for?  (i.e. Dryad is optimized for clusters and well-connected groups of clusters in a single administrative domain)
-}
 
 
 \section{Future work}