[Swift-commit] r3745 - text/parco10submission

Tue Dec 7 19:21:35 CST 2010

Author: wilde
Date: 2010-12-07 19:21:35 -0600 (Tue, 07 Dec 2010)
New Revision: 3745

Added:
   text/parco10submission/ResponseToReviews.txt
Modified:
   text/parco10submission/paper.tex
Log:
minor initial edits after PARCO review

Added: text/parco10submission/ResponseToReviews.txt
===================================================================

--- text/parco10submission/ResponseToReviews.txt	                        (rev 0)
+++ text/parco10submission/ResponseToReviews.txt	2010-12-08 01:21:35 UTC (rev 3745)
@@ -0,0 +1,287 @@
+
+---------- Forwarded message ----------
+Date: Thu, 14 Oct 2010 13:29:09
+From: Parallel Computing <parco at elsevier.com>
+To: wozniak at mcs.anl.gov
+Cc: worleyph at ornl.gov, Rupak.Biswas at nasa.gov, Rajesh.Nishtala at gmail.com,
+     loliker at lbl.gov
+Subject: Your Submission PARCO-D-10-00054
+
+Ms. Ref. No.:  PARCO-D-10-00054
+Title: Swift: A language for distributed parallel scripting
+Parallel Computing
+
+Dear Justin,
+
+Reviewers have now commented on your paper. You will see that they are advising that you revise your manuscript. If you are prepared to undertake the work required, I would be pleased to reconsider my decision.
+
+For your guidance, reviewers' comments are appended below.
+
+If you decide to revise the work, please submit a list of changes or a rebuttal against each point which is being raised when you submit the revised manuscript.
+
+Your revision should be submitted before Nov 11, 2010.
+
+To submit a revision, please go to http://ees.elsevier.com/parco/ and login as an Author. On your Main Menu page is a folder entitled "Submissions Needing Revision". You will find your submission record there.
+
+When submitting your revised manuscript, please ensure that you upload the source files (e.g. (La)TeX or Word; (La)TeX files as 1 comprehensive file and each figure and table separately). Uploading a PDF file at this stage will create delays should your manuscript be finally accepted for publication. If your revised submission does not include the source files, we will contact you to request them.
+
+Yours sincerely,
+
+Patrick Haven Worley, Ph.D.
+Associate Editor
+Parallel Computing
+
+**********************************************************
+
+Reviewers' comments:
+
+Dear Authors,
+         Thank you for your submission to the Parallel Computing special issue on Emerging Programming Paradigms for Large-Scale Scientific Computing.  As there were several concerns about the manuscript, we ask that you update your submission to address the reviewer feedback, and include a point by point response to the reviewer comments.
+
+With kind regards,
+ParCo Guest Editors:
+Rupak Biswas, Rajesh Nishtala,  Lenny Oliker
+
+
+****************************************************************************************
+
+Reviewer #1: The paper describes Swift, a scripting language for distributed parallel applications.  I have mixed feelings about this paper. I can see the usefulness of a scripting language like Swift.  The problems Swift was designed to address are certainly interesting and important.
+
+On the other hand, I don't see much scientific merit in the paper.  The paper reads more like a Swift user manual than a scientific paper.  For the language design, the only thing that might be novel is the notion of mapped type, but I consider it to be quite minor.  I also don't see any new ideas in the data-flow dependency based execution model.
+
+>>>  Novelty and scientific merit:
+
+<<<
+
+For the distributed execution, one important missing piece is performance evaluation.  Data locality is very important for data-intensive applications.  As I understand it, data have to be moved in and out the clusters.  So, understanding the cost of scheduling and data transfer is very important to validate the Swift design.  Perhaps, it was
+published somewhere else, but it would be nice to discuss it in this paper.   Here are some more detailed comments:
+
+>>> Performance evaluation:
+
+<<<
+
+1.        Swift uses restart log to reuse the results of successfully completed components.  The paper mentioned "appropriate manual intervention".  This seems to be something you can almost completely automate.  Based on my experiences with large-scale and long running applications, this can be very useful.
+
+>>> Automation of restart
+
+<<<
+
+2.        Swift reduces job submission cost using clustering.  It is not clear to me if a subgraph can be batched together by your clustering technique.  This obviously requires a little bit of analysis of the data-flow graph to do it properly.  But it could be quite useful to achieve better data locality.
+
+>>> Clustering
+
+<<<
+
+3.        In terms of programming models, modern systems such as Microsoft's DryadLINQ and Google's FlumeJava successfully integrate data-flow constructs into state of the art programming languages (C# and Java).  This integration approach is quite nice and powerful. It would be nice if the authors can compare Swift with these two systems.
+
+>>> Comparison to Dryad and FlumeJava
+
+<<<
+
+Reviewer #2: The paper presents a powerful high-level scripting language, SwiftScript, for performing a massive number of tasks/jobs coupled with collections of file-based data. It describes details of the language syntax and semantics, which is based on data flow with support of arrays and procedures, and discusses the implementation and several use cases with the main focus on a grid.  Although a similar work was published before, this paper gives elaborated summary of the technical details, which is useful for general audience.
+
+The current work is an attempt to overcome issues in dealing with a massive amount of simulations and data either on a grid or on a massive parallel system, such as large data files, various dependencies, fault tolerance. The framework simplifies the process in the scripting design by an "app" interface to the actual command line for program execution and a "mapper" to map data structures with actual files without the need of going into details of how the program actually gets executed and without explicitly specifying the types of the files involved. The paper also describes the implementation that is built on top of the CoG Karajan workflow engine with built-in fault tolerance and demonstrates the usefulness of the system with a number of examples.
+
+The primary focus of the work is for the grid applications, although the paper has indicated the applicability to other systems, such as more tightly coupled massive computing systems. 
+
+For those who might not be familiar with the Karajan language, it would be useful to add a reference to the related work.
+
+>>> Reference to Karajan:  <<<
+
+It would be helpful to include some discussion on the "auto-parallelization" capability (achieved via data flow analysis?).
+
+>>> auto-parallelization
+
+<<<
+
+Out of the four application examples presented, two of the cases (4.3 and 4.4) do not contain enough details to support the discussion; deleting the two examples should not affect the clarity of the paper.
+
+>>> Examples: clarify or delete 
+
+<<<
+
+It would be helpful to elaborate more in example 4.2 on how each task/job gets scheduled onto Ranger nodes or how Swift interacts with the local batch job scheduler, which would in turn help audience understand better how SwiftScript could be used for a certain class of applications on a more tightly coupled massive computing environment (such as
+parameter studies).
+
+>>> 4.2 - task scheduling on Ranger 
+
+<<<
+
+There are a few minor typos in the manuscript.
+For instance, on Page 7 section 2.2, in the code snippet:
+
+    output[ix] = rotate(frames, 180);
+
+Should "frames" be "f" in this case?
+
+>>> typo
+
+<<<
+
+Reviewer #3: This is an interesting paper aimed at the practical problem of
+managing a large ensemble of computations with possible dependencies
+among the tasks.  The general outline and structure of the paper is
+fine.  There are a number of small errors which should have been
+caught by proofreading the manuscript.
+
+>>> typos
+
+<<<
+
+The most substantive comment I have concerns examples 4.3 and 4.4.  I
+believe that, as they stand, these are too sparsely annotated to add
+significant value to the paper.  So I suggest either documenting these
+examples more carefully, so that a reader new to Swift can understand
+them, or else omit these examples entirely.
+
+>>> examples 4.3 and 4.4: annotate or delete
+
+<<<
+
+Further comments:
+
+1. Sometimes "Swift" is used and sometimes "SwiftScript" is used.  Is
+there any reason to have two different terms?  From the website it
+appears that "SwiftScript" referred to an earlier version of the
+language, so perhaps "SwiftScript" should be replaced by "Swift"
+everywhere.
+
+>>> <<<
+
+2. It's a bit awkward that "single assignment" is used in section 2.1
+but not defined until section 2.3.
+
+>>> <<<
+
+3. The example on p. 5 appears erroneous, it seems the rotate
+procedure should have an angle input in addition to the image input
+(this is corrected on p.6).
+
+>>> <<<
+
+4. In section 2.2, should rotate be invoked as "rotate(f, 180)" instead
+of "rotate(frames, 180)"?
+
+>>> <<<
+
+5. There are some acronyms in the paper that should, I believe, be
+defined.  In some cases the acronym can be omitted, for example I
+believe RDBMS is only used once so there is really no need for the
+acronym.  Other acronyms appear in Figure 1: CoG, OSG, AWS; these
+probably should be defined.  Other acronyms: GRAM, fMRI (and FMRI,
+cf. Fig. 2, which should probably be fMRI).
+
+>>> <<<
+
+6. All these appear: "stage in", "stage-in", "stagein".  Please be
+consistent (similarly for stage out).
+
+>>> <<<
+
+7. The mysterious numbers '12, 1000, 1000, "81 3 3"' in example 4.1
+might merit an explanation.
+
+>>> <<<
+
+8. The Figure numbering in the text of section 4.1 needs correction.
+
+>>> <<<
+
+9. In the text of section 4.1, I believe the variables or, iv,
+direction, overwrite, and ov should be in a different font for clarity.
+In the text of section 4.1 "n x n" should be in a different font and
+"n2" is clearly wrong.
+
+>>> <<<
+
+10. Section 4.2, what is an "emblem experiment"?
+
+>>> <<<
+
+11. The margins in sections 4.2 and 4.4 need fixing as some lines
+run completely off the page.
+
+>>> <<<
+
+12. "Karajan" is mentioned several times, there really should be short
+definition of it and a reference to it in the bibliography.
+
+>>> <<<
+
+13. Many of the references look incomplete; journal references really
+should have page numbers, some references are missing a year.
+Reference 8 severely mangles "Bresnahan".
+
+>>> <<<
+
+14. It is somewhat disingenuous to refer to thousands and millions of
+cores (section 6.2).  All systems I know of are managed as nodes,
+where each node might have 8 or 12 or 16 cores.  This is no essential
+simplification, I know, but why not be honest and refer to nodes
+instead of cores?
+
+>>> <<<
+
+15. "Swift is implemented by compiling to a Karajan program".  Why
+would one want to _compile_ a scripting language?  It seems more natural
+(to this naive reader) to have an interpreter or a translator.
+
+>>> <<<
+
+16. The coaster idea looks quite interesting, could this be expanded,
+or could an example with coasters be constructed?
+
+>>> <<<
+
+17. Table 1, 1st row, 3rd column: should it be f->data.txt instead of
+f->file.txt?
+
+>>> <<<
+
+18. There are many (too many to list) typos, missing words, mistakes
+such as "en queued" instead of "enqueued", subject/verb mismatches of
+number and/or tense. A careful proofreading is sorely needed.
+
+
+>>> <<<
+
+====  Other improvement notes
+
+mention futures in parco paper, show them visually to show fine grain
+
+mention habanero  (c and java) and other fresh stack languages (x10)
+compare to GEL - from SIngapore
+
+mention csp bsp sim and diff to mpi (IPO)
+
+Why a new model?
+
+examine determinism
+
+examine language vs library
+
+examine how it builds on karajan
+
+
+---
+
+Innov: fine grained parallelism; no need for flow analysis;
+sep of concerns: how throttling and site mgmt are isolated
+
+How we can manage data locality
+
+How restart is more transparent than it sounds here
+
+Fine: how work takes off before a proc returns
+
+Add table of critical benchmarks on multi sys types
+
+How complex flows are easily composed
+
+How types and mappers encapsulate complexity
+
+2.3 order of exec: show more complex patterns here or later
+
+Second 2 is the part that reads like a LRM; make it more interesting
+
+2.5 don't say marker types

Modified: text/parco10submission/paper.tex
===================================================================
--- text/parco10submission/paper.tex	2010-12-06 22:04:55 UTC (rev 3744)
+++ text/parco10submission/paper.tex	2010-12-08 01:21:35 UTC (rev 3745)
@@ -734,13 +734,13 @@
 multiple sites, it is necessary to choose between the available sites.
 The Swift \emph{site selector} achieves this by maintaining a score for
 each site which determines the load that Swift will place on that site.
-As a site is successful in executing jobs, this score will be increased
-and as the site is uncsuccessful, this score will be decreased. In
-addition to selecting between sites, this mechanism provides some
+As a site succeeds in executing jobs, this score is increased
+and as job executions at a site fail, this score is decreased. In
+addition to selecting between sites, this mechanism provides a measure of
 dynamic rate limiting if sites fail due to overload~\cite{FTSH_2003}.
 
 This provides an empirically measured estimate of a site's ability to
-bear load, distinct from more static information elsewhere published.
+bear load, distinct from more static information published elsewhere.
 In part, this is due to unreliable and incomplete published information
 (for example, site policies restricting (for example) job counts are
 often not published in usable form); in addition, that site's ability
@@ -774,8 +774,11 @@
 will fail resulting ultimately in the entire script failing.
 
 In such a case, Swift provides a \emph{restart log} which encapsulates
-which procedure invocations have been successfully completed. After
-appropriate manual intervention, a subsequent Swift run may be started
+which procedure invocations have been successfully completed. 
+%%%%%% What manual interv. and why???
+After
+appropriate manual intervention, 
+a subsequent Swift run may be started
 with this restart log; this will suppress re-execution of already
 executed invocations but otherwise allow the script to continue.