[Swift-commit] r4149 - text/parco10submission

Mon Feb 28 13:32:27 CST 2011

Author: wozniak
Date: 2011-02-28 13:32:27 -0600 (Mon, 28 Feb 2011)
New Revision: 4149

Added:
   text/parco10submission/ResponseToReviews1.txt
Log:
Rename


Copied: text/parco10submission/ResponseToReviews1.txt (from rev 4137, text/parco10submission/ResponseToReviews.txt)
===================================================================

--- text/parco10submission/ResponseToReviews1.txt	                        (rev 0)
+++ text/parco10submission/ResponseToReviews1.txt	2011-02-28 19:32:27 UTC (rev 4149)
@@ -0,0 +1,374 @@
+
+---------- Forwarded message ----------
+Date: Thu, 14 Oct 2010 13:29:09
+From: Parallel Computing <parco at elsevier.com>
+To: wozniak at mcs.anl.gov
+Cc: worleyph at ornl.gov, Rupak.Biswas at nasa.gov, Rajesh.Nishtala at gmail.com,
+     loliker at lbl.gov
+Subject: Your Submission PARCO-D-10-00054
+
+Ms. Ref. No.:  PARCO-D-10-00054
+Title: Swift: A language for distributed parallel scripting
+Parallel Computing
+
+Dear Justin,
+
+Reviewers have now commented on your paper. You will see that they are advising that you revise your manuscript. If you are prepared to undertake the work required, I would be pleased to reconsider my decision.
+
+For your guidance, reviewers' comments are appended below.
+
+If you decide to revise the work, please submit a list of changes or a rebuttal against each point which is being raised when you submit the revised manuscript.
+
+Your revision should be submitted before Nov 11, 2010.
+
+To submit a revision, please go to http://ees.elsevier.com/parco/ and login as an Author. On your Main Menu page is a folder entitled "Submissions Needing Revision". You will find your submission record there.
+
+When submitting your revised manuscript, please ensure that you upload the source files (e.g. (La)TeX or Word; (La)TeX files as 1 comprehensive file and each figure and table separately). Uploading a PDF file at this stage will create delays should your manuscript be finally accepted for publication. If your revised submission does not include the source files, we will contact you to request them.
+
+Yours sincerely,
+
+Patrick Haven Worley, Ph.D.
+Associate Editor
+Parallel Computing
+
+**********************************************************
+
+Reviewers' comments:
+
+Dear Authors,
+         Thank you for your submission to the Parallel Computing special issue on Emerging Programming Paradigms for Large-Scale Scientific Computing.  As there were several concerns about the manuscript, we ask that you update your submission to address the reviewer feedback, and include a point by point response to the reviewer comments.
+
+With kind regards,
+ParCo Guest Editors:
+Rupak Biswas, Rajesh Nishtala,  Lenny Oliker
+
+
+****************************************************************************************
+
+Reviewer #1: The paper describes Swift, a scripting language for distributed parallel applications.  I have mixed feelings about this paper. I can see the usefulness of a scripting language like Swift.  The problems Swift was designed to address are certainly interesting and important.
+
+On the other hand, I don't see much scientific merit in the paper.  The paper reads more like a Swift user manual than a scientific paper.  For the language design, the only thing that might be novel is the notion of mapped type, but I consider it to be quite minor.  I also don't see any new ideas in the data-flow dependency based execution model.
+
+>>>  Response:
+
+We believe that our discussion of related work shows that there is no other language that does what Swift does.  We also believe that the decisions we have made in creating Swift, as a simple minimal language with a function model for evaluating a large set of individual applications on both parallel and distributed systems of extreme scale using the concept of single-assignment futures to highly effectively exploit implicit parallelism provides the scientific merit of the paper.
+
+<<<
+
+For the distributed execution, one important missing piece is performance evaluation.  Data locality is very important for data-intensive applications.  As I understand it, data have to be moved in and out the clusters.  So, understanding the cost of scheduling and data transfer is very important to validate the Swift design.  Perhaps, it was
+published somewhere else, but it would be nice to discuss it in this paper.  
+
+>>> Response:
+
+We have added a new section, "5. Performance Characteristics" in
+response to this point. Additional tests are being developed and run,
+so these results may be further refined before publication. Some
+results from prior publications have been cited and included here,
+which show the overlap of data transfer and processing to address the
+issue above.
+
+<<<
+
+Here are some more detailed comments:
+
+1.  Swift uses restart log to reuse the results of successfully
+completed components.  The paper mentioned "appropriate manual
+intervention".  This seems to be something you can almost completely
+automate.  Based on my experiences with large-scale and long running
+applications, this can be very useful.
+
+>>> Automation of restart
+
+The "manual intervention" referred to the correction of whatever
+caused the script to fail. For example, a missing data file. Since the
+Swift restart mechanism *is* fully automated, this phrase was removed.
+
+<<<
+
+2.  Swift reduces job submission cost using clustering.  It is not
+clear to me if a subgraph can be batched together by your clustering
+technique.  This obviously requires a little bit of analysis of the
+data-flow graph to do it properly.  But it could be quite useful to
+achieve better data locality.
+
+>>> Clustering
+
+This section has been clarified. Swift will group tasks together based
+on their expected time duration and their readiness to run. So a
+cluster batch could include tasks from multiple sub-graphs. But its
+based on 
+
+<<<
+
+3.  In terms of programming models, modern systems such as Microsoft's
+DryadLINQ and Google's FlumeJava successfully integrate data-flow
+constructs into state of the art programming languages (C# and Java).
+This integration approach is quite nice and powerful. It would be nice
+if the authors can compare Swift with these two systems.
+
+>>> Response:
+
+We have added comparisons to Dryad and FlumeJava in the related work section.
+
+<<<
+
+Reviewer #2: The paper presents a powerful high-level scripting language, SwiftScript, for performing a massive number of tasks/jobs coupled with collections of file-based data. It describes details of the language syntax and semantics, which is based on data flow with support of arrays and procedures, and discusses the implementation and several use cases with the main focus on a grid.  Although a similar work was published before, this paper gives elaborated summary of the technical details, which is useful for general audience.
+
+The current work is an attempt to overcome issues in dealing with a massive amount of simulations and data either on a grid or on a massive parallel system, such as large data files, various dependencies, fault tolerance. The framework simplifies the process in the scripting design by an "app" interface to the actual command line for program execution and a "mapper" to map data structures with actual files without the need of going into details of how the program actually gets executed and without explicitly specifying the types of the files involved. The paper also describes the implementation that is built on top of the CoG Karajan workflow engine with built-in fault tolerance and demonstrates the usefulness of the system with a number of examples.
+
+The primary focus of the work is for the grid applications, although the paper has indicated the applicability to other systems, such as more tightly coupled massive computing systems. 
+
+For those who might not be familiar with the Karajan language, it would be useful to add a reference to the related work.
+
+>>>  Response:
+
+We have added such a reference
+
+<<<
+
+It would be helpful to include some discussion on the "auto-parallelization" capability (achieved via data flow analysis?).
+
+>>> auto-parallelization
+
+This is now discussed in much more detail, in section 2.
+
+<<<
+
+Out of the four application examples presented, two of the cases (4.3 and 4.4) do not contain enough details to support the discussion; deleting the two examples should not affect the clarity of the paper.
+
+>>> Examples: clarify or delete 
+
+We have completely revised the application example section (4).  It
+now shows only two app examples, but does so in much more precise
+detail, to provide a better understanding of what using Swift entails.
+
+<<<
+
+It would be helpful to elaborate more in example 4.2 on how each task/job gets scheduled onto Ranger nodes or how Swift interacts with the local batch job scheduler, which would in turn help audience understand better how SwiftScript could be used for a certain class of applications on a more tightly coupled massive computing environment (such as
+parameter studies).
+
+>>> 4.2 - task scheduling on Ranger 
+
+As Sec 4 is revised considerably, this information has been included
+in Secs. 2 & 3.
+
+<<<
+
+There are a few minor typos in the manuscript.
+For instance, on Page 7 section 2.2, in the code snippet:
+
+    output[ix] = rotate(frames, 180);
+
+Should "frames" be "f" in this case?
+
+>>>  Response:
+
+The typo has been corrected.
+
+<<<
+
+Reviewer #3: This is an interesting paper aimed at the practical problem of
+managing a large ensemble of computations with possible dependencies
+among the tasks.  The general outline and structure of the paper is
+fine.  There are a number of small errors which should have been
+caught by proofreading the manuscript.
+
+>>>  Response:
+
+typos and grammar have been corrected by a fresh complete proofreading.
+
+<<<
+
+The most substantive comment I have concerns examples 4.3 and 4.4.  I
+believe that, as they stand, these are too sparsely annotated to add
+significant value to the paper.  So I suggest either documenting these
+examples more carefully, so that a reader new to Swift can understand
+them, or else omit these examples entirely.
+
+>>> examples 4.3 and 4.4: annotate or delete
+
+Addressed and revised completely, as mentioned above.
+
+<<<
+
+Further comments:
+
+1. Sometimes "Swift" is used and sometimes "SwiftScript" is used.  Is
+there any reason to have two different terms?  From the website it
+appears that "SwiftScript" referred to an earlier version of the
+language, so perhaps "SwiftScript" should be replaced by "Swift"
+everywhere.
+
+>>>  Response:
+
+This has been done.  SwiftScript no longer appears.
+
+<<<
+
+2. It's a bit awkward that "single assignment" is used in section 2.1
+but not defined until section 2.3.
+
+>>>  Response:
+
+This has been fixed.
+
+<<<
+
+3. The example on p. 5 appears erroneous, it seems the rotate
+procedure should have an angle input in addition to the image input
+(this is corrected on p.6).
+
+>>>  Response:
+
+This has been fixed.
+
+<<<
+
+4. In section 2.2, should rotate be invoked as "rotate(f, 180)" instead
+of "rotate(frames, 180)"?
+
+>>>  Response:
+
+This has been fixed.
+
+<<<
+
+5. There are some acronyms in the paper that should, I believe, be
+defined.  In some cases the acronym can be omitted, for example I
+believe RDBMS is only used once so there is really no need for the
+acronym.  Other acronyms appear in Figure 1: CoG, OSG, AWS; these
+probably should be defined.  Other acronyms: GRAM, fMRI (and FMRI,
+cf. Fig. 2, which should probably be fMRI).
+
+>>>  Response:
+
+RDBMS has been removed.
+FMRI has been removed.
+GRAM doesn't seem to need explanation, it's cited where first used.
+The acronyms in Figure 1 are defined in the caption.
+
+<<<
+
+6. All these appear: "stage in", "stage-in", "stagein".  Please be
+consistent (similarly for stage out).
+
+>>>  Response:
+
+This has been fixed.
+
+<<<
+
+7. The mysterious numbers '12, 1000, 1000, "81 3 3"' in example 4.1
+might merit an explanation.
+
+>>> 
+
+It seems clear that these are parameters passed into the app program;
+I don't think explanation is needed
+
+<<<
+
+8. The Figure numbering in the text of section 4.1 needs correction.
+
+>>> 
+
+fixed.
+
+<<<
+
+9. In the text of section 4.1, I believe the variables or, iv,
+direction, overwrite, and ov should be in a different font for clarity.
+In the text of section 4.1 "n x n" should be in a different font and
+"n2" is clearly wrong.
+
+>>> 
+
+fixed.
+
+<<<
+
+10. Section 4.2, what is an "emblem experiment"?
+
+>>>
+
+This has been removed in the revision of Sec 4.
+
+<<<
+
+11. The margins in sections 4.2 and 4.4 need fixing as some lines
+run completely off the page.
+
+>>> 
+
+This has been addressed in the revision of Sec 4.
+
+<<<
+
+12. "Karajan" is mentioned several times, there really should be short
+definition of it and a reference to it in the bibliography.
+
+>>>  Response:
+
+This has been fixed.
+
+<<<
+
+13. Many of the references look incomplete; journal references really
+should have page numbers, some references are missing a year.
+Reference 8 severely mangles "Bresnahan".
+
+>>>  Response:
+
+This has been fixed.
+
+<<<
+
+14. It is somewhat disingenuous to refer to thousands and millions of
+cores (section 6.2).  All systems I know of are managed as nodes,
+where each node might have 8 or 12 or 16 cores.  This is no essential
+simplification, I know, but why not be honest and refer to nodes
+instead of cores?
+
+>>>  Response:
+
+In Swift, individual tasks generally run on cores, so cores is the correct
+term.
+
+<<<
+
+
+15. "Swift is implemented by compiling to a Karajan program".  Why
+would one want to _compile_ a scripting language?  It seems more natural
+(to this naive reader) to have an interpreter or a translator.
+
+>>>  Response:
+
+This has been fixed.  We now more accurately say:
+Swift is implemented by generating and executing a Karajan program.
+
+<<<
+
+16. The coaster idea looks quite interesting, could this be expanded,
+or could an example with coasters be constructed?
+
+>>> <<<
+
+17. Table 1, 1st row, 3rd column: should it be f->data.txt instead of
+f->file.txt?
+
+>>>  Response:
+
+This has been fixed.
+
+<<<
+
+18. There are many (too many to list) typos, missing words, mistakes
+such as "en queued" instead of "enqueued", subject/verb mismatches of
+number and/or tense. A careful proofreading is sorely needed.
+
+
+>>>  Response:
+
+This has been fixed.
+
+<<<