[Swift-commit] r3925 - in text/parco10submission: . code

Sun Jan 9 18:08:11 CST 2011

Author: wilde
Date: 2011-01-09 18:08:11 -0600 (Sun, 09 Jan 2011)
New Revision: 3925

Modified:
   text/parco10submission/code/modis.swift
   text/parco10submission/paper.tex
Log:
Small updates based on comments form Tim.

Modified: text/parco10submission/code/modis.swift
===================================================================

--- text/parco10submission/code/modis.swift	2011-01-09 22:48:58 UTC (rev 3924)
+++ text/parco10submission/code/modis.swift	2011-01-10 00:08:11 UTC (rev 3925)
@@ -6,7 +6,7 @@
 
 app (landuse output) getLandUse (imagefile input, int sortfield)
 {
-  getlanduse @input sortfield stdout=@output ;
+  getlanduse @input sortfield stdout=@output;
 }
 
 app (file output, file tilelist) analyzeLandUse

Modified: text/parco10submission/paper.tex
===================================================================
--- text/parco10submission/paper.tex	2011-01-09 22:48:58 UTC (rev 3924)
+++ text/parco10submission/paper.tex	2011-01-10 00:08:11 UTC (rev 3925)
@@ -1118,8 +1118,7 @@
 GRAM/LRM submission.
 
 Swift offers two approaches: \emph{clustering} and \emph{coasters}.
-Clustering constructs job submissions that contain a number of component program
-executions, rather than just submitting jobs one-at-a-time.
+Clustering aggregates multiple program executions into a single job, thereby reducing the total number of jobs to be submitted
 Coasters~\cite{coasters} is a form of multi-level scheduling similar to pilot jobs~\cite{Condor-G_2002}.
 It submits generic coaster jobs to a site, and binds
 component program executions to the coaster jobs (and thus to worker
@@ -1128,9 +1127,9 @@
 Clustering requires little additional support on the remote site, while
 the coasters framework requires an active component on the
 head node (in Java) and on the worker nodes (in Perl) as well as
-additional network connectivity within a site. In practice, the
-automatic deployment and execution of these components can be
-difficult on a number sites.
+additional network connectivity within a site. Occasionally, the
+automatic deployment and execution of the coaster components can be
+problematic or even impractical on a site, and require alternative manual configuration.
 
 However, clustering can be less efficient than using coasters.
 Coasters can react much more dynamically to changing numbers of
@@ -1141,7 +1140,7 @@
 with excessive serialization, or, in the other direction, it can result
 in an excessive number of job submissions. Coaster workers can be
 queued and executed before all of the work that they will eventually
-execute is known; this can enable more work to be done per job submission,
+execute is known; this can enable the Swift scheduler to perform more application invocations per coaster worker job,
 and faster overall execution of the entire application.
 
 Using coasters, the status for the actual application jobs is reported as
@@ -1258,9 +1257,9 @@
 \end{Verbatim}
 it would cause the first five elements of the array {\tt geos} to be mapped to the first five files of the modis dataset in the specified directory.
 
-At lines 52-53, the script declares the array {\tt land} which will contain the output of the {\tt getlanduse} application. This declaration uses the built-in ``structured regular expression mapper'', which will determine the names of the \emph{output} files that the array will refer to once they are computed. Swift knows from context that this is an output mapping. The mapper will use regular expressions to base that names of the output files on the filenames of the corresponding elements of the input array {\tt geos} given by the {\tt source=} argument to the mapper.
+At lines 52-53, the script declares the array {\tt land} which will contain the output of the {\tt getlanduse} application. This declaration uses the built-in ``structured regular expression mapper'', which will determine the names of the \emph{output} files that the array will refer to once they are computed. Swift knows from context that this is an output mapping. The mapper will use regular expressions to base the names of the output files on the filenames of the corresponding elements of the input array {\tt geos} given by the {\tt source=} argument to the mapper. The declaration for {\tt land[]} maps, for example, a file {\tt h07v08.landuse.byfreq} to an element of the {\tt land[]} array for a file {\tt h07v08.tif} in the {\tt geos[]} array.
 
- At lines 55-57 the script performs its first computation using a {\tt foreach} loop to invoke {\tt getLandUse} in parallel on each file mapped to the elements of {\tt geos[]}. As 317 files were mapped (in lines 47-48), the loop will invoke 317 instances of the application in parallel. \katznote{is this strictly true?  Do you want to say that it will enable 317 instances to be runnable in parallel, but the number that are actually run in parallel depends on the hardware available to Swift, or something like that?} The result of each computation is placed in a file mapped to the array {\tt land} and named by the regular expression translation to be based on the file names mapped to the array {\tt geos[]} (in lines \katznote{is this 52-53?}). Thus the landuse histogram for file {\tt /home/wilde/modis/2002/h00v08.tif} would be written into file {\tt h00v08.landuse.freq} and would be considered by Swift to be of type {\tt landuse}.
+At lines 55-57 the script performs its first computation using a {\tt foreach} loop to invoke {\tt getLandUse} in parallel on each file mapped to the elements of {\tt geos[]}. As 317 files were mapped (in lines 47-48), the loop will invoke 317 instances of the application in parallel. \katznote{is this strictly true?  Do you want to say that it will enable 317 instances to be runnable in parallel, but the number that are actually run in parallel depends on the hardware available to Swift, or something like that?} The result of each computation is placed in a file mapped to the array {\tt land} and named by the regular expression translation to be based on the file names mapped to the array {\tt geos[]} (in lines \katznote{is this 52-53?}). Thus the landuse histogram for file {\tt /home/wilde/modis/2002/h00v08.tif} would be written into file {\tt h00v08.landuse.freq} and would be considered by Swift to be of type {\tt landuse}.
 
 Once all the land usage histograms have have been computed, the script can then execute {\tt analyzeLandUse} at line 63 to find the requested number of highest tiles (files) with a specific land cover combination. The Swift runtime system uses futures to ensure that this analysis function is not invoked until all of its input files have computed and transported to the computation site chosen to run the analysis program. All of these steps take place automatically, using the relatively simple and location-independent Swift expressions shown. The output files to be use to hold the result are specified in the declarations at lines 61-62. \katznote{should these lines have a space inserted before the ``<'' to match the previous lines?  Same question for 67-68... }