[Swift-commit] r3904 - text/parco10submission

noreply at svn.ci.uchicago.edu noreply at svn.ci.uchicago.edu
Sat Jan 8 09:14:53 CST 2011


Author: dsk
Date: 2011-01-08 09:14:53 -0600 (Sat, 08 Jan 2011)
New Revision: 3904

Modified:
   text/parco10submission/paper.tex
Log:
some light editing in first script in 4, and some comments/questions


Modified: text/parco10submission/paper.tex
===================================================================
--- text/parco10submission/paper.tex	2011-01-08 14:44:33 UTC (rev 3903)
+++ text/parco10submission/paper.tex	2011-01-08 15:14:53 UTC (rev 3904)
@@ -1206,30 +1206,31 @@
 and performing image processing for research in
 image-guided planning for neurosurgery \cite{Fedorov_2009}.
 
-This section describes two representative Swift scripts (from two diverse disciplines) in more detail.
-The first script is a tutorial example (used in a class on data intensive computing at the University of Chicago) which performs a simple analysis of satellite land-use imagery. The second script is taken (with minor changes to fit better on the page) directly from work done using Swift for an investigation into the molecular structure of glassy materials in the field of theoretical chemistry. In both examples, the intent is to show a complete and realistic Swift script, annotated to better understand the nature of the Swift programming model and to provide a glimpse of real Swift usage.
+This section describes two representative Swift scripts (from two diverse disciplines) in detail.
+The first is a tutorial example (used in a class on data intensive computing at the University of Chicago) that performs a simple analysis of satellite land-use imagery. The second script is taken (with minor changes to fit better on the page) directly from work done using Swift for an investigation into the molecular structure of glassy materials in the field of theoretical chemistry. In both examples, the intent is to show complete and realistic Swift scripts, annotated to better understand the nature of the Swift programming model and to provide a glimpse of real Swift usage.
 
 \subsection{Satellite image data processing.}
 
 The first example -- Script 1 below -- processes
 data from a large dataset of files that categorize the Earth's surface,
-from the MODIS sensor instruments that orbit Earth on two NASA
+derived from data from the MODIS sensor instruments that orbit the Earth on two NASA
 satellites of the Earth Observing System.
 
-The dataset (we tested with one named {\tt mcd12q1}, for year 2002)
+The dataset we use (for 2002, named {\tt mcd12q1})
 consists of 317 ``tile'' files that categorize every
 250-meter square of non-ocean surface of the earth into one of 17
 ``land cover'' categories, for example, water, ice, forest, barren,
 urban, etc. Each pixel of these data files has a value of 0 to 16,
-describing one 250-meter square of the earth's surface at a specific
-point in time. Each tile file has ~ 5 million 1-byte pixels (5.7 megabytes), covering 2400
+describing one square of the earth's surface at a specific
+point in time. Each tile file has approximately
+5 million 1-byte pixels (5.7 MB), covering 2400
 x 2400 250-meter squares, based on a specific map projection.
 
 The Swift script analyzes the dataset to find the files with the N
 largest total area of any requested sets of land-cover types, and then produces a new dataset with viewable
 color images of those closest-matching data tiles.
-(The input datasets are not viewable images, as their pixel
-values are land-use codes. Thus a color rendering step is required). A typical invocation of this script would be ``\emph{find the top 12 urban tiles}'' or ``\emph{find the 16 tiles with the most forest and grassland}''. As this script is used for tutorial purposes, the application programs it calls are simple shell scripts that use fast, generic image processing applications to process the MODIS data. Thus the example executes quickly while serving as a realistic tutorial script for much more compute-intensive satellite data processing applications.
+(A color rendering step is required to do this, as the input datasets are not viewable images; their pixel
+values are land-use codes.) A typical invocation of this script would be ``\emph{find the top 12 urban tiles}'' or ``\emph{find the 16 tiles with the most forest and grassland}''. As this script is used for tutorial purposes, the application programs it calls are simple shell scripts that use fast, generic image processing applications to process the MODIS data. Thus the example executes quickly while serving as a realistic tutorial script for much more compute-intensive satellite data processing applications.
 \\
 \\
 The script is structured as follows:
@@ -1241,6 +1242,7 @@
 
 Lines 36-41 extract a set of science parameters from the {\tt swift} command line with which the user invokes the script.
 These indicate the number of files of the input set to select (to enable processing the first M of N files), the set of land cover types to select, the number of ``top'' tiles to select, and parameters used to locate input and output directories.
+\katznote{not sure it these syntaxes were explained in section 2 clearly - if not, they probably should be added to section 2}
 
 Lines 47-48 invoke a ``external'' mapper script {\tt modis.mapper} to map the first {\tt nFiles} MODIS data files in the directory contained in the script argument {\tt MODISdir} to the array {\tt geos}. An external mapper script is written by the Swift programmer (in any language desired, but quite often mappers are simple shell scripts). External mappers are usually co-located with the Swift script, and are invoked when Swift instantiates the associated variable. They return a two-field list of the the form \emph{SwiftExpression, filename}, where \emph{SwiftExpression} is relative to the variable name being mapped.  For example, if this mapper invocation were called from the Swift script at line 47-48:
 \begin{Verbatim}[fontsize=\scriptsize,framesep=2mm]
@@ -1255,13 +1257,13 @@
 
 At lines 52-53, the script declares the array {\tt land} which will contain the output of the {\tt getlanduse} application. This declaration uses the built-in ``structured regular expression mapper'', which will determine the names of the \emph{output} files that the array will refer to once they are computed. Swift knows from context that this is an output mapping. The mapper will use regular expressions to base that names of the output files on the filenames of the corresponding elements of the input array {\tt geos} given by the {\tt source=} argument to the mapper.
  
- At lines 55-57 the script performs its first computation using a {\tt foreach} loop to invoke {\tt getLandUse} in parallel on each file mapped to the elements of {\tt geos[]}. As 317 files were mapped, the loop will invoke 317 instances of the application in parallel. The result of each computation is placed in a file mapped to the array {\tt land} and named by the regular expression translation to be based on the file names mapped to the array {\tt geos[]}. Thus the landuse histogram for file {\tt /home/wilde/modis/2002/h00v08.tif} would be written into file {\tt h00v08.landuse.freq} and would be considered by Swift to be of type {\tt landuse}.
+ At lines 55-57 the script performs its first computation using a {\tt foreach} loop to invoke {\tt getLandUse} in parallel on each file mapped to the elements of {\tt geos[]}. As 317 files were mapped (in lines 47-48), the loop will invoke 317 instances of the application in parallel. \katznote{is this strictly true?  Do you want to say that it will enable 317 instances to be runnable in parallel, but the number that are actually run in parallel depends on the hardware available to Swift, or something like that?} The result of each computation is placed in a file mapped to the array {\tt land} and named by the regular expression translation to be based on the file names mapped to the array {\tt geos[]} (in lines \katznote{is this 52-53?}). Thus the landuse histogram for file {\tt /home/wilde/modis/2002/h00v08.tif} would be written into file {\tt h00v08.landuse.freq} and would be considered by Swift to be of type {\tt landuse}.
 
-Once all the land usage histograms have have been computed, the script can then execute {\tt analyzeLandUse} at line 63 to find the requested number of highest tiles (files) with a specific land cover combination. The Swift runtime system uses futures to ensure that this analysis function is not invoked until all of its input files have computed and transported to the computation site chosen to run the analysis program. All of these steps take place automatically, using the relatively simple and location-independent Swift expressions shown. The output files to be use to hold the result are specified in the declarations at lines 61-62.
+Once all the land usage histograms have have been computed, the script can then execute {\tt analyzeLandUse} at line 63 to find the requested number of highest tiles (files) with a specific land cover combination. The Swift runtime system uses futures to ensure that this analysis function is not invoked until all of its input files have computed and transported to the computation site chosen to run the analysis program. All of these steps take place automatically, using the relatively simple and location-independent Swift expressions shown. The output files to be use to hold the result are specified in the declarations at lines 61-62. \katznote{should these lines have a space inserted before the ``<'' to match the previous lines?  Same question for 67-68... }
 
-To visualize the results, the application function {\tt markMap} invoked at line 68 will generate an image of a world map using the MODIS projection system and indicate the selected tiles matching the analysis criteria. Since this statememt depends on the output of the analysis, it will wait for statement at line 63 to complete before commencing.
+To visualize the results, the application function {\tt markMap} invoked at line 68 will generate an image of a world map using the MODIS projection system and indicate the selected tiles matching the analysis criteria. Since this statememt depends on the output of the analysis ({\tt topSelected}), it will wait for statement at line 63 to complete before commencing.
 
-For additional visualization, the script assembles a full map of all the input tiles, placed in their proper grid location on the MODIS world map projection, and again marking the selected tiles. Since this operation needs true-color images of every input tiles these are computed -- again in parallel -- with 317 jobs invoked by the foreach statement at line 76-78. The power of Swift's implicit parallelization is very vividly shown here: since the {\tt colorMODIS} call at line 77 depends only on the input array {\tt geos}, these 317 application invocations are executed in parallel with the initial 317 parallel executions of the {\tt getLandUse} application at line 56.  The script concludes at line 83 by assembling a montage of all the colored tiles and writing this image file to a web-accessible directory for viewing.
+For additional visualization, the script assembles a full map of all the input tiles, placed in their proper grid location on the MODIS world map projection, and again marking the selected tiles. Since this operation needs true-color images of every input tiles these are computed---again in \katznote{potentially? as before} parallel---with 317 jobs invoked by the foreach statement at line 76-78. The power of Swift's implicit parallelization is very vividly shown here: since the {\tt colorMODIS} call at line 77 depends only on the input array {\tt geos}, these 317 application invocations are executed in parallel with the initial 317 parallel executions of the {\tt getLandUse} application at line 56.  The script concludes at line 83 by assembling a montage of all the colored tiles and writing this image file to a web-accessible directory for viewing.
 
 \pagebreak
 Swift example 1: MODIS satellite image processing script




More information about the Swift-commit mailing list