[Swift-commit] r3902 - text/parco10submission

noreply at svn.ci.uchicago.edu noreply at svn.ci.uchicago.edu
Fri Jan 7 18:12:42 CST 2011


Author: wilde
Date: 2011-01-07 18:12:42 -0600 (Fri, 07 Jan 2011)
New Revision: 3902

Modified:
   text/parco10submission/paper.tex
Log:
Added 75% of the text for MODIS application.

Modified: text/parco10submission/paper.tex
===================================================================
--- text/parco10submission/paper.tex	2011-01-07 20:10:09 UTC (rev 3901)
+++ text/parco10submission/paper.tex	2011-01-08 00:12:42 UTC (rev 3902)
@@ -1201,28 +1201,59 @@
 data from a large dataset of files that categorize the Earth's surface,
 from the MODIS sensor instruments that orbit Earth on two NASA
 satellites of the Earth Observing System.
-The Swift script analyzes the dataset to find the files with the ten
-largest total urban area and then produces a new dataset with viewable
-color images of those top-ten urban data ``tiles''.
 
-The dataset consists of 317 ``tile'' files that categorize every
+The dataset (we tested with one named {\tt mcd12q1}, for year 2002)
+consists of 317 ``tile'' files that categorize every
 250-meter square of non-ocean surface of the earth into one of 17
-``land cover'' categories, for example, water, ice, forest, barren, or
-urban. Each pixel of these data files has a value of 0 to 16,
+``land cover'' categories, for example, water, ice, forest, barren,
+urban, etc. Each pixel of these data files has a value of 0 to 16,
 describing one 250-meter square of the earth's surface at a specific
-point in time. Each tile file has 5 million pixels, covering 2400
+point in time. Each tile file has ~ 5 million 1-byte pixels (5.7 megabytes), covering 2400
 x 2400 250-meter squares, based on a specific map projection.
 
-The input datasets are not ``viewable'' images because of the pixel
-values, thus requiring the color rendering step above.
-\katznote{``above'' isn't above, it's in the script below, which isn't at all
-described.}
+The Swift script analyzes the dataset to find the files with the N
+largest total area of any requested sets of land-cover types, and then produces a new dataset with viewable
+color images of those closest-matching data tiles.
+(The input datasets are not viewable images, as their pixel
+values are land-use codes. Thus a color rendering step is required). A typical invocation of this script would be ``\emph{find the top 12 urban tiles}'' or ``\emph{find the 16 tiles with the most forest and grassland}''.
+\\
+\\
+The script is structured as follows:
+\\
+\\
+Lines 1-3 define 3 mapped file types -- {\tt  MODISfile} for the input images, {\tt landuse} for the output of the landuse histogram calculation; and {\tt file} for any other generic file that we don't care to assign a unique type to.
 
+Lines 7-32 define the Swift interface functions for the application programs {\tt getLandUse}, {\tt analyzeLandUse}, {\tt colorMODIS}, {\tt assemble}, and {\tt markMap}.
+
+Lines 36-41 extract a set of science parameters from the {\tt swift} command line with which the user invokes the script.
+These indicate the number of files of the input set to select (to enable processing the first M of N files), the set of land cover types to select, the number of ``top'' tiles to select, and parameters used to locate input and output directories.
+
+Lines 47-48 invoke a ``external'' mapper script {\tt modis.mapper} to map the first {\tt nFiles} MODIS data files in the directory contained in the script argument {\tt MODISdir} to the array {\tt geos}. An external mapper script is written by the Swift programmer (in any language desired, but quite often mappers are simple shell scripts). External mappers are usually co-located with the Swift script, and are invoked when Swift instantiates the associated variable. They return a two-field list of the the form \emph{SwiftExpression, filename}, where \emph{SwiftExpression} is relative to the variable name being mapped.  For example, if this mapper invocation were called from the Swift script at line 47-48:
+\begin{Verbatim}[fontsize=\scriptsize,framesep=2mm]
+$ ./modis.mapper -location /home/wilde/modis/2002/ -suffix .tif -n 5
+[0] /home/wilde/modis/2002/h00v08.tif
+[1] /home/wilde/modis/2002/h00v09.tif
+[2] /home/wilde/modis/2002/h00v10.tif
+[3] /home/wilde/modis/2002/h01v07.tif
+[4] /home/wilde/modis/2002/h01v08.tif
+\end{Verbatim}
+it would cause the first five elements of the array {\tt geos} to be mapped to the first five files of the modis dataset in the specified directory.
+
+At lines 52-53, the script declares the array {\tt land} which will contain the output of the {\tt getlanduse} application. This declaration uses the built-in ``structured regular expression mapper'', which will determine the names of the \emph{output} files that the array will refer to once they are computed. Swift knows from context that this is an output mapping. The mapper will use regular expressions to base that names of the output files on the filenames of the corresponding elements of the input array {\tt geos} given by the {\tt source=} argument to the mapper.
+ 
+ At lines 55-57 the script performs its first computation using a {\tt foreach} loop to invoke {\tt getLandUse} in parallel on each file mapped to the elements of {\tt geos[]}. As 317 files were mapped, the loop will invoke 317 instances of the application in parallel. The result of each computation is placed in a file mapped to the array {\tt land} and named by the regular expression translation to be based on the file names mapped to the array {\tt geos[]}. Thus the landuse histogram for file {\tt /home/wilde/modis/2002/h00v08.tif} would be written into file {\tt h00v08.landuse.freq} and would be considered by Swift to be of type {\tt landuse}.
+
+Once all the land usage histograms have have been computed, the script can then execute {\tt analyzeLandUse} at line 63 to find the requested number of highest tiles (files) with a specific land cover combination. The Swift runtime system uses futures to ensure that this analysis function is not invoked until all of its input files have computed and transported to the computation site chosen to run the analysis program. All of these steps take place automatically, using the relatively simple and location-independent Swift expressions shown. The output files to be use to hold the result are specified in the declarations at lines 61-62.
+
+To visualize the results, the application function {\tt markMap} invoked at line 68 will generate an image of a world map using the MODIS projection system and indicate the selected tiles matching the analysis criteria. Since this statememt depends on the output of the analysis, it will wait for statement at line 63 to complete before commencing.
+
+For additional visualization, the script assembles a full map of all the input tiles, placed in their proper grid location on the MODIS world map projection, and again marking the selected tiles. Since this operation needs true-color images of every input tiles these are computed -- again in parallel -- with 317 jobs invoked by the foreach statement at line 76-78. The power of Swift's implicit parallelization is very vividly shown here: since the colorMODIS call at line 77 depends only on the input array geos, these 317 application invocations.
+
 \pagebreak
 Swift example 1: MODIS satellite image processing script
 \begin{Verbatim}[fontsize=\scriptsize,frame=single,framesep=2mm,gobble=7, numbers=left]
      1	type file;
-     2	type imagefile;
+     2	type MODIS; type image;
      3	type landuse;
      4
      5	# Define application program interfaces
@@ -1233,23 +1264,28 @@
     10	}
     11
     12	app (file output, file tilelist) analyzeLandUse
-    13	    (landuse input[], string usetype, int maxnum)
+    13	    (MODIS input[], string usetype, int maxnum)
     14	{
     15	  analyzelanduse @output @tilelist usetype maxnum @filenames(input);
     16	}
     17
-    18	app (imagefile output) colorMODIS (imagefile input)
+    18	app (image output) colorMODIS (MODIS input)
     19	{
     20	  colormodis @input @output;
     21	}
     22
-    23	app (imagefile output) assemble
-    24	    (file selected, imagefile image[], string webdir)
+    23	app (image output) assemble
+    24	    (file selected, image img[], string webdir)
     25	{
-    26	  assemble @output @selected @filename(image[0]) webdir;
+    26	  assemble @output @selected @filename(img[0]) webdir;
     27	}
+<<<<<<< .mine
+    28	
+    29	app (image grid) markMap (file tilelist) 
+=======
     28
     29	app (imagefile grid) markMap (file tilelist)
+>>>>>>> .r3901
     30	{
     31	  markmap @tilelist @grid;
     32	}
@@ -1263,12 +1299,12 @@
     40	string MODISdir=  @arg("modisdir","/home/wilde/bigdata/data/modis/2002");
     41	string webDir =   @arg("webdir","/home/wilde/public_html/geo/");
     42
-    43	string suffix=".tif";
+    43	
     44
     45	# Input Dataset
     46
-    47	imagefile geos[] <ext; exec="modis.mapper",
-    48	  location=MODISdir, suffix=".tif", n=nFiles >; # site=site
+    47	image geos[] <ext; exec="modis.mapper",
+    48	  location=MODISdir, suffix=".tif", n=nFiles >;
     49
     50	# Compute the land use summary of each MODIS tile
     51
@@ -1287,12 +1323,12 @@
     64
     65	# Mark the top N tiles on a sinusoidal gridded map
     66
-    67	imagefile gridMap<"markedGrid.gif">;
+    67	image gridMap<"markedGrid.gif">;
     68	gridMap = markMap(topSelected);
     69
     70	# Create multi-color images for all tiles
     71
-    72	imagefile colorImage[] <structured_regexp_mapper;
+    72	image colorImage[] <structured_regexp_mapper;
     73	          source=geos, match="(h..v..)",
     74	          transform="landuse/\\1.color.png">;
     75
@@ -1302,7 +1338,7 @@
     79
     80	# Assemble a montage of the top selected areas
     81
-    82	imagefile montage <single_file_mapper; file=@strcat(runID,"/","map.png") >; # @arg
+    82	image montage <single_file_mapper; file=@strcat(runID,"/","map.png") >; # @arg
     83	montage = assemble(selectedTiles,colorImage,webDir);
 
 \end{Verbatim}




More information about the Swift-commit mailing list