[Swift-commit] r3375 - text/parco10submission

noreply at svn.ci.uchicago.edu noreply at svn.ci.uchicago.edu
Tue Jun 15 23:47:39 CDT 2010


Author: wilde
Date: 2010-06-15 23:47:39 -0500 (Tue, 15 Jun 2010)
New Revision: 3375

Modified:
   text/parco10submission/paper.tex
Log:
adjusted authors. reduce verbatim font. edits to Examples section. Added MODIS example.

Modified: text/parco10submission/paper.tex
===================================================================
--- text/parco10submission/paper.tex	2010-06-16 02:39:54 UTC (rev 3374)
+++ text/parco10submission/paper.tex	2010-06-16 04:47:39 UTC (rev 3375)
@@ -4,7 +4,14 @@
 \usepackage{graphicx}
 
 \journal{Parallel Computing}
+\makeatletter 
+\g at addto@macro\@verbatim\small 
+\makeatother 
 
+\makeatletter 
+\g at addto@macro\@verbatim\small 
+\makeatother 
+
 \begin{document}
 % \bibliographystyle{unsrt} % initial temp bib style for editing
 
@@ -35,14 +42,18 @@
 %%        \affaddr{Argonne National Laboratory} \\
 %% }
 
-\author{Ben Clifford}
 \author{Ian Foster}
 \author{Mihael Hategan}
 \author{Justin M. Wozniak}
 \author{Michael Wilde}
 
-\address{Computation Institute, University of Chicago}
+\address{Mathematics and Computer Science Division, Argonne National
+  Laboratory, Computation Institute, University of Chicago}
 
+\author{Ben Clifford}
+
+\address{Computation Institute, University of Chicago (at time of writing) }
+
 \begin{abstract}
 
 Scientists, engineers and business analysts often work by performing a
@@ -865,19 +876,22 @@
 \section{Applications}
 \label{Applications}
 
-TODO: two or three applications in brief. discuss both the application
-behaviour in relation to Swift, but underlying grid behaviour in
-relation to Swift
+% TODO: two or three applications in brief. discuss both the application
+% behaviour in relation to Swift, but underlying grid behaviour in
+% relation to Swift
 
-One app: CNARI + TeraGrid - small jobs (3s), many of them.
+% One app: CNARI + TeraGrid - small jobs (3s), many of them.
 
-Another app: Rosetta on OSG? OSG was designed with a focus on
-heterogeneity between sites. Large number of sites; automatic site file
-selection; and automatic app deployment there.
+% Another app: Rosetta on OSG? OSG was designed with a focus on
+% heterogeneity between sites. Large number of sites; automatic site file
+% selection; and automatic app deployment there.
 
+We describe in this section a few representative Swift applications
+from various diverse disciplines.
+
 \subsection{BLAST Application Example}
 
-The following is notes from the Wiki by Allan: needs much refinement, adding here as a placeholder.
+% The following is notes from the Wiki by Allan: needs much refinement, adding here as a placeholder.
 
 \begin{verbatim}
 type database;
@@ -885,41 +899,44 @@
 type output;
 type error;
 
-(output out, error err) blastall(query i, database db) {
- app {
-   blastall "-p" "blastp" "-F" "F" "-d" @filename(db) "-i"
- at filename(i) "-v" "300" "-b" "300" "-m8" "-o" @filename(out)
-stderr=@filename(err);
- }
+app (output out, error err) blastall(query i, database db) {
+  blastall "-p" "blastp" "-F" "F"
+           "-d" @filename(db) "-i" @filename(i)
+           "-v" "300" "-b" "300" "-m8"
+           "-o" @filename(out) stderr=@filename(err);
 }
 
-database pir <simple_mapper;prefix="/disks/ci-gpfs/swift/blast/pir/UNIPROT_for_blast_14.0.seq">;
+database pir <simple_mapper;prefix="/ci/pir/UNIPROT.14.0.seq">;
+
+query  i   <"test.in">; 
 output out <"test.out">;
-query i <"test.in">;
-error err <"test.err">;
+error  err <"test.err">;
+
 (out,err) = blastall(i, pir);
 \end{verbatim}
 
-The trick here is that blastall reads takes the prefix name of the database files that it will read (.phr, .seq and .pin files).
-So i made a dummy file called ``{\tt UNIPROT\_for\_blast\_14.0.seq}'' to satisfy the data dependency . So here is the final list of my files:
+The application {\tt \small blastall} expects the prefix of the database files that it will read (.phr, .seq and .pin files).
+This example employs a dummy file called {\tt \small
+  UNIPROT.14.0.seq} to satisfy the data dependency. When executed,
+the Swift script processes the following input directory {\tt\small /ci/pir}:
 
 \begin{verbatim}
--rw-r--r--  1 aespinosa ci-users         0 Nov 15 13:49 UNIPROT_for_blast_14.0.seq
--rw-r--r--  1 aespinosa ci-users 204106872 Oct 20 16:50 UNIPROT_for_blast_14.0.seq.00.phr
--rw-r--r--  1 aespinosa ci-users  23001752 Oct 20 16:50 UNIPROT_for_blast_14.0.seq.00.pin
--rw-r--r--  1 aespinosa ci-users 999999669 Oct 20 16:51 UNIPROT_for_blast_14.0.seq.00.psq
--rw-r--r--  1 aespinosa ci-users 233680738 Oct 20 16:51 UNIPROT_for_blast_14.0.seq.01.phr
--rw-r--r--  1 aespinosa ci-users  26330312 Oct 20 16:51 UNIPROT_for_blast_14.0.seq.01.pin
--rw-r--r--  1 aespinosa ci-users 999999864 Oct 20 16:52 UNIPROT_for_blast_14.0.seq.01.psq
--rw-r--r--  1 aespinosa ci-users  21034886 Oct 20 16:52 UNIPROT_for_blast_14.0.seq.02.phr
--rw-r--r--  1 aespinosa ci-users   2370216 Oct 20 16:52 UNIPROT_for_blast_14.0.seq.02.pin
--rw-r--r--  1 aespinosa ci-users 103755125 Oct 20 16:52 UNIPROT_for_blast_14.0.seq.02.psq
--rw-r--r--  1 aespinosa ci-users       208 Oct 20 16:52 UNIPROT_for_blast_14.0.seq.pal
+-rw-r--r--  1 ben ci         0 Nov 15 13:49 UNIPROT.14.0.seq
+-rw-r--r--  1 ben ci 204106872 Oct 20 16:50 UNIPROT.14.0.seq.00.phr
+-rw-r--r--  1 ben ci  23001752 Oct 20 16:50 UNIPROT.14.0.seq.00.pin
+-rw-r--r--  1 ben ci 999999669 Oct 20 16:51 UNIPROT.14.0.seq.00.psq
+-rw-r--r--  1 ben ci 233680738 Oct 20 16:51 UNIPROT.14.0.seq.01.phr
+-rw-r--r--  1 ben ci  26330312 Oct 20 16:51 UNIPROT.14.0.seq.01.pin
+-rw-r--r--  1 ben ci 999999864 Oct 20 16:52 UNIPROT.14.0.seq.01.psq
+-rw-r--r--  1 ben ci  21034886 Oct 20 16:52 UNIPROT.14.0.seq.02.phr
+-rw-r--r--  1 ben ci   2370216 Oct 20 16:52 UNIPROT.14.0.seq.02.pin
+-rw-r--r--  1 ben ci 103755125 Oct 20 16:52 UNIPROT.14.0.seq.02.psq
+-rw-r--r--  1 ben ci       208 Oct 20 16:52 UNIPROT.14.0.seq.pal
 \end{verbatim}
 
-I looked at the dock6 documentation for OSG. It looks that it recommends to transfer the datafiles to OSG sites manually via globus-url-copy. By my understanding of how swift works, it should be able to transfer my local files to the selected sites. I have yet to try this and will look more on examples in the data management side of Swift.
+% I looked at the dock6 documentation for OSG. It looks that it recommends to transfer the datafiles to OSG sites manually via globus-url-copy. By my understanding of how swift works, it should be able to transfer my local files to the selected sites. I have yet to try this and will look more on examples in the data management side of Swift.
 
-Do you know other users who went in this approach? The documentation has only a few examples in managing data. I'll check the swift Wiki later and see what material we have and also post this email/ notes.
+% Do you know other users who went in this approach? The documentation has only a few examples in managing data. I'll check the swift Wiki later and see what material we have and also post this email/ notes.
 
 \subsection{fMRI Application Example}
 
@@ -1112,6 +1129,80 @@
 doall(p);
 \end{verbatim}
 
+\subsection{Satellite image data processing.}
+
+The last example (from a class project) processes data from a large dataset of files that categorize the Earth's surface, from the MODIS sensor instruments that orbit Earth on two NASA satellites of the Earth Observing System.
+The Swift script analyzes the dataset to find the files with the ten
+largest total urban area and then produce a new dataset with viewable
+color images of these top-ten urban data "tiles".
+
+The dataset consists of 317 "tile" files that categorize every
+250-meter square of non-ocean surface of the earth into one of 17
+"land cover" categories, such as water, ice, forest, barren and
+urban. Each pixel of these TIFF-format data files has a value 0-16
+describing one 250-meter square of the earth's surface for a specific
+point in time. Each tile file has 5M pixels, covering a region of 2400
+x 2400 250-meter squares, based on a specific map projection.
+
+The input datasets are not ``viewable'' images because of its pixel
+values, thus requiring the color rendering step above.
+
+\begin{verbatim}
+type file;
+type imagefile;
+type landuse;
+
+app (landuse output) getLandUse (imagefile input, int sortfield)
+{
+  getlanduse @input sortfield stdout=@output ;
+}
+
+app (file output, file tilelist) analyzeLandUse (landuse input[], int usetype, int maxnum)
+{
+  analyzelanduse @output @tilelist usetype maxnum @filenames(input);
+}
+
+app (imagefile output) colormodis (imagefile input)
+{
+  colormodis @input @output;
+}
+
+imagefile geos[]<filesys_mapper; location="/home/wilde/bigdata/data/modis", suffix=".tif">;
+landuse   land[]<structured_regexp_mapper; source=geos,match="(h..v..)", transform="\\1.landuse.byfreq">;
+
+# Find the land use of each modis tile
+
+foreach g,i in geos {
+  land[i] = getLandUse(g,1);
+}
+
+# Find the top 10 most urban tiles (by area)
+
+int UsageTypeURBAN=13;
+file bigurban<"topurban.txt">;
+file urbantiles<"urbantiles.txt">;
+(bigurban, urbantiles) = analyzeLandUse(land, UsageTypeURBAN, 10);
+
+# Map the files to an array
+
+string urbanfilenames[] = readData(urbantiles);
+
+imagefile urbanfiles[] <array_mapper;files=urbanfilenames>;
+
+# Create a set of recolored images for just the urban tiles
+
+foreach uf, i in urbanfiles {
+  imagefile recoloredImage <single_file_mapper;
+            file=@strcat(@strcut(urbanfilenames[i],"(h..v..)"),
+                         ".recolored.tif")>;
+  recoloredImage = colormodis(uf);
+}
+
+imagefile geos[]<filesys_mapper;
+                location="/ci/modis/2008", suffix=".tif">;
+
+\end{verbatim}
+
 \section{Comparison to Other Systems}
 \label{Related}
 




More information about the Swift-commit mailing list