[Swift-commit] r3375 - text/parco10submission
noreply at svn.ci.uchicago.edu
noreply at svn.ci.uchicago.edu
Tue Jun 15 23:47:39 CDT 2010
Author: wilde
Date: 2010-06-15 23:47:39 -0500 (Tue, 15 Jun 2010)
New Revision: 3375
Modified:
text/parco10submission/paper.tex
Log:
adjusted authors. reduce verbatim font. edits to Examples section. Added MODIS example.
Modified: text/parco10submission/paper.tex
===================================================================
--- text/parco10submission/paper.tex 2010-06-16 02:39:54 UTC (rev 3374)
+++ text/parco10submission/paper.tex 2010-06-16 04:47:39 UTC (rev 3375)
@@ -4,7 +4,14 @@
\usepackage{graphicx}
\journal{Parallel Computing}
+\makeatletter
+\g at addto@macro\@verbatim\small
+\makeatother
+\makeatletter
+\g at addto@macro\@verbatim\small
+\makeatother
+
\begin{document}
% \bibliographystyle{unsrt} % initial temp bib style for editing
@@ -35,14 +42,18 @@
%% \affaddr{Argonne National Laboratory} \\
%% }
-\author{Ben Clifford}
\author{Ian Foster}
\author{Mihael Hategan}
\author{Justin M. Wozniak}
\author{Michael Wilde}
-\address{Computation Institute, University of Chicago}
+\address{Mathematics and Computer Science Division, Argonne National
+ Laboratory, Computation Institute, University of Chicago}
+\author{Ben Clifford}
+
+\address{Computation Institute, University of Chicago (at time of writing) }
+
\begin{abstract}
Scientists, engineers and business analysts often work by performing a
@@ -865,19 +876,22 @@
\section{Applications}
\label{Applications}
-TODO: two or three applications in brief. discuss both the application
-behaviour in relation to Swift, but underlying grid behaviour in
-relation to Swift
+% TODO: two or three applications in brief. discuss both the application
+% behaviour in relation to Swift, but underlying grid behaviour in
+% relation to Swift
-One app: CNARI + TeraGrid - small jobs (3s), many of them.
+% One app: CNARI + TeraGrid - small jobs (3s), many of them.
-Another app: Rosetta on OSG? OSG was designed with a focus on
-heterogeneity between sites. Large number of sites; automatic site file
-selection; and automatic app deployment there.
+% Another app: Rosetta on OSG? OSG was designed with a focus on
+% heterogeneity between sites. Large number of sites; automatic site file
+% selection; and automatic app deployment there.
+We describe in this section a few representative Swift applications
+from various diverse disciplines.
+
\subsection{BLAST Application Example}
-The following is notes from the Wiki by Allan: needs much refinement, adding here as a placeholder.
+% The following is notes from the Wiki by Allan: needs much refinement, adding here as a placeholder.
\begin{verbatim}
type database;
@@ -885,41 +899,44 @@
type output;
type error;
-(output out, error err) blastall(query i, database db) {
- app {
- blastall "-p" "blastp" "-F" "F" "-d" @filename(db) "-i"
- at filename(i) "-v" "300" "-b" "300" "-m8" "-o" @filename(out)
-stderr=@filename(err);
- }
+app (output out, error err) blastall(query i, database db) {
+ blastall "-p" "blastp" "-F" "F"
+ "-d" @filename(db) "-i" @filename(i)
+ "-v" "300" "-b" "300" "-m8"
+ "-o" @filename(out) stderr=@filename(err);
}
-database pir <simple_mapper;prefix="/disks/ci-gpfs/swift/blast/pir/UNIPROT_for_blast_14.0.seq">;
+database pir <simple_mapper;prefix="/ci/pir/UNIPROT.14.0.seq">;
+
+query i <"test.in">;
output out <"test.out">;
-query i <"test.in">;
-error err <"test.err">;
+error err <"test.err">;
+
(out,err) = blastall(i, pir);
\end{verbatim}
-The trick here is that blastall reads takes the prefix name of the database files that it will read (.phr, .seq and .pin files).
-So i made a dummy file called ``{\tt UNIPROT\_for\_blast\_14.0.seq}'' to satisfy the data dependency . So here is the final list of my files:
+The application {\tt \small blastall} expects the prefix of the database files that it will read (.phr, .seq and .pin files).
+This example employs a dummy file called {\tt \small
+ UNIPROT.14.0.seq} to satisfy the data dependency. When executed,
+the Swift script processes the following input directory {\tt\small /ci/pir}:
\begin{verbatim}
--rw-r--r-- 1 aespinosa ci-users 0 Nov 15 13:49 UNIPROT_for_blast_14.0.seq
--rw-r--r-- 1 aespinosa ci-users 204106872 Oct 20 16:50 UNIPROT_for_blast_14.0.seq.00.phr
--rw-r--r-- 1 aespinosa ci-users 23001752 Oct 20 16:50 UNIPROT_for_blast_14.0.seq.00.pin
--rw-r--r-- 1 aespinosa ci-users 999999669 Oct 20 16:51 UNIPROT_for_blast_14.0.seq.00.psq
--rw-r--r-- 1 aespinosa ci-users 233680738 Oct 20 16:51 UNIPROT_for_blast_14.0.seq.01.phr
--rw-r--r-- 1 aespinosa ci-users 26330312 Oct 20 16:51 UNIPROT_for_blast_14.0.seq.01.pin
--rw-r--r-- 1 aespinosa ci-users 999999864 Oct 20 16:52 UNIPROT_for_blast_14.0.seq.01.psq
--rw-r--r-- 1 aespinosa ci-users 21034886 Oct 20 16:52 UNIPROT_for_blast_14.0.seq.02.phr
--rw-r--r-- 1 aespinosa ci-users 2370216 Oct 20 16:52 UNIPROT_for_blast_14.0.seq.02.pin
--rw-r--r-- 1 aespinosa ci-users 103755125 Oct 20 16:52 UNIPROT_for_blast_14.0.seq.02.psq
--rw-r--r-- 1 aespinosa ci-users 208 Oct 20 16:52 UNIPROT_for_blast_14.0.seq.pal
+-rw-r--r-- 1 ben ci 0 Nov 15 13:49 UNIPROT.14.0.seq
+-rw-r--r-- 1 ben ci 204106872 Oct 20 16:50 UNIPROT.14.0.seq.00.phr
+-rw-r--r-- 1 ben ci 23001752 Oct 20 16:50 UNIPROT.14.0.seq.00.pin
+-rw-r--r-- 1 ben ci 999999669 Oct 20 16:51 UNIPROT.14.0.seq.00.psq
+-rw-r--r-- 1 ben ci 233680738 Oct 20 16:51 UNIPROT.14.0.seq.01.phr
+-rw-r--r-- 1 ben ci 26330312 Oct 20 16:51 UNIPROT.14.0.seq.01.pin
+-rw-r--r-- 1 ben ci 999999864 Oct 20 16:52 UNIPROT.14.0.seq.01.psq
+-rw-r--r-- 1 ben ci 21034886 Oct 20 16:52 UNIPROT.14.0.seq.02.phr
+-rw-r--r-- 1 ben ci 2370216 Oct 20 16:52 UNIPROT.14.0.seq.02.pin
+-rw-r--r-- 1 ben ci 103755125 Oct 20 16:52 UNIPROT.14.0.seq.02.psq
+-rw-r--r-- 1 ben ci 208 Oct 20 16:52 UNIPROT.14.0.seq.pal
\end{verbatim}
-I looked at the dock6 documentation for OSG. It looks that it recommends to transfer the datafiles to OSG sites manually via globus-url-copy. By my understanding of how swift works, it should be able to transfer my local files to the selected sites. I have yet to try this and will look more on examples in the data management side of Swift.
+% I looked at the dock6 documentation for OSG. It looks that it recommends to transfer the datafiles to OSG sites manually via globus-url-copy. By my understanding of how swift works, it should be able to transfer my local files to the selected sites. I have yet to try this and will look more on examples in the data management side of Swift.
-Do you know other users who went in this approach? The documentation has only a few examples in managing data. I'll check the swift Wiki later and see what material we have and also post this email/ notes.
+% Do you know other users who went in this approach? The documentation has only a few examples in managing data. I'll check the swift Wiki later and see what material we have and also post this email/ notes.
\subsection{fMRI Application Example}
@@ -1112,6 +1129,80 @@
doall(p);
\end{verbatim}
+\subsection{Satellite image data processing.}
+
+The last example (from a class project) processes data from a large dataset of files that categorize the Earth's surface, from the MODIS sensor instruments that orbit Earth on two NASA satellites of the Earth Observing System.
+The Swift script analyzes the dataset to find the files with the ten
+largest total urban area and then produce a new dataset with viewable
+color images of these top-ten urban data "tiles".
+
+The dataset consists of 317 "tile" files that categorize every
+250-meter square of non-ocean surface of the earth into one of 17
+"land cover" categories, such as water, ice, forest, barren and
+urban. Each pixel of these TIFF-format data files has a value 0-16
+describing one 250-meter square of the earth's surface for a specific
+point in time. Each tile file has 5M pixels, covering a region of 2400
+x 2400 250-meter squares, based on a specific map projection.
+
+The input datasets are not ``viewable'' images because of its pixel
+values, thus requiring the color rendering step above.
+
+\begin{verbatim}
+type file;
+type imagefile;
+type landuse;
+
+app (landuse output) getLandUse (imagefile input, int sortfield)
+{
+ getlanduse @input sortfield stdout=@output ;
+}
+
+app (file output, file tilelist) analyzeLandUse (landuse input[], int usetype, int maxnum)
+{
+ analyzelanduse @output @tilelist usetype maxnum @filenames(input);
+}
+
+app (imagefile output) colormodis (imagefile input)
+{
+ colormodis @input @output;
+}
+
+imagefile geos[]<filesys_mapper; location="/home/wilde/bigdata/data/modis", suffix=".tif">;
+landuse land[]<structured_regexp_mapper; source=geos,match="(h..v..)", transform="\\1.landuse.byfreq">;
+
+# Find the land use of each modis tile
+
+foreach g,i in geos {
+ land[i] = getLandUse(g,1);
+}
+
+# Find the top 10 most urban tiles (by area)
+
+int UsageTypeURBAN=13;
+file bigurban<"topurban.txt">;
+file urbantiles<"urbantiles.txt">;
+(bigurban, urbantiles) = analyzeLandUse(land, UsageTypeURBAN, 10);
+
+# Map the files to an array
+
+string urbanfilenames[] = readData(urbantiles);
+
+imagefile urbanfiles[] <array_mapper;files=urbanfilenames>;
+
+# Create a set of recolored images for just the urban tiles
+
+foreach uf, i in urbanfiles {
+ imagefile recoloredImage <single_file_mapper;
+ file=@strcat(@strcut(urbanfilenames[i],"(h..v..)"),
+ ".recolored.tif")>;
+ recoloredImage = colormodis(uf);
+}
+
+imagefile geos[]<filesys_mapper;
+ location="/ci/modis/2008", suffix=".tif">;
+
+\end{verbatim}
+
\section{Comparison to Other Systems}
\label{Related}
More information about the Swift-commit
mailing list