[Swift-commit] r3377 - text/parco10submission

noreply at svn.ci.uchicago.edu noreply at svn.ci.uchicago.edu
Wed Jun 16 00:23:16 CDT 2010


Author: wilde
Date: 2010-06-16 00:23:16 -0500 (Wed, 16 Jun 2010)
New Revision: 3377

Modified:
   text/parco10submission/paper.tex
Log:
added sem scripts; commented out blast for now (there was no paralelism); adjusted dock example.

Modified: text/parco10submission/paper.tex
===================================================================
--- text/parco10submission/paper.tex	2010-06-16 05:09:02 UTC (rev 3376)
+++ text/parco10submission/paper.tex	2010-06-16 05:23:16 UTC (rev 3377)
@@ -889,54 +889,54 @@
 We describe in this section a few representative Swift applications
 from various diverse disciplines.
 
-\subsection{BLAST Application Example}
+% \subsection{BLAST Application Example}
 
-% The following is notes from the Wiki by Allan: needs much refinement, adding here as a placeholder.
+% % The following is notes from the Wiki by Allan: needs much refinement, adding here as a placeholder.
 
-\begin{verbatim}
-type database;
-type query;
-type output;
-type error;
+% \begin{verbatim}
+% type database;
+% type query;
+% type output;
+% type error;
 
-app (output out, error err) blastall(query i, database db) {
-  blastall "-p" "blastp" "-F" "F"
-           "-d" @filename(db) "-i" @filename(i)
-           "-v" "300" "-b" "300" "-m8"
-           "-o" @filename(out) stderr=@filename(err);
-}
+% app (output out, error err) blastall(query i, database db) {
+%   blastall "-p" "blastp" "-F" "F"
+%            "-d" @filename(db) "-i" @filename(i)
+%            "-v" "300" "-b" "300" "-m8"
+%            "-o" @filename(out) stderr=@filename(err);
+% }
 
-database pir <simple_mapper;prefix="/ci/pir/UNIPROT.14.0.seq">;
+% database pir <simple_mapper;prefix="/ci/pir/UNIPROT.14.0.seq">;
 
-query  i   <"test.in">; 
-output out <"test.out">;
-error  err <"test.err">;
+% query  i   <"test.in">; 
+% output out <"test.out">;
+% error  err <"test.err">;
 
-(out,err) = blastall(i, pir);
-\end{verbatim}
+% (out,err) = blastall(i, pir);
+% \end{verbatim}
 
-The application {\tt \small blastall} expects the prefix of the database files that it will read (.phr, .seq and .pin files).
-This example employs a dummy file called {\tt \small
-  UNIPROT.14.0.seq} to satisfy the data dependency. When executed,
-the Swift script processes the following input directory {\tt\small /ci/pir}:
+% The application {\tt \small blastall} expects the prefix of the database files that it will read (.phr, .seq and .pin files).
+% This example employs a dummy file called {\tt \small
+%   UNIPROT.14.0.seq} to satisfy the data dependency. When executed,
+% the Swift script processes the following input directory {\tt\small /ci/pir}:
 
-\begin{verbatim}
--rw-r--r--  1 ben ci         0 Nov 15 13:49 UNIPROT.14.0.seq
--rw-r--r--  1 ben ci 204106872 Oct 20 16:50 UNIPROT.14.0.seq.00.phr
--rw-r--r--  1 ben ci  23001752 Oct 20 16:50 UNIPROT.14.0.seq.00.pin
--rw-r--r--  1 ben ci 999999669 Oct 20 16:51 UNIPROT.14.0.seq.00.psq
--rw-r--r--  1 ben ci 233680738 Oct 20 16:51 UNIPROT.14.0.seq.01.phr
--rw-r--r--  1 ben ci  26330312 Oct 20 16:51 UNIPROT.14.0.seq.01.pin
--rw-r--r--  1 ben ci 999999864 Oct 20 16:52 UNIPROT.14.0.seq.01.psq
--rw-r--r--  1 ben ci  21034886 Oct 20 16:52 UNIPROT.14.0.seq.02.phr
--rw-r--r--  1 ben ci   2370216 Oct 20 16:52 UNIPROT.14.0.seq.02.pin
--rw-r--r--  1 ben ci 103755125 Oct 20 16:52 UNIPROT.14.0.seq.02.psq
--rw-r--r--  1 ben ci       208 Oct 20 16:52 UNIPROT.14.0.seq.pal
-\end{verbatim}
+% \begin{verbatim}
+% -rw-r--r--  1 ben ci         0 Nov 15 13:49 UNIPROT.14.0.seq
+% -rw-r--r--  1 ben ci 204106872 Oct 20 16:50 UNIPROT.14.0.seq.00.phr
+% -rw-r--r--  1 ben ci  23001752 Oct 20 16:50 UNIPROT.14.0.seq.00.pin
+% -rw-r--r--  1 ben ci 999999669 Oct 20 16:51 UNIPROT.14.0.seq.00.psq
+% -rw-r--r--  1 ben ci 233680738 Oct 20 16:51 UNIPROT.14.0.seq.01.phr
+% -rw-r--r--  1 ben ci  26330312 Oct 20 16:51 UNIPROT.14.0.seq.01.pin
+% -rw-r--r--  1 ben ci 999999864 Oct 20 16:52 UNIPROT.14.0.seq.01.psq
+% -rw-r--r--  1 ben ci  21034886 Oct 20 16:52 UNIPROT.14.0.seq.02.phr
+% -rw-r--r--  1 ben ci   2370216 Oct 20 16:52 UNIPROT.14.0.seq.02.pin
+% -rw-r--r--  1 ben ci 103755125 Oct 20 16:52 UNIPROT.14.0.seq.02.psq
+% -rw-r--r--  1 ben ci       208 Oct 20 16:52 UNIPROT.14.0.seq.pal
+% \end{verbatim}
 
-% I looked at the dock6 documentation for OSG. It looks that it recommends to transfer the datafiles to OSG sites manually via globus-url-copy. By my understanding of how swift works, it should be able to transfer my local files to the selected sites. I have yet to try this and will look more on examples in the data management side of Swift.
+% % I looked at the dock6 documentation for OSG. It looks that it recommends to transfer the datafiles to OSG sites manually via globus-url-copy. By my understanding of how swift works, it should be able to transfer my local files to the selected sites. I have yet to try this and will look more on examples in the data management side of Swift.
 
-% Do you know other users who went in this approach? The documentation has only a few examples in managing data. I'll check the swift Wiki later and see what material we have and also post this email/ notes.
+% % Do you know other users who went in this approach? The documentation has only a few examples in managing data. I'll check the swift Wiki later and see what material we have and also post this email/ notes.
 
 \subsection{fMRI Application Example}
 
@@ -1040,6 +1040,8 @@
 
 \subsection{Structural Equation Modeling using OpenMx}
 
+% \cite{OpenMx}
+
 OpenMx is an R library designed for structural equation modeling (SEM),
 a technique currently used in the neuroimaging field to examine
 connectivity between brain areas.
@@ -1084,49 +1086,99 @@
 connection weights (or strength of the relationships between anatomical
 regions) can be explored based on the fit of each model.
 
-modgenproc.swift is used to submit each of the necessary computation
+%modgenproc.swift
+A Swift script is used to submit each of the necessary computation
 components to TeraGrid's Ranger cluster: a) the model object b) the
 covariance matrix derived from the database and c) the R script which
-makes the call to OpenMx. Once the job is assigned to a node, OpenMx?
+makes the call to OpenMx. Once the job is assigned to a node, OpenMx
 estimates weight parameters for each connection within the given model
 that results in a model covariance closest to the observed covariance of
 the data. Each of these compute jobs returns its solution model object
 as well as a file containing the minimum value achieved from that model.
 The processing of these models on Ranger was achieved in <45 minutes.
 
+A model generator was developed for the OpenMx package and is designed
+explicitly to enable parallel execution of exhaustive or partially
+pruned sets of model objects. Given an n x n covariance matrix it can
+generate the entire set of possible models with anywhere from 0 to n2
+connections; however, it can also take as input a single index from
+that set and it will generate and run a single model. What this means
+in the context of workflow design is that the generator can be
+controlled (and parallelized) easily by a Swift script. For example,
+using Swift as the interface to OpenMx we have these few lines of
+code:
+
+Script 1: 4-region exhaustive SEM for a single experimental condition:
+
+\begin{verbatim}
+
+1.	app (mxModel min) mxModelProcessor(file covMatrix, Rscript mxModProc, int modnum, float initweight, string cond){
+2.	{
+3.	    RInvoke @filename(mxModProc) @filename(covMatrix) modnum initweight cond;
+4.	}
+5.	file covMatrix<single_file_mapper;file="speech.cov">;
+6.	Rscript mxScript<single_file_mapper;file="singlemodels.R">;
+7.	int totalperms[] = [1:65536];
+8.	float initweight = .5; 
+9.	foreach perm in totalperms{ 
+10.	   mxModel modmin<single_file_mapper; file=@strcat(perm,".rdata")>; 
+11.	   modmin = mxModelProcessor(covMatrix, mxScript, perm, initweight, “speech”); 
+12.	} 
+\end{verbatim}
+
+First, a covariance matrix containing activation data for 4 brain regions, over 8 time points, averaged over a group of subjects in the speech condition was drawn from the experiment database and its location (in this example, on the local file system, though the file could be located anywhere) is mapped in line 5. Line 6 maps the R processing script and lines 1 through 4 define the atomic procedure for invoking R. Each iteration of the foreach loop maps its optimized model output file and calls mxModelProcessor() with the necessary parameters to generate and run a model. Each of these invocations of mxModelProcessor() is independent and is submitted for processing in parallel. Swift passes 5 variables for each invocation: (1) the covariance matrix; (2) the R script containing the call to OpenMx; (3) the permutation number, i.e., the index of the model; (4) the initialization weight for the free parameters of the given model; and (5) the experimental condition. Clearly, in t
 his workflow all free parameters of the given model will have the same initialization weight as Swift is passing only one weight variable. When the job reaches a worker node on Ranger an R process is initialized, the generator creates the desired model by calculating where in the array that permutation of the model matrix falls. OpenMx then estimates the model parameters using a non-linear optimization algorithm called NPSOL (Gill, 1986) and the optimized model is returned and written out by Swift to the location specified in its mapping on line 10.
+
+The above script completed in approximately 40 minutes. The script can
+then be altered to run over multiple experimental conditions by adding
+another outer loop: 
+
+Script 2: 4-region exhaustive SEM for 2 experimental conditions
+
+\begin{verbatim}
+1.	string conditions[] = ["emblem", "speech"];
+2.	int totalperms[] = [1:65536];
+3.	float initweight = .5;
+4.	foreach cond in conditions{
+5.	  foreach perm in totalperms{
+6.	   file covMatrix<single_file_mapper;file=@strcat(cond,".cov")>;
+7.	   mxModel modmin<single_file_mapper;file=@strcat(cond,perm,".rdata")>;
+8.	   modmin= mxModelProcessor(covMatrix, mxScript, perm, initweight, cond);
+9.	}
+\end{verbatim}
+
+When the outer loop is added, the new workflow consists of 131,072 jobs since we are now running the entire set for two conditions. This workflow completed in approximately 2 hours
+
 \subsection{Molecular Dynamics with DOCK}
 
 \begin{verbatim}
-(file t,DockOut tarout) dockcompute (DockIn infile, string targetlist) {
-  app {
-    rundock @infile targetlist stdout=@filename(t) @tarout;
-    }
+
+app (file t,DockOut tarout) dock (DockIn infile, string targetlist) {
+   dock6 @infile targetlist stdout=@filename(t) @tarout;
 }
 
 type params {
-      string  ligandsfile;
-      string  targetlist;
+      string  ligands;
+      string  targets;
 }
 
-#params pset[] <csv_mapper;file="paramslist.txt">;
-doall(params pset[])
+params pset[] <csv_mapper;file="params.txt">;
+
+runDocks(params pset[])
 {
   foreach params,i in pset {
-  DockIn infile < single_file_mapper; file=@strcat("/home/houzx/dock-
-run/databases/KEGG_and_Drugs/",pset[i].ligandsfile)>;
-  file sout <single_file_mapper; file=@strcat("/home/houzx/dock-
-run/databases/results/stdout/",pset[i].targetlist,"-",i,"-stdout.txt")>;
-  DockOut tout <single_file_mapper; file=@strcat(pset[i].ligandsfile,"-result.tar.gz")>;
-#  DockOut tout <"result.tar.gz">;
-#  sout =  dockcompute(infile,pset[i].targetlist);
-  (sout,tout) =  dockcompute(infile,pset[i].targetlist);
-
+    DockIn infile < single_file_mapper;
+           file=@strcat("/ci/dock/db/KEGGDrugs/",pset[i].ligands)>;
+    file sout <single_file_mapper;
+           file=@strcat("/ci/dock/results/",pset[i].targes,".",i)>;
+    DockOut docking <single_file_mapper;
+           file=@strcat(pset[i].ligands,".result.tar.gz")>;
+    (sout,docking) = dock(infile,pset[i].targetlist);
  }
 }
 
 params p[];
 p = readdata("paramslist.txt");
-doall(p);
+runDocks(p);
 \end{verbatim}
 
 \subsection{Satellite image data processing.}




More information about the Swift-commit mailing list