[Swift-commit] r3935 - text/parco10submission

Mon Jan 10 12:44:30 CST 2011

Author: dsk
Date: 2011-01-10 12:44:30 -0600 (Mon, 10 Jan 2011)
New Revision: 3935

Modified:
   text/parco10submission/paper.tex
Log:
adding some comments and making some changes in Perf.


Modified: text/parco10submission/paper.tex
===================================================================

--- text/parco10submission/paper.tex	2011-01-10 18:24:39 UTC (rev 3934)
+++ text/parco10submission/paper.tex	2011-01-10 18:44:30 UTC (rev 3935)
@@ -1343,32 +1343,35 @@
 \label{Performance}
 
 We present here a few additional measurements to supplement
-those previously published.
+those previously published. \katznote{need to site something here, maybe \cite{Swift_2007}?}
 
-First, we measure the ability of Swift to support many user tasks on a
-single system image.  In Test A, we used Swift to submit up to 2,000
+First, we measured the ability of Swift to support many user tasks on a
+single system image.  We used Swift to submit up to 2,000
 tasks to Thwomp, a 16-core x86-based Linux compute server at Argonne
 National Laboratory.  Each job in the batch was an identical, simple
 single-processor job that executed for the given duration and
 performed application input and output at 1 byte each.  The total
 execution time was measured.  This was compared to the total amount of
-core-time consumed to report a utilization ratio, which is plotted.
+core-time consumed to report a utilization ratio, which is plotted in Figure~\ref{fig:swift-performance}, case A. 
+\katznote{what knowledge should I gain from the figure? is the data good or bad?  why?}
 
 Second, we measure the ability of Swift to support many tasks on a
 large, distributed memory system without considering the effect on the
-underlying file services.  In Test B, we used Swift/Coasters to submit
+underlying file services.  We used Swift/Coasters to submit
 up to 20,480 tasks to Intrepid, the 40,000-node IBM BlueGene/P system
-at Argonne National Laboratory.  Each job in the batch was an
+at Argonne.  Each job in the batch was an
 identical, simple single-processor job that executed for the given
 duration and performed no I/O.  Each node was limited to one
 concurrent job, thus, the user task had 4 cores at its disposal.  The
 total execution time was measured.  This was compared to the total
 amount of node-time consumed to report a utilization ratio, which is
-plotted.
+plotted in Figure~\ref{fig:swift-performance}, case B.
+\katznote{what knowledge should I gain from the figure? is the data good or bad?  why?}
 
+
 Third, we measure the ability of Swift to support many tasks on a
 large, distributed memory system including application use of the
-underlying GPFS filesystem.  In Test C, we used Swift/Coasters to
+underlying GPFS filesystem.  We used Swift/Coasters to
 submit up to 10,240 tasks to Intrepid.  Each job in the batch was an
 identical, simple single-processor job that executed for 30 seconds
 and performed the given amount of input and output.  Coasters provider
@@ -1378,7 +1381,8 @@
 the user task had 4 cores at its disposal.  The total execution time
 was measured.  This was compared to the total amount of time consumed
 by an equivalent shell script-based application to report an
-efficiency ratio, which is plotted.
+efficiency ratio, which is plotted in Figure~\ref{fig:swift-performance}, case C.
+\katznote{what knowledge should I gain from the figure? is the data good or bad?  why?}
 
 The Test C shell script was provided with all job specifications in
 advance and did not require communication from between the worker
@@ -1411,7 +1415,7 @@
       & \\
     \end{tabular}
   }
-    \caption{Swift performance figures.}
+    \caption{Swift performance figures.\label{fig:swift-performance}}
   \end{center}
 \end{figure}
 
@@ -1419,9 +1423,10 @@
 \subsection{Prior performance measures}
 \mikenote{Remove above caption}
 
-Published measurements of Swift performance provide evidence that its parallel distributed programming model can be implemented with sufficient scalability and efficiency to make it a practical tool for large-scale parallel application scripting.
+Published measurements of Swift performance
+provide evidence that its parallel distributed programming model can be implemented with sufficient scalability and efficiency to make it a practical tool for large-scale parallel application scripting.
 
-The performance of Swift, submitting jobs over the wide area network from UChicago to the TeraGrid Ranger cluster at TACC, as published in \cite{CNARI_2009}are shown in figure \ref{SEMplots}, which shows an SEM workload of 131,072 jobs for 4 brain regions and two experimental conditions. This workflow completed in approximately 3 hours.  The logs from the {\tt swift\_plot\_log} utility show the high degree of concurrent overlap between job execution and input and output file staging to remote computing resources. 
+The performance of Swift submitting jobs over the wide area network from UChicago to the TeraGrid Ranger cluster at TACC are shown in Figure~\ref{SEMplots} (from \cite{CNARI_2009}), which shows an SEM workload of 131,072 jobs for 4 brain regions and two experimental conditions. This workflow completed in approximately 3 hours.  The logs from the {\tt swift\_plot\_log} utility show the high degree of concurrent overlap between job execution and input and output file staging to remote computing resources. 
 The workflows were developed on and submitted (to Ranger) from a single-core Linux workstation at UChicago running an Intel¨ Xeonª 3.20 GHz CPU. Data staging was performed using the Globus GridFTP protocol and job execution was performed over the Globus GRAM 2 protocol.
 During the third hour of the workflow, Swift achieved very high utilization of the 2,048 allocated processor cores and a steady rate of input and output transfers. The first two hours of the run were more bursty, due to fluctuating grid conditions and data server loads.
 
@@ -1438,11 +1443,11 @@
   \end{center}
 \end{figure}
 
-Prior work also showed Swift's ability to achieve ample task rates for local and remote submission to high performance clusters\cite{PetascaleScripting_2009}. These prior results are shown in figure \ref{TaskPlots}.
+Prior work also showed Swift's ability to achieve ample task rates for local and remote submission to high performance clusters. These prior results are shown in Figure~\ref{TaskPlots} (from~\cite{PetascaleScripting_2009}).
 
-The left figure shows the PTMap application running  the stage 1 processing of the E.coli K12 genome (4,127 sequences) on 2,048 Intrepid cores. The lower plot shows processor utilization as time progresses; Overall, the average per task execution time was 64 seconds, with a standard deviation of 14 seconds. These 4,127 tasks consumed a total of 73 CPU hours, in a span of 161 seconds on 2,048 processor cores, achieving 80 percent utilization.
+Figure~\ref{TaskPlots} left shows the PTMap application running the stage 1 processing of the E.coli K12 genome (4,127 sequences) on 2,048 Intrepid cores. The lower plot shows processor utilization as time progresses; Overall, the average per task execution time was 64 seconds, with a standard deviation of 14 seconds. These 4,127 tasks consumed a total of 73 CPU hours, in a span of 161 seconds on 2,048 processor cores, achieving 80 percent utilization.
 
-The right figure below shows performance of Swift running structural equation modeling problem at large scale using on the Ranger Constellation to model neural pathway connectivity from experimental fMRI data\cite{CNARI_2009}. The left figure shows the active jobs for a larger version of the problem type shown in figure \ref{SEMplots}.  This shows an  SEM script executing ~ 418,000 jobs. The red line represents job execution on Ranger; 
+Figure~\ref{TaskPlots} right shows performance of Swift running structural equation modeling problem at large scale using on the Ranger Constellation to model neural pathway connectivity from experimental fMRI data\cite{CNARI_2009}. The left \katznote{lower?} figure shows the active jobs for a larger version of the problem type shown in Figure~\ref{SEMplots}.  This shows an  SEM script executing ~ 418,000 jobs. The red line represents job execution on Ranger; 
 
 \begin{figure}
   \begin{center}