[Swift-commit] r3364 - text/parco10submission

Tue Jun 15 15:38:21 CDT 2010

Author: wozniak
Date: 2010-06-15 15:38:21 -0500 (Tue, 15 Jun 2010)
New Revision: 3364

Modified:
   text/parco10submission/paper.tex
Log:
Drop "Usage Experience" section, move key points to "Execution" section


Modified: text/parco10submission/paper.tex
===================================================================

--- text/parco10submission/paper.tex	2010-06-15 20:30:32 UTC (rev 3363)
+++ text/parco10submission/paper.tex	2010-06-15 20:38:21 UTC (rev 3364)
@@ -849,8 +849,44 @@
 forwardref collective IO section if that gets written, or include that
 entire section here?
 
-\section{Example applications}
+\subsection{Features to support use on dynamic resources}
 
+Using Swift to submit to a large number of sites poses a number of
+practical challenges that are not encountered when running on a small
+number of sites. These challenges are seen when comparing execution on
+the TeraGrid\cite{TERAGRID} with execution on the Open Science
+Grid\cite{OSG}. The set of sites which may be used is large and
+changing. It is impractical to maintain a site catalog by hand in this
+situation. In collaboration with the OSG Engagement group, Swift was
+interfaced to ReSS\cite{ReSS} so that the site catalog is generated
+from that information system. This provides a very straightforward way
+to generate a large catalog of 'fairly likely to work' sites.
+
+Having discovered those sites, two significant problems remain: the
+quality of those sites varies wildly; and user applications are not
+installed on those sites. Individual OSG sites exhibit extremely
+different behaviour, both with respect to other sites at the same
+time, and with respect to themselves at other times. This is hard to
+describe statically. Firstly, the load that a particular site will
+bear varies over time. Secondly, some sites fail in unusual fashion.
+Swift's site scoring mechanism deals well with this in the majority of
+cases. However, continued discovery of unusual failure modes drives
+the implementation of ever more fault tolerance mechanisms.
+
+When running jobs on dynamically discovered sites, it is likely that
+component programs are not installed on those sites. OSG Engagement
+has developed best practices to deal with this, which are implemented
+straightforwardly in Swift. Applications may be compiled statically
+and deployed as a small number of self contained files as part of the
+input for a component program execution; in this case, the application
+files are described as mapped input files in the same way as input
+data files, and are passed as a parameter to the application
+executable. Swift's existing input file management then handles
+once-per-site-per run staging in of the application files, without
+change.
+
+\section{Applications}
+
 TODO: two or three applications in brief. discuss both the application
 behaviour in relation to Swift, but underlying grid behaviour in
 relation to Swift
@@ -971,20 +1007,20 @@
 a series of procedure calls, using variables to establish
 data dependencies.
 
-In the example, reslice\_wf defines a four-step
-pipeline computation, using variables to establish
-data dependencies. It applies reorientRun to a run first in the x axis
-and then in the y axis, and then aligns each image in the resulting
-run with the first image. The program alignlinear determines how to
-spatially adjust an image to match a reference image, and produces an
-air parameter file. The actual alignment is done by the program
-reslice. Note that variable yR, being the output of the first step and
-the input of the second step, defines the data dependencies between
-the two steps. The pipeline is illustrated in the center of figure \ref{FMRIFigure2}, while in figure \ref{FMRIgraph} we show the expanded graph for a 20-volume
-run. Each volume comprises an image file and a header file, so there
-are a total of 40 input files and 40 output files. We can also apply
-the same procedure to a run containing hundreds or thousands of
-volumes.
+In the example, reslice\_wf defines a four-step pipeline computation,
+using variables to establish data dependencies. It applies reorientRun
+to a run first in the x axis and then in the y axis, and then aligns
+each image in the resulting run with the first image. The program
+alignlinear determines how to spatially adjust an image to match a
+reference image, and produces an air parameter file. The actual
+alignment is done by the program reslice. Note that variable yR, being
+the output of the first step and the input of the second step, defines
+the data dependencies between the two steps. The pipeline is
+illustrated in the center of Figure~\ref{FMRIFigure2}, while in figure
+\ref{FMRIgraph} we show the expanded graph for a 20-volume run. Each
+volume comprises an image file and a header file, so there are a total
+of 40 input files and 40 output files. We can also apply the same
+procedure to a run containing hundreds or thousands of volumes.
 
 In this example we show the details of the procedure reorientRun,
 which is also a compound procedure.
@@ -1096,73 +1132,6 @@
 doall(p);
 \end{verbatim}
 
-\section{Usage Experience}
-
-\subsection{Use on large numbers of sites in the Open Science Grid}
-
-TODO: get Mats to comment on this section...?
-
-Using Swift to submit to a large number of sites poses a number of
-practical challenges that are not encountered when running on a small
-number of sites. These challenges are seen when comparing execution on
-the TeraGrid\cite{TERAGRID} with execution on the Open Science
-Grid\cite{OSG}.
-
-The set of sites which may be used is large and changing. It is
-impractical to maintain a site catalog by hand in this situation.
-In collaboration with the OSG Engagement group, Swift was interfaced to
-ReSS\cite{ReSS} so that the site catalog is generated from that information
-system. This provides a very straightforward way to generate a large catalog
-of 'fairly likely to work' sites.
-
-Having discovered those sites, two significant problems remain: the
-quality of those sites varies wildly; and user applications are not
-installed on those sites.
-
-Individual OSG sites exhibit extremely different behaviour, both with
-respect to other sites at the same time, and with respect to themselves
-at other times. This is hard to describe statically. Firstly, the
-load that a particular site will bear varies over time. Secondly, some
-sites fail in unusual fashion.
-
-Swift's site scoring mechanism deals well with this in the majority of
-cases. However, continued discovery of unusual failure modes drives
-the implementation of ever more fault tolerance mechanisms.
-
-\subsection{Automating Application Deployment}
-
-When running jobs on dynamically discovered sites, it is likely that
-component programs are not installed on those sites.
-
-OSG Engagement has developed best practices to deal with this, which
-are implemented straightforwardly in Swift. Applications may be compiled
-statically and deployed as a small number of self contained files as part
-of the input for a component program execution; in this case, the
-application files are described as mapped input files in the same way
-as input data files, and are passed as a parameter to the application
-executable. Swift's existing input file management then handles
-once-per-site-per run staging in of the application files, without change.
-
-\begin{verbatim}
-  // NEED TO MAKE THIS A BETTER CODE SAMPLE
-  // ... I JUST MADE IT UP FROM MEMORY OF
-  // BEING AT RENCI
-
-  app (file o) myapp_inner (file i, file exe) {
-    sh "appexecutable" @i @o;
-  }
-
-  (file o) myapp (file i) {
-     file appexe <"appexecutable">;
-     o = myapp_inner(i);
-  }
-\end{verbatim}
-
-TODO: Zhengxiong Hou has also done stuff about application
-stagein - this could be mentioned (see Zhengxiong's email and paper)
-
-TODO: what's the conclusion (if any) of this section?
-
 \section{Future work}
 
 \subsection{Automatic characterisation of site and application behaviour}