[Swift-commit] r3905 - text/parco10submission

Sat Jan 8 14:48:46 CST 2011

Author: wilde
Date: 2011-01-08 14:48:45 -0600 (Sat, 08 Jan 2011)
New Revision: 3905

Modified:
   text/parco10submission/paper.tex
Log:
Completed sec 4.2 to describe Glass sim application.

Modified: text/parco10submission/paper.tex
===================================================================

--- text/parco10submission/paper.tex	2011-01-08 15:14:53 UTC (rev 3904)
+++ text/parco10submission/paper.tex	2011-01-08 20:48:45 UTC (rev 3905)
@@ -1234,8 +1234,6 @@
 \\
 \\
 The script is structured as follows:
-\\
-\\
 Lines 1-3 define 3 mapped file types -- {\tt  MODISfile} for the input images, {\tt landuse} for the output of the landuse histogram calculation; and {\tt file} for any other generic file that we don't care to assign a unique type to.
 
 Lines 7-32 define the Swift interface functions for the application programs {\tt getLandUse}, {\tt analyzeLandUse}, {\tt colorMODIS}, {\tt assemble}, and {\tt markMap}.
@@ -1266,7 +1264,7 @@
 For additional visualization, the script assembles a full map of all the input tiles, placed in their proper grid location on the MODIS world map projection, and again marking the selected tiles. Since this operation needs true-color images of every input tiles these are computed---again in \katznote{potentially? as before} parallel---with 317 jobs invoked by the foreach statement at line 76-78. The power of Swift's implicit parallelization is very vividly shown here: since the {\tt colorMODIS} call at line 77 depends only on the input array {\tt geos}, these 317 application invocations are executed in parallel with the initial 317 parallel executions of the {\tt getLandUse} application at line 56.  The script concludes at line 83 by assembling a montage of all the colored tiles and writing this image file to a web-accessible directory for viewing.
 
 \pagebreak
-Swift example 1: MODIS satellite image processing script
+{\bf \small Swift example 1: MODIS satellite image processing script}
 \begin{Verbatim}[fontsize=\scriptsize,frame=single,framesep=2mm,gobble=7, numbers=left]
      1	type file;
      2	type MODIS; type image;
@@ -1355,7 +1353,7 @@
 \end{Verbatim}
 %\end{verbatim}
 
-\subsection{Simulation of glassy dynamics and thermodynamics.}
+\subsection{Simulation of glass cavity dynamics and thermodynamics.}
 
 A recent study of the glass transition in model systems has focused on calculating from theory or simulation what is known as the "Mosaic length".
 
@@ -1363,20 +1361,33 @@
 
 Hocky's application code performs 100,000 Monte-Carlo steps in about 1-2 hours. Ten jobs are used to generate the 1M simulation steps needed for each configuration. The input data to each simulation is a file of about 150KB representing initial glass structures. Each simulation returns three new structures of 150KB each, a 50 KB log file, and a 4K file describing which particles are in the cavity.
 
-Each simulation covers a space of 7 radii by 27 centers by 10 models, requiring 1690 jobs per run. Three methods are simulated (``kalj'', ``kawka'', and  ``pedersenipl'') for total of 90 runs. Swift mappers enable metadata describing these aspects to be encoded in the data files of the campaigns to assist in managing the large volume of file data.
+Each script run covers a simulation space of 7 radii by 27 centers by 10 models, requiring 1690 jobs per run. Three methods are simulated (``kalj'', ``kawka'', and  ``pedersenipl'') for total of 90 runs. Swift mappers enable metadata describing these aspects to be encoded in the data files of the campaigns to assist in managing the large volume of file data.
 
 As the simulation campaigns are quite lengthy (the first ran from October through December 2010) Hocky chose to leverage Swift ``external'' mappers to determine what simulations need to be performed at any point in the campaign. His input mappers assume an application run was complete if all the returned ".final" files exist.  In the case of script restarts, results that already existed were not computed.
 
-Roughly 152,000 jobs defined by all the run*.sh scripts. Some runs were done on other resources including UChicago PADS cluster and TeraGrid resources. The only change necessary to run on OSG was configuring the OSG sites to run the science application.
+Roughly 152,000 jobs are executed in a simulation campaign, defined by a set of parameter files defining molecular radii and centroids, and set set of "run" scripts that perform the execution of the {\tt swift} command with appropriately varying science parameters. Most runs were performed using the "USer Engagement" virtual organization of the Open Science Grid (OSG) \cite{OSG, OSGEngage}. Some runs were done on other resources including University of Chicago ``PADS'' cluster and TeraGrid resources. The only change necessary to run on OSG was configuring the OSG sites to run the science application.
 
-Approximate OSG usage over 100K cpus hours with about 100K tasks of 1-2 hours completed. App has been successfully run on about 18 OG (with the majority of runs have been completed on about 6 primary  sites).
+The approximate OSG usage was over 100K cpus hours with about 100K tasks of 1-2 hours completed. The simulation campaign has been successfully run on about 18 OSG sites, with the majority of runs have been completed on about 6 primary  sites that tend to provide the most compute-hour opportunities for members of the Engagement VO.
 
-Investigations of more advanced techniques are underway, and the fact that the entire campaign can be driven by location-independent Swift scripts will enable Hocky to reliably re-execute the entire campaign with relative ease.
-This project would be completely unwieldy and much harder to organize without using Swift.
+Example 2 shows a slightly reformatted version of the glass simulation script that was in use in Dec. 2010. Its key aspects are as follows.
+Lines 1-3 define the mapped file types; these files are used to compose input and output structures at lines 5-15. (At the moment, the input structure is a degenerate single-file structure, but the user has experimented with various multi-file input structures in prior versions of this script). The output structure reflects the fact that the simulation is restartable in 1-2 hour increments, and works together with the Swift script to create a simple but powerful mechanism for managing checkpoint/restart across a long-running large-scale simulation campaign.
 
-\pagebreak
-Swift example 2: Monte-Carlo simulation of quantum glass structures
+The single application called by this script is the {\tt glassRun} program wrapped in the app function at lines 17-27. Note that rather than defining main program logic in ``open" (top-level) code, the script places all the program login in the function {\tt CreateGlassSystem}, with a single statement in open code at line 82 to invoke it.  This enables the simulation script to be defined in a library which can be imported into other Swift scripts to perform entire campaigns or campaign subsets.
 
+The {\tt CreateGlassSystem} function starts by extracting a large set of science parameters from the Swift command line  at lines 31-44 and 52 using the {\tt @arg()} function. It uses the built-in function {\tt readData} at lines 40-41 to read prepared lists of molecular radii and centroids from parameter files to define the primary physical dimensions of the simulation space.
+A selectable energy function to used by the simulation application is specified as a parameter at line 52.
+
+At lines 54 and 58, the script leverages Swift flexible dynamic arrays to create a 3D array for input and an 4D array of structures for outputs. These data structures, whose leaves consist entirely of mapped files, are set using the external mappers specified for the input array at lines 54-57 and for the output array of structures at 58-61.  Note that many of the science parameters are passed to the mappers, which in turn are used by the input  mapper to locate files within the large multi-level directory structure of the campaign, and by the output mapper to create new directory and file naming conventions for the campaign outputs. The mappers use the common, and useful practice of using scientific metadata to determine directory and file names.
+
+The entire body of the {\tt CreateGlassSystem}  is a four-level nesting of foreach statements at lines 63-79. These perform a parameter sweep over all combinations of radius, centroid, model, and job number within the simulation space. A single run of the script immediately expands to an independent parallel invocation of the simulation application for each point in the space - 1,670 jobs for the minimum case of a 7 x 27 x 10 x 1 space. Note that the if statement at line 69 causes the simulation execution to be skipped if it has already been performed, as determine by a "NULL" file name returned by the mapper for the output of a given job in the simulation space.
+
+The advantages of managing a simulation campaign in this manner are well borne out by Hocky's experience: the expression of the campaign is a well-structured high-level script, devoid of details about file naming, synchronization of parallel tasks, location and state of remote computing resources, or explicit explicit data transfer. Hock was able to leverage local cluster resources on many occasions, but at any time could count on his script acquiring on the order of 1,000 compute cores from 6 to 18 sites of the Open Science Grid. When executing on the OSG, he leveraged the Swift capability to replicate jobs that are waiting in queues at more congested sites, and automatically send them to sites where jobs were moving through the system. All of these capabilities would be a huge distraction from his primary scientific simulation campaign were he to use lower-level abstractions where parallelism and remote distribution were the visible responsibility of the programmer.
+
+Investigations of more advanced glass simulation techniques are underway, and the fact that the entire campaign can be driven by location-independent Swift scripts will enable Hocky to reliably re-execute the entire campaign with relative ease.
+He believes that Swift has made the project much easier to organize and execute. The project would be completely unwieldy without using Swift, and the distraction and scripting/programming effort level of leveraging multiple computing resources would be prohibitive.
+\\
+\\
+{\bf \small Swift example 2: Monte-Carlo simulation of glass cavity dynamics.}
 %\begin{verbatim}
 \begin{Verbatim}[fontsize=\scriptsize,frame=single,framesep=2mm,gobble=7, numbers=left]
      1	type Arc;