From noreply at svn.ci.uchicago.edu Thu Jan 1 05:36:26 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Thu, 1 Jan 2009 05:36:26 -0600 (CST) Subject: [Swift-commit] r2386 - text/hpdc09submission Message-ID: <20090101113626.512ED22810E@www.ci.uchicago.edu> Author: benc Date: 2009-01-01 05:36:24 -0600 (Thu, 01 Jan 2009) New Revision: 2386 Modified: text/hpdc09submission/paper.latex Log: a few more bits of rambling Modified: text/hpdc09submission/paper.latex =================================================================== --- text/hpdc09submission/paper.latex 2008-12-30 15:43:21 UTC (rev 2385) +++ text/hpdc09submission/paper.latex 2009-01-01 11:36:24 UTC (rev 2386) @@ -454,6 +454,7 @@ \hline \end{tabular} \caption{SwiftScript built-in mappers} +\label{mappertable} \end{table} \subsection{The execution environment for component programs} @@ -513,8 +514,21 @@ value of that variable has not yet been computed, the filename where that value will go is already known. +TODO comment (here?) about how this model appears somewhat constrained +but provides a well defined atomicity that can be used for various +reliability mechanisms, site portability, on-site efficiency tuning. +multi-site and reliabilty are already discussed; but the on-site +efficiency tuning (eg using GPFS and laying out files in a way that is +sympathetic to that, potentially using Collective IO fs, and using a +workernode local filesystem) - that discussion could go into the +'executing efficiently' section, or a different 'executing efficiently' +section (change titles...) + \section{Execution} +TODO could briefly describe the execution layer here? how much depth is +interesting? + \subsection{Executing on a remote site} With the above restrictions, execution of a unix program on a remote @@ -547,10 +561,20 @@ the submitting system to the remote system; and with GRAM\cite{GRAM} and a local resource manager (LRM) providing an execution mechanism. -Parameterisation of sites occurs in the \emph{site catalog}, an example -of which is: +Sites are defined in the \emph{site catalog}, which contains descriptions +of each site: -TODO site catalog example here + \begin{verbatim} + + + + + /home/benc/swifttest + + + \end{verbatim} This file may be constructed by hand or mechanically from some pre-existing database (such as a grid's existing discovery system). @@ -628,7 +652,7 @@ When any of those jobs begins executing, all other replicas will be cancelled. -\subsection{Executing efficiently} +\subsection{Avoiding job submission penalties} In many applications, the overhead of job submission through commonly available mechanisms, such as through GRAM into an LRM, can dominate @@ -680,6 +704,20 @@ way to talk about the 'underlying submission system, that underlies coasters and clustering' ? +\subsection{Avoiding filesystem inefficiency} + +When running a large number of jobs on a site at once, access to the +shared filesystem on that site can be a bottleneck. + +On large systems, the shared file system is commonly provided by +GPFS\cite{GPFS}. This can scale well but when used na\"ively can +exhibit pathological behaviour. Early versions of Swift triggered this +behaviour by targetting too much file system activity at a single +working directory, so that GPFS lock contention came to dominate execution +time. + +TODO more... + \section{Example applications} TODO: two or three applications in brief. discuss both the application @@ -792,6 +830,9 @@ TODO reference the VDC from VDS\cite{VDS} +TODO write about the stuff on provenance db that I did before - that whole +document of notes... + \subsection{Workflow GUIs as generators of SwiftScript programs - LONI Pipeline} @@ -846,8 +887,8 @@ \subsection{Debugging} TODO: debugging of distributed system - can have a non-futures section -on what is available now, as well as mentioning CEDPS\cite{CEDPS} as -somewhat promising(?) for the future. +on what is available now - logprocessing module, as well as +mentioning CEDPS\cite{CEDPS} as somewhat promising(?) for the future. \section{Implementation status} @@ -866,7 +907,7 @@ Reference Swift as a follow-on project to VDL in VDS; how does XDTM fit into this? Is it of any interest other than as part of the - project history? + project history? And is history of this project interesting? maybe so... Acknowledgement of all developers names? @@ -926,7 +967,14 @@ \bibitem{MAPREDUCE} - mapreduce +\bibitem{TERAGRID} - teragrid +\bibitem{OSG} - open science grid + +\bibitem{ReSS} - ress + +\bibitem{GPFS} - GPFS + \end{thebibliography} \end{document} From noreply at svn.ci.uchicago.edu Mon Jan 5 02:49:53 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Mon, 5 Jan 2009 02:49:53 -0600 (CST) Subject: [Swift-commit] r2387 - trunk/tests/language Message-ID: <20090105084954.11BF622810F@www.ci.uchicago.edu> Author: benc Date: 2009-01-05 02:49:51 -0600 (Mon, 05 Jan 2009) New Revision: 2387 Modified: trunk/tests/language/run Log: base language test can now be given a specific test case to run on the command line Modified: trunk/tests/language/run =================================================================== --- trunk/tests/language/run 2009-01-01 11:36:24 UTC (rev 2386) +++ trunk/tests/language/run 2009-01-05 08:49:51 UTC (rev 2387) @@ -9,11 +9,18 @@ if [ "X$1" != "X-resume" ]; then echo Cleaning previous test results rm *.?ml + if [ "X$1" != "X" ] ; then + export LIST=$1 + else + export LIST="*.swift" + fi else echo Resuming previous test run + export LIST="*.swift" fi -for a in *.swift; do + +for a in $LIST; do b=$(basename $a .swift).xml if [ ! -f $b ]; then echo -n "known-good $a: " From noreply at svn.ci.uchicago.edu Mon Jan 5 03:55:50 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Mon, 5 Jan 2009 03:55:50 -0600 (CST) Subject: [Swift-commit] r2388 - in trunk/src/org/griphyn/vdl/karajan: . lib Message-ID: <20090105095550.BC4192281D0@www.ci.uchicago.edu> Author: benc Date: 2009-01-05 03:55:50 -0600 (Mon, 05 Jan 2009) New Revision: 2388 Modified: trunk/src/org/griphyn/vdl/karajan/WrapperMap.java trunk/src/org/griphyn/vdl/karajan/lib/VDLFunction.java Log: some asserts about lock behaviour Modified: trunk/src/org/griphyn/vdl/karajan/WrapperMap.java =================================================================== --- trunk/src/org/griphyn/vdl/karajan/WrapperMap.java 2009-01-05 08:49:51 UTC (rev 2387) +++ trunk/src/org/griphyn/vdl/karajan/WrapperMap.java 2009-01-05 09:55:50 UTC (rev 2388) @@ -57,6 +57,7 @@ map.put(handle, fw = new FutureWrappers()); } if (fw.nodeWrapper == null) { + assert Thread.holdsLock(handle.getRoot()); // TODO should be on root or on handle? fw.nodeWrapper = new DSHandleFutureWrapper(handle); } return fw.nodeWrapper; @@ -68,6 +69,7 @@ map.put(handle, fw = new FutureWrappers()); } if (fw.arrayWrapper == null) { + assert Thread.holdsLock(handle.getRoot()); // TODO should be on root or on handle? fw.arrayWrapper = new ArrayIndexFutureList(handle, value); } return fw.arrayWrapper; Modified: trunk/src/org/griphyn/vdl/karajan/lib/VDLFunction.java =================================================================== --- trunk/src/org/griphyn/vdl/karajan/lib/VDLFunction.java 2009-01-05 08:49:51 UTC (rev 2387) +++ trunk/src/org/griphyn/vdl/karajan/lib/VDLFunction.java 2009-01-05 09:55:50 UTC (rev 2388) @@ -187,6 +187,7 @@ /** The caller is expected to have synchronized on the root of var. */ public String[] filename(DSHandle var) throws ExecutionException, HandleOpenException { + assert Thread.holdsLock(var.getRoot()); try { if (var.getType().isArray()) { return leavesFileNames(var); From noreply at svn.ci.uchicago.edu Mon Jan 5 04:28:48 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Mon, 5 Jan 2009 04:28:48 -0600 (CST) Subject: [Swift-commit] r2389 - trunk/src/org/griphyn/vdl/karajan/lib Message-ID: <20090105102848.6162822810E@www.ci.uchicago.edu> Author: benc Date: 2009-01-05 04:28:47 -0600 (Mon, 05 Jan 2009) New Revision: 2389 Modified: trunk/src/org/griphyn/vdl/karajan/lib/GetArrayIterator.java Log: lock in GetArrayIterator Modified: trunk/src/org/griphyn/vdl/karajan/lib/GetArrayIterator.java =================================================================== --- trunk/src/org/griphyn/vdl/karajan/lib/GetArrayIterator.java 2009-01-05 09:55:50 UTC (rev 2388) +++ trunk/src/org/griphyn/vdl/karajan/lib/GetArrayIterator.java 2009-01-05 10:28:47 UTC (rev 2389) @@ -42,7 +42,9 @@ return new PairIterator(value); } else { - return addFutureListListener(stack, var, value); + synchronized(var.getRoot()) { + return addFutureListListener(stack, var, value); + } } } else { throw new RuntimeException("Cannot get array iterator for non-array"); From noreply at svn.ci.uchicago.edu Mon Jan 5 04:29:54 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Mon, 5 Jan 2009 04:29:54 -0600 (CST) Subject: [Swift-commit] r2390 - text/hpdc09submission Message-ID: <20090105102954.0340922810E@www.ci.uchicago.edu> Author: benc Date: 2009-01-05 04:29:53 -0600 (Mon, 05 Jan 2009) New Revision: 2390 Modified: text/hpdc09submission/paper.latex Log: another TODO Modified: text/hpdc09submission/paper.latex =================================================================== --- text/hpdc09submission/paper.latex 2009-01-05 10:28:47 UTC (rev 2389) +++ text/hpdc09submission/paper.latex 2009-01-05 10:29:53 UTC (rev 2390) @@ -716,7 +716,9 @@ working directory, so that GPFS lock contention came to dominate execution time. -TODO more... +TODO more... - work done on arranging things in fs; presumably can +forwardref collective IO section if that gets written, or include that +entire section here? \section{Example applications} From noreply at svn.ci.uchicago.edu Mon Jan 5 07:45:59 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Mon, 5 Jan 2009 07:45:59 -0600 (CST) Subject: [Swift-commit] r2391 - trunk/src/org/griphyn/vdl/karajan/lib Message-ID: <20090105134559.28C7F22810E@www.ci.uchicago.edu> Author: benc Date: 2009-01-05 07:45:58 -0600 (Mon, 05 Jan 2009) New Revision: 2391 Modified: trunk/src/org/griphyn/vdl/karajan/lib/VDLFunction.java Log: change locking on closes to be on the root of a datanode, rather than on the data node itself Modified: trunk/src/org/griphyn/vdl/karajan/lib/VDLFunction.java =================================================================== --- trunk/src/org/griphyn/vdl/karajan/lib/VDLFunction.java 2009-01-05 10:29:53 UTC (rev 2390) +++ trunk/src/org/griphyn/vdl/karajan/lib/VDLFunction.java 2009-01-05 13:45:58 UTC (rev 2391) @@ -405,7 +405,7 @@ WrapperMap hash = getFutureWrapperMap(stack); // Close the future boolean closed; - synchronized(handle) { + synchronized(handle.getRoot()) { closed = handle.isClosed(); if (!closed) { handle.closeShallow(); @@ -434,28 +434,33 @@ // Also mark all arrays from root Path fullPath = handle.getPathFromRoot(); DSHandle root = handle.getRoot(); - for (int i = 0; i < fullPath.size(); i++) { - if (fullPath.isArrayIndex(i)) { - try { - markAsAvailable(stack, root.getField(fullPath.subPath(0, i)), - fullPath.getElement(i)); + synchronized(root) { + for (int i = 0; i < fullPath.size(); i++) { + if (fullPath.isArrayIndex(i)) { + try { + markAsAvailable(stack, root.getField(fullPath.subPath(0, i)), + fullPath.getElement(i)); + } + catch (InvalidPathException e) { + e.printStackTrace(); + } } - catch (InvalidPathException e) { - e.printStackTrace(); - } } } } protected void waitFor(VariableStack stack, DSHandle handle) throws ExecutionException { - if (!handle.isClosed()) { - throw new FutureNotYetAvailable(addFutureListener(stack, handle)); + synchronized(handle.getRoot()) { + if (!handle.isClosed()) { + throw new FutureNotYetAvailable(addFutureListener(stack, handle)); + } } } private void closeDeep(VariableStack stack, DSHandle handle) throws ExecutionException, InvalidPathException { // Close the future + synchronized(handle.getRoot()) { handle.closeShallow(); getFutureWrapperMap(stack).close(handle); try { @@ -470,14 +475,18 @@ throw new ExecutionException("HandleOpen during closeDeep",e); } markToRoot(stack, handle); + } } protected void closeShallow(VariableStack stack, DSHandle handle) throws ExecutionException { - handle.closeShallow(); - getFutureWrapperMap(stack).close(handle); + synchronized(handle.getRoot()) { + handle.closeShallow(); + getFutureWrapperMap(stack).close(handle); + } } private boolean isClosed(VariableStack stack, DSHandle handle) throws ExecutionException { + assert Thread.holdsLock(handle.getRoot()); return getFutureWrapperMap(stack).isClosed(handle); } From noreply at svn.ci.uchicago.edu Mon Jan 5 11:38:31 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Mon, 5 Jan 2009 11:38:31 -0600 (CST) Subject: [Swift-commit] r2392 - text/hpdc09submission Message-ID: <20090105173831.863D62281A0@www.ci.uchicago.edu> Author: hategan Date: 2009-01-05 11:38:30 -0600 (Mon, 05 Jan 2009) New Revision: 2392 Added: text/hpdc09submission/Makefile Log: added rudimentary makefile Added: text/hpdc09submission/Makefile =================================================================== --- text/hpdc09submission/Makefile (rev 0) +++ text/hpdc09submission/Makefile 2009-01-05 17:38:30 UTC (rev 2392) @@ -0,0 +1,4 @@ +all: paper.pdf + +paper.pdf: + pdflatex paper.latex \ No newline at end of file From noreply at svn.ci.uchicago.edu Mon Jan 5 11:45:56 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Mon, 5 Jan 2009 11:45:56 -0600 (CST) Subject: [Swift-commit] r2393 - text/hpdc09submission Message-ID: <20090105174556.8156C2281D8@www.ci.uchicago.edu> Author: hategan Date: 2009-01-05 11:45:55 -0600 (Mon, 05 Jan 2009) New Revision: 2393 Modified: text/hpdc09submission/paper.latex Log: conceptually there is no upper bound on the invocations, so why bother? Modified: text/hpdc09submission/paper.latex =================================================================== --- text/hpdc09submission/paper.latex 2009-01-05 17:38:30 UTC (rev 2392) +++ text/hpdc09submission/paper.latex 2009-01-05 17:45:55 UTC (rev 2393) @@ -42,9 +42,8 @@ are location-independent and automatically parallelized. Swift can execute scripts that perform tens to hundreds of thousands of -program invocations on highly parallel resources, and its conceptual -design is intended to scale to applications that perform many millions -of invocations, and beyond. +program invocations on highly parallel resources, and deal with the +unreliable and dynamic aspects of wide-area distributed resources. The language provides a high level representation of collections of data and a specification of how those collections are to be mapped to an From noreply at svn.ci.uchicago.edu Mon Jan 5 12:36:14 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Mon, 5 Jan 2009 12:36:14 -0600 (CST) Subject: [Swift-commit] r2394 - text/hpdc09submission Message-ID: <20090105183614.5F2052281A0@www.ci.uchicago.edu> Author: hategan Date: 2009-01-05 12:36:13 -0600 (Mon, 05 Jan 2009) New Revision: 2394 Modified: text/hpdc09submission/paper.latex Log: no need not to be specific Modified: text/hpdc09submission/paper.latex =================================================================== --- text/hpdc09submission/paper.latex 2009-01-05 17:45:55 UTC (rev 2393) +++ text/hpdc09submission/paper.latex 2009-01-05 18:36:13 UTC (rev 2394) @@ -72,7 +72,7 @@ descriptions of dataflow between those components. Data is represented in a script by strongly-typed single-assignment -variables, using a syntax familiar to many programmers. +variables, using a C-like syntax. A variable may store data of \emph{primitive type} such as an integer or a string. However, a variable may also be \emph{mapper} From noreply at svn.ci.uchicago.edu Mon Jan 5 12:59:01 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Mon, 5 Jan 2009 12:59:01 -0600 (CST) Subject: [Swift-commit] r2395 - text/hpdc09submission Message-ID: <20090105185901.5D8EE2281D8@www.ci.uchicago.edu> Author: hategan Date: 2009-01-05 12:59:00 -0600 (Mon, 05 Jan 2009) New Revision: 2395 Modified: text/hpdc09submission/paper.latex Log: may help understand mapped types Modified: text/hpdc09submission/paper.latex =================================================================== --- text/hpdc09submission/paper.latex 2009-01-05 18:36:13 UTC (rev 2394) +++ text/hpdc09submission/paper.latex 2009-01-05 18:59:00 UTC (rev 2395) @@ -75,7 +75,7 @@ variables, using a C-like syntax. A variable may store data of \emph{primitive type} such as an -integer or a string. However, a variable may also be \emph{mapper} +integer or a string. However, a variable may also be \emph{mapped} to one or more POSIX-like files, allowing treatment of those files using the same syntax as other variables. In that case, the variable declaration is annotated with a @@ -88,6 +88,12 @@ image photo <"shane.jpeg">; \end{verbatim} +Conceptually, a parallel can be drawn between Swift \emph{mapped} variables +and Java \emph{reference types}. In both cases there is no syntactic distinction +between \emph{primitive types} and \emph{mapped} types or +\emph{reference types} respectively. Additionally, the semantic distinction +is also kept to a minimum. + Component programs of scripts are declared in an \emph{app declaration}, with the description of the command line syntax for that program and a list of input and output data. An \verb|app| block @@ -103,7 +109,7 @@ } \end{verbatim} -A procedure is invoked using with familiar syntax: +A procedure is invoked using the familiar syntax: \begin{verbatim} rotated = rotate(photo, 180); From noreply at svn.ci.uchicago.edu Mon Jan 5 14:22:25 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Mon, 5 Jan 2009 14:22:25 -0600 (CST) Subject: [Swift-commit] r2396 - text/hpdc09submission Message-ID: <20090105202225.DB49A2281DA@www.ci.uchicago.edu> Author: wilde Date: 2009-01-05 14:22:25 -0600 (Mon, 05 Jan 2009) New Revision: 2396 Modified: text/hpdc09submission/paper.latex Log: Added initial draft of Introduction. Needs much trimming to avoid overlap with section 2. "Rationale" is a placeholder at the moment. Modified: text/hpdc09submission/paper.latex =================================================================== --- text/hpdc09submission/paper.latex 2009-01-05 18:59:00 UTC (rev 2395) +++ text/hpdc09submission/paper.latex 2009-01-05 20:22:25 UTC (rev 2396) @@ -59,8 +59,194 @@ \section{Introduction} -TODO +Swift is a scripting language designed for composing ordinary +application programs (serial or parallel) into distributed, +parallelized, applications. It can execute scripts that perform tens +to hundreds of thousands of program invocations on highly parallel +resources, and its design is intended to scale to runs of many +millions of invocations. +Swift's purpose is to enable ``loosely coupled scripting'' in a +convenient, powerful fashion. It is intended to serve as a higher +level framework for composing parallel pipelines of other programs and +scripts - much like using a Makefile to encapsulate the compilation +process, Swift puts a data-driven ``make-like'' wrapper around the +execution of programs (which can themselves be scripts written in any +other scripting language, or binary executables). + +Swift is data-oriented: the core of its programming model is to +encapsulate the invocation of ``ordinary programs'' - technically, POSIX +\emph{exec()} operations - in a manner that explicitly specifies the data +objects (files) that are the inputs and outputs of each program +invocation. This formal but simple specification of data inputs and +outputs enables Swift to provide four critical features not provided +by scripting languages like Perl, or Python, ``shells'', or Tcl: + +\begin{itemize} +\item It can provide location transparent execution: automatically +selecting a location (i.e. a specific distributed system) for a given +program invocation +\item It can automatically parallelize (and throttle) the execution flow +of program invocations, executing invocations that have no data +dependencies in parallel. +\item It can record the provenance of derived data objects (and related +caller-callee information) +\item The Swift execution engine records the progress of a script's +execution, so if interrupted or terminated, it can be restarted from +the point of interruption, without re-executing any work that was +logged as successfully completed. +\item Swift variables can represent references to files, which are ten +passed to application programs to operate on. +\end{itemize} + +In the rest of this section, we provide an overview of Swift's main +concepts. These concepts are elaborated, with examples, in subsequent +sections. [Note: we will need to adjust between how much to specify +here, and how mush to state just before each construct is introduced. + +\emph{The term ``workflow''.} Because Swift serves as a ``parallel scripting +language'', it is typically used to specify and execute scientific +``workflows'' - which we define here as the pattern of execution of a +series of steps to perform larger domain-specific tasks. We use the +term workflow as defined by (Taylor et. al. 2006). So we often call a +Swift script a workflow. + +\emph{Dataset typing and mapping model}. Swift provides a high level +representation of collections of data and how those collections are to +be processed by component programs. It provides a structured data-type +model for representing collections of files and directories that are +passed to Swift procedures, and a mapping model to convert between the +physical representation of data on storage systems and the logical +representation of data structures in the abstract swift programming +model. + +Swift itself does not specify how the physical data environment is +implemented. This specification is instead left up to a set of +mappers. + +\emph{Execution of atomic functions}. Underlying this is an +implementation to execute scripts on grid and other platforms, +providing built-in site selection, data management and +reliability. Swift scripts can be tested on a single local +workstation. The same script can then be executed on a cluster, one o +more grids of clusters, and on large scale parallel supercomputers +such as the Sun Constellation (ref) or the IBM Blue Gene / P. + +Swift programs can also span environments: ...explain + +Swift is implemented by compiling to a Karajan program, which provides +several benefits. A notable benefit visible to users is that of +providers. This enbles the swift execution model to be extended by +adding new data providers and job execution providers. This is +explained in more detail in section X: Swift Implementation. + +\emph{Minimalist nature of Swift}. As a scripting language, Swift is +intentionally designed to be a sparse, minimal language. Its primary +purpose is to coordinate, throttle, and sequence the execution of +other programs. As such, it has only a very limited set of data types, +operators, and built-in functions. + +We believe strongly in, and our experience reinforces, the principle +that Swift - or languages like it - play an important role in the +family of programming languages. Ordinary scripting languages provide +the constructs for manipulating files and typically contain rich +operators, primitives and libraries for large classes of useful +operations such as string processing, math operations, internet and +file operations. + +Swift programs typically contain very little code to manipulate data +directly. + +\emph{Function composition and Libraries}. Swift programs are composed +starting with ``atomic'' functions, and then higher level functions are +composed as pipelines of sub-functions. The basic structure of a +composite function is a graph of calls to other functions. + +Recursive function calls [are / are not] supported [Relevant? FIXME] + +\emph{Variables, single assignment and data flow}. Swift variables hold +primitive values, or references to datasets, which are files or +collections of files. Variables are ``single assignment'', which is the +basis for Swift's model of function chaining. Functions are executed +when their variables are all set. Functions are chained by specifying +that an output variable of one function is passed as the input +variable to the second function. + +This dataflow model means that within a swift program, functions are +executed when their data is available - which is not necessarily in +source-code order. + +Swift function arguments are lists of such variables, and return lists +of such variables. Swift does not yet have a notion of +libraries. Swift programs execute as if all functions called in the +script are present in a single logical source file and are thus passed +to the Swift virtual machine all at once. + +\emph{Rationale}. Why do we need Swift? Why create yet another scripting +language for the execution of application programs when so many exist? +Swift was developed to create a higher-level language that focuses not +on the details of executing sequences or ``pipelines'' of programs, but +rather on specific issues that arise from scale. These issues, +however, once identified, seem to equally well apply to, and benefit +the execution of, application pipelines that are not large-scale and +not necessarily distributed. + +\emph{FIXME: This section needs much polishing/condensing.} + +Our motivation for developing Swift is based on the following premises: + +Scale (at least in the grid) requires distribution of execution among +many computers (``resources''), and hence a ``grid'' approach. Even if a +single large parallel resource suffices, users won't always have +access to the same one: resources are scarce, and users often need or +want to utilize whatever resource happened to be free at the moment +when they need to perform intensive computation. + +While many application needs involve the execution of a single large +and perhaps message-passing parallel app, many others require the +coupling or orchestration of large numbers of application invocations: +either many invocations of the same app, or many invocations of +sequences and patterns of several apps. In this model, existing apps +become like functions in programming, and users typically need to +execute many of them. + +Ousterhout in (Ousterhout 1998) eloquently laid out the rational and +motivation for scripting languages. As the creator of Tcl [ref], he +described here the difference between programming and scripting, and +the place of each in the scheme of applying computers to solving +problems. + +What's missing in current scripting languages is sufficient +specification and encapsulation of inputs to, and outputs from, a +given application, such that an execution environment could +automatically make remote execution transparent. + +In a sense, Swift adds to scripting what RPC adds to programming: by +formalizing the inputs and outputs of ``applications-as-functions'', it +provides a way to make the remote - and hence parallel - execution of +applications fairly transparent. + +It's useful to draw an analogy here between Swift and make. Just as a +``makefile'' - which in a sense is can be considered a script or program +(or certainly a ``recipe'') - is a specification of how to derive an +application program, a Swift script is a recipe for how to produce a +set of data. Unlike make, in which case the derived product is only +produced once, in Swift, derived datasets are repetitively derived. + +Could existing programs execute Swift calls through a library +approach? The answer to this is certainly ``yes'', and we examine this +in more detail in section X. + +Usage of Swift. Swift is achieving growing use on a variety of science +problems. + +... FIXME: provide details. + +In the remainder of this paper, FIXME ... we present the language, +details of the implementation, application use-cases and ongoing +research. + + \section{The SwiftScript language} \subsection{Overview of the language} From noreply at svn.ci.uchicago.edu Mon Jan 5 14:40:02 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Mon, 5 Jan 2009 14:40:02 -0600 (CST) Subject: [Swift-commit] r2397 - text/hpdc09submission Message-ID: <20090105204003.00ABA2281D0@www.ci.uchicago.edu> Author: hategan Date: 2009-01-05 14:40:02 -0600 (Mon, 05 Jan 2009) New Revision: 2397 Modified: text/hpdc09submission/paper.latex Log: some misc changes Modified: text/hpdc09submission/paper.latex =================================================================== --- text/hpdc09submission/paper.latex 2009-01-05 20:22:25 UTC (rev 2396) +++ text/hpdc09submission/paper.latex 2009-01-05 20:40:02 UTC (rev 2397) @@ -651,21 +651,19 @@ \subsection{The execution environment for component programs} A SwiftScript \verb|app| declaration describes how a component -program is invoked. In order to facilitate a number of interesting -features, the environment in which programs are executed is tightly -constrained, substantially more so than in a conventional scripting -language. +program is invoked. In order to ensure the correctness of the +Swift model, the environment in which programs are executed needs +to be constrained. A program is invoked in its own working directory; in that working -directory, the program can expect to find all of the files that are -passed as inputs to the application block; and on exit, it should leave -all files named by that application block in the same working -directory. Programs should not make excessive(?) assumptions about -their working environment; for example, they should not assume that +directory or one of its subdirectories, the program can expect to find +all of the files that are passed as inputs to the application block; and +on exit, it should leave all files named by that application block in +the same working directory. Applications should also not assume that they will be executed on a particular host (to facilitate site -portability); run in in any particular order with respoect to other -program invocations in a script (except those implied by datra -dependency); or that their working directories will or will not be +portability), run in in any particular order with respect to other +application invocations in a script (except those implied by data +dependency), or that their working directories will or will not be cleaned up after execution. Consider the \verb|app| declaration for the \verb|rotate| procedure in @@ -681,7 +679,7 @@ will be placed into the application working directory before execution, and which files will be expected there after execution. For the above declaration, the file mapped to the \verb|input| parameter will be -placed in the working directory before hand, and the file mapped to +placed in the working directory beforehand, and the file mapped to \verb|output| will be expected there after execution; the input parameter \verb|angle| is of primitive type\footnote{need to define primitive type earlier on here...} and so no files are staged in for From noreply at svn.ci.uchicago.edu Tue Jan 6 11:52:09 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Tue, 6 Jan 2009 11:52:09 -0600 (CST) Subject: [Swift-commit] r2398 - text/hpdc09submission Message-ID: <20090106175209.189CA228198@www.ci.uchicago.edu> Author: wilde Date: 2009-01-06 11:52:07 -0600 (Tue, 06 Jan 2009) New Revision: 2398 Modified: text/hpdc09submission/paper.latex Log: Updated introduction. Adjusted author list (Added Mihael, Ian, Mike). Added comments for ACM bibliography style. Modified: text/hpdc09submission/paper.latex =================================================================== --- text/hpdc09submission/paper.latex 2009-01-05 20:40:02 UTC (rev 2397) +++ text/hpdc09submission/paper.latex 2009-01-06 17:52:07 UTC (rev 2398) @@ -3,25 +3,31 @@ \usepackage{graphicx} \begin{document} +\bibliographystyle{unsrt} +% \bibliographystyle{abbrv} % for ACM SIGS style + \title{SwiftScript - a language for loosely coupled distributed parallel scripting \\ draft - contact benc at ci.uchicago.edu} % ACM styleguide says max 3 authors here, rest in acknowledgements -\numberofauthors{3} +\numberofauthors{4} \author{ \alignauthor Ben Clifford \\ \affaddr{University of Chicago Computation Institute}\\ - \affaddr{5640 S. Ellis}\\ - \affaddr{Chicago, Illinois}\\ - \email{benc at ci.uchicago.edu} \alignauthor Other people \\ -\affaddr{elsewhere} -\alignauthor Third author \\ -\affaddr{someplace}\\ -\affaddr{sometown} + \email{benc at ci.uchicago.edu} +\alignauthor Ian Foster \\ + \affaddr{University of Chicago Computation Institute}\\ + \affaddr{Argonne National Laboratory} +\alignauthor Mihael Hategan \\ + \affaddr{University of Chicago Computation Institute}\\ +\and +\alignauthor Michael Wilde \\ + \affaddr{University of Chicago Computation Institute}\\ + \affaddr{Argonne National Laboratory} } \maketitle @@ -69,18 +75,27 @@ Swift's purpose is to enable ``loosely coupled scripting'' in a convenient, powerful fashion. It is intended to serve as a higher level framework for composing parallel pipelines of other programs and -scripts - much like using a Makefile to encapsulate the compilation +scripts. Much like a Makefile encapsulates the compilation process, Swift puts a data-driven ``make-like'' wrapper around the execution of programs (which can themselves be scripts written in any other scripting language, or binary executables). -Swift is data-oriented: the core of its programming model is to -encapsulate the invocation of ``ordinary programs'' - technically, POSIX -\emph{exec()} operations - in a manner that explicitly specifies the data -objects (files) that are the inputs and outputs of each program +As a ``parallel scripting +language'', Swift is typically used to specify and execute scientific +``workflows'' - which we define here as the execution of a +series of steps to perform larger domain-specific tasks. We use the +term workflow as defined by (Taylor et. al. 2006). So we often call a +Swift script a workflow. FIXME: Drop this paragraph/concept? Or crisp it up. + +\subsection{Swift language concepts} + +The Swift programming model is data-oriented: it +encapsulates the invocation of ``ordinary programs'' - technically, POSIX +\emph{exec()} operations - in a manner that explicitly specifies the files +and other arguments that are the inputs and outputs of each program invocation. This formal but simple specification of data inputs and outputs enables Swift to provide four critical features not provided -by scripting languages like Perl, or Python, ``shells'', or Tcl: +by scripting languages like Perl, Python, Tcl, or the various command-line ``shells'': \begin{itemize} \item It can provide location transparent execution: automatically @@ -95,24 +110,15 @@ execution, so if interrupted or terminated, it can be restarted from the point of interruption, without re-executing any work that was logged as successfully completed. -\item Swift variables can represent references to files, which are ten -passed to application programs to operate on. \end{itemize} In the rest of this section, we provide an overview of Swift's main -concepts. These concepts are elaborated, with examples, in subsequent -sections. [Note: we will need to adjust between how much to specify -here, and how mush to state just before each construct is introduced. +concepts. Each concept is elaborated, with examples, in subsequent +sections. [FIXME: we will need to adjust between how much to specify +here, and how much to state just before each construct is introduced. -\emph{The term ``workflow''.} Because Swift serves as a ``parallel scripting -language'', it is typically used to specify and execute scientific -``workflows'' - which we define here as the pattern of execution of a -series of steps to perform larger domain-specific tasks. We use the -term workflow as defined by (Taylor et. al. 2006). So we often call a -Swift script a workflow. - -\emph{Dataset typing and mapping model}. Swift provides a high level -representation of collections of data and how those collections are to +\emph{Dataset typing and mapping model}. Swift provides for the high level +specification of collections of data and of how those collections should be processed by component programs. It provides a structured data-type model for representing collections of files and directories that are passed to Swift procedures, and a mapping model to convert between the @@ -182,7 +188,11 @@ script are present in a single logical source file and are thus passed to the Swift virtual machine all at once. -\emph{Rationale}. Why do we need Swift? Why create yet another scripting +\subsection{Rationale for creating Swift} + +\emph{FIXME: This section needs much polishing/condensing.} + +Why do we need Swift? Why create yet another scripting language for the execution of application programs when so many exist? Swift was developed to create a higher-level language that focuses not on the details of executing sequences or ``pipelines'' of programs, but @@ -191,15 +201,13 @@ the execution of, application pipelines that are not large-scale and not necessarily distributed. -\emph{FIXME: This section needs much polishing/condensing.} - Our motivation for developing Swift is based on the following premises: -Scale (at least in the grid) requires distribution of execution among +Scaling up requires the distribution of execution among many computers (``resources''), and hence a ``grid'' approach. Even if a single large parallel resource suffices, users won't always have -access to the same one: resources are scarce, and users often need or -want to utilize whatever resource happened to be free at the moment +access to the same supercomputer cluster: resources are scarce, and users often need or +want to utilize whatever resource happened to be available or economical at the moment when they need to perform intensive computation. While many application needs involve the execution of a single large @@ -249,9 +257,9 @@ \section{The SwiftScript language} -\subsection{Overview of the language} +\subsection{Language basics} - Programs written in the SwiftScript language are called (Swift) +Programs written in the SwiftScript language are called (Swift) scripts. A Swift script describes the interface of each component program, and how those components communicate with each other. Component programs in a Swift script are coupled together by @@ -1109,9 +1117,11 @@ defining; specifically for CoG, need to declare that it builds on top of that; relation to old old VDL2 papers (eg that Yong was on...) - some dude did some stuff about BOINC - that could have a one-liner + some dude (it was Xu Du) did some stuff about BOINC - that could have a one-liner if it was actually written up somewhere; otherwise ignore. +Not likely that it was written up but I will ask. (mike) + performance: application tuning graphs; provisioning and coaster file access (give one-liner numbers for those); file system layout tuning to accomodate GPFS - can make before/after one-liners for that @@ -1124,10 +1134,13 @@ Swift core: me, wilde, hategan, milena, yong, ian CNARI: skenny OSG: mats -Site selection: xi li, dude-who-did-data-affinity +Site selection: xi li, ragib +App installed: Zhengxiong Howe Falkon: Ioan, zhao -Collective IO: allan +Collective IO: allan, zhao, ioan +Users: Uri, Kubal, Hocky, UMD Student, ... + more explicit mapper description should include table of all/common mappers ramble about separation of parallel execution concerns and dataflow spec @@ -1167,5 +1180,6 @@ \bibitem{GPFS} - GPFS \end{thebibliography} + +% \bibliography{paper} % for ACM SIGS style \end{document} - From noreply at svn.ci.uchicago.edu Tue Jan 6 12:47:46 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Tue, 6 Jan 2009 12:47:46 -0600 (CST) Subject: [Swift-commit] r2399 - text/hpdc09submission Message-ID: <20090106184747.0ADDD2281D8@www.ci.uchicago.edu> Author: wilde Date: 2009-01-06 12:47:45 -0600 (Tue, 06 Jan 2009) New Revision: 2399 Added: text/hpdc09submission/makepaper text/hpdc09submission/paper.bib Modified: text/hpdc09submission/paper.latex Log: Switched to ACM bibliography style. Added paper.bib. Added makepaper script to generate paper.pdf using pdflatex and bibtex. Added: text/hpdc09submission/makepaper =================================================================== --- text/hpdc09submission/makepaper (rev 0) +++ text/hpdc09submission/makepaper 2009-01-06 18:47:45 UTC (rev 2399) @@ -0,0 +1,4 @@ +pdflatex paper.latex +bibtex paper +pdflatex paper.latex +pdflatex paper.latex Property changes on: text/hpdc09submission/makepaper ___________________________________________________________________ Name: svn:executable + * Added: text/hpdc09submission/paper.bib =================================================================== --- text/hpdc09submission/paper.bib (rev 0) +++ text/hpdc09submission/paper.bib 2009-01-06 18:47:45 UTC (rev 2399) @@ -0,0 +1,213 @@ +% +% $Id$ +% + + at inproceedings{VDS, + author = {Ian Foster and Jens Voeckler and Michael Wilde and Yong Zhao}, + title = {{Chimera: A Virtual Data System for Representing, Querying, + and Automating Data Derivation}}, + year = 2002, + booktitle = {14th Conference on Scientific and Statistical Database + Management} +} + + at article{CEDPS, + title = {{CEDPS}}, + author = {John Smith and Jane Doe}, + journal = {{Cluster Computing}}, + volume = {5(3)}, + year = 2002, + pages = {237--247} +} + + at article{MONOTONICPHD, + title = {{MONOTONICPHD}}, + author = {John Smith and Jane Doe}, + journal = {{Cluster Computing}}, + volume = {5(3)}, + year = 2002, + pages = {237--247} +} + + at article{GLOBUS, + title = {{GLOBUS}}, + author = {John Smith and Jane Doe}, + journal = {{Cluster Computing}}, + volume = {5(3)}, + year = 2002, + pages = {237--247} +} + + at article{GRAM, + title = {{GRAM}}, + author = {John Smith and Jane Doe}, + journal = {{Cluster Computing}}, + volume = {5(3)}, + year = 2002, + pages = {237--247} +} + + at article{GridFTP, + title = {{GridFTP}}, + author = {John Smith and Jane Doe}, + journal = {{Cluster Computing}}, + volume = {5(3)}, + year = 2002, + pages = {237--247} +} + + at article{TCP, + title = {{TCP}}, + author = {John Smith and Jane Doe}, + journal = {{Cluster Computing}}, + volume = {5(3)}, + year = 2002, + pages = {237--247} +} + + at article{CNARI, + title = {{CNARI}}, + author = {John Smith and Jane Doe}, + journal = {{Cluster Computing}}, + volume = {5(3)}, + year = 2002, + pages = {237--247} +} + + at article{FALKON, + title = {{FALKON}}, + author = {John Smith and Jane Doe}, + journal = {{Cluster Computing}}, + volume = {5(3)}, + year = 2002, + pages = {237--247} +} + + at article{COG, + title = {{COG}}, + author = {John Smith and Jane Doe}, + journal = {{Cluster Computing}}, + volume = {5(3)}, + year = 2002, + pages = {237--247} +} + + at article{LONIPIPELINE, + title = {{LONIPIPELINE}}, + author = {John Smith and Jane Doe}, + journal = {{Cluster Computing}}, + volume = {5(3)}, + year = 2002, + pages = {237--247} +} + + at article{MAPREDUCE, + title = {{MAPREDUCE}}, + author = {John Smith and Jane Doe}, + journal = {{Cluster Computing}}, + volume = {5(3)}, + year = 2002, + pages = {237--247} +} + + at article{TERAGRID, + title = {{TERAGRID}}, + author = {John Smith and Jane Doe}, + journal = {{Cluster Computing}}, + volume = {5(3)}, + year = 2002, + pages = {237--247} +} + + at article{OSG, + title = {{OSG}}, + author = {John Smith and Jane Doe}, + journal = {{Cluster Computing}}, + volume = {5(3)}, + year = 2002, + pages = {237--247} +} + + at article{ReSS, + title = {{ReSS}}, + author = {John Smith and Jane Doe}, + journal = {{Cluster Computing}}, + volume = {5(3)}, + year = 2002, + pages = {237--247} +} + + at article{GPFS, + title = {{GPFS}}, + author = {John Smith and Jane Doe}, + journal = {{Cluster Computing}}, + volume = {5(3)}, + year = 2002, + pages = {237--247} +} + +% Items below are from an older paper - retain for the moment in case any are useful here + + at article{condor-g, + title = {{Condor-G: A Computation Management Agent for + Multi-Institutional Grids}}, + author = {Frey J and Tannenbaum T and Foster I and Livny M and Tuecke S}, + journal = {{Cluster Computing}}, + volume = {5(3)}, + year = 2002, + pages = {237--247} +} + + at inproceedings{mds, + title = {{Grid Information Services for Distributed Resource Sharing}}, + author = {Czajkowski K and Fitzgerald S and Foster I and Kesselman C}, + booktitle = {Proceedings of the Tenth IEEE International Symposium on High-Performance Distributed Computing (HPDC-10), IEEE Press}, + month = {August}, + year = 2001 +} + + at inproceedings{rls, + title = {{Giggle: A Framework for Constructing Sclable Replica Location Services}}, + author = {Chervenak A and Deelman E and Foster I and Guy L and Hoschek W and Iamnitchi A and Kesselman C and + Kunst P and Ripeanu M and Schwartzkopf B and Stockinger H and Stockinger K and Tierney B}, + booktitle = {Proceedings of Supercomputing 2002 (SC2002)}, + month = {November}, + year = 2002 +} + + at inproceedings{pegasus, + title = {{Pegasus: Mapping Scientific Workflows onto the Grid}}, + author = {Deelman E et al.}, + booktitle = {2nd EU Across Grids Conference}, + year = 2004, + location = {Cyprus} +} + + at inproceedings{gridant, + title = {{GridAnt: A Client-Controllable Grid Workflow System}}, + author = {Amin K et al}, + booktitle = {37th Hawaii International Conference on System Sciences}, + year = 2004, +} + + at inproceedings{MGC04, + author = {Zhao Y and Wilde M and Foster I and Voeckler J and Dobson J and Jordan T and Quigg E}, + title = {{Grid Middleware Services for Virtual Data Discovery, Composition, and Integration}}, + year = 2004, + location = {Toronto, Ontario, Canada}, + booktitle = {Proceedings of the 2nd workshop on Middleware for grid computing} +} + + at inproceedings{xdtm, + author = {Moreau L and Zhao Y and Foster I and Voeckler J and Wilde, M}, + title = {{XDTM: XML Dataset Typing and Mapping for Specifying Datasets}}, + year = 2005, + booktitle = {European Grid Conference} +} + + at inproceedings{qnetgrid, + author = {Bardeen M and Gilbert E and Jordan T and Nepywoda P and Quigg E and Wilde M and Zhao Y}, + title = {{The QuarkNet/Grid Collaborative Learning e-Lab}}, + year = 2005, + booktitle = {Second International Workshop on Collaborative and Learning Applications of Grid Technology and Grid Education} +} Property changes on: text/hpdc09submission/paper.bib ___________________________________________________________________ Name: svn:executable + * Modified: text/hpdc09submission/paper.latex =================================================================== --- text/hpdc09submission/paper.latex 2009-01-06 17:52:07 UTC (rev 2398) +++ text/hpdc09submission/paper.latex 2009-01-06 18:47:45 UTC (rev 2399) @@ -3,9 +3,9 @@ \usepackage{graphicx} \begin{document} -\bibliographystyle{unsrt} +% \bibliographystyle{unsrt} % initial temp bib style for editing -% \bibliographystyle{abbrv} % for ACM SIGS style +\bibliographystyle{abbrv} % for ACM SIGS style \title{SwiftScript - a language for loosely coupled distributed parallel scripting \\ draft - contact benc at ci.uchicago.edu} @@ -1146,40 +1146,40 @@ ramble about separation of parallel execution concerns and dataflow spec in the same way that gph has a separation of same concerns... compare contrast -\begin{thebibliography}{99} -\bibitem{CEDPS} - cedps +%\begin{thebibliography}{99} +%\bibitem{CEDPS} - cedps +% +%\bibitem{MONOTONICPHD} - the phd on distributed language that defines the term 'monotonic' - although maybe it comes from elsewhere +% +%\bibitem{GLOBUS} globus toolkit +% +%\bibitem{GRAM} gram +% +%\bibitem{GridFTP} - gridftp +% +%\bibitem{TCP} - tcp +% +%\bibitem{CNARI} - something about the cnari paper +% +%\bibitem{FALKON} - falkon +% +%\bibitem{COG} - cog +% +%\bibitem{VDS} - VDS +% +%\bibitem{LONIPIPELINE} - loni pipeline +% +%\bibitem{MAPREDUCE} - mapreduce +% +%\bibitem{TERAGRID} - teragrid +% +%\bibitem{OSG} - open science grid +% +%\bibitem{ReSS} - ress +% +%\bibitem{GPFS} - GPFS +% +%\end{thebibliography} -\bibitem{MONOTONICPHD} - the phd on distributed language that defines the term 'monotonic' - although maybe it comes from elsewhere - -\bibitem{GLOBUS} globus toolkit - -\bibitem{GRAM} gram - -\bibitem{GridFTP} - gridftp - -\bibitem{TCP} - tcp - -\bibitem{CNARI} - something about the cnari paper - -\bibitem{FALKON} - falkon - -\bibitem{COG} - cog - -\bibitem{VDS} - VDS - -\bibitem{LONIPIPELINE} - loni pipeline - -\bibitem{MAPREDUCE} - mapreduce - -\bibitem{TERAGRID} - teragrid - -\bibitem{OSG} - open science grid - -\bibitem{ReSS} - ress - -\bibitem{GPFS} - GPFS - -\end{thebibliography} - -% \bibliography{paper} % for ACM SIGS style +\bibliography{paper} % for ACM SIGS style \end{document} From noreply at svn.ci.uchicago.edu Tue Jan 6 13:47:00 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Tue, 6 Jan 2009 13:47:00 -0600 (CST) Subject: [Swift-commit] r2400 - trunk/tests/misc Message-ID: <20090106194700.C27972281EA@www.ci.uchicago.edu> Author: benc Date: 2009-01-06 13:46:58 -0600 (Tue, 06 Jan 2009) New Revision: 2400 Added: trunk/tests/misc/asserts.sh Modified: trunk/tests/misc/run Log: test with java assertions turned on Added: trunk/tests/misc/asserts.sh =================================================================== --- trunk/tests/misc/asserts.sh (rev 0) +++ trunk/tests/misc/asserts.sh 2009-01-06 19:46:58 UTC (rev 2400) @@ -0,0 +1,9 @@ +#!/bin/bash + +# run tests with java assertions enabled + +export COG_OPTS="-enableassertions" + +cd ../language-behaviour + +./run Property changes on: trunk/tests/misc/asserts.sh ___________________________________________________________________ Name: svn:executable + * Modified: trunk/tests/misc/run =================================================================== --- trunk/tests/misc/run 2009-01-06 18:47:45 UTC (rev 2399) +++ trunk/tests/misc/run 2009-01-06 19:46:58 UTC (rev 2400) @@ -1,7 +1,7 @@ #!/bin/sh for a in clusters no-retries dryrun typecheck path-prefix restart restart2 restart3 restart4 restart5 restart-iterate workernode-local \ ordering-extern-notlazy restart-extern ordering-extern \ -external-mapper-args extract-int-delayed \ +external-mapper-args extract-int-delayed asserts \ ; do ./${a}.sh R=$? From noreply at svn.ci.uchicago.edu Wed Jan 7 13:45:24 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Wed, 7 Jan 2009 13:45:24 -0600 (CST) Subject: [Swift-commit] r2401 - trunk/src/org/griphyn/vdl/karajan/lib Message-ID: <20090107194524.308732281A0@www.ci.uchicago.edu> Author: benc Date: 2009-01-07 13:45:22 -0600 (Wed, 07 Jan 2009) New Revision: 2401 Modified: trunk/src/org/griphyn/vdl/karajan/lib/WaitFieldValue.java Log: lock in waitfieldvalue - I forgot to commit this before a previous commit enabling assertions for tests, so tests were broken for a day Modified: trunk/src/org/griphyn/vdl/karajan/lib/WaitFieldValue.java =================================================================== --- trunk/src/org/griphyn/vdl/karajan/lib/WaitFieldValue.java 2009-01-06 19:46:58 UTC (rev 2400) +++ trunk/src/org/griphyn/vdl/karajan/lib/WaitFieldValue.java 2009-01-07 19:45:22 UTC (rev 2401) @@ -33,7 +33,7 @@ try { Path path = parsePath(OA_PATH.getValue(stack), stack); var = var.getField(path); - synchronized (var) { + synchronized (var.getRoot()) { if (!var.isClosed()) { logger.debug("Waiting for " + var); throw new FutureNotYetAvailable(addFutureListener(stack, var)); From noreply at svn.ci.uchicago.edu Wed Jan 7 14:05:12 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Wed, 7 Jan 2009 14:05:12 -0600 (CST) Subject: [Swift-commit] r2402 - trunk/src/org/griphyn/vdl/karajan/lib Message-ID: <20090107200512.CACEA2281E4@www.ci.uchicago.edu> Author: benc Date: 2009-01-07 14:05:11 -0600 (Wed, 07 Jan 2009) New Revision: 2402 Modified: trunk/src/org/griphyn/vdl/karajan/lib/GetFieldValue.java Log: lock in GetFieldValue Modified: trunk/src/org/griphyn/vdl/karajan/lib/GetFieldValue.java =================================================================== --- trunk/src/org/griphyn/vdl/karajan/lib/GetFieldValue.java 2009-01-07 19:45:22 UTC (rev 2401) +++ trunk/src/org/griphyn/vdl/karajan/lib/GetFieldValue.java 2009-01-07 20:05:11 UTC (rev 2402) @@ -32,25 +32,26 @@ return var1; } DSHandle var = (DSHandle) var1; - try { - Path path = parsePath(OA_PATH.getValue(stack), stack); - if (path.hasWildcards()) { - try { - return var.getFields(path).toArray(); + DSHandle root = var.getRoot(); + synchronized(root) { + try { + Path path = parsePath(OA_PATH.getValue(stack), stack); + if (path.hasWildcards()) { + try { + return var.getFields(path).toArray(); + } + catch (HandleOpenException e) { + if (logger.isDebugEnabled()) { + logger.debug("Waiting for var=" + var + " path=" + path); + } + throw new FutureNotYetAvailable(addFutureListener(stack, e.getSource())); + } } - catch (HandleOpenException e) { - if (logger.isDebugEnabled()) { - logger.debug("Waiting for var=" + var + " path=" + path); + else { + var = var.getField(path); + if (var.getType().isArray()) { + throw new RuntimeException("Getting value for array "+var+" which is not permitted."); } - throw new FutureNotYetAvailable(addFutureListener(stack, e.getSource())); - } - } - else { - var = var.getField(path); - if (var.getType().isArray()) { - throw new RuntimeException("Getting value for array "+var+" which is not permitted."); - } - synchronized (var) { if (!var.isClosed()) { if (logger.isDebugEnabled()) { logger.debug("Waiting for " + var); @@ -62,10 +63,9 @@ } } } + catch (InvalidPathException e) { + throw new ExecutionException(e); + } } - catch (InvalidPathException e) { - throw new ExecutionException(e); - } } - } From noreply at svn.ci.uchicago.edu Thu Jan 8 02:02:10 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Thu, 8 Jan 2009 02:02:10 -0600 (CST) Subject: [Swift-commit] r2403 - trunk/src/org/griphyn/vdl/karajan Message-ID: <20090108080210.C5E62228198@www.ci.uchicago.edu> Author: benc Date: 2009-01-08 02:02:09 -0600 (Thu, 08 Jan 2009) New Revision: 2403 Modified: trunk/src/org/griphyn/vdl/karajan/DSHandleFutureWrapper.java Log: as far as I can tell, this never gets used Modified: trunk/src/org/griphyn/vdl/karajan/DSHandleFutureWrapper.java =================================================================== --- trunk/src/org/griphyn/vdl/karajan/DSHandleFutureWrapper.java 2009-01-07 20:05:11 UTC (rev 2402) +++ trunk/src/org/griphyn/vdl/karajan/DSHandleFutureWrapper.java 2009-01-08 08:02:09 UTC (rev 2403) @@ -16,7 +16,7 @@ import org.griphyn.vdl.mapping.DSHandle; import org.griphyn.vdl.mapping.DSHandleListener; -public class DSHandleFutureWrapper implements Future, Mergeable, DSHandleListener { +public class DSHandleFutureWrapper implements Future, DSHandleListener { private DSHandle handle; private LinkedList listeners; @@ -74,16 +74,6 @@ listeners = null; } - public void mergeListeners(Future f) { - Iterator i = listeners.iterator(); - while (i.hasNext()) { - EventTargetPair etp = (EventTargetPair) i.next(); - f.addModificationAction(etp.getTarget(), etp.getEvent()); - i.remove(); - } - listeners = null; - } - public int listenerCount() { if (listeners == null) { return 0; From noreply at svn.ci.uchicago.edu Thu Jan 8 07:23:37 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Thu, 8 Jan 2009 07:23:37 -0600 (CST) Subject: [Swift-commit] r2404 - text/hpdc09submission Message-ID: <20090108132337.B16E72281A0@www.ci.uchicago.edu> Author: wilde Date: 2009-01-08 07:23:36 -0600 (Thu, 08 Jan 2009) New Revision: 2404 Modified: text/hpdc09submission/paper.bib text/hpdc09submission/paper.latex Log: Added "Related Work" section - taken verbatim from rejected FCGS paper. Added a few refs. Modified: text/hpdc09submission/paper.bib =================================================================== --- text/hpdc09submission/paper.bib 2009-01-08 08:02:09 UTC (rev 2403) +++ text/hpdc09submission/paper.bib 2009-01-08 13:23:36 UTC (rev 2404) @@ -11,6 +11,34 @@ Management} } + at incollection{GCRPNOVA, + author = {Yong Zhao and Ioan Raicu and Ian Foster an Mihael Hategan and Veronika Nefedova and Mike Wilde}, + title = {{Scalable and Reliable Scientific Computations in Grid Environments}} + booktitle = {Grid Computing Research Progress}, + isbn = {978-1-60456-404-4}, + pages = {TODO}, + publisher = {Nova Publisher}, + year = 2008, + editor = (TODO}, + url = {http://people.cs.uchicago.edu/~iraicu/publications/2008_NOVA08_book-chapter_Swift.pdf), +} + + at inproceedings{SWIFTIWSW2007, + author = {Yong Zhao and MihaelHategan and B Clifford and I Foster and G vonLaszewski and I Raicu and T Stef-Praun and M Wilde}, + title = {{Swift: Fast, Reliable, Loosely Coupled Parallel Computation}}, + year = 2007, + booktitle = {IEEE International Workshop on Scientific Workflows} +} + + at article{LINDA, + title = {{Linda and Friends}}, + author = {S Ahuja and N Carriero and D Gelernter}, + journal = {{IEEE Computer}}, + volume = {19(8)}, + year = 1986, + pages = {26--34} +} + @article{CEDPS, title = {{CEDPS}}, author = {John Smith and Jane Doe}, Modified: text/hpdc09submission/paper.latex =================================================================== --- text/hpdc09submission/paper.latex 2009-01-08 08:02:09 UTC (rev 2403) +++ text/hpdc09submission/paper.latex 2009-01-08 13:23:36 UTC (rev 2404) @@ -85,7 +85,7 @@ ``workflows'' - which we define here as the execution of a series of steps to perform larger domain-specific tasks. We use the term workflow as defined by (Taylor et. al. 2006). So we often call a -Swift script a workflow. FIXME: Drop this paragraph/concept? Or crisp it up. +Swift script a workflow. TODO: Drop this paragraph/concept? Or crisp it up. \subsection{Swift language concepts} @@ -114,7 +114,7 @@ In the rest of this section, we provide an overview of Swift's main concepts. Each concept is elaborated, with examples, in subsequent -sections. [FIXME: we will need to adjust between how much to specify +sections. [TODO: we will need to adjust between how much to specify here, and how much to state just before each construct is introduced. \emph{Dataset typing and mapping model}. Swift provides for the high level @@ -168,7 +168,7 @@ composed as pipelines of sub-functions. The basic structure of a composite function is a graph of calls to other functions. -Recursive function calls [are / are not] supported [Relevant? FIXME] +Recursive function calls [are / are not] supported [TODO: Relevant?] \emph{Variables, single assignment and data flow}. Swift variables hold primitive values, or references to datasets, which are files or @@ -190,7 +190,7 @@ \subsection{Rationale for creating Swift} -\emph{FIXME: This section needs much polishing/condensing.} +\emph{TODO: This section needs much polishing/condensing.} Why do we need Swift? Why create yet another scripting language for the execution of application programs when so many exist? @@ -248,9 +248,9 @@ Usage of Swift. Swift is achieving growing use on a variety of science problems. -... FIXME: provide details. +... TODO: provide details. -In the remainder of this paper, FIXME ... we present the language, +TODO: In the remainder of this paper, ... we present the language, details of the implementation, application use-cases and ongoing research. @@ -928,9 +928,9 @@ Another app: Rosetta on OSG? OSG was designed with a focus on heterogeneity between sites. Large number of sites; automatic site file selection; and automatic app deployment there. - -\section{Swift as a framework for ongoing and experimental work} +\section{Usage Experience} + \subsection{Use on large numbers of sites in the Open Science Grid} TODO: get Mats to comment on this section...? @@ -962,6 +962,8 @@ cases. However, continued discovery of unusual failure modes drives the implementation of ever more fault tolerance mechanisms. +\subsection{Automating Application Deployment} + When running jobs on dynamically discovered sites, it is likely that component programs are not installed on those sites. @@ -989,11 +991,13 @@ } \end{verbatim} -TODO: dude in CI SWFT group has also done stuff about application -stagein - this could be mentioned, if I knew more about it... +TODO: Zhengxiong Hou has also done stuff about application +stagein - this could be mentioned (see Zhengxiong's email and paper) TODO: what's the conclusion (if any) of this section? +\section{Swift as a framework for ongoing and experimental work} + \subsection{Automatic characterisation of site and application behaviour} TODO The replication mechanism is the beginning of this - but there is scope @@ -1095,6 +1099,107 @@ active development group; releases roughly every 2 months. +\section{Comparison to Other Systems} + +Coordination languages and systems such as Linda\cite{LINDA}, +Strand\cite{STRAN} and PCN\cite{PCN} [11] allow composition of +distributed or parallel components, but usually require the components +to be programmed in specific languages and linked with the systems; +where we need to coordinate procedures that may already exist (e.g., +legacy applications), were coded in various programming languages and +run in different platforms and architectures. Linda defines a set of +coordination primitives for concurrent agents to put and retrieve +tuples from a shared data space called tuplespace, which serves as the +medium for communication and coordination. Strand and PCN use +single-assignment variables [7] as coordination mechanism. Like Linda, +Strand and PCN are data driven in the sense that the action of sending +and receiving data are decoupled, and processes execute only when data +are available. The Swift system uses similar mechanism called future +[16] for workflow evaluation and scheduling. + +MapReduce [8] also provides a programming models and a runtime system +to support the processing of large scale datasets. The two key +functions ?map? and ?reduce? are borrowed from functional language: a +map function iterates over a set of items, performs a specific +operation on each of them and produces a new set of items, where a +reduce function performs aggregation on a set of items. The runtime +system automatically partitions input data and schedules the execution +of programs in a large cluster of commodity machines. The system is +made fault tolerant by checking worker nodes periodically and +reassigning failed jobs to other worker nodes. Sawzall [22] is an +interpreted language that builds on MapReduce and separates the +filtering and aggregation phases for more concise program +specification and better parallelization. + +Swift and MapReduce/Sawzall share the same goals to providing a +programming tool for the specification and execution of large parallel +computations on large quantities of data, and facilitating the +utilization of large distributed resources. However, the two also +differ in many aspects: + +\begin{itemize} + +\item Programming model: MapReduce only supports key-value pairs as input +or output datasets and two types of computation functions ? map and +reduce; where Swift provides a type system and allows the definition +of complex data structures and arbitrary computation procedures. + +\item Data format: in MapReduce, input and output data can be of several +different formats, and it is also possible to define new data +sources. Swift provides a more flexible mapping mechanism to map +between logical data structures and various physical representations. + +\item Dataset partition: Swift does not automatically partition input +datasets. Instead, datasets can be organized in structures, and +individual items in a dataset can be transferred accordingly along +with computations. + +\item Execution environment: MapReduce schedules computations within a +cluster with shared Google File System, where Swift schedules across +distributed Grid sites that may span multiple administrative domains, +and deals with security and resource usage policy issues. + +\end{itemize} + +BPEL [5] is a Web Service-based standard that specifies how a set of +Web services interact to form a larger, composite Web Service. BPEL is +starting to be tested in scientific contexts [10]. While BPEL can +transfer data as XML messages, for very large scale datasets, data +exchange must be handled via separate mechanisms. In BPEL 1.0 +specification, it does not have support for dataset +iterations. According to Emmerich et al, an application with +repetitive patterns on a collection of datasets could result in a BPEL +document of 200MB in size, and BPEL is cumbersome if not impossible to +write for computational scientists [10]. Although BPEL can use XML +Schema to describe data types, it does not provide support for mapping +between a logical XML view and arbitrary physical representations. + +DAGMan [6] provides a workflow engine that manages Condor jobs +organized as directed acyclic graphs (DAGs) in which each edge +corresponds to an explicit task precedence. . It has no knowledge of +data flow, and in distributed environment works best with a +higher-level, data-cognizant layer. It is based on static workflow +graphs and lacks dynamic features such as iteration or conditional +execution, although these features are being researched. + +Pegasus [9] is primarily a set of DAG transformers. Pegasus planners +translate a workflow graph into a location specific DAGMan input file, +adding stages for data staging, inter-site transfer and data +registration. They can prune tasks for files that already exist, +select sites for jobs, and cluster jobs based on various +criteria. Pegasus performs graph transformation with the knowledge of +the whole workflow graph, while in Swift, the structure of a workflow +is constructed and expanded dynamically. + +Swift integrates the CoG Karajan workflow engine. Karajan provides the +libraries and primitives for job scheduling, data transfer, and Grid +job submission; Swift adds support for high-level abstract +specification of large parallel computations, data abstraction, and +workflow restart, and also (via Falkon) fast, reliable execution over +multiple Grid sites. + +\section{Future Work} + \section{Acknowledgements} TODO: authors beyond number 3 go here according to ACM style guide, rather From noreply at svn.ci.uchicago.edu Thu Jan 8 15:12:12 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Thu, 8 Jan 2009 15:12:12 -0600 (CST) Subject: [Swift-commit] r2405 - text/hpdc09submission Message-ID: <20090108211212.743DF2281DC@www.ci.uchicago.edu> Author: wilde Date: 2009-01-08 15:12:11 -0600 (Thu, 08 Jan 2009) New Revision: 2405 Modified: text/hpdc09submission/paper.latex Log: Added a fmri example from the FPGS paper. Not likely that this is the one we want to go with - just need something to work with here at the momment. Modified: text/hpdc09submission/paper.latex =================================================================== --- text/hpdc09submission/paper.latex 2009-01-08 13:23:36 UTC (rev 2404) +++ text/hpdc09submission/paper.latex 2009-01-08 21:12:11 UTC (rev 2405) @@ -929,6 +929,104 @@ heterogeneity between sites. Large number of sites; automatic site file selection; and automatic app deployment there. +\subsection{fMRI Application Example} + +\includegraphics{IMG_fmridataset} + +\begin{verbatim} +type Study { Group g[]; } +type Run { Volume v[]; } +type Volume { + Image img; + Header hdr; +} +type Group { Subject s[]; } +type AirVector { Air a[]; } +type Subject { + Volume anat; + Run run[]; +} + +(Run resliced) reslice_wf ( Run r) { + Run yR = reorientRun( r , "y", "n" ); + Run roR = reorientRun( yR , "x", "n" ); + Volume std = roR.v[1]; + AirVector roAirVec = + alignlinearRun(std, roR, 12, 1000, 1000, "81 3 3"); + resliced = resliceRun( roR, roAirVec, "-o", "-k"); +} + +(Volume ov) reorient (Volume iv, string direction, string overwrite) { + app { + reorient @filename(iv.hdr) + @filename(ov.hdr) + direction + overwrite; + } +} + +(Run or) reorientRun (Run ir, string direction, string overwrite) { + foreach Volume iv, i in ir.v { + or.v[i] = reorient (iv, direction, overwrite); + } +} + +\end{verbatim} + +In this example, the logical structure of the fMRI dataset shown in +Figure 1 can be represented by the SwiftScript type declarations in +lines 1-6 in Figure 2. Here, Study is declared as containing an array +of Group, which in turn contains an array of Subject, etc. Similarly, +an fMRI Run is a series of brain scans called volumes, with a Volume +containing a 3D image of a volumetric slice of a brain image, +represented by an Image (voxels) and a Header (scanner metadata). An +Air is a parameter file for spatial adjustment, and an AirVector is a +set of such parameter files. Datasets are operated on by procedures, +which take typed data described by XDTM as input, perform computations +on those data, and produce data described by XDTM as output. The +procedure reslice\_wf defines a compound procedure, which may comprise +a series of procedure calls, using variables or datasets to establish +data dependencies. Such procedures can themselves be called by other +procedures, thus defining a potentially large and complex execution +graph. + +In the example, reslice\_wf essentially defines a simple four-step +pipeline computation, using variables and/or datasets to establish +data dependencies. It applies reorientRun to a run first in the x axis +and then in the y axis, and then aligns each image in the resulting +run with the first image. The program alignlinear determines how to +spatially adjust an image to match a reference image, and produces an +air parameter file. The actual alignment is done by the program +reslice. Note that variable yR, being the output of the first step and +the input of the second step, defines the data dependencies between +the two steps. The pipeline is illustrated in the center of Figure 2, +while on the right we show the expanded graph for a 20-volume +run. Each volume comprises an image file and a header file, so there +are a total of 40 input files and 40 output files. We can also apply +the same procedure to a run containing hundreds or thousands of +volumes. + +In this example we show the details of the procedure reorientRun, +which is also a compound procedure. Note the typed input arguments (to +the right of the procedure name) and output argument (to the +left). The foreach statement defines an iteration over the input run +ir and applies the procedure reorient (which rotates a brain image +along a certain axis) to each volume in the run to produces a +reoriented run or. Because the multiple calls to reorient operate on +independent data elements, they can proceed in parallel. The +procedure reorient in this example is an atomic procedure, which +specifies the interface to calling an executable program. This +procedure has typed input parameters iv, direction and overwrite and +one output ov. The body of this particular procedure specifies that it +invokes a program (conveniently, also called reorient) that will be +dynamically mapped to a binary executable. (This executable will +execute at an execution site chosen by the Swift runtime system.) The +body also specifies how input parameters map to command line +arguments. The notation @filename is a built-in mapping function that +maps a logical data structure to a physical file name. In this case, +it extracts the file name of input header and output header, which are +then put in the command line to invoke the reorient program. + \section{Usage Experience} \subsection{Use on large numbers of sites in the Open Science Grid} From noreply at svn.ci.uchicago.edu Fri Jan 9 02:06:05 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Fri, 9 Jan 2009 02:06:05 -0600 (CST) Subject: [Swift-commit] r2406 - text/hpdc09submission Message-ID: <20090109080605.413712281DC@www.ci.uchicago.edu> Author: benc Date: 2009-01-09 02:06:03 -0600 (Fri, 09 Jan 2009) New Revision: 2406 Added: text/hpdc09submission/omxFigure.jpg Modified: text/hpdc09submission/paper.latex Log: import skenny's summary of SEM app Added: text/hpdc09submission/omxFigure.jpg =================================================================== (Binary files differ) Property changes on: text/hpdc09submission/omxFigure.jpg ___________________________________________________________________ Name: svn:mime-type + application/octet-stream Modified: text/hpdc09submission/paper.latex =================================================================== --- text/hpdc09submission/paper.latex 2009-01-08 21:12:11 UTC (rev 2405) +++ text/hpdc09submission/paper.latex 2009-01-09 08:06:03 UTC (rev 2406) @@ -1027,6 +1027,63 @@ it extracts the file name of input header and output header, which are then put in the command line to invoke the reorient program. +\subsection{Structural Equation Modeling using OpenMx} + +OpenMx is an R library designed for structural equation modeling (SEM), +a technique currently used in the neuroimaging field to examine +connectivity between brain areas. + +Structural Equation Modeling is greatly enhanced when coupled with +Grid-resources and a workflow management system. Traditionally, +structural equation models have been derived from anatomical models +based on primate brains out of necessity. It was infeasible to test +models outside of the hypothetical anatomical model space due to +restrictions in resources. In the current infrastructure we implement a +scriptable, high-level means for not only testing but generating +exploratory models in parallel on large clusters, making the testing of +models outside the (anatomical) hypothesis space a more reasonable goal. +In light of this, we have developed a ``model generator'' to allow a +researcher to test all models within a space of potential connections +without a predefined anatomical model. In the absence of a large-scale +infrastructure this would not be doable. For example, within the CNARI +submit cluster at Chicago, we have a relatively simple Swift script for +calling OpenMx to generate and process models in parallel. + +\begin{figure}[htbp] +\includegraphics{omxFigure} + +\caption{Schematic of a single OpenMx model containing 4 regions of +interest (I through K) with 5 regression starting values (asymmetric +connections) of weight 0.75 and 4 residual variances (symmetric connections) +of weight 1.0} +\end{figure} + +Using OpenMx's model generator -- a set of functions which creates +self-contained, structural equation models -- we generated 65,535 R +objects representing all models with anywhere from 1 to 16 connections +of varying weights between 4 pre-selected regions of interest. That is, +a 4x4 matrix represents connections between the four regions, +representing 16 possible connections (connections between the same two +regions but in different directions are tested separately). We queried +our experiment database for activation values based on the selected +regions of interest, and the covariance of those regions over 8 time +points during the emblem experiment was calculated. The covariance of +each generated model was then compared to the covariance matrix of the +observed data to determine the best-fitting model. In other words, the +connection weights (or strength of the relationships between anatomical +regions) can be explored based on the fit of each model. + +modgenproc.swift is used to submit each of the necessary computation +components to TeraGrid's Ranger cluster: a) the model object b) the +covariance matrix derived from the database and c) the R script which +makes the call to OpenMx. Once the job is assigned to a node, OpenMx? +estimates weight parameters for each connection within the given model +that results in a model covariance closest to the observed covariance of +the data. Each of these compute jobs returns its solution model object +as well as a file containing the minimum value achieved from that model. +The processing of these models on Ranger was achieved in <45 minutes. + + \section{Usage Experience} \subsection{Use on large numbers of sites in the Open Science Grid} From noreply at svn.ci.uchicago.edu Fri Jan 9 02:22:02 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Fri, 9 Jan 2009 02:22:02 -0600 (CST) Subject: [Swift-commit] r2407 - text/hpdc09submission Message-ID: <20090109082202.8D94A2281DC@www.ci.uchicago.edu> Author: benc Date: 2009-01-09 02:22:01 -0600 (Fri, 09 Jan 2009) New Revision: 2407 Modified: text/hpdc09submission/paper.latex Log: put figures in figure float enviornments Modified: text/hpdc09submission/paper.latex =================================================================== --- text/hpdc09submission/paper.latex 2009-01-09 08:06:03 UTC (rev 2406) +++ text/hpdc09submission/paper.latex 2009-01-09 08:22:01 UTC (rev 2407) @@ -732,11 +732,14 @@ site is straightforward. The swift runtime must prepare a remote working directory for each job with appropriate input files staged in; then it must execute the program; and then it must stage the output -files back out the submitting system. +files back out the submitting system. The site model used by Swift is +shown in figure \ref{FigureSwiftModel} -The model implemented by Swift is shown in figure E: - +\begin{figure}[htbp] \includegraphics{IMG_9463} +\caption{Swift site model} +\label{FigureSwiftModel} +\end{figure} A site in Swift consists of one or more worker nodes which will execute programs through some \emph{execution provider}, an @@ -931,7 +934,10 @@ \subsection{fMRI Application Example} +\begin{figure}[htbp] \includegraphics{IMG_fmridataset} +\caption{FMRI application} +\end{figure} \begin{verbatim} type Study { Group g[]; } From noreply at svn.ci.uchicago.edu Fri Jan 9 03:19:19 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Fri, 9 Jan 2009 03:19:19 -0600 (CST) Subject: [Swift-commit] r2408 - text/hpdc09submission Message-ID: <20090109091919.526DD2281A1@www.ci.uchicago.edu> Author: benc Date: 2009-01-09 03:19:18 -0600 (Fri, 09 Jan 2009) New Revision: 2408 Modified: text/hpdc09submission/paper.latex Log: add skenny coauthor; tidy up import artifacts (citations and non-ascii symbols) in related work section Modified: text/hpdc09submission/paper.latex =================================================================== --- text/hpdc09submission/paper.latex 2009-01-09 08:22:01 UTC (rev 2407) +++ text/hpdc09submission/paper.latex 2009-01-09 09:19:18 UTC (rev 2408) @@ -27,7 +27,9 @@ \and \alignauthor Michael Wilde \\ \affaddr{University of Chicago Computation Institute}\\ - \affaddr{Argonne National Laboratory} + \affaddr{Argonne National Laboratory} \\ +\alignauthor Sarah Kenny \\ + \affaddr{University of Chicago Computation Institute}\\ } \maketitle @@ -1263,7 +1265,7 @@ \section{Comparison to Other Systems} Coordination languages and systems such as Linda\cite{LINDA}, -Strand\cite{STRAN} and PCN\cite{PCN} [11] allow composition of +Strand\cite{STRAN} and PCN\cite{PCN} allow composition of distributed or parallel components, but usually require the components to be programmed in specific languages and linked with the systems; where we need to coordinate procedures that may already exist (e.g., @@ -1272,22 +1274,22 @@ coordination primitives for concurrent agents to put and retrieve tuples from a shared data space called tuplespace, which serves as the medium for communication and coordination. Strand and PCN use -single-assignment variables [7] as coordination mechanism. Like Linda, +single-assignment variables\cite{singleassigment} as coordination mechanism. Like Linda, Strand and PCN are data driven in the sense that the action of sending and receiving data are decoupled, and processes execute only when data are available. The Swift system uses similar mechanism called future [16] for workflow evaluation and scheduling. -MapReduce [8] also provides a programming models and a runtime system +MapReduce\cite{MapReduce} also provides a programming models and a runtime system to support the processing of large scale datasets. The two key -functions ?map? and ?reduce? are borrowed from functional language: a +functions \emph{map} and \emph{reduce} are borrowed from functional language: a map function iterates over a set of items, performs a specific operation on each of them and produces a new set of items, where a reduce function performs aggregation on a set of items. The runtime system automatically partitions input data and schedules the execution of programs in a large cluster of commodity machines. The system is made fault tolerant by checking worker nodes periodically and -reassigning failed jobs to other worker nodes. Sawzall [22] is an +reassigning failed jobs to other worker nodes. Sawzall\cite{sawzall} is an interpreted language that builds on MapReduce and separates the filtering and aggregation phases for more concise program specification and better parallelization. @@ -1301,7 +1303,7 @@ \begin{itemize} \item Programming model: MapReduce only supports key-value pairs as input -or output datasets and two types of computation functions ? map and +or output datasets and two types of computation functions - map and reduce; where Swift provides a type system and allows the definition of complex data structures and arbitrary computation procedures. @@ -1322,28 +1324,28 @@ \end{itemize} -BPEL [5] is a Web Service-based standard that specifies how a set of +BPEL\cite{BPEL} is a Web Service-based standard that specifies how a set of Web services interact to form a larger, composite Web Service. BPEL is -starting to be tested in scientific contexts [10]. While BPEL can +starting to be tested in scientific contexts\cite{BPELScience}. While BPEL can transfer data as XML messages, for very large scale datasets, data exchange must be handled via separate mechanisms. In BPEL 1.0 specification, it does not have support for dataset iterations. According to Emmerich et al, an application with repetitive patterns on a collection of datasets could result in a BPEL document of 200MB in size, and BPEL is cumbersome if not impossible to -write for computational scientists [10]. Although BPEL can use XML +write for computational scientists\cite{BPEL2}. Although BPEL can use XML Schema to describe data types, it does not provide support for mapping between a logical XML view and arbitrary physical representations. -DAGMan [6] provides a workflow engine that manages Condor jobs +DAGMan\cite{DAGman} provides a workflow engine that manages Condor jobs organized as directed acyclic graphs (DAGs) in which each edge -corresponds to an explicit task precedence. . It has no knowledge of +corresponds to an explicit task precedence. It has no knowledge of data flow, and in distributed environment works best with a higher-level, data-cognizant layer. It is based on static workflow graphs and lacks dynamic features such as iteration or conditional execution, although these features are being researched. -Pegasus [9] is primarily a set of DAG transformers. Pegasus planners +Pegasus\cite{Pegasus} is primarily a set of DAG transformers. Pegasus planners translate a workflow graph into a location specific DAGMan input file, adding stages for data staging, inter-site transfer and data registration. They can prune tasks for files that already exist, From noreply at svn.ci.uchicago.edu Fri Jan 9 03:23:33 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Fri, 9 Jan 2009 03:23:33 -0600 (CST) Subject: [Swift-commit] r2409 - text/hpdc09submission Message-ID: <20090109092333.2080C2281D8@www.ci.uchicago.edu> Author: benc Date: 2009-01-09 03:23:32 -0600 (Fri, 09 Jan 2009) New Revision: 2409 Modified: text/hpdc09submission/paper.latex Log: capitalise Swift Modified: text/hpdc09submission/paper.latex =================================================================== --- text/hpdc09submission/paper.latex 2009-01-09 09:19:18 UTC (rev 2408) +++ text/hpdc09submission/paper.latex 2009-01-09 09:23:32 UTC (rev 2409) @@ -125,7 +125,7 @@ model for representing collections of files and directories that are passed to Swift procedures, and a mapping model to convert between the physical representation of data on storage systems and the logical -representation of data structures in the abstract swift programming +representation of data structures in the abstract Swift programming model. Swift itself does not specify how the physical data environment is @@ -144,7 +144,7 @@ Swift is implemented by compiling to a Karajan program, which provides several benefits. A notable benefit visible to users is that of -providers. This enbles the swift execution model to be extended by +providers. This enbles the Swift execution model to be extended by adding new data providers and job execution providers. This is explained in more detail in section X: Swift Implementation. @@ -180,7 +180,7 @@ that an output variable of one function is passed as the input variable to the second function. -This dataflow model means that within a swift program, functions are +This dataflow model means that within a Swift program, functions are executed when their data is available - which is not necessarily in source-code order. @@ -731,7 +731,7 @@ \subsection{Executing on a remote site} With the above restrictions, execution of a unix program on a remote -site is straightforward. The swift runtime must prepare a remote +site is straightforward. The Swift runtime must prepare a remote working directory for each job with appropriate input files staged in; then it must execute the program; and then it must stage the output files back out the submitting system. The site model used by Swift is @@ -785,7 +785,7 @@ case exectuion will be attemted on all sites. In the presence of multiple sites, it is necessary to choose between the avalable sites. The Swift \emph{site selector} achivees this by maintaining a score for -each site which determines the load that swift will place on that site. +each site which determines the load that Swift will place on that site. As a site is successful in executing jobs, this score wil be increased and as the site is uncsuccessful, this score will be cdecreased. In addition to selecting between sites, this mechanism provides some @@ -925,8 +925,8 @@ \section{Example applications} TODO: two or three applications in brief. discuss both the application -behaviour in relation to swift, but underlying grid behaviour in -relation to swift +behaviour in relation to Swift, but underlying grid behaviour in +relation to Swift One app: CNARI + TeraGrid - small jobs (3s), many of them. @@ -1205,15 +1205,15 @@ \cite{LONIPIPELINE} \subsection{The IBM BG/P} - TODO: interesting from swift perspective: + TODO: interesting from Swift perspective: 1. getting things running at all: use of BG/P for loosely coupled tasks, which is a somewhat untraditional use of such a machine; lack of antive LRM that is anywhere near appropraite for that (pset granularity only, and only running one executable) - falkon as solution to this; -decomposition of large machine into multiple swift sites, with 1 pset = -1 swift site - how some of the problems related to running on multisite +decomposition of large machine into multiple Swift sites, with 1 pset = +1 Swift site - how some of the problems related to running on multisite grids are sort-of similar to problems within the BG/P - hierarchical scheduling of of jobs and hierarchical management of data. @@ -1258,7 +1258,7 @@ \section{Implementation status} - TODO: list how swift can be downloaded here. describe development group? + TODO: list how Swift can be downloaded here. describe development group? active development group; releases roughly every 2 months. @@ -1398,7 +1398,7 @@ people who have thus far contributed directly to this written paper: me, wilde -people who have thus far contributed to the swift work described here: +people who have thus far contributed to the Swift work described here: Swift core: me, wilde, hategan, milena, yong, ian CNARI: skenny OSG: mats From noreply at svn.ci.uchicago.edu Fri Jan 9 05:06:15 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Fri, 9 Jan 2009 05:06:15 -0600 (CST) Subject: [Swift-commit] r2410 - text/hpdc09submission Message-ID: <20090109110615.D29DA2281D8@www.ci.uchicago.edu> Author: benc Date: 2009-01-09 05:06:15 -0600 (Fri, 09 Jan 2009) New Revision: 2410 Modified: text/hpdc09submission/paper.latex Log: procedures are called procedures, not functions. some rearrangements of text Modified: text/hpdc09submission/paper.latex =================================================================== --- text/hpdc09submission/paper.latex 2009-01-09 09:23:32 UTC (rev 2409) +++ text/hpdc09submission/paper.latex 2009-01-09 11:06:15 UTC (rev 2410) @@ -97,21 +97,19 @@ and other arguments that are the inputs and outputs of each program invocation. This formal but simple specification of data inputs and outputs enables Swift to provide four critical features not provided -by scripting languages like Perl, Python, Tcl, or the various command-line ``shells'': +by scripting languages like Perl, Python, or the various command-line shells: \begin{itemize} \item It can provide location transparent execution: automatically -selecting a location (i.e. a specific distributed system) for a given -program invocation -\item It can automatically parallelize (and throttle) the execution flow +selecting a location for a given program invocation +\item It can automatically parallelize the execution flow of program invocations, executing invocations that have no data -dependencies in parallel. -\item It can record the provenance of derived data objects (and related -caller-callee information) -\item The Swift execution engine records the progress of a script's -execution, so if interrupted or terminated, it can be restarted from -the point of interruption, without re-executing any work that was -logged as successfully completed. +dependencies in parallel, whilst throttling parallel invocations to a rate +appropriate for each execution location +\item It can record the provenance of derived data objects +\item It can provide reliability through retrying of failed executions during +a run and by logging completed work so that an interrupted script can be +restarted from the point of interruption. \end{itemize} In the rest of this section, we provide an overview of Swift's main @@ -132,7 +130,7 @@ implemented. This specification is instead left up to a set of mappers. -\emph{Execution of atomic functions}. Underlying this is an +\emph{Execution of atomic procedures}. Underlying this is an implementation to execute scripts on grid and other platforms, providing built-in site selection, data management and reliability. Swift scripts can be tested on a single local @@ -146,7 +144,7 @@ several benefits. A notable benefit visible to users is that of providers. This enbles the Swift execution model to be extended by adding new data providers and job execution providers. This is -explained in more detail in section X: Swift Implementation. +explained in more detail in section \ref{ExecutingSites}: Executing on a remote site. \emph{Minimalist nature of Swift}. As a scripting language, Swift is intentionally designed to be a sparse, minimal language. Its primary @@ -165,28 +163,26 @@ Swift programs typically contain very little code to manipulate data directly. -\emph{Function composition and Libraries}. Swift programs are composed -starting with ``atomic'' functions, and then higher level functions are -composed as pipelines of sub-functions. The basic structure of a -composite function is a graph of calls to other functions. +\emph{Procedure composition and Libraries}. Swift programs are composed +starting with ``atomic'' procedures, and then higher level procedures are +composed as pipelines of sub-procedures. The basic structure of a +composite procedure is a graph of calls to other procedures. -Recursive function calls [are / are not] supported [TODO: Relevant?] - \emph{Variables, single assignment and data flow}. Swift variables hold primitive values, or references to datasets, which are files or collections of files. Variables are ``single assignment'', which is the -basis for Swift's model of function chaining. Functions are executed -when their variables are all set. Functions are chained by specifying -that an output variable of one function is passed as the input -variable to the second function. +basis for Swift's model of procedure chaining (TODO execution ordering?). Procedures are executed +when their input variables are all set. Procedures are chained by specifying +that an output variable of one procedure is passed as the input +variable to the second procedure. -This dataflow model means that within a Swift program, functions are +This dataflow model means that within a Swift program, procedures are executed when their data is available - which is not necessarily in source-code order. -Swift function arguments are lists of such variables, and return lists +Swift procedure arguments are lists of such variables, and return lists of such variables. Swift does not yet have a notion of -libraries. Swift programs execute as if all functions called in the +libraries. Swift programs execute as if all procedures called in the script are present in a single logical source file and are thus passed to the Swift virtual machine all at once. @@ -232,7 +228,7 @@ automatically make remote execution transparent. In a sense, Swift adds to scripting what RPC adds to programming: by -formalizing the inputs and outputs of ``applications-as-functions'', it +formalizing the inputs and outputs of ``applications-as-procedures'', it provides a way to make the remote - and hence parallel - execution of applications fairly transparent. @@ -243,10 +239,6 @@ set of data. Unlike make, in which case the derived product is only produced once, in Swift, derived datasets are repetitively derived. -Could existing programs execute Swift calls through a library -approach? The answer to this is certainly ``yes'', and we examine this -in more detail in section X. - Usage of Swift. Swift is achieving growing use on a variety of science problems. @@ -729,6 +721,7 @@ interesting? \subsection{Executing on a remote site} +\label{ExecutingSites} With the above restrictions, execution of a unix program on a remote site is straightforward. The Swift runtime must prepare a remote @@ -1256,6 +1249,11 @@ on what is available now - logprocessing module, as well as mentioning CEDPS\cite{CEDPS} as somewhat promising(?) for the future. +\subsection{Swift as a library} +Could existing programs execute Swift calls through a library +approach? The answer to this is certainly ``yes''. (?) + + \section{Implementation status} TODO: list how Swift can be downloaded here. describe development group? @@ -1272,7 +1270,7 @@ legacy applications), were coded in various programming languages and run in different platforms and architectures. Linda defines a set of coordination primitives for concurrent agents to put and retrieve -tuples from a shared data space called tuplespace, which serves as the +tuples from a shared data space called a tuple space, which serves as the medium for communication and coordination. Strand and PCN use single-assignment variables\cite{singleassigment} as coordination mechanism. Like Linda, Strand and PCN are data driven in the sense that the action of sending From noreply at svn.ci.uchicago.edu Fri Jan 9 06:09:35 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Fri, 9 Jan 2009 06:09:35 -0600 (CST) Subject: [Swift-commit] r2411 - text/hpdc09submission Message-ID: <20090109120935.7C4032281DC@www.ci.uchicago.edu> Author: benc Date: 2009-01-09 06:09:34 -0600 (Fri, 09 Jan 2009) New Revision: 2411 Modified: text/hpdc09submission/paper.latex Log: cross references from the introduction into meatier sections; rearrangement of variables / dataset typing intro. Remove table of operator and built in procedures/functions as unnecessarily verbose. Move library paragraph to future section Modified: text/hpdc09submission/paper.latex =================================================================== --- text/hpdc09submission/paper.latex 2009-01-09 11:06:15 UTC (rev 2410) +++ text/hpdc09submission/paper.latex 2009-01-09 12:09:34 UTC (rev 2411) @@ -95,8 +95,9 @@ encapsulates the invocation of ``ordinary programs'' - technically, POSIX \emph{exec()} operations - in a manner that explicitly specifies the files and other arguments that are the inputs and outputs of each program -invocation. This formal but simple specification of data inputs and -outputs enables Swift to provide four critical features not provided +invocation. This formal but simple model (elaborated on in section +\ref{LanguageEnvironment}) +enables Swift to provide four critical features not provided by scripting languages like Perl, Python, or the various command-line shells: \begin{itemize} @@ -117,35 +118,15 @@ sections. [TODO: we will need to adjust between how much to specify here, and how much to state just before each construct is introduced. -\emph{Dataset typing and mapping model}. Swift provides for the high level -specification of collections of data and of how those collections should -be processed by component programs. It provides a structured data-type -model for representing collections of files and directories that are -passed to Swift procedures, and a mapping model to convert between the -physical representation of data on storage systems and the logical -representation of data structures in the abstract Swift programming -model. - -Swift itself does not specify how the physical data environment is -implemented. This specification is instead left up to a set of -mappers. - \emph{Execution of atomic procedures}. Underlying this is an implementation to execute scripts on grid and other platforms, providing built-in site selection, data management and reliability. Swift scripts can be tested on a single local -workstation. The same script can then be executed on a cluster, one o +workstation. The same script can then be executed on a cluster, one or more grids of clusters, and on large scale parallel supercomputers -such as the Sun Constellation (ref) or the IBM Blue Gene / P. +such as the Sun Constellation (ref) or the IBM Blue Gene/P. More information +about the implementation is found in section \ref{Execution}. -Swift programs can also span environments: ...explain - -Swift is implemented by compiling to a Karajan program, which provides -several benefits. A notable benefit visible to users is that of -providers. This enbles the Swift execution model to be extended by -adding new data providers and job execution providers. This is -explained in more detail in section \ref{ExecutingSites}: Executing on a remote site. - \emph{Minimalist nature of Swift}. As a scripting language, Swift is intentionally designed to be a sparse, minimal language. Its primary purpose is to coordinate, throttle, and sequence the execution of @@ -163,29 +144,30 @@ Swift programs typically contain very little code to manipulate data directly. -\emph{Procedure composition and Libraries}. Swift programs are composed -starting with ``atomic'' procedures, and then higher level procedures are -composed as pipelines of sub-procedures. The basic structure of a -composite procedure is a graph of calls to other procedures. -\emph{Variables, single assignment and data flow}. Swift variables hold -primitive values, or references to datasets, which are files or -collections of files. Variables are ``single assignment'', which is the -basis for Swift's model of procedure chaining (TODO execution ordering?). Procedures are executed -when their input variables are all set. Procedures are chained by specifying -that an output variable of one procedure is passed as the input -variable to the second procedure. -This dataflow model means that within a Swift program, procedures are -executed when their data is available - which is not necessarily in -source-code order. -Swift procedure arguments are lists of such variables, and return lists -of such variables. Swift does not yet have a notion of -libraries. Swift programs execute as if all procedures called in the -script are present in a single logical source file and are thus passed -to the Swift virtual machine all at once. +\emph{Variables, data flow and procedures}. Swift variables hold +primitive values, or collections of files. +Variables are \emph{single assignment}, which is the +basis for Swift's model of procedure chaining. +Procedures are executed when their input parameters are all defined (i.e. +have a value). +Procedures are chained by specifying that an output variable of one +procedure is passed as the input variable to the second procedure. This +dataflow model means that within a script, procedures are not executed +in source-code order; instead they are executed as input data becomes +available. +Variables are given a type, and when they contain collections of files, +are associated with a \emph{mapper} which indicates how the layout of +data files is associated with the logical representation in the Swift +data model. See section \ref{LanguageTypes}. + +Swift programs are composed +starting with \emph{atomic procedures} which execute component programs, +and then higher level procedures are composed as pipelines of sub-procedures. + \subsection{Rationale for creating Swift} \emph{TODO: This section needs much polishing/condensing.} @@ -455,6 +437,10 @@ declared with the \verb|app| keyword, as they invoke other SwiftScript procedures rather than a component program. +The basic structure of a composite procedure is a graph of calls to +other procedures. (TODO: does talking about call graphs make sense in +the context of programming language-style descriptions?) + \begin{verbatim} (file output) process (file input) { file intermediate; @@ -491,6 +477,7 @@ TODO: talk about anonymous mapping somewhere - a mappers section... \subsection{More about types} +\label{LanguageTypes} Each variable and procedure parameter in SwiftScript is strongly typed. Types are used to structure data, to aid in debugging and program @@ -578,66 +565,13 @@ TODO mappings may be to URLs, not only to local filesystem files; and more explicit description of what mapping is. -\subsection{Operators, built-in procedures, functions and mappers} +\subsection{Swift mappers} +Swift contains a number of built-in mappers. A representative sample +of these is listed in table \ref{mappertable}. -SwiftScript has a number of built in operators, procedures, functions -are very briefly described in tables \ref{optable}, \ref{proctable} -and \ref{mappertable}. - \begin{table}[htb] -\begin{tabular}{|c|p{2in}|} -\hline -+ - * & usual mathematical operations \\ -\hline -== != > < & \\ ->= <= & usual comparison operations \\ -\hline -! \&\& || & usual boolean operations \\ -\hline -/ & floating point division \\ -\hline -\%/ \%\% & integer division and mod \\ -\hline -( ) & parentheses \\ -\hline -\end{tabular} -\caption{SwiftScript operators} -\label{optable} -\end{table} - -% ops: @ [ ] . - -\begin{table}[htb] \begin{tabular}{|r|p{2in}|} \hline -readData & \\ -readData2 & read data into a structure from a file \\ -\hline -trace & output trace debugging information \\ -\hline - at arg & returns a named commandline argument \\ -\hline - at extractint & reads an integer from a file \\ -\hline - at filename & \\ - at filenames & returns the filename(s) that are mapped to an expression \\ -\hline - at regexp \\ -\hline - at strcat \\ -\hline - at strcut \\ -\hline - at toint \\ -\hline -\end{tabular} -\caption{SwiftScript procedures and functions} -\label{proctable} -\end{table} - -\begin{table}[htb] -\begin{tabular}{|r|p{2in}|} -\hline \verb|single_file_mapper| & maps a single explicitly named file \\ \hline \verb|filesys_mapper| & maps files matching a pattern into an array \\ @@ -651,6 +585,7 @@ \end{table} \subsection{The execution environment for component programs} +\label{LanguageEnvironment} A SwiftScript \verb|app| declaration describes how a component program is invoked. In order to ensure the correctness of the @@ -716,10 +651,13 @@ section (change titles...) \section{Execution} +\label{Execution} +Swift is implemented by compiling to a Karajan program, which provides +several benefits. A notable benefit visible to users is that of +providers. This enbles the Swift execution model to be extended by +adding new data providers and job execution providers. This is +explained in more detail in section \ref{ExecutingSites}: Executing on a remote site. -TODO could briefly describe the execution layer here? how much depth is -interesting? - \subsection{Executing on a remote site} \label{ExecutingSites} @@ -1253,7 +1191,19 @@ Could existing programs execute Swift calls through a library approach? The answer to this is certainly ``yes''. (?) +\subsection{Swift library / source code management} + +(TODO benc: unclear what is meant by this paragraph. it was originally in the +introduction, but as it appears to talk about something which does not (yet?) +exist, then it is probably better being absorbed into the future section) + + Swift does not yet have a notion of libraries. Swift programs execute as +if all procedures called in the script are present in a single logical +source file and are thus passed to the Swift virtual machine all at once. + + + \section{Implementation status} TODO: list how Swift can be downloaded here. describe development group? From noreply at svn.ci.uchicago.edu Fri Jan 9 06:23:13 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Fri, 9 Jan 2009 06:23:13 -0600 (CST) Subject: [Swift-commit] r2412 - text/hpdc09submission Message-ID: <20090109122313.487592281A0@www.ci.uchicago.edu> Author: benc Date: 2009-01-09 06:23:12 -0600 (Fri, 09 Jan 2009) New Revision: 2412 Modified: text/hpdc09submission/paper.latex Log: reformat fmri example code to fit in column Modified: text/hpdc09submission/paper.latex =================================================================== --- text/hpdc09submission/paper.latex 2009-01-09 12:09:34 UTC (rev 2411) +++ text/hpdc09submission/paper.latex 2009-01-09 12:23:12 UTC (rev 2412) @@ -882,32 +882,33 @@ type Group { Subject s[]; } type AirVector { Air a[]; } type Subject { - Volume anat; - Run run[]; + Volume anat; + Run run[]; } (Run resliced) reslice_wf ( Run r) { - Run yR = reorientRun( r , "y", "n" ); - Run roR = reorientRun( yR , "x", "n" ); - Volume std = roR.v[1]; - AirVector roAirVec = - alignlinearRun(std, roR, 12, 1000, 1000, "81 3 3"); - resliced = resliceRun( roR, roAirVec, "-o", "-k"); + Run yR = reorientRun( r , "y", "n" ); + Run roR = reorientRun( yR , "x", "n" ); + Volume std = roR.v[1]; + AirVector roAirVec = alignlinearRun(std, roR, + 12, 1000, 1000, "81 3 3"); + resliced = resliceRun( roR, roAirVec, "-o", + "-k"); } -(Volume ov) reorient (Volume iv, string direction, string overwrite) { - app { - reorient @filename(iv.hdr) - @filename(ov.hdr) - direction - overwrite; - } +app (Volume ov) reorient (Volume iv, + string direction, string overwrite) { + + reorient @filename(iv.hdr) @filename(ov.hdr) + direction overwrite; + } -(Run or) reorientRun (Run ir, string direction, string overwrite) { - foreach Volume iv, i in ir.v { - or.v[i] = reorient (iv, direction, overwrite); - } +(Run or) reorientRun (Run ir, string direction, + string overwrite) { + foreach Volume iv, i in ir.v { + or.v[i] = reorient (iv, direction, overwrite); + } } \end{verbatim} From noreply at svn.ci.uchicago.edu Fri Jan 9 06:37:48 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Fri, 9 Jan 2009 06:37:48 -0600 (CST) Subject: [Swift-commit] r2413 - text/hpdc09submission Message-ID: <20090109123748.4CEB52281D8@www.ci.uchicago.edu> Author: benc Date: 2009-01-09 06:37:47 -0600 (Fri, 09 Jan 2009) New Revision: 2413 Modified: text/hpdc09submission/paper.latex Log: more crossref; clarify position of falkon and cog coasters Modified: text/hpdc09submission/paper.latex =================================================================== --- text/hpdc09submission/paper.latex 2009-01-09 12:23:12 UTC (rev 2412) +++ text/hpdc09submission/paper.latex 2009-01-09 12:37:47 UTC (rev 2413) @@ -102,7 +102,8 @@ \begin{itemize} \item It can provide location transparent execution: automatically -selecting a location for a given program invocation +selecting a location for a given program invocation (section +\ref{ExecutingSites}) \item It can automatically parallelize the execution flow of program invocations, executing invocations that have no data dependencies in parallel, whilst throttling parallel invocations to a rate @@ -110,7 +111,7 @@ \item It can record the provenance of derived data objects \item It can provide reliability through retrying of failed executions during a run and by logging completed work so that an interrupted script can be -restarted from the point of interruption. +restarted from the point of interruption. (section \ref{ExecutingReliably}) \end{itemize} In the rest of this section, we provide an overview of Swift's main @@ -732,6 +733,7 @@ to load caused by other users). \subsection{Executing reliably} +\label{ExecutingReliably} The functional/dataflow(?) nature of SwiftScript with a clearly defined interface to imperative components, in addition to allowing Swift great @@ -1307,8 +1309,8 @@ libraries and primitives for job scheduling, data transfer, and Grid job submission; Swift adds support for high-level abstract specification of large parallel computations, data abstraction, and -workflow restart, and also (via Falkon) fast, reliable execution over -multiple Grid sites. +workflow restart, reliable execution over multiple Grid sites, and +(via Falkon and CoG coasters) fast job execution. \section{Future Work} From noreply at svn.ci.uchicago.edu Fri Jan 9 07:31:48 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Fri, 9 Jan 2009 07:31:48 -0600 (CST) Subject: [Swift-commit] r2414 - trunk/src/org/griphyn/vdl/karajan/lib Message-ID: <20090109133148.85B752281D8@www.ci.uchicago.edu> Author: benc Date: 2009-01-09 07:31:47 -0600 (Fri, 09 Jan 2009) New Revision: 2414 Modified: trunk/src/org/griphyn/vdl/karajan/lib/VDLFunction.java Log: add assertion that addFutureListener will only be called with a lock on the root of the handle to listen to Modified: trunk/src/org/griphyn/vdl/karajan/lib/VDLFunction.java =================================================================== --- trunk/src/org/griphyn/vdl/karajan/lib/VDLFunction.java 2009-01-09 12:37:47 UTC (rev 2413) +++ trunk/src/org/griphyn/vdl/karajan/lib/VDLFunction.java 2009-01-09 13:31:47 UTC (rev 2414) @@ -492,11 +492,13 @@ protected static Future addFutureListener(VariableStack stack, DSHandle handle) throws ExecutionException { + assert Thread.holdsLock(handle.getRoot()); return getFutureWrapperMap(stack).addNodeListener(handle); } protected static FutureIterator addFutureListListener(VariableStack stack, DSHandle handle, Map value) throws ExecutionException { + assert Thread.holdsLock(handle.getRoot()); return getFutureWrapperMap(stack).addFutureListListener(handle, value).futureIterator(stack); } From noreply at svn.ci.uchicago.edu Fri Jan 9 08:14:48 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Fri, 9 Jan 2009 08:14:48 -0600 (CST) Subject: [Swift-commit] r2415 - text/hpdc09submission Message-ID: <20090109141448.CAD0A2281A0@www.ci.uchicago.edu> Author: wilde Date: 2009-01-09 08:14:46 -0600 (Fri, 09 Jan 2009) New Revision: 2415 Added: text/hpdc09submission/IMG_fmridataset.png Log: Was missing from prior commit. This image may be omited later if we decide not to use this example. Added: text/hpdc09submission/IMG_fmridataset.png =================================================================== (Binary files differ) Property changes on: text/hpdc09submission/IMG_fmridataset.png ___________________________________________________________________ Name: svn:mime-type + application/octet-stream From noreply at svn.ci.uchicago.edu Fri Jan 9 08:52:46 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Fri, 9 Jan 2009 08:52:46 -0600 (CST) Subject: [Swift-commit] r2416 - text/hpdc09submission Message-ID: <20090109145246.86B042281DC@www.ci.uchicago.edu> Author: benc Date: 2009-01-09 08:52:45 -0600 (Fri, 09 Jan 2009) New Revision: 2416 Modified: text/hpdc09submission/paper.bib Log: LONI pipeline cite now goes to their website rather than Lorem Ipsum. some actual LONI Pipeline paper might be better here, though Modified: text/hpdc09submission/paper.bib =================================================================== --- text/hpdc09submission/paper.bib 2009-01-09 14:14:46 UTC (rev 2415) +++ text/hpdc09submission/paper.bib 2009-01-09 14:52:45 UTC (rev 2416) @@ -120,13 +120,8 @@ pages = {237--247} } - at article{LONIPIPELINE, - title = {{LONIPIPELINE}}, - author = {John Smith and Jane Doe}, - journal = {{Cluster Computing}}, - volume = {5(3)}, - year = 2002, - pages = {237--247} + at misc{LONIPIPELINE, + title="LONI Pipeline http://pipeline.loni.ucla.edu/" } @article{MAPREDUCE, From noreply at svn.ci.uchicago.edu Fri Jan 9 09:15:37 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Fri, 9 Jan 2009 09:15:37 -0600 (CST) Subject: [Swift-commit] r2417 - text/hpdc09submission Message-ID: <20090109151537.C908D2281DC@www.ci.uchicago.edu> Author: benc Date: 2009-01-09 09:15:37 -0600 (Fri, 09 Jan 2009) New Revision: 2417 Modified: text/hpdc09submission/IMG_fmridataset.png Log: shrink IMG_fmridataset.png as it was too wide for the column and so messing up float placement Modified: text/hpdc09submission/IMG_fmridataset.png =================================================================== (Binary files differ) From noreply at svn.ci.uchicago.edu Fri Jan 9 09:25:24 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Fri, 9 Jan 2009 09:25:24 -0600 (CST) Subject: [Swift-commit] r2418 - text/hpdc09submission Message-ID: <20090109152524.C9BB32281A1@www.ci.uchicago.edu> Author: benc Date: 2009-01-09 09:25:24 -0600 (Fri, 09 Jan 2009) New Revision: 2418 Modified: text/hpdc09submission/paper.latex Log: remove XDTM present-tense references; LONI GUI note; correct an identifier Modified: text/hpdc09submission/paper.latex =================================================================== --- text/hpdc09submission/paper.latex 2009-01-09 15:15:37 UTC (rev 2417) +++ text/hpdc09submission/paper.latex 2009-01-09 15:25:24 UTC (rev 2418) @@ -924,16 +924,15 @@ represented by an Image (voxels) and a Header (scanner metadata). An Air is a parameter file for spatial adjustment, and an AirVector is a set of such parameter files. Datasets are operated on by procedures, -which take typed data described by XDTM as input, perform computations -on those data, and produce data described by XDTM as output. The -procedure reslice\_wf defines a compound procedure, which may comprise -a series of procedure calls, using variables or datasets to establish -data dependencies. Such procedures can themselves be called by other -procedures, thus defining a potentially large and complex execution -graph. +which take typed data described by a mapper, perform computations +on those data, and produce data to be stored in locations specified +by a mapper. The +procedure reslice\_wf defines a compound procedure, which comprises +a series of procedure calls, using variables to establish +data dependencies. -In the example, reslice\_wf essentially defines a simple four-step -pipeline computation, using variables and/or datasets to establish +In the example, reslice\_wf defines a four-step +pipeline computation, using variables to establish data dependencies. It applies reorientRun to a run first in the x axis and then in the y axis, and then aligns each image in the resulting run with the first image. The program alignlinear determines how to @@ -941,25 +940,22 @@ air parameter file. The actual alignment is done by the program reslice. Note that variable yR, being the output of the first step and the input of the second step, defines the data dependencies between -the two steps. The pipeline is illustrated in the center of Figure 2, -while on the right we show the expanded graph for a 20-volume +the two steps. The pipeline is illustrated in the center of figure \ref{FMRIFigure2}, while in figure \ref{FMRIgraph} we show the expanded graph for a 20-volume run. Each volume comprises an image file and a header file, so there are a total of 40 input files and 40 output files. We can also apply the same procedure to a run containing hundreds or thousands of volumes. In this example we show the details of the procedure reorientRun, -which is also a compound procedure. Note the typed input arguments (to -the right of the procedure name) and output argument (to the -left). The foreach statement defines an iteration over the input run +which is also a compound procedure. +The foreach statement defines an iteration over the input run ir and applies the procedure reorient (which rotates a brain image along a certain axis) to each volume in the run to produces a reoriented run or. Because the multiple calls to reorient operate on independent data elements, they can proceed in parallel. The -procedure reorient in this example is an atomic procedure, which -specifies the interface to calling an executable program. This -procedure has typed input parameters iv, direction and overwrite and -one output ov. The body of this particular procedure specifies that it +procedure reorient in this example is an atomic procedure. +This procedure has typed input parameters iv, direction and overwrite and +one output ov. The body specifies that it invokes a program (conveniently, also called reorient) that will be dynamically mapped to a binary executable. (This executable will execute at an execution site chosen by the Swift runtime system.) The @@ -995,7 +991,7 @@ \includegraphics{omxFigure} \caption{Schematic of a single OpenMx model containing 4 regions of -interest (I through K) with 5 regression starting values (asymmetric +interest (I through L) with 5 regression starting values (asymmetric connections) of weight 0.75 and 4 residual variances (symmetric connections) of weight 1.0} \end{figure} @@ -1093,7 +1089,7 @@ TODO: what's the conclusion (if any) of this section? -\section{Swift as a framework for ongoing and experimental work} +\section{Future work} \subsection{Automatic characterisation of site and application behaviour} @@ -1133,12 +1129,20 @@ TODO write about the stuff on provenance db that I did before - that whole document of notes... -\subsection{Workflow GUIs as generators of SwiftScript programs - LONI -Pipeline} +\subsection{GUI workflow design tools} -\cite{LONIPIPELINE} +In contrast to a text-oriented programming language like SwiftScript, +some scientists prefer to design simple programs using GUI design tools. +An example of this is the LONI Pipeline tool\cite{LONIPIPELINE}. Preliminary +investigations suggest that scientific workflows designed with that tool +can be straightforwardly compiled into SwiftScript and thus benefit from +Swift's execution system. + \subsection{The IBM BG/P} +TODO: hopefully Ioan will write some section that is interesting in this +area. + TODO: interesting from Swift perspective: 1. getting things running at all: use of BG/P for loosely coupled @@ -1312,8 +1316,6 @@ workflow restart, reliable execution over multiple Grid sites, and (via Falkon and CoG coasters) fast job execution. -\section{Future Work} - \section{Acknowledgements} TODO: authors beyond number 3 go here according to ACM style guide, rather From noreply at svn.ci.uchicago.edu Fri Jan 9 09:55:30 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Fri, 9 Jan 2009 09:55:30 -0600 (CST) Subject: [Swift-commit] r2419 - text/hpdc09submission Message-ID: <20090109155530.365B32281DC@www.ci.uchicago.edu> Author: benc Date: 2009-01-09 09:55:29 -0600 (Fri, 09 Jan 2009) New Revision: 2419 Modified: text/hpdc09submission/paper.latex Log: some elaboration on provenance Modified: text/hpdc09submission/paper.latex =================================================================== --- text/hpdc09submission/paper.latex 2009-01-09 15:25:24 UTC (rev 2418) +++ text/hpdc09submission/paper.latex 2009-01-09 15:55:29 UTC (rev 2419) @@ -1124,11 +1124,21 @@ \subsection{Provenance} +Swift produces log information regarding the provenance of its output files. +In an existing development module, this information can be imported into +relational and XML databases for later querying. + +Providing an efficient query mechanism for such provenance data is an area +of ongoing research; whilst many queries can be easily answered efficiently +by a suitably indexed relational or XML database, the lack of support for +efficient transitive queries can make some common queries involving +either transitivity over time (such as 'find all data derived from input +file X') or over dataset containment (such as 'find all procedures which +took an input containing the file F') expensive to evaluate and awkward +to express. + TODO reference the VDC from VDS\cite{VDS} -TODO write about the stuff on provenance db that I did before - that whole -document of notes... - \subsection{GUI workflow design tools} In contrast to a text-oriented programming language like SwiftScript, From noreply at svn.ci.uchicago.edu Fri Jan 9 15:02:26 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Fri, 9 Jan 2009 15:02:26 -0600 (CST) Subject: [Swift-commit] r2420 - text/hpdc09submission Message-ID: <20090109210226.64FC6228187@www.ci.uchicago.edu> Author: hategan Date: 2009-01-09 15:02:25 -0600 (Fri, 09 Jan 2009) New Revision: 2420 Modified: text/hpdc09submission/paper.latex Log: made parts of the abstract more clear Modified: text/hpdc09submission/paper.latex =================================================================== --- text/hpdc09submission/paper.latex 2009-01-09 15:55:29 UTC (rev 2419) +++ text/hpdc09submission/paper.latex 2009-01-09 21:02:25 UTC (rev 2420) @@ -39,12 +39,12 @@ \begin{abstract} Scientists, engineers and business analysts often pursue their work by -applying application programs to massive collections of file-based data. +applying domain-specific programs to massive collections of file-based data. Distributed and parallel computing resources provide a powerful way to get more of this type of work done faster, but using such resources -greatly increases the complexity of the computing task. +imposes additional complexities. -Swift addresses this problem with a scripting language for composing +Swift addresses these complexities with a scripting language for composing ordinary application programs (serial or parallel) into more powerful distributed, parallelized applications. Applications expressed in Swift are location-independent and automatically parallelized. @@ -54,14 +54,14 @@ unreliable and dynamic aspects of wide-area distributed resources. The language provides a high level representation of collections of data -and a specification of how those collections are to be mapped to an +and a specification of how those collections are to be mapped to that abstract representation and processed by component programs. Underlying this is an implementation that executes component applications on grids and other parallel platforms, providing automated site selection, data -management and reliability. +management, and reliability. We present the language, details of the implementation, application -examples, measurements and ongoing research. +examples, measurements, and ongoing research. \end{abstract} From noreply at svn.ci.uchicago.edu Fri Jan 9 16:00:33 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Fri, 9 Jan 2009 16:00:33 -0600 (CST) Subject: [Swift-commit] r2421 - text/hpdc09submission Message-ID: <20090109220033.384C82281D8@www.ci.uchicago.edu> Author: wilde Date: 2009-01-09 16:00:32 -0600 (Fri, 09 Jan 2009) New Revision: 2421 Modified: text/hpdc09submission/paper.latex Log: Merged conflicts with Mihael's latest edits on the abstract. Edits to the intro and a few typos elsewhere. Modified: text/hpdc09submission/paper.latex =================================================================== --- text/hpdc09submission/paper.latex 2009-01-09 21:02:25 UTC (rev 2420) +++ text/hpdc09submission/paper.latex 2009-01-09 22:00:32 UTC (rev 2421) @@ -10,7 +10,6 @@ \title{SwiftScript - a language for loosely coupled distributed parallel scripting \\ draft - contact benc at ci.uchicago.edu} - % ACM styleguide says max 3 authors here, rest in acknowledgements \numberofauthors{4} @@ -44,35 +43,39 @@ get more of this type of work done faster, but using such resources imposes additional complexities. -Swift addresses these complexities with a scripting language for composing +Swift reduces these complexities with a scripting language for composing ordinary application programs (serial or parallel) into more powerful -distributed, parallelized applications. Applications expressed in Swift -are location-independent and automatically parallelized. +parallel applications that can be executed on distributed +resources. Applications expressed +in Swift are location-independent and automatically parallelized. -Swift can execute scripts that perform tens to hundreds of thousands of +Swift can execute scripts that perform hundreds of thousands of program invocations on highly parallel resources, and deal with the unreliable and dynamic aspects of wide-area distributed resources. -The language provides a high level representation of collections of data -and a specification of how those collections are to be mapped to that -abstract representation and processed by component programs. Underlying -this is an implementation that executes component applications on grids -and other parallel platforms, providing automated site selection, data -management, and reliability. +The language provides a high level representation of collections of +data and a specification of how those collections are to be mapped to +that abstract representation and processed by component +programs. Underlying this is an implementation that executes the +component programs on grids and other parallel platforms, providing +automated site selection, data management, and reliability. We present the language, details of the implementation, application examples, measurements, and ongoing research. +% TODO: DECIDE: Drop SwiftScript, use Swift throughout to refer to the language? + \end{abstract} \section{Introduction} Swift is a scripting language designed for composing ordinary application programs (serial or parallel) into distributed, -parallelized, applications. It can execute scripts that perform tens -to hundreds of thousands of program invocations on highly parallel -resources, and its design is intended to scale to runs of many -millions of invocations. +parallelized, applications. It can execute scripts that perform +hundreds of thousands of program invocations on highly parallel +resources, and its design is expected to scale to handle millions of +invocations and to thus address the needs of ``many-task +computing''\cite{MTC}\cite{FALKONSC08}. Swift's purpose is to enable ``loosely coupled scripting'' in a convenient, powerful fashion. It is intended to serve as a higher @@ -82,13 +85,20 @@ execution of programs (which can themselves be scripts written in any other scripting language, or binary executables). -As a ``parallel scripting -language'', Swift is typically used to specify and execute scientific -``workflows'' - which we define here as the execution of a -series of steps to perform larger domain-specific tasks. We use the -term workflow as defined by (Taylor et. al. 2006). So we often call a -Swift script a workflow. TODO: Drop this paragraph/concept? Or crisp it up. +Swift's contribution and primary value resides in the fact that it +provides the minimal language constructs needed to coerce the process +of specifiying how applications are glued together at large scale into +a simple compact form of expression, while keeping the language simple +and elegant, and carefully not replacing or overlapping with the tasks +that existing scripting langauges do well. Swift regularizes and +abstracts both the notion of data and process for distributed parallel +execution of application programs. +This paper goes deeper than prior papers\cite{SWIFTSWF08,SWIFTNNN} in +describing the details of the swift language, detailing how it is +implemented, and discussing its role in the toolkit of solutions for +distributed parallel programming. + \subsection{Swift language concepts} The Swift programming model is data-oriented: it @@ -101,17 +111,22 @@ by scripting languages like Perl, Python, or the various command-line shells: \begin{itemize} -\item It can provide location transparent execution: automatically -selecting a location for a given program invocation (section -\ref{ExecutingSites}) -\item It can automatically parallelize the execution flow -of program invocations, executing invocations that have no data -dependencies in parallel, whilst throttling parallel invocations to a rate -appropriate for each execution location -\item It can record the provenance of derived data objects -\item It can provide reliability through retrying of failed executions during -a run and by logging completed work so that an interrupted script can be -restarted from the point of interruption. (section \ref{ExecutingReliably}) + +\item Location transparent execution: automatically selecting a +location for each program invocation (section \ref{ExecutingSites}) + +\item Automatic parallelization of program invocations invoking +programs that have no data dependencies in parallel (section +\ref{Language}) and throttling invocations to a rate appropriate for +each execution location (section \ref{ExecutingSites}). + +\item Reliability through retry (and re-siting) of failed executions +and restart of interrupted scripts from the point of +failure. (section \ref{ExecutingReliably}) + +\item Recording the provenance of derived data objects (section +\ref{Provenance}). + \end{itemize} In the rest of this section, we provide an overview of Swift's main @@ -145,9 +160,6 @@ Swift programs typically contain very little code to manipulate data directly. - - - \emph{Variables, data flow and procedures}. Swift variables hold primitive values, or collections of files. Variables are \emph{single assignment}, which is the @@ -233,6 +245,7 @@ \section{The SwiftScript language} +\label{Language} \subsection{Language basics} @@ -336,7 +349,8 @@ such as remote multisite execution and fault tolerance that will be discussed in a later section. -\subsection{Working with arrays} +\subsection{Arrays and Parallel Execution} +\label{ArraysAndForeach} Arrays of values can be declared using the \verb|[]| suffix. An array can be mapped to a collection of files, one element per file, by using @@ -713,8 +727,8 @@ This file may be constructed by hand or mechanically from some pre-existing database (such as a grid's existing discovery system). -The site catalog may contain definitions fo rmultiple sites in which -case exectuion will be attemted on all sites. In the presence of +The site catalog may contain definitions for multiple sites in which +case execution will be attemted on all sites. In the presence of multiple sites, it is necessary to choose between the avalable sites. The Swift \emph{site selector} achivees this by maintaining a score for each site which determines the load that Swift will place on that site. @@ -1123,6 +1137,7 @@ substituted for GridFTP. \subsection{Provenance} +\label{Provenance} Swift produces log information regarding the provenance of its output files. In an existing development module, this information can be imported into @@ -1229,6 +1244,13 @@ \section{Comparison to Other Systems} +As a ``parallel scripting +language'', Swift is typically used to specify and execute scientific +``workflows'' - which we define here as the execution of a +series of steps to perform larger domain-specific tasks. We use the +term workflow as defined by (Taylor et. al. 2006). So we often call a +Swift script a workflow. TODO: Drop this paragraph/concept? Or crisp it up. Perhaps break down the systems that we compare Swift to into a few classes...? + Coordination languages and systems such as Linda\cite{LINDA}, Strand\cite{STRAN} and PCN\cite{PCN} allow composition of distributed or parallel components, but usually require the components From noreply at svn.ci.uchicago.edu Fri Jan 9 16:55:45 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Fri, 9 Jan 2009 16:55:45 -0600 (CST) Subject: [Swift-commit] r2422 - text/hpdc09submission Message-ID: <20090109225545.A6676228187@www.ci.uchicago.edu> Author: hategan Date: 2009-01-09 16:55:44 -0600 (Fri, 09 Jan 2009) New Revision: 2422 Modified: text/hpdc09submission/paper.latex Log: split item into parallelization and throttling; added load balancing Modified: text/hpdc09submission/paper.latex =================================================================== --- text/hpdc09submission/paper.latex 2009-01-09 22:00:32 UTC (rev 2421) +++ text/hpdc09submission/paper.latex 2009-01-09 22:55:44 UTC (rev 2422) @@ -117,13 +117,20 @@ \item Automatic parallelization of program invocations invoking programs that have no data dependencies in parallel (section -\ref{Language}) and throttling invocations to a rate appropriate for -each execution location (section \ref{ExecutingSites}). +\ref{Language}) +\item Management of program invocations such as throttling +to a rate appropriate for each execution location and mechanism +(section \ref{ExecutingSites}). + \item Reliability through retry (and re-siting) of failed executions and restart of interrupted scripts from the point of failure. (section \ref{ExecutingReliably}) +\item Automatic balancing of the workload to available resources +based on adaptive algorithms that can account for both resource +performance and reliability. + \item Recording the provenance of derived data objects (section \ref{Provenance}). @@ -1267,7 +1274,7 @@ are available. The Swift system uses similar mechanism called future [16] for workflow evaluation and scheduling. -MapReduce\cite{MapReduce} also provides a programming models and a runtime system +MapReduce\cite{MAPREDUCE} also provides a programming models and a runtime system to support the processing of large scale datasets. The two key functions \emph{map} and \emph{reduce} are borrowed from functional language: a map function iterates over a set of items, performs a specific From noreply at svn.ci.uchicago.edu Fri Jan 9 17:04:30 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Fri, 9 Jan 2009 17:04:30 -0600 (CST) Subject: [Swift-commit] r2423 - text/hpdc09submission Message-ID: <20090109230430.50FD62281A0@www.ci.uchicago.edu> Author: hategan Date: 2009-01-09 17:04:29 -0600 (Fri, 09 Jan 2009) New Revision: 2423 Modified: text/hpdc09submission/paper.latex Log: throttling may be too low level in this sentence Modified: text/hpdc09submission/paper.latex =================================================================== --- text/hpdc09submission/paper.latex 2009-01-09 22:55:44 UTC (rev 2422) +++ text/hpdc09submission/paper.latex 2009-01-09 23:04:29 UTC (rev 2423) @@ -152,7 +152,7 @@ \emph{Minimalist nature of Swift}. As a scripting language, Swift is intentionally designed to be a sparse, minimal language. Its primary -purpose is to coordinate, throttle, and sequence the execution of +purpose is to coordinate and manage the execution of other programs. As such, it has only a very limited set of data types, operators, and built-in functions. From noreply at svn.ci.uchicago.edu Fri Jan 9 18:35:32 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Fri, 9 Jan 2009 18:35:32 -0600 (CST) Subject: [Swift-commit] r2424 - text/hpdc09submission Message-ID: <20090110003532.86ED1228187@www.ci.uchicago.edu> Author: hategan Date: 2009-01-09 18:35:31 -0600 (Fri, 09 Jan 2009) New Revision: 2424 Modified: text/hpdc09submission/paper.latex Log: rewrote first part of section 2 Modified: text/hpdc09submission/paper.latex =================================================================== --- text/hpdc09submission/paper.latex 2009-01-09 23:04:29 UTC (rev 2423) +++ text/hpdc09submission/paper.latex 2009-01-10 00:35:31 UTC (rev 2424) @@ -256,25 +256,31 @@ \subsection{Language basics} -Programs written in the SwiftScript language are called (Swift) -scripts. A Swift script describes the interface of each component -program, and how those components communicate with each other. -Component programs in a Swift script are coupled together by -descriptions of dataflow between those components. +A Swift script describes data, application components, invocations +of applications components, and the inter-relations (data flow) +between those invocations. Data is represented in a script by strongly-typed single-assignment variables, using a C-like syntax. - A variable may store data of \emph{primitive type} such as an -integer or a string. However, a variable may also be \emph{mapped} -to one or more POSIX-like files, allowing treatment of those files using -the same syntax as other variables. -In that case, the variable declaration is annotated with a -\emph{mapping} describing the file(s) that make up that \emph{dataset}. -For example, this line declares a variable named \verb|photo| with -datatype \verb|image|. It additionally declares that the data for this -dataset is stored in a single file named \verb|shane.jpeg| + Types in Swift can be \emph{atomic} or \emph{composite}. An atomic +type can be either a \emph{primitive type} or a \emph{mapped type}. +Swift provides a fixed set of primitive types, such as \emph{integer} or +\emph{string}. A mapped type indicates that the actual data does not +reside in CPU addressable memory (as it would in conventional +programming languages), but in POSIX-like files. Composite types are +further subdivided into \emph{structures} and \emph{arrays}. Structures +are similar in most respects to structure types in other languages. One +array type is associated with every non-array type. Arrays use numeric +indices, but are sparse. We often refer to instances of composites of +mapped types as \emph{datasets}. +Mapped type and composite type variable declarations can be annotated with a +\emph{mapping} descriptor indicating the file(s) that make up that \emph{dataset}. +For example, the following line declares a variable named \verb|photo| with +type \verb|image|. It additionally declares that the data for this +variable is stored in a single file named \verb|shane.jpeg| + \begin{verbatim} image photo <"shane.jpeg">; \end{verbatim} @@ -291,8 +297,9 @@ describes a functional/dataflow style interface to imperative components. -For example, a procedure which makes use of the ImageMagick\cite{ImageMagick} -convert command to rotate a supplied image by a specified angle: +For example, the following example lists a procedure which makes use of +the ImageMagick\cite{ImageMagick} convert command to rotate a supplied +image by a specified angle: \begin{verbatim} app (image output) rotate(image input) { From noreply at svn.ci.uchicago.edu Fri Jan 9 19:22:43 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Fri, 9 Jan 2009 19:22:43 -0600 (CST) Subject: [Swift-commit] r2425 - text/hpdc09submission Message-ID: <20090110012243.F13BD228187@www.ci.uchicago.edu> Author: wilde Date: 2009-01-09 19:22:41 -0600 (Fri, 09 Jan 2009) New Revision: 2425 Modified: text/hpdc09submission/paper.latex Log: Revised abstract and intro and reduced them close to 1.5 page target. Modified: text/hpdc09submission/paper.latex =================================================================== --- text/hpdc09submission/paper.latex 2009-01-10 00:35:31 UTC (rev 2424) +++ text/hpdc09submission/paper.latex 2009-01-10 01:22:41 UTC (rev 2425) @@ -7,9 +7,11 @@ \bibliographystyle{abbrv} % for ACM SIGS style -\title{SwiftScript - a language for loosely coupled distributed -parallel scripting \\ draft - contact benc at ci.uchicago.edu} +\title{Swift - a language for distributed +parallel scripting} +% draft - contact benc at ci.uchicago.edu + % ACM styleguide says max 3 authors here, rest in acknowledgements \numberofauthors{4} @@ -24,34 +26,33 @@ \alignauthor Mihael Hategan \\ \affaddr{University of Chicago Computation Institute}\\ \and +\alignauthor Sarah Kenny \\ + \affaddr{University of Chicago Computation Institute}\\ \alignauthor Michael Wilde \\ \affaddr{University of Chicago Computation Institute}\\ \affaddr{Argonne National Laboratory} \\ -\alignauthor Sarah Kenny \\ - \affaddr{University of Chicago Computation Institute}\\ } \maketitle -\verb|$Id$| - \begin{abstract} -Scientists, engineers and business analysts often pursue their work by -applying domain-specific programs to massive collections of file-based data. -Distributed and parallel computing resources provide a powerful way to -get more of this type of work done faster, but using such resources -imposes additional complexities. +Scientists, engineers and business analysts often work by performing a +massive number of runs of domain-specific programs, typically coupled +``loosely'' by large collections of file-based data. Distributed and +parallel computing resources provide a powerful way to get more of +this type of work done faster, but using such resources imposes +additional complexities. -Swift reduces these complexities with a scripting language for composing -ordinary application programs (serial or parallel) into more powerful -parallel applications that can be executed on distributed -resources. Applications expressed -in Swift are location-independent and automatically parallelized. +Swift reduces these complexities with a scripting language for +composing ordinary application programs (serial or parallel) into more +powerful parallel applications that can be executed on distributed +resources. Applications expressed in Swift are location-independent +and automatically parallelized. -Swift can execute scripts that perform hundreds of thousands of -program invocations on highly parallel resources, and deal with the -unreliable and dynamic aspects of wide-area distributed resources. +Swift can execute scripts that perform tens of thousands of program +invocations on highly parallel resources, and handle the unreliable +and dynamic aspects of wide-area distributed resources. The language provides a high level representation of collections of data and a specification of how those collections are to be mapped to @@ -61,7 +62,8 @@ automated site selection, data management, and reliability. We present the language, details of the implementation, application -examples, measurements, and ongoing research. +examples, measurements, and ongoing research, focusing on its +importance as a distributed computing paradigm. % TODO: DECIDE: Drop SwiftScript, use Swift throughout to refer to the language? @@ -71,186 +73,154 @@ Swift is a scripting language designed for composing ordinary application programs (serial or parallel) into distributed, -parallelized, applications. It can execute scripts that perform -hundreds of thousands of program invocations on highly parallel -resources, and its design is expected to scale to handle millions of -invocations and to thus address the needs of ``many-task -computing''\cite{MTC}\cite{FALKONSC08}. +parallelized applications for execution on grids and supercomputers +with tens to hundreds of thousands of processors. It is intended to +serve as a higher level framework for composing parallel pipelines of +other programs and scripts, sitting above (and utilizing) existing +scripting languages and applications. Swift scripts express the +execution of programs to produce datasets using a dataflow-driven +specification. The application programs executed by a Swift script can +be binary executables or can be scripts written in any other scripting +language. -Swift's purpose is to enable ``loosely coupled scripting'' in a -convenient, powerful fashion. It is intended to serve as a higher -level framework for composing parallel pipelines of other programs and -scripts. Much like a Makefile encapsulates the compilation -process, Swift puts a data-driven ``make-like'' wrapper around the -execution of programs (which can themselves be scripts written in any -other scripting language, or binary executables). - -Swift's contribution and primary value resides in the fact that it -provides the minimal language constructs needed to coerce the process -of specifiying how applications are glued together at large scale into -a simple compact form of expression, while keeping the language simple -and elegant, and carefully not replacing or overlapping with the tasks -that existing scripting langauges do well. Swift regularizes and +Swift's contribution and primary value is that it provides a simple, +minimal set of language constructs to specifiy how applications are +glued together at large scale in a simple compact form, while keeping +the language simple and elegant, and minimizing any overlap with the +tasks that existing scripting langauges do well. Swift regularizes and abstracts both the notion of data and process for distributed parallel execution of application programs. -This paper goes deeper than prior papers\cite{SWIFTSWF08,SWIFTNNN} in -describing the details of the swift language, detailing how it is -implemented, and discussing its role in the toolkit of solutions for -distributed parallel programming. +This paper goes into greater depth than prior publications +\cite{SWIFTSWF08,SWIFTNNN} in describing the Swift language, how its +implementation handles large-scale and distributed execution +environments, and its contribution to distributed parallel computing. +TODO: Provide a compelling example here, perhaps with +a code segment, of the power of Swift, in a single paragraph. + \subsection{Swift language concepts} -The Swift programming model is data-oriented: it -encapsulates the invocation of ``ordinary programs'' - technically, POSIX -\emph{exec()} operations - in a manner that explicitly specifies the files -and other arguments that are the inputs and outputs of each program -invocation. This formal but simple model (elaborated on in section -\ref{LanguageEnvironment}) -enables Swift to provide four critical features not provided -by scripting languages like Perl, Python, or the various command-line shells: +The Swift programming model is data-oriented: it encapsulates the +invocation of ``ordinary programs'' - technically, POSIX \emph{exec()} +operations - in a manner that explicitly specifies the files and other +arguments that are the inputs and outputs of each program +invocation. This formal but simple model (elaborated in section +\ref{LanguageEnvironment}) enables Swift to provide several critical +features not provided by - nor readily implemented in - existing +scripting languages like Perl, Python, or shells: \begin{itemize} \item Location transparent execution: automatically selecting a -location for each program invocation (section \ref{ExecutingSites}) +location for each program invocation and managing diverse execution +environments. Swift scripts can be tested on a single local +workstation. The same script can then be executed on a cluster, one or +more grids of clusters, and on large scale parallel supercomputers +such as the Sun Constellation (ref) or the IBM Blue Gene/P. (section +\ref{ExecutingSites}) -\item Automatic parallelization of program invocations invoking +\item Automatic parallelization of program invocations, invoking programs that have no data dependencies in parallel (section \ref{Language}) -\item Management of program invocations such as throttling -to a rate appropriate for each execution location and mechanism -(section \ref{ExecutingSites}). +\item Automatic balancing work over available resources based +on adaptive algorithms that account for both resource performance +and reliability, and which throttle program invocations at a rate +appropriate for each execution location and mechanism (section +\ref{ExecutingSites}). -\item Reliability through retry (and re-siting) of failed executions +\item Reliability through retry and relocation of failed executions and restart of interrupted scripts from the point of failure. (section \ref{ExecutingReliably}) -\item Automatic balancing of the workload to available resources -based on adaptive algorithms that can account for both resource -performance and reliability. +\item Recording the provenance of data objects produced by a Swift +script (section \ref{Provenance}). -\item Recording the provenance of derived data objects (section -\ref{Provenance}). - \end{itemize} -In the rest of this section, we provide an overview of Swift's main -concepts. Each concept is elaborated, with examples, in subsequent -sections. [TODO: we will need to adjust between how much to specify -here, and how much to state just before each construct is introduced. +Swift is intentionally designed to be a sparse, minimal scripting +language. Its sole purpose is to sequence and schedule the execution +of other programs. As such, Swift has only a very limited set of data +types, operators, and built-in functions. The essence of the Swift +language, which makes the benefits above possible, can be summarized +as follows: -\emph{Execution of atomic procedures}. Underlying this is an -implementation to execute scripts on grid and other platforms, -providing built-in site selection, data management and -reliability. Swift scripts can be tested on a single local -workstation. The same script can then be executed on a cluster, one or -more grids of clusters, and on large scale parallel supercomputers -such as the Sun Constellation (ref) or the IBM Blue Gene/P. More information -about the implementation is found in section \ref{Execution}. +Swift scripts are written as a set of procedures, composed upwards, +starting with \emph{atomic procedures} which specify the execution of +component programs, and then higher level procedures are composed as +pipelines (or more generally, graphs) of sub-procedures. Atomic +procedures specify the inputs and outputs of application programs in +terms of files and other parameters. Compound procedures are composed +of a graph of calls to atomic and other compound procedures -\emph{Minimalist nature of Swift}. As a scripting language, Swift is -intentionally designed to be a sparse, minimal language. Its primary -purpose is to coordinate and manage the execution of -other programs. As such, it has only a very limited set of data types, -operators, and built-in functions. +Swift variables hold either primitive values, files, or collections of +files. Atomic variables are \emph{single assignment}, which provides +the basis for Swift's model of procedure chaining. Procedures are +executed when their input parameters have all been set from existing +data or prior procedure executions. Procedures are chained by +specifying that an output variable of one procedure is passed as the +input variable to the second procedure. -We believe strongly in, and our experience reinforces, the principle -that Swift - or languages like it - play an important role in the -family of programming languages. Ordinary scripting languages provide -the constructs for manipulating files and typically contain rich -operators, primitives and libraries for large classes of useful -operations such as string processing, math operations, internet and -file operations. +% This dataflow model means that +% Swift procedures are not necessarily executed in source-code order but +% rather when their input data becomes available. -Swift programs typically contain very little code to manipulate data -directly. +Variables are declared with a type, and when they contain files +are associated with a \emph{mapper} which indicates how physical +data files are associated with the logical representation of Swift's +data model of variables and collections. -\emph{Variables, data flow and procedures}. Swift variables hold -primitive values, or collections of files. -Variables are \emph{single assignment}, which is the -basis for Swift's model of procedure chaining. -Procedures are executed when their input parameters are all defined (i.e. -have a value). -Procedures are chained by specifying that an output variable of one -procedure is passed as the input variable to the second procedure. This -dataflow model means that within a script, procedures are not executed -in source-code order; instead they are executed as input data becomes -available. - -Variables are given a type, and when they contain collections of files, -are associated with a \emph{mapper} which indicates how the layout of -data files is associated with the logical representation in the Swift -data model. See section \ref{LanguageTypes}. - -Swift programs are composed -starting with \emph{atomic procedures} which execute component programs, -and then higher level procedures are composed as pipelines of sub-procedures. - \subsection{Rationale for creating Swift} -\emph{TODO: This section needs much polishing/condensing.} +Why do we need Swift? Why create yet another scripting language for +the execution of application programs when so many exist? Swift was +developed to create a higher-level language that focuses not on the +details of executing sequences or ``pipelines'' of programs, but +rather on specific issues that arise from scale. -Why do we need Swift? Why create yet another scripting -language for the execution of application programs when so many exist? -Swift was developed to create a higher-level language that focuses not -on the details of executing sequences or ``pipelines'' of programs, but -rather on specific issues that arise from scale. These issues, -however, once identified, seem to equally well apply to, and benefit -the execution of, application pipelines that are not large-scale and -not necessarily distributed. +% These issues, +% however, once identified, seem to equally well apply to, and benefit +% the execution of, application pipelines that are not large-scale and +% not necessarily distributed. Our motivation for developing Swift is +% based on the following premises: -Our motivation for developing Swift is based on the following premises: - -Scaling up requires the distribution of execution among -many computers (``resources''), and hence a ``grid'' approach. Even if a -single large parallel resource suffices, users won't always have -access to the same supercomputer cluster: resources are scarce, and users often need or -want to utilize whatever resource happened to be available or economical at the moment -when they need to perform intensive computation. - While many application needs involve the execution of a single large and perhaps message-passing parallel app, many others require the coupling or orchestration of large numbers of application invocations: either many invocations of the same app, or many invocations of sequences and patterns of several apps. In this model, existing apps become like functions in programming, and users typically need to -execute many of them. +execute many of them. Scaling up requires the distribution of such +workloads among many computers (``resources''), and hence a ``grid'' +approach. Even if a single large parallel resource suffices, users +won't always have access to the same supercomputer cluster: hence the +need to utilize whatever resource happened to be available or +economical at the moment when they need to perform intensive +computation - without continued reprogramming or adjustment of scripts. -Ousterhout in (Ousterhout 1998) eloquently laid out the rational and -motivation for scripting languages. As the creator of Tcl [ref], he -described here the difference between programming and scripting, and -the place of each in the scheme of applying computers to solving -problems. +% Ousterhout in (Ousterhout 1998) eloquently laid out the rational and +% motivation for scripting languages. As the creator of Tcl [ref], he +% described here the difference between programming and scripting, and +% the place of each in the scheme of applying computers to solving +% problems. What's missing in current scripting languages is sufficient specification and encapsulation of inputs to, and outputs from, a given application, such that an execution environment could -automatically make remote execution transparent. +automatically make remote execution transparent. Without this, +achieving location transparancy and automated parallel execution is +not feasible. Swift adds to scripting what RPC adds to programming: +by formalizing the inputs and outputs of +``applications-as-procedures'', it provides a way to make the remote - +and hence parallel - execution of applications fairly transparent. -In a sense, Swift adds to scripting what RPC adds to programming: by -formalizing the inputs and outputs of ``applications-as-procedures'', it -provides a way to make the remote - and hence parallel - execution of -applications fairly transparent. +TODO: Refine and condense this rationale. -It's useful to draw an analogy here between Swift and make. Just as a -``makefile'' - which in a sense is can be considered a script or program -(or certainly a ``recipe'') - is a specification of how to derive an -application program, a Swift script is a recipe for how to produce a -set of data. Unlike make, in which case the derived product is only -produced once, in Swift, derived datasets are repetitively derived. - -Usage of Swift. Swift is achieving growing use on a variety of science -problems. - -... TODO: provide details. - -TODO: In the remainder of this paper, ... we present the language, +In the remainder of this paper, we present the language, details of the implementation, application use-cases and ongoing -research. +research. TODO: refine this sentence. - \section{The SwiftScript language} \label{Language} @@ -1362,6 +1332,21 @@ workflow restart, reliable execution over multiple Grid sites, and (via Falkon and CoG coasters) fast job execution. +\section{Conclusion} + +Our experience reinforces the belief that Swift plays an important +role in the family of programming languages. Ordinary scripting +languages provide the constructs for manipulating files and typically +contain rich operators, primitives, and libraries for large classes of +useful operations such as string, math, internet, and file +operations. In contrast, Swift scripts typically contain very little +code that manipulates data directly. They contain instead the "data +flow recipes" and input/output specifications of each program +invocation such that the location and environment transparency goals +can be implemented automatically by the Swift environment. + +TODO: Polish conclusion - was pasted here from intro and doesnt fit yet. + \section{Acknowledgements} TODO: authors beyond number 3 go here according to ACM style guide, rather @@ -1449,4 +1434,8 @@ %\end{thebibliography} \bibliography{paper} % for ACM SIGS style + +\verb|$Id$| + \end{document} + From noreply at svn.ci.uchicago.edu Sat Jan 10 03:53:10 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Sat, 10 Jan 2009 03:53:10 -0600 (CST) Subject: [Swift-commit] r2426 - in trunk/src/org/griphyn/vdl/karajan: . lib Message-ID: <20090110095311.9709A2281A1@www.ci.uchicago.edu> Author: benc Date: 2009-01-10 03:52:47 -0600 (Sat, 10 Jan 2009) New Revision: 2426 Modified: trunk/src/org/griphyn/vdl/karajan/DSHandleFutureWrapper.java trunk/src/org/griphyn/vdl/karajan/lib/SwiftArg.java Log: more locks to make previously committed asserts happy Modified: trunk/src/org/griphyn/vdl/karajan/DSHandleFutureWrapper.java =================================================================== --- trunk/src/org/griphyn/vdl/karajan/DSHandleFutureWrapper.java 2009-01-10 01:22:41 UTC (rev 2425) +++ trunk/src/org/griphyn/vdl/karajan/DSHandleFutureWrapper.java 2009-01-10 09:52:47 UTC (rev 2426) @@ -74,7 +74,7 @@ listeners = null; } - public int listenerCount() { + public synchronized int listenerCount() { if (listeners == null) { return 0; } @@ -83,7 +83,7 @@ } } - public EventTargetPair[] getListenerEvents() { + public synchronized EventTargetPair[] getListenerEvents() { if (listeners != null) { return (EventTargetPair[]) listeners.toArray(new EventTargetPair[0]); } Modified: trunk/src/org/griphyn/vdl/karajan/lib/SwiftArg.java =================================================================== --- trunk/src/org/griphyn/vdl/karajan/lib/SwiftArg.java 2009-01-10 01:22:41 UTC (rev 2425) +++ trunk/src/org/griphyn/vdl/karajan/lib/SwiftArg.java 2009-01-10 09:52:47 UTC (rev 2426) @@ -36,17 +36,19 @@ DSHandle handle = (DSHandle) val; if (handle.getType().isArray()) { Map value = handle.getArrayValue(); - if (handle.isClosed()) { - return new PairIterator(value); + synchronized(handle.getRoot()) { + if (handle.isClosed()) { + return new PairIterator(value); + } + else { + return VDLFunction.addFutureListListener(stack, handle, value); + } } - else { - return VDLFunction.addFutureListListener(stack, handle, value); - } } if (logger.isDebugEnabled()) { logger.debug("SwiftArg.getValue(" + handle + ")"); } - synchronized (handle) { + synchronized (handle.getRoot()) { if (!handle.isClosed()) { if (logger.isDebugEnabled()) { logger.debug("Waiting for " + handle); From noreply at svn.ci.uchicago.edu Sat Jan 10 09:33:00 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Sat, 10 Jan 2009 09:33:00 -0600 (CST) Subject: [Swift-commit] r2427 - trunk/src/org/griphyn/vdl/mapping/file Message-ID: <20090110153300.A9A93228198@www.ci.uchicago.edu> Author: benc Date: 2009-01-10 09:32:59 -0600 (Sat, 10 Jan 2009) New Revision: 2427 Modified: trunk/src/org/griphyn/vdl/mapping/file/ArrayFileMapper.java Log: some debugging statements in ArrayFileMapper Modified: trunk/src/org/griphyn/vdl/mapping/file/ArrayFileMapper.java =================================================================== --- trunk/src/org/griphyn/vdl/mapping/file/ArrayFileMapper.java 2009-01-10 09:52:47 UTC (rev 2426) +++ trunk/src/org/griphyn/vdl/mapping/file/ArrayFileMapper.java 2009-01-10 15:32:59 UTC (rev 2427) @@ -42,11 +42,13 @@ // we could typecheck more elegantly here to make sure that // we really do have an array of strings as parameter. DSHandle dn = (DSHandle) PARAM_FILES.getRawValue(this); + assert(dn.isClosed()); DSHandle srcNode = null; try { srcNode = dn.getField(path); } catch(InvalidPathException e) { + logger.error("Invalid path exception "+e+" for path "+path,e); return null; } String returnValue = srcNode.getValue().toString(); From noreply at svn.ci.uchicago.edu Sat Jan 10 11:01:12 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Sat, 10 Jan 2009 11:01:12 -0600 (CST) Subject: [Swift-commit] r2428 - trunk/src/org/griphyn/vdl/mapping Message-ID: <20090110170112.70AB7228187@www.ci.uchicago.edu> Author: benc Date: 2009-01-10 11:01:10 -0600 (Sat, 10 Jan 2009) New Revision: 2428 Modified: trunk/src/org/griphyn/vdl/mapping/AbstractDataNode.java trunk/src/org/griphyn/vdl/mapping/DSHandle.java trunk/src/org/griphyn/vdl/mapping/ExternalDataNode.java Log: separate getIdentifyingString method for presenting in log Modified: trunk/src/org/griphyn/vdl/mapping/AbstractDataNode.java =================================================================== --- trunk/src/org/griphyn/vdl/mapping/AbstractDataNode.java 2009-01-10 15:32:59 UTC (rev 2427) +++ trunk/src/org/griphyn/vdl/mapping/AbstractDataNode.java 2009-01-10 17:01:10 UTC (rev 2428) @@ -97,13 +97,26 @@ return this.value.toString(); } } + return getIdentifyingString(); + } + + public String getIdentifyingString() { + String prefix = this.getClass().getName(); prefix = prefix + " identifier "+this.getIdentifier(); - prefix = prefix + " with no value at dataset="; + prefix = prefix + " type "+getType(); + if(value == null) { + prefix = prefix + " with no value at dataset="; + } else if (value instanceof Throwable) { + prefix = prefix + " containing throwable "+value.getClass(); + } else { + prefix = prefix + " value="+this.value.toString()+" dataset="; + } + prefix = prefix + getDisplayableName(); if (!Path.EMPTY_PATH.equals(getPathFromRoot())) { @@ -118,6 +131,7 @@ } return prefix; + } protected String getDisplayableName() { @@ -348,7 +362,7 @@ public synchronized void closeShallow() { this.closed = true; notifyListeners(); - logger.info("closed "+this.getIdentifier()); + logger.info("closed "+this.getIdentifyingString()); // so because its closed, we can dump the contents try { logContent(); @@ -450,7 +464,7 @@ public synchronized void addListener(DSHandleListener listener) { if (logger.isInfoEnabled()) { Exception e = new Exception("To get stack trace"); - logger.info("Adding handle listener \"" + listener + "\" to \"" + this + "\"", e); + logger.info("Adding handle listener \"" + listener + "\" to \"" + getIdentifyingString() + "\"", e); } if (listeners == null) { listeners = new LinkedList(); @@ -468,7 +482,7 @@ DSHandleListener listener = (DSHandleListener) i.next(); i.remove(); if (logger.isInfoEnabled()) { - logger.info("Notifying listener \"" + listener + "\" about \"" + this + "\""); + logger.info("Notifying listener \"" + listener + "\" about \"" + getIdentifyingString() + "\""); } listener.handleClosed(this); } Modified: trunk/src/org/griphyn/vdl/mapping/DSHandle.java =================================================================== --- trunk/src/org/griphyn/vdl/mapping/DSHandle.java 2009-01-10 15:32:59 UTC (rev 2427) +++ trunk/src/org/griphyn/vdl/mapping/DSHandle.java 2009-01-10 17:01:10 UTC (rev 2428) @@ -83,5 +83,7 @@ public String getIdentifier(); + public String getIdentifyingString(); + public boolean isRestartable(); } Modified: trunk/src/org/griphyn/vdl/mapping/ExternalDataNode.java =================================================================== --- trunk/src/org/griphyn/vdl/mapping/ExternalDataNode.java 2009-01-10 15:32:59 UTC (rev 2427) +++ trunk/src/org/griphyn/vdl/mapping/ExternalDataNode.java 2009-01-10 17:01:10 UTC (rev 2428) @@ -115,6 +115,10 @@ return prefix; } + public String getIdentifyingString() { + return toString(); + } + public DSHandle getRoot() { return this; } From noreply at svn.ci.uchicago.edu Sat Jan 10 15:06:19 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Sat, 10 Jan 2009 15:06:19 -0600 (CST) Subject: [Swift-commit] r2430 - in trunk: . libexec src/org/griphyn/vdl/karajan/lib Message-ID: <20090110210619.31AAC228187@www.ci.uchicago.edu> Author: benc Date: 2009-01-10 15:06:18 -0600 (Sat, 10 Jan 2009) New Revision: 2430 Modified: trunk/CHANGES.txt trunk/libexec/execute-default.k trunk/src/org/griphyn/vdl/karajan/lib/RuntimeStats.java Log: Console output for individual application invocation start and finish is no longer shown. The progress ticker now appears more often. This should give a better overview of run progress. Modified: trunk/CHANGES.txt =================================================================== --- trunk/CHANGES.txt 2009-01-10 20:08:05 UTC (rev 2429) +++ trunk/CHANGES.txt 2009-01-10 21:06:18 UTC (rev 2430) @@ -1,3 +1,8 @@ +(01/10/09) +*** Console output for individual application invocation start and finish + is no longer shown. The progress ticker now appears more often. + This should give a better overview of run progress. + (11/11/08) *** Swift 0.7 built from Swift SVN r2318 and cog SVN r2255 Modified: trunk/libexec/execute-default.k =================================================================== --- trunk/libexec/execute-default.k 2009-01-10 20:08:05 UTC (rev 2429) +++ trunk/libexec/execute-default.k 2009-01-10 21:06:18 UTC (rev 2430) @@ -12,7 +12,6 @@ if( sys:not(done) try( sequential( - echo("{tr} started") log(LOG:INFO, "START thread={#thread} tr={tr}") vdl:setprogress("Selecting site") restartOnError(".*", vdl:configProperty("execution.retries"), @@ -32,12 +31,10 @@ ) mark(restartout, err=false, mapping=false) graphStuff(tr, stagein, stageout, err=false, maybe(args=arguments)) - echo("{tr} completed") log(LOG:INFO, "END_SUCCESS thread={#thread} tr={tr}") vdl:setprogress("Finished successfully") ) catch(".*" - echo("{tr} failed") log(LOG:INFO, "END_FAILURE thread={#thread} tr={tr}") vdl:setprogress("Failed") if( Modified: trunk/src/org/griphyn/vdl/karajan/lib/RuntimeStats.java =================================================================== --- trunk/src/org/griphyn/vdl/karajan/lib/RuntimeStats.java 2009-01-10 20:08:05 UTC (rev 2429) +++ trunk/src/org/griphyn/vdl/karajan/lib/RuntimeStats.java 2009-01-10 21:06:18 UTC (rev 2430) @@ -18,8 +18,8 @@ public class RuntimeStats extends FunctionsCollection { public static final Arg PA_STATE = new Arg.Positional("state"); - public static final int MIN_PERIOD_MS=5000; - public static final int MAX_PERIOD_MS=60000; + public static final int MIN_PERIOD_MS=1000; + public static final int MAX_PERIOD_MS=30000; public static final String[] preferredOutputOrder = { "uninitialized", From noreply at svn.ci.uchicago.edu Sat Jan 10 15:16:16 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Sat, 10 Jan 2009 15:16:16 -0600 (CST) Subject: [Swift-commit] r2431 - trunk Message-ID: <20090110211617.08FC52281A0@www.ci.uchicago.edu> Author: benc Date: 2009-01-10 15:16:16 -0600 (Sat, 10 Jan 2009) New Revision: 2431 Modified: trunk/CHANGES.txt Log: CHANGE message for recent dataflow work Modified: trunk/CHANGES.txt =================================================================== --- trunk/CHANGES.txt 2009-01-10 21:06:18 UTC (rev 2430) +++ trunk/CHANGES.txt 2009-01-10 21:16:16 UTC (rev 2431) @@ -3,6 +3,13 @@ is no longer shown. The progress ticker now appears more often. This should give a better overview of run progress. +*** More parallelisation of execution, so that some constructs which should + have always work now actually work. For example, mappers can take + values as parameters which are not known "early" in a run, and will + have their initialisation deferred until those parameters are ready. + SwiftScript programs which previously worked should still work; and + some programs which did not previously work should now work. + (11/11/08) *** Swift 0.7 built from Swift SVN r2318 and cog SVN r2255 From noreply at svn.ci.uchicago.edu Sat Jan 10 15:51:18 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Sat, 10 Jan 2009 15:51:18 -0600 (CST) Subject: [Swift-commit] r2432 - in trunk: resources src/org/griphyn/vdl/toolkit Message-ID: <20090110215118.8771B228187@www.ci.uchicago.edu> Author: benc Date: 2009-01-10 15:51:17 -0600 (Sat, 10 Jan 2009) New Revision: 2432 Added: trunk/resources/swiftscript.stg Removed: trunk/resources/XDTM.stg Modified: trunk/src/org/griphyn/vdl/toolkit/VDLt2VDLx.java Log: rename XDTM template file to swiftscript Deleted: trunk/resources/XDTM.stg =================================================================== --- trunk/resources/XDTM.stg 2009-01-10 21:16:16 UTC (rev 2431) +++ trunk/resources/XDTM.stg 2009-01-10 21:51:17 UTC (rev 2432) @@ -1,337 +0,0 @@ -group XDTM; - -program(namespaces,targetNS,functions,types,statements,sourcelocation) ::= << - - $else$ - - $namespaces;separator="\n"$>$endif$ - $if(types)$ - - $types;separator="\n"$ - - $endif$ - $functions;separator="\n"$ - $statements;separator="\n"$ - ->> - -defaultNS(ns,sourcelocation) ::= << -$if(ns)$"$ns$"$else$ -"http://ci.uchicago.edu/swift/2007/07/swiftscript" -$endif$ ->> - -nsDef(prefix,uri,sourcelocation) ::= << -$if(prefix)$xmlns:$prefix$="$uri$"$else$targetNamespace="$uri$"$endif$ ->> - -typeDef(name,type,members,sourcelocation) ::= << -$if(type)$ - - $name$ - $type$ - - -$else$ -$if(!members)$ - - $name$ - string - - -$else$ - - $name$ - - - $members;separator="\n"$ - - -$endif$ -$endif$ ->> - -memberdefinition(type,name,sourcelocation) ::= << - - $name$ - $type$ - ->> - -variable(type,name, value,sourcelocation) ::= << -$else$>$value$ -$endif$ ->> - -dataset(name,type,mapping, lfn,sourcelocation) ::= << - -$if(lfn)$ - - -$else$ - $mapping$ - -$endif$ - ->> - -mapping(descriptor,params,sourcelocation) ::= << - - $params;separator="\n"$ - -$else$/>$endif$ ->> - -mapParam(name,value,sourcelocation) ::= << -$value$ ->> - -arrayInit(elements,range,sourcelocation) ::= << -$if(range)$ - - $range$ - -$else$ - - - $elements$ - - -$endif$ ->> - -range(from, to, step, sourcelocation) ::= << - - $from$ - $to$ - $if(step)$$step$$endif$ - ->> - -function(name,outputs,inputs,statements,config,sourcelocation) ::= << - - $outputs;separator="\n"$ - $inputs;separator="\n"$ - $statements;separator="\n"$ - $config$ - ->> - -call(func,outputs,inputs,sourcelocation) ::= << - - $outputs;separator="\n"$ - $inputs;separator="\n"$ - ->> - -vardecl(sourcelocation) ::= << -$if (it.type)$ -$\n$ -$endif$ ->> - -returnParam(type,name,bind,sourcelocation) ::= << -$name$ ->> - -actualParam(value,bind,sourcelocation) ::= << -$value$ ->> - -parameter(type,name,outlink,defaultv,sourcelocation) ::= << - -$if(outlink)$ - -$defaultv$ - -$if(outlink)$ - -$else$ - -$endif$ - -$else$ - xsi:nil="true" /> -$endif$ ->> - -app(exec,arguments,stdin,stdout,stderr,sourcelocation) ::= << - - - $exec$ -$if(stdin)$ - $stdin$ -$endif$ -$if(stdout)$ - $stdout$ -$endif$ -$if(stderr)$ - $stderr$ -$endif$ -$if(arguments)$ - $arguments$ -$endif$ - - ->> - -functionInvocation(name,args,sourcelocation) ::= << - -$if(args)$ - $args$ -$endif$ - ->> - -mappingExpr(expr,sourcelocation) ::= "$expr$" - -stdin(content,sourcelocation) ::= << -$content$ ->> - -stdout(content,sourcelocation) ::= << -$content$ ->> - -stderr(content,sourcelocation) ::= << -$content$ ->> - -statementList(statements,sourcelocation) ::= << - $statements;separator="\n"$ ->> - -if(cond,body,els,sourcelocation) ::= << - - $cond$ - - $body$ - -$if(els)$ - - $els$ - -$endif$ - ->> - -foreach(var,in,index,body,sourcelocation) ::= << - -$in$ -$body$ - ->> - -switch(cond,cases,sourcelocation) ::= << - - $cond$ - $cases;separator="\n"$ - ->> - -case(value, statements,sourcelocation) ::= << -$if(value)$ - - $value$ - - $statements;separator="\n"$ - - -$else$ - - $statements;separator="\n"$ - -$endif$ ->> - -iterate(cond,body,var,sourcelocation) ::= << - - - $body$ - - $cond$ - ->> - -assign(lhs,rhs,sourcelocation) ::= << - - $lhs$ - $rhs$ - ->> - -arraySubscript(array, subscript, sourcelocation) ::= << - - $array$ - $subscript$ - ->> - -memberAccess(structure,name,sourcelocation) ::= << - - $structure$ - $name$ - ->> - -unaryNegation(exp,sourcelocation) ::= << - - $exp$ - ->> - -cond(op,left,right,sourcelocation) ::= << - - $left$ - $right$ - ->> - -and(left,right,sourcelocation) ::= << - - $left$ - $right$ - ->> - -or(left, right,sourcelocation) ::= << - - $left$ - $right$ - ->> - -not(exp,sourcelocation) ::= "$exp$" - -arith(op, left,right, sourcelocation) ::= << - - $left$ - $right$ - ->> - -paren(exp,sourcelocation) ::= "$exp$" - -type(name,sourcelocation) ::= "$name$" - -variableReference(name,sourcelocation) ::= "$name$" - -iConst(value,sourcelocation) ::= "$value$" - -fConst(value,sourcelocation) ::= "$value$" - -bConst(value,sourcelocation) ::= "$value$" - -sConst(value,sourcelocation) ::= "$value$" - -blank(sourcelocation) ::= "" - Copied: trunk/resources/swiftscript.stg (from rev 2429, trunk/resources/XDTM.stg) =================================================================== --- trunk/resources/swiftscript.stg (rev 0) +++ trunk/resources/swiftscript.stg 2009-01-10 21:51:17 UTC (rev 2432) @@ -0,0 +1,337 @@ +group XDTM; + +program(namespaces,targetNS,functions,types,statements,sourcelocation) ::= << + + $else$ + + $namespaces;separator="\n"$>$endif$ + $if(types)$ + + $types;separator="\n"$ + + $endif$ + $functions;separator="\n"$ + $statements;separator="\n"$ + +>> + +defaultNS(ns,sourcelocation) ::= << +$if(ns)$"$ns$"$else$ +"http://ci.uchicago.edu/swift/2007/07/swiftscript" +$endif$ +>> + +nsDef(prefix,uri,sourcelocation) ::= << +$if(prefix)$xmlns:$prefix$="$uri$"$else$targetNamespace="$uri$"$endif$ +>> + +typeDef(name,type,members,sourcelocation) ::= << +$if(type)$ + + $name$ + $type$ + + +$else$ +$if(!members)$ + + $name$ + string + + +$else$ + + $name$ + + + $members;separator="\n"$ + + +$endif$ +$endif$ +>> + +memberdefinition(type,name,sourcelocation) ::= << + + $name$ + $type$ + +>> + +variable(type,name, value,sourcelocation) ::= << +$else$>$value$ +$endif$ +>> + +dataset(name,type,mapping, lfn,sourcelocation) ::= << + +$if(lfn)$ + + +$else$ + $mapping$ + +$endif$ + +>> + +mapping(descriptor,params,sourcelocation) ::= << + + $params;separator="\n"$ + +$else$/>$endif$ +>> + +mapParam(name,value,sourcelocation) ::= << +$value$ +>> + +arrayInit(elements,range,sourcelocation) ::= << +$if(range)$ + + $range$ + +$else$ + + + $elements$ + + +$endif$ +>> + +range(from, to, step, sourcelocation) ::= << + + $from$ + $to$ + $if(step)$$step$$endif$ + +>> + +function(name,outputs,inputs,statements,config,sourcelocation) ::= << + + $outputs;separator="\n"$ + $inputs;separator="\n"$ + $statements;separator="\n"$ + $config$ + +>> + +call(func,outputs,inputs,sourcelocation) ::= << + + $outputs;separator="\n"$ + $inputs;separator="\n"$ + +>> + +vardecl(sourcelocation) ::= << +$if (it.type)$ +$\n$ +$endif$ +>> + +returnParam(type,name,bind,sourcelocation) ::= << +$name$ +>> + +actualParam(value,bind,sourcelocation) ::= << +$value$ +>> + +parameter(type,name,outlink,defaultv,sourcelocation) ::= << + +$if(outlink)$ + +$defaultv$ + +$if(outlink)$ + +$else$ + +$endif$ + +$else$ + xsi:nil="true" /> +$endif$ +>> + +app(exec,arguments,stdin,stdout,stderr,sourcelocation) ::= << + + + $exec$ +$if(stdin)$ + $stdin$ +$endif$ +$if(stdout)$ + $stdout$ +$endif$ +$if(stderr)$ + $stderr$ +$endif$ +$if(arguments)$ + $arguments$ +$endif$ + + +>> + +functionInvocation(name,args,sourcelocation) ::= << + +$if(args)$ + $args$ +$endif$ + +>> + +mappingExpr(expr,sourcelocation) ::= "$expr$" + +stdin(content,sourcelocation) ::= << +$content$ +>> + +stdout(content,sourcelocation) ::= << +$content$ +>> + +stderr(content,sourcelocation) ::= << +$content$ +>> + +statementList(statements,sourcelocation) ::= << + $statements;separator="\n"$ +>> + +if(cond,body,els,sourcelocation) ::= << + + $cond$ + + $body$ + +$if(els)$ + + $els$ + +$endif$ + +>> + +foreach(var,in,index,body,sourcelocation) ::= << + +$in$ +$body$ + +>> + +switch(cond,cases,sourcelocation) ::= << + + $cond$ + $cases;separator="\n"$ + +>> + +case(value, statements,sourcelocation) ::= << +$if(value)$ + + $value$ + + $statements;separator="\n"$ + + +$else$ + + $statements;separator="\n"$ + +$endif$ +>> + +iterate(cond,body,var,sourcelocation) ::= << + + + $body$ + + $cond$ + +>> + +assign(lhs,rhs,sourcelocation) ::= << + + $lhs$ + $rhs$ + +>> + +arraySubscript(array, subscript, sourcelocation) ::= << + + $array$ + $subscript$ + +>> + +memberAccess(structure,name,sourcelocation) ::= << + + $structure$ + $name$ + +>> + +unaryNegation(exp,sourcelocation) ::= << + + $exp$ + +>> + +cond(op,left,right,sourcelocation) ::= << + + $left$ + $right$ + +>> + +and(left,right,sourcelocation) ::= << + + $left$ + $right$ + +>> + +or(left, right,sourcelocation) ::= << + + $left$ + $right$ + +>> + +not(exp,sourcelocation) ::= "$exp$" + +arith(op, left,right, sourcelocation) ::= << + + $left$ + $right$ + +>> + +paren(exp,sourcelocation) ::= "$exp$" + +type(name,sourcelocation) ::= "$name$" + +variableReference(name,sourcelocation) ::= "$name$" + +iConst(value,sourcelocation) ::= "$value$" + +fConst(value,sourcelocation) ::= "$value$" + +bConst(value,sourcelocation) ::= "$value$" + +sConst(value,sourcelocation) ::= "$value$" + +blank(sourcelocation) ::= "" + Property changes on: trunk/resources/swiftscript.stg ___________________________________________________________________ Name: svn:mergeinfo + Modified: trunk/src/org/griphyn/vdl/toolkit/VDLt2VDLx.java =================================================================== --- trunk/src/org/griphyn/vdl/toolkit/VDLt2VDLx.java 2009-01-10 21:16:16 UTC (rev 2431) +++ trunk/src/org/griphyn/vdl/toolkit/VDLt2VDLx.java 2009-01-10 21:51:17 UTC (rev 2432) @@ -22,7 +22,7 @@ public class VDLt2VDLx { private static final Logger logger = Logger.getLogger(VDLt2VDLx.class); - public static final String DEFAULT_TEMPLATE_FILE_NAME = "XDTM.stg"; + public static final String DEFAULT_TEMPLATE_FILE_NAME = "swiftscript.stg"; public static void main(String[] args) throws Exception { try { From noreply at svn.ci.uchicago.edu Sun Jan 11 05:01:42 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Sun, 11 Jan 2009 05:01:42 -0600 (CST) Subject: [Swift-commit] r2433 - trunk Message-ID: <20090111110142.B9D7B2281AD@www.ci.uchicago.edu> Author: benc Date: 2009-01-11 05:01:41 -0600 (Sun, 11 Jan 2009) New Revision: 2433 Modified: trunk/project.properties Log: change module name to swift - build from source now needs to be done with the Swift SVN checked out in a directory called "swift", not "vdsk" as previously Modified: trunk/project.properties =================================================================== --- trunk/project.properties 2009-01-10 21:51:17 UTC (rev 2432) +++ trunk/project.properties 2009-01-11 11:01:41 UTC (rev 2433) @@ -1,4 +1,4 @@ -module.name = vdsk +module.name = swift debug = true long.name = Swift # Non-dev version numbers should not be committed to the trunk of From noreply at svn.ci.uchicago.edu Sun Jan 11 05:03:51 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Sun, 11 Jan 2009 05:03:51 -0600 (CST) Subject: [Swift-commit] r2434 - nmi-build-test Message-ID: <20090111110351.C8D622281AD@www.ci.uchicago.edu> Author: benc Date: 2009-01-11 05:03:51 -0600 (Sun, 11 Jan 2009) New Revision: 2434 Modified: nmi-build-test/coaster nmi-build-test/debug-x86_64-hang nmi-build-test/main nmi-build-test/with-deef Log: change NMI build/test to use swift module name introduced in r2433 Modified: nmi-build-test/coaster =================================================================== --- nmi-build-test/coaster 2009-01-11 11:01:41 UTC (rev 2433) +++ nmi-build-test/coaster 2009-01-11 11:03:51 UTC (rev 2434) @@ -2,14 +2,14 @@ which ant which javac -mv trunk cog/modules/vdsk -cd cog/modules/vdsk/ +mv trunk cog/modules/swift +cd cog/modules/swift/ echo building... ant redist -Dwith-provider-coaster=true || exit 1 # echo testing... -# export PATH=`pwd`/dist/vdsk-svn/bin:$PATH +# export PATH=`pwd`/dist/swift-svn/bin:$PATH # cd tests/misc # ./coaster.sh || exit 4 Modified: nmi-build-test/debug-x86_64-hang =================================================================== --- nmi-build-test/debug-x86_64-hang 2009-01-11 11:01:41 UTC (rev 2433) +++ nmi-build-test/debug-x86_64-hang 2009-01-11 11:03:51 UTC (rev 2434) @@ -3,19 +3,19 @@ which javac #locate ant #locate javac -mv trunk cog/modules/vdsk +mv trunk cog/modules/swift wget http://www.ci.uchicago.edu/~benc/tmp/more-debug-hanging-wh patch -p0 < more-debug-hanging-wh && echo patch was successful -cd cog/modules/vdsk/ +cd cog/modules/swift/ echo building... ant redist || exit 1 echo testing... -export PATH=`pwd`/dist/vdsk-svn/bin:$PATH +export PATH=`pwd`/dist/swift-svn/bin:$PATH cd tests/language-behaviour Modified: nmi-build-test/main =================================================================== --- nmi-build-test/main 2009-01-11 11:01:41 UTC (rev 2433) +++ nmi-build-test/main 2009-01-11 11:03:51 UTC (rev 2434) @@ -3,14 +3,14 @@ which javac #locate ant #locate javac -mv trunk cog/modules/vdsk -cd cog/modules/vdsk/ +mv trunk cog/modules/swift +cd cog/modules/swift/ echo building... ant redist || exit 1 echo testing... -export PATH=`pwd`/dist/vdsk-svn/bin:$PATH +export PATH=`pwd`/dist/swift-svn/bin:$PATH cd tests/language ./run || exit 2 Modified: nmi-build-test/with-deef =================================================================== --- nmi-build-test/with-deef 2009-01-11 11:01:41 UTC (rev 2433) +++ nmi-build-test/with-deef 2009-01-11 11:03:51 UTC (rev 2434) @@ -3,10 +3,10 @@ which javac #locate ant #locate javac -mv trunk cog/modules/vdsk +mv trunk cog/modules/swift mv provider-deef cog/modules -cd cog/modules/vdsk/ +cd cog/modules/swift/ echo building... ant -Dwith-provider-deef redist || exit 1 From noreply at svn.ci.uchicago.edu Sun Jan 11 05:15:00 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Sun, 11 Jan 2009 05:15:00 -0600 (CST) Subject: [Swift-commit] r2435 - trunk/docs Message-ID: <20090111111500.206072281A2@www.ci.uchicago.edu> Author: benc Date: 2009-01-11 05:14:59 -0600 (Sun, 11 Jan 2009) New Revision: 2435 Modified: trunk/docs/quickstartguide.xml trunk/docs/reallyquickstartguide.xml trunk/docs/userguide.xml Log: update docs to reflect change of cog module name from vdsk to swift Modified: trunk/docs/quickstartguide.xml =================================================================== --- trunk/docs/quickstartguide.xml 2009-01-11 11:03:51 UTC (rev 2434) +++ trunk/docs/quickstartguide.xml 2009-01-11 11:14:59 UTC (rev 2435) @@ -78,7 +78,7 @@ build instructions are available on the Swift downloads page. Once built, the dist/vdsk-svn directory + class="directory">dist/swift-svn directory will contain a self-contained build which can be used in place or moved to a different location. You should then proceed to the configuration section. @@ -93,16 +93,16 @@ Simply unpack the downloaded package (vdsk-<version>.tar.gz) into a + class="file">swift-<version>.tar.gz) into a directory of your choice: > tar vdsk-<version>.tar.gz +class="file">swift-<version>.tar.gz This will create a vdsk-<version> directory + class="directory">swift-<version> directory containing the build. Modified: trunk/docs/reallyquickstartguide.xml =================================================================== --- trunk/docs/reallyquickstartguide.xml 2009-01-11 11:03:51 UTC (rev 2434) +++ trunk/docs/reallyquickstartguide.xml 2009-01-11 11:14:59 UTC (rev 2435) @@ -40,7 +40,7 @@ Unpack it and add the vdsk-xyz/bin directory to your + class="directory">swift-xyz/bin directory to your PATH. @@ -63,7 +63,7 @@ Edit vdsk-xyz/etc/swift.properties. You + class="file">swift-xyz/etc/swift.properties. You should add your numeric IP address there (ip.address=x.y.z.w). @@ -77,7 +77,7 @@ are configured for local submission): -cd vdsk-xyz/etc +cd swift-xyz/etc cp sites.xml.example sites.xml cp tc.data.example tc.data Modified: trunk/docs/userguide.xml =================================================================== --- trunk/docs/userguide.xml 2009-01-11 11:03:51 UTC (rev 2434) +++ trunk/docs/userguide.xml 2009-01-11 11:14:59 UTC (rev 2435) @@ -2730,12 +2730,12 @@ with-provider-deef - build with Falkon provider deef. In order for this option to work, it is necessary to check out the provider-deef code in -the cog/modules directory alongside vdsk: +the cog/modules directory alongside swift: $ cd cog/modules $ svn co https://svn.ci.uchicago.edu/svn/vdl2/provider-deef -$ cd ../vdsk +$ cd ../swift $ ant -Dwith-provider-deef=true redist @@ -2745,12 +2745,12 @@ that provides delays and unreliability for the purposes of testing Swift's fault tolerance mechanisms. In order for this option to work, it is necessary to check out the provider-wonky code in the cog/modules -directory alongside vdsk: +directory alongside swift: $ cd cog/modules $ svn co https://svn.ci.uchicago.edu/svn/vdl2/provider-wonky -$ cd ../vdsk +$ cd ../swift $ ant -Dwith-provider-wonky=true redist From noreply at svn.ci.uchicago.edu Sun Jan 11 05:39:05 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Sun, 11 Jan 2009 05:39:05 -0600 (CST) Subject: [Swift-commit] r2436 - provider-wonky/src/org/globus/cog/abstraction/impl/execution/wonky Message-ID: <20090111113905.845E72281A2@www.ci.uchicago.edu> Author: benc Date: 2009-01-11 05:39:04 -0600 (Sun, 11 Jan 2009) New Revision: 2436 Modified: provider-wonky/src/org/globus/cog/abstraction/impl/execution/wonky/JobSubmissionTaskHandler.java Log: option to not pass through unix exit code as Status.FAILED, instead indicating Status.COMPLETED - this more closely models GRAM2 Modified: provider-wonky/src/org/globus/cog/abstraction/impl/execution/wonky/JobSubmissionTaskHandler.java =================================================================== --- provider-wonky/src/org/globus/cog/abstraction/impl/execution/wonky/JobSubmissionTaskHandler.java 2009-01-11 11:14:59 UTC (rev 2435) +++ provider-wonky/src/org/globus/cog/abstraction/impl/execution/wonky/JobSubmissionTaskHandler.java 2009-01-11 11:39:04 UTC (rev 2436) @@ -259,14 +259,20 @@ if (killed) { return; } - if (exitCode == 0) { - if(failDelay("completed")) { - this.task.setStatus(Status.COMPLETED); + + if(siteOptions.contains("nofailonexit")) { + // suppress failures caused by exit code + this.task.setStatus(Status.COMPLETED); + } else { // normal fail behaviour + if (exitCode == 0) { + if(failDelay("completed")) { + this.task.setStatus(Status.COMPLETED); + } else { + this.task.setStatus(Status.FAILED); + } } else { - this.task.setStatus(Status.FAILED); + throw new JobException(exitCode); } - } else { - throw new JobException(exitCode); } } catch (Exception e) { if (killed) { From noreply at svn.ci.uchicago.edu Sun Jan 11 06:32:59 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Sun, 11 Jan 2009 06:32:59 -0600 (CST) Subject: [Swift-commit] r2437 - trunk/src/org/griphyn/vdl/karajan/functions Message-ID: <20090111123259.707062281D8@www.ci.uchicago.edu> Author: benc Date: 2009-01-11 06:32:58 -0600 (Sun, 11 Jan 2009) New Revision: 2437 Modified: trunk/src/org/griphyn/vdl/karajan/functions/ConfigProperty.java Log: ConfigProperty function can now be given a BoundContact, and will take properties from that host in preference to other config mechanisms. In future commits, this will allow more settings to be made per-site. Modified: trunk/src/org/griphyn/vdl/karajan/functions/ConfigProperty.java =================================================================== --- trunk/src/org/griphyn/vdl/karajan/functions/ConfigProperty.java 2009-01-11 11:39:04 UTC (rev 2436) +++ trunk/src/org/griphyn/vdl/karajan/functions/ConfigProperty.java 2009-01-11 12:32:58 UTC (rev 2437) @@ -11,21 +11,41 @@ import org.globus.cog.karajan.workflow.ExecutionException; import org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction; import org.griphyn.vdl.util.VDL2Config; +import org.globus.cog.karajan.util.BoundContact; +import org.apache.log4j.Logger; public class ConfigProperty extends AbstractFunction { public static final Arg NAME = new Arg.Positional("name"); public static final Arg INSTANCE = new Arg.Optional("instance", Boolean.TRUE); + public static final Arg HOST = new Arg.Optional("host",null); static { - setArguments(ConfigProperty.class, new Arg[] { NAME, INSTANCE }); + setArguments(ConfigProperty.class, new Arg[] { NAME, INSTANCE, HOST }); } public static final String INSTANCE_CONFIG_FILE = "vdl:instanceconfigfile"; public static final String INSTANCE_CONFIG = "vdl:instanceconfig"; + public static final Logger logger = Logger.getLogger(ConfigProperty.class); + public Object function(VariableStack stack) throws ExecutionException { String name = TypeUtil.toString(NAME.getValue(stack)); boolean instance = TypeUtil.toBoolean(INSTANCE.getValue(stack)); + Object host = HOST.getValue(stack); + if(logger.isDebugEnabled()) { + logger.debug("Getting property "+name+" with host "+host); + } + if(host!= null) { + // see if the host has this property defined, and if so + // get its value + BoundContact h = (BoundContact)host; + String prop = (String) h.getProperty(name); + if(prop != null) { + logger.debug("Found property "+name+" in BoundContact"); + return prop; + } + logger.debug("Could not find property "+name+" in BoundContact"); + } return getProperty(name, instance, stack); } From noreply at svn.ci.uchicago.edu Mon Jan 12 01:37:34 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Mon, 12 Jan 2009 01:37:34 -0600 (CST) Subject: [Swift-commit] r2438 - log-processing/libexec Message-ID: <20090112073734.2C925228114@www.ci.uchicago.edu> Author: benc Date: 2009-01-12 01:37:32 -0600 (Mon, 12 Jan 2009) New Revision: 2438 Modified: log-processing/libexec/compute-t-inf Log: when the last line of a log file does not contain a time stamp (generally because it is a continuation line of a multi-line log entry) then compute-t-inf was not finding the end time of the run. this commit will allow it to find the end time as long as it occurs within the last 500 lines. potentially the entire log file could be searched, but that would increase processing time. Modified: log-processing/libexec/compute-t-inf =================================================================== --- log-processing/libexec/compute-t-inf 2009-01-11 12:32:58 UTC (rev 2437) +++ log-processing/libexec/compute-t-inf 2009-01-12 07:37:32 UTC (rev 2438) @@ -4,5 +4,5 @@ # for now, approximates by the last line of the swift log file -tail -n 1 | iso-to-secs | cut -f 1 -d " " +tail -n 500 | grep -E '^[0-9]' | tail -n 1 | iso-to-secs | cut -f 1 -d " " From noreply at svn.ci.uchicago.edu Mon Jan 12 12:04:39 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Mon, 12 Jan 2009 12:04:39 -0600 (CST) Subject: [Swift-commit] r2439 - trunk/src/org/griphyn/vdl/engine Message-ID: <20090112180439.41FA22281A2@www.ci.uchicago.edu> Author: benc Date: 2009-01-12 12:04:36 -0600 (Mon, 12 Jan 2009) New Revision: 2439 Modified: trunk/src/org/griphyn/vdl/engine/Karajan.java Log: correct typo in exception message Modified: trunk/src/org/griphyn/vdl/engine/Karajan.java =================================================================== --- trunk/src/org/griphyn/vdl/engine/Karajan.java 2009-01-12 07:37:32 UTC (rev 2438) +++ trunk/src/org/griphyn/vdl/engine/Karajan.java 2009-01-12 18:04:36 UTC (rev 2439) @@ -649,7 +649,7 @@ String inType = datatype(inST); if (!inType.substring(inType.length() - 2).equals("[]")) - throw new CompilationException("You can iterate through array atructure only"); + throw new CompilationException("You can iterate through an array structure only"); String varType = inType.substring(0, inType.length() - 2); innerScope.addVariable(foreach.getVar(), varType); foreachST.setAttribute("indexVar", foreach.getIndexVar()); From noreply at svn.ci.uchicago.edu Mon Jan 12 12:15:52 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Mon, 12 Jan 2009 12:15:52 -0600 (CST) Subject: [Swift-commit] r2440 - in trunk: src/org/griphyn/vdl/engine tests/language-behaviour Message-ID: <20090112181552.818E72281DA@www.ci.uchicago.edu> Author: benc Date: 2009-01-12 12:15:51 -0600 (Mon, 12 Jan 2009) New Revision: 2440 Added: trunk/tests/language-behaviour/027-single-character-typename.swift trunk/tests/language-behaviour/028-double-character-typename.swift Modified: trunk/src/org/griphyn/vdl/engine/Karajan.java Log: fix bug 164 - some parts of the compiler were assuming that type names were at least 2 characters long Modified: trunk/src/org/griphyn/vdl/engine/Karajan.java =================================================================== --- trunk/src/org/griphyn/vdl/engine/Karajan.java 2009-01-12 18:04:36 UTC (rev 2439) +++ trunk/src/org/griphyn/vdl/engine/Karajan.java 2009-01-12 18:15:51 UTC (rev 2440) @@ -367,7 +367,7 @@ } void checkIsTypeDefined(String type) throws CompilationException { - while (type.substring(type.length() - 2).equals("[]")) + while (type.length() > 2 && type.substring(type.length() - 2).equals("[]")) type = type.substring(0, type.length() - 2); if (!type.equals("int") && !type.equals("float") && !type.equals("string") && !type.equals("boolean") && !type.equals("external")) { @@ -648,7 +648,7 @@ foreachST.setAttribute("in", inST); String inType = datatype(inST); - if (!inType.substring(inType.length() - 2).equals("[]")) + if (inType.length() < 2 || !inType.substring(inType.length() - 2).equals("[]")) throw new CompilationException("You can iterate through an array structure only"); String varType = inType.substring(0, inType.length() - 2); innerScope.addVariable(foreach.getVar(), varType); Added: trunk/tests/language-behaviour/027-single-character-typename.swift =================================================================== --- trunk/tests/language-behaviour/027-single-character-typename.swift (rev 0) +++ trunk/tests/language-behaviour/027-single-character-typename.swift 2009-01-12 18:15:51 UTC (rev 2440) @@ -0,0 +1,3 @@ +type q; +q i; + Added: trunk/tests/language-behaviour/028-double-character-typename.swift =================================================================== --- trunk/tests/language-behaviour/028-double-character-typename.swift (rev 0) +++ trunk/tests/language-behaviour/028-double-character-typename.swift 2009-01-12 18:15:51 UTC (rev 2440) @@ -0,0 +1,4 @@ +type qq; + +qq i; + From noreply at svn.ci.uchicago.edu Wed Jan 14 07:36:17 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Wed, 14 Jan 2009 07:36:17 -0600 (CST) Subject: [Swift-commit] r2441 - trunk/src/org/griphyn/vdl/mapping Message-ID: <20090114133617.40B4C228198@www.ci.uchicago.edu> Author: benc Date: 2009-01-14 07:36:15 -0600 (Wed, 14 Jan 2009) New Revision: 2441 Modified: trunk/src/org/griphyn/vdl/mapping/AbstractDataNode.java Log: removes some debugging code that leaked into a commit Modified: trunk/src/org/griphyn/vdl/mapping/AbstractDataNode.java =================================================================== --- trunk/src/org/griphyn/vdl/mapping/AbstractDataNode.java 2009-01-12 18:15:51 UTC (rev 2440) +++ trunk/src/org/griphyn/vdl/mapping/AbstractDataNode.java 2009-01-14 13:36:15 UTC (rev 2441) @@ -463,8 +463,7 @@ public synchronized void addListener(DSHandleListener listener) { if (logger.isInfoEnabled()) { -Exception e = new Exception("To get stack trace"); - logger.info("Adding handle listener \"" + listener + "\" to \"" + getIdentifyingString() + "\"", e); + logger.info("Adding handle listener \"" + listener + "\" to \"" + getIdentifyingString() + "\""); } if (listeners == null) { listeners = new LinkedList(); From noreply at svn.ci.uchicago.edu Wed Jan 14 08:38:43 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Wed, 14 Jan 2009 08:38:43 -0600 (CST) Subject: [Swift-commit] r2442 - in trunk: . libexec src/org/griphyn/vdl/util Message-ID: <20090114143843.C721F228198@www.ci.uchicago.edu> Author: benc Date: 2009-01-14 08:38:43 -0600 (Wed, 14 Jan 2009) New Revision: 2442 Modified: trunk/CHANGES.txt trunk/libexec/vdl-int.k trunk/libexec/wrapper.sh trunk/src/org/griphyn/vdl/util/VDL2Config.java Log: Application success/failure status reporting can now be done using CoG provider status, rather than the previous only choice of using status files on the shared file system. A status.mode parameter has been added to set this. It can be configured either in the swift.properties file, to have effect for all sites, or can be set per-site. Modified: trunk/CHANGES.txt =================================================================== --- trunk/CHANGES.txt 2009-01-14 13:36:15 UTC (rev 2441) +++ trunk/CHANGES.txt 2009-01-14 14:38:43 UTC (rev 2442) @@ -1,3 +1,11 @@ +(01/14/09) +*** Application success/failure status reporting can now be done using + CoG provider status, rather than the previous only choice of + using status files on the shared file system. A status.mode parameter + has been added to set this. It can be configured either in the + swift.properties file, to have effect for all sites, or can be set + per-site. + (01/10/09) *** Console output for individual application invocation start and finish is no longer shown. The progress ticker now appears more often. Modified: trunk/libexec/vdl-int.k =================================================================== --- trunk/libexec/vdl-int.k 2009-01-14 13:36:15 UTC (rev 2441) +++ trunk/libexec/vdl-int.k 2009-01-14 14:38:43 UTC (rev 2442) @@ -107,7 +107,12 @@ transfer(srcdir="{vds.home}/libexec/", srcfile="wrapper.sh", destdir=sharedDir, desthost=rhost) transfer(srcdir="{vds.home}/libexec/", srcfile="seq.sh", destdir=sharedDir, desthost=rhost) dir:make(dircat(wfdir, "kickstart"), host=rhost) - dir:make(dircat(wfdir, "status"), host=rhost) + + statusMode := configProperty("status.mode",host=rhost) + if(statusMode == "files" + dir:make(dircat(wfdir, "status"), host=rhost) + ) + dir:make(dircat(wfdir, "info"), host=rhost) wfdir, sharedDir //we send the cleanup data to vdl:main() @@ -374,6 +379,9 @@ jobid := concat(tr, "-", uid) log(LOG:DEBUG, "THREAD_ASSOCIATION jobid={jobid} thread={#thread} host={rhost} replicationGroup={replicationGroup}") + + statusMode := configProperty("status.mode",host=rhost) + vdl:setprogress("Stage in") tmpdir := dircat(concat(wfdir, "/jobs/", jobdir), jobid) @@ -403,6 +411,7 @@ "-if", flatten(infiles(stagein)), "-of", flatten(outfiles(stageout)), "-k", kickstart, + "-status", statusMode "-a", maybe(each(arguments))) directory=wfdir redirect=false @@ -412,8 +421,10 @@ replicationChannel=replicationChannel jobid=jobid ) - - checkJobStatus(rhost, wfdir, jobid, tr, jobdir) + + if(statusMode == "files" + checkJobStatus(rhost, wfdir, jobid, tr, jobdir) + ) log(LOG:DEBUG, "STAGING_OUT jobid={jobid}") Modified: trunk/libexec/wrapper.sh =================================================================== --- trunk/libexec/wrapper.sh 2009-01-14 13:36:15 UTC (rev 2441) +++ trunk/libexec/wrapper.sh 2009-01-14 14:38:43 UTC (rev 2442) @@ -37,12 +37,16 @@ fail() { EC=$1 shift - echo $@ >"$WFDIR/status/$JOBDIR/${ID}-error" + if [ "$STATUSMODE" = "files" ]; then + echo $@ >"$WFDIR/status/$JOBDIR/${ID}-error" + fi log $@ info - #exit $EC - #let vdl-int.k handle the issues - exit 0 + if [ "$STATUSMODE" = "files" ]; then + exit 0 + else + exit $EC + fi } checkError() { @@ -115,7 +119,6 @@ logstate "LOG_START" infosection "Wrapper" -mkdir -p $WFDIR/status/$JOBDIR getarg "-e" "$@" EXEC=$VALUE @@ -149,12 +152,20 @@ KICKSTART=$VALUE shift $SHIFTCOUNT +getarg "-status" "$@" +STATUSMODE=$VALUE +shift $SHIFTCOUNT + if [ "$1" == "-a" ]; then shift else fail 254 "Missing arguments (-a option)" fi +if [ "$STATUSMODE" = "files" ]; then + mkdir -p $WFDIR/status/$JOBDIR +fi + if [ "X$SWIFT_JOBDIR_PATH" != "X" ]; then log "Job directory mode is: local copy" DIR=${SWIFT_JOBDIR_PATH}/$JOBDIR/$ID @@ -289,8 +300,15 @@ rm -rf "$DIR" 2>&1 >& "$INFO" checkError 254 "Failed to remove job directory $DIR" -logstate "TOUCH_SUCCESS" -touch status/${JOBDIR}/${ID}-success +if [ "$STATUSMODE" = "files" ]; then + logstate "TOUCH_SUCCESS" + touch status/${JOBDIR}/${ID}-success +fi + logstate "END" closeinfo + +# ensure we exit with a 0 after a successful exection +exit 0 + Modified: trunk/src/org/griphyn/vdl/util/VDL2Config.java =================================================================== --- trunk/src/org/griphyn/vdl/util/VDL2Config.java 2009-01-14 13:36:15 UTC (rev 2441) +++ trunk/src/org/griphyn/vdl/util/VDL2Config.java 2009-01-14 14:38:43 UTC (rev 2442) @@ -81,6 +81,7 @@ put("replication.enabled", "false"); put("replication.min.queue.time", "60"); put("replication.limit", "3"); + put("status.mode", "files"); } private VDL2Config(VDL2Config other) { From noreply at svn.ci.uchicago.edu Wed Jan 14 12:23:47 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Wed, 14 Jan 2009 12:23:47 -0600 (CST) Subject: [Swift-commit] r2443 - text/hpdc09submission Message-ID: <20090114182347.E2B002281A2@www.ci.uchicago.edu> Author: wilde Date: 2009-01-14 12:23:47 -0600 (Wed, 14 Jan 2009) New Revision: 2443 Modified: text/hpdc09submission/paper.latex Log: Added a stub of an example for BLAST - just copied Allan's notes from SWFT wiki into here. Modified: text/hpdc09submission/paper.latex =================================================================== --- text/hpdc09submission/paper.latex 2009-01-14 14:38:43 UTC (rev 2442) +++ text/hpdc09submission/paper.latex 2009-01-14 18:23:47 UTC (rev 2443) @@ -865,10 +865,55 @@ heterogeneity between sites. Large number of sites; automatic site file selection; and automatic app deployment there. +\subsection{BLAST Application Example} + +The following is notes from the Wiki by Allan: needs much refinement, adding here as a placeholder. + +\begin{verbatim} +type database; +type query; +type output; +type error; + +(output out, error err) blastall(query i, database db) { + app { + blastall "-p" "blastp" "-F" "F" "-d" @filename(db) "-i" + at filename(i) "-v" "300" "-b" "300" "-m8" "-o" @filename(out) +stderr=@filename(err); + } +} + +database pir ; +output out <"test.out">; +query i <"test.in">; +error err <"test.err">; +(out,err) = blastall(i, pir); +\end{verbatim} + +The trick here is that blastall reads takes the prefix name of the database files that it will read (.phr, .seq and .pin files). So i made a dummy file called "UNIPROT_for_blast_14.0.seq" to satisfy the data dependency . So here is the final list of my files: + +\begin{verbatim} +-rw-r--r-- 1 aespinosa ci-users 0 Nov 15 13:49 UNIPROT_for_blast_14.0.seq +-rw-r--r-- 1 aespinosa ci-users 204106872 Oct 20 16:50 UNIPROT_for_blast_14.0.seq.00.phr +-rw-r--r-- 1 aespinosa ci-users 23001752 Oct 20 16:50 UNIPROT_for_blast_14.0.seq.00.pin +-rw-r--r-- 1 aespinosa ci-users 999999669 Oct 20 16:51 UNIPROT_for_blast_14.0.seq.00.psq +-rw-r--r-- 1 aespinosa ci-users 233680738 Oct 20 16:51 UNIPROT_for_blast_14.0.seq.01.phr +-rw-r--r-- 1 aespinosa ci-users 26330312 Oct 20 16:51 UNIPROT_for_blast_14.0.seq.01.pin +-rw-r--r-- 1 aespinosa ci-users 999999864 Oct 20 16:52 UNIPROT_for_blast_14.0.seq.01.psq +-rw-r--r-- 1 aespinosa ci-users 21034886 Oct 20 16:52 UNIPROT_for_blast_14.0.seq.02.phr +-rw-r--r-- 1 aespinosa ci-users 2370216 Oct 20 16:52 UNIPROT_for_blast_14.0.seq.02.pin +-rw-r--r-- 1 aespinosa ci-users 103755125 Oct 20 16:52 UNIPROT_for_blast_14.0.seq.02.psq +-rw-r--r-- 1 aespinosa ci-users 208 Oct 20 16:52 UNIPROT_for_blast_14.0.seq.pal +\end{verbatim} + +I looked at the dock6 documentation for OSG. It looks that it recommends to transfer the datafiles to OSG sites manually via globus-url-copy. By my understanding of how swift works, it should be able to transfer my local files to the selected sites. I have yet to try this and will look more on examples in the data management side of Swift. + +Do you know other users who went in this approach? The documentation has only a few examples in managing data. I'll check the swift Wiki later and see what material we have and also post this email/ notes. + \subsection{fMRI Application Example} \begin{figure}[htbp] -\includegraphics{IMG_fmridataset} +\includegraphics[scale=0.5]{IMG_fmridataset} \caption{FMRI application} \end{figure} From noreply at svn.ci.uchicago.edu Wed Jan 14 12:29:44 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Wed, 14 Jan 2009 12:29:44 -0600 (CST) Subject: [Swift-commit] r2444 - text/hpdc09submission Message-ID: <20090114182944.699402281A2@www.ci.uchicago.edu> Author: wilde Date: 2009-01-14 12:29:43 -0600 (Wed, 14 Jan 2009) New Revision: 2444 Modified: text/hpdc09submission/paper.latex Log: Added stub of example for DOCK (from Zhengiong). Modified: text/hpdc09submission/paper.latex =================================================================== --- text/hpdc09submission/paper.latex 2009-01-14 18:23:47 UTC (rev 2443) +++ text/hpdc09submission/paper.latex 2009-01-14 18:29:43 UTC (rev 2444) @@ -1064,7 +1064,41 @@ as well as a file containing the minimum value achieved from that model. The processing of these models on Ranger was achieved in <45 minutes. +\subsection{Molecular Dynamics with DOCK} +\begin{verbatim} +(file t,DockOut tarout) dockcompute (DockIn infile, string targetlist) { + app { + rundock @infile targetlist stdout=@filename(t) @tarout; + } +} + +type params { + string ligandsfile; + string targetlist; +} + +#params pset[] ; +doall(params pset[]) +{ + foreach params,i in pset { + DockIn infile < single_file_mapper; file=@strcat("/home/houzx/dock- +run/databases/KEGG_and_Drugs/",pset[i].ligandsfile)>; + file sout ; + DockOut tout ; +# DockOut tout <"result.tar.gz">; +# sout = dockcompute(infile,pset[i].targetlist); + (sout,tout) = dockcompute(infile,pset[i].targetlist); + + } +} + +params p[]; +p = readdata("paramslist.txt"); +doall(p); +\end{verbatim} + \section{Usage Experience} \subsection{Use on large numbers of sites in the Open Science Grid} From noreply at svn.ci.uchicago.edu Fri Jan 16 13:04:26 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Fri, 16 Jan 2009 13:04:26 -0600 (CST) Subject: [Swift-commit] r2445 - SwiftApps/SIDGrid/config Message-ID: <20090116190426.7256D228154@www.ci.uchicago.edu> Author: skenny Date: 2009-01-16 13:04:25 -0600 (Fri, 16 Jan 2009) New Revision: 2445 Modified: SwiftApps/SIDGrid/config/sites_ranger.xml Log: setting default ranger sites file to use coasters Modified: SwiftApps/SIDGrid/config/sites_ranger.xml =================================================================== --- SwiftApps/SIDGrid/config/sites_ranger.xml 2009-01-14 18:29:43 UTC (rev 2444) +++ SwiftApps/SIDGrid/config/sites_ranger.xml 2009-01-16 19:04:25 UTC (rev 2445) @@ -1,16 +1,18 @@ - + - + 1 - 3 - - + 8 + TG-DBS080005N + + 16 + /work/00926/tg459516/sidgrid_out/{username} From noreply at svn.ci.uchicago.edu Mon Jan 19 10:15:29 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Mon, 19 Jan 2009 10:15:29 -0600 (CST) Subject: [Swift-commit] r2446 - in trunk: docs etc Message-ID: <20090119161529.88F7D2281A1@www.ci.uchicago.edu> Author: benc Date: 2009-01-19 10:15:28 -0600 (Mon, 19 Jan 2009) New Revision: 2446 Modified: trunk/docs/userguide.xml trunk/etc/swift.properties Log: doc on status.mode property Modified: trunk/docs/userguide.xml =================================================================== --- trunk/docs/userguide.xml 2009-01-16 19:04:25 UTC (rev 2445) +++ trunk/docs/userguide.xml 2009-01-19 16:15:28 UTC (rev 2446) @@ -1717,7 +1717,32 @@ + + + status.mode + + + + Valid values: files, provider + + + Default value: files + + + +Controls how Swift will communicate the result code of running user programs +from workers to the submit side. In files mode, a file +indicating success or failure will be created on the site shared filesystem. +In provider mode, the execution provider job status will +be used. Notably, GRAM2 does not return job statuses correctly, and so +provider mode will not work with GRAM2. With other +providers, it can be used to reduce the amount of filesystem access compared +to files mode. + + + + Modified: trunk/etc/swift.properties =================================================================== --- trunk/etc/swift.properties 2009-01-16 19:04:25 UTC (rev 2445) +++ trunk/etc/swift.properties 2009-01-19 16:15:28 UTC (rev 2446) @@ -282,3 +282,16 @@ # #ip.address=127.0.0.1 + + +# Controls how Swift will communicate the result code of running user programs +# from workers to the submit side. In files mode, a file +# indicating success or failure will be created on the site shared filesystem. +# In provider mode, the execution provider job status will +# be used. Notably, GRAM2 does not return job statuses correctly, and so +# provider mode will not work with GRAM2. With other +# providers, it can be used to reduce the amount of filesystem access compared +# to files mode. +# +# status.mode=files + From noreply at svn.ci.uchicago.edu Mon Jan 19 10:23:58 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Mon, 19 Jan 2009 10:23:58 -0600 (CST) Subject: [Swift-commit] r2447 - trunk/docs Message-ID: <20090119162358.E38C12281FB@www.ci.uchicago.edu> Author: benc Date: 2009-01-19 10:23:58 -0600 (Mon, 19 Jan 2009) New Revision: 2447 Modified: trunk/docs/userguide.xml Log: further status.mode documentation Modified: trunk/docs/userguide.xml =================================================================== --- trunk/docs/userguide.xml 2009-01-19 16:15:28 UTC (rev 2446) +++ trunk/docs/userguide.xml 2009-01-19 16:23:58 UTC (rev 2447) @@ -1738,7 +1738,7 @@ be used. Notably, GRAM2 does not return job statuses correctly, and so provider mode will not work with GRAM2. With other providers, it can be used to reduce the amount of filesystem access compared -to files mode. +to files mode. (since Swift 0.8) @@ -2320,6 +2320,9 @@ delayBase - controls how much a site will be delayed when it performs poorly. With each reduction in a sites score by 1, the delay between execution attempts will increase by a factor of delayBase. + status.mode - allows the status.mode property to be set per-site instead of for an entire run. +See the Swift configuration properties section for more information. +(since Swift 0.8)
swift namespace storagesize limits the From noreply at svn.ci.uchicago.edu Fri Jan 23 03:52:08 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Fri, 23 Jan 2009 03:52:08 -0600 (CST) Subject: [Swift-commit] r2448 - www/downloads Message-ID: <20090123095208.C235522819E@www.ci.uchicago.edu> Author: benc Date: 2009-01-23 03:52:06 -0600 (Fri, 23 Jan 2009) New Revision: 2448 Modified: www/downloads/index.php Log: another change of vdsk module name to swift Modified: www/downloads/index.php =================================================================== --- www/downloads/index.php 2009-01-19 16:23:58 UTC (rev 2447) +++ www/downloads/index.php 2009-01-23 09:52:06 UTC (rev 2448) @@ -94,13 +94,13 @@

Checkout Swift:

cd cog/modules
-
svn co https://svn.ci.uchicago.edu/svn/vdl2/trunk vdsk
+
svn co https://svn.ci.uchicago.edu/svn/vdl2/trunk swift

- Change directory to the vdsk module: -

cd vdsk
+ Change directory to the swift module: +
cd swift

From noreply at svn.ci.uchicago.edu Mon Jan 26 06:59:03 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Mon, 26 Jan 2009 06:59:03 -0600 (CST) Subject: [Swift-commit] r2449 - trunk/tests/sites Message-ID: <20090126125903.CE6762281E9@www.ci.uchicago.edu> Author: benc Date: 2009-01-26 06:59:01 -0600 (Mon, 26 Jan 2009) New Revision: 2449 Added: trunk/tests/sites/renci-engage-condor.xml Modified: trunk/tests/sites/tc.data Log: add renci engagement site to tests; it uses condor, so the tests involving spaces and fancy things like that fail. Added: trunk/tests/sites/renci-engage-condor.xml =================================================================== --- trunk/tests/sites/renci-engage-condor.xml (rev 0) +++ trunk/tests/sites/renci-engage-condor.xml 2009-01-26 12:59:01 UTC (rev 2449) @@ -0,0 +1,9 @@ + + + + + + /nfs/osg-data/osgedu/benc/swift + + + Modified: trunk/tests/sites/tc.data =================================================================== --- trunk/tests/sites/tc.data 2009-01-23 09:52:06 UTC (rev 2448) +++ trunk/tests/sites/tc.data 2009-01-26 12:59:01 UTC (rev 2449) @@ -71,3 +71,11 @@ localhost grep /bin/grep INSTALLED INTEL32::LINUX null localhost sort /bin/sort INSTALLED INTEL32::LINUX null localhost paste /bin/paste INSTALLED INTEL32::LINUX null +renci-engage echo /bin/echo INSTALLED INTEL32::LINUX null +renci-engage cat /bin/cat INSTALLED INTEL32::LINUX null +renci-engage ls /bin/ls INSTALLED INTEL32::LINUX null +renci-engage wc /bin/wc INSTALLED INTEL32::LINUX null +renci-engage grep /bin/grep INSTALLED INTEL32::LINUX null +renci-engage sort /bin/sort INSTALLED INTEL32::LINUX null +renci-engage paste /bin/paste INSTALLED INTEL32::LINUX null +renci-engage touch /bin/touch INSTALLED INTEL32::LINUX null From noreply at svn.ci.uchicago.edu Mon Jan 26 11:15:32 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Mon, 26 Jan 2009 11:15:32 -0600 (CST) Subject: [Swift-commit] r2450 - trunk/docs Message-ID: <20090126171532.8689B2281E9@www.ci.uchicago.edu> Author: benc Date: 2009-01-26 11:15:30 -0600 (Mon, 26 Jan 2009) New Revision: 2450 Modified: trunk/docs/userguide.xml Log: missing provider in coaster doc Modified: trunk/docs/userguide.xml =================================================================== --- trunk/docs/userguide.xml 2009-01-26 12:59:01 UTC (rev 2449) +++ trunk/docs/userguide.xml 2009-01-26 17:15:30 UTC (rev 2450) @@ -2846,7 +2846,7 @@ To use for file transfer, specify a sites.xml filesystem element like this: -<filesystem url="gt2://grid.myhost.org" /> +<filesystem provider="coaster" url="gt2://grid.myhost.org" /> The url parameter should be a pseudo-URI formed with the URI scheme being the name of the provider to use to submit the coaster head job, and the From noreply at svn.ci.uchicago.edu Mon Jan 26 17:34:50 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Mon, 26 Jan 2009 17:34:50 -0600 (CST) Subject: [Swift-commit] r2451 - trunk/docs Message-ID: <20090126233450.CE12D22819E@www.ci.uchicago.edu> Author: benc Date: 2009-01-26 17:34:49 -0600 (Mon, 26 Jan 2009) New Revision: 2451 Added: trunk/docs/swift-site-model.fig trunk/docs/swift-site-model.png Modified: trunk/docs/userguide.xml Log: xfig diagram and png export - this is a diagram I made for the hpdc paper that we did not submit. the text around it needs reworking to discuss how execution actually happens Added: trunk/docs/swift-site-model.fig =================================================================== --- trunk/docs/swift-site-model.fig (rev 0) +++ trunk/docs/swift-site-model.fig 2009-01-26 23:34:49 UTC (rev 2451) @@ -0,0 +1,53 @@ +#FIG 3.2 Produced by xfig version 3.2.5 +Landscape +Center +Inches +Letter +100.00 +Single +-2 +1200 2 +2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 2100 3450 4425 2025 +2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 2100 3675 4425 3825 +2 4 0 1 0 11 999 -1 20 0.000 0 0 7 0 0 5 + 8850 4800 4200 4800 4200 675 8850 675 8850 4800 +2 2 0 1 0 5 50 -1 20 0.000 0 0 -1 0 0 5 + 7425 2025 8400 2025 8400 2775 7425 2775 7425 2025 +2 2 0 1 0 5 50 -1 20 0.000 0 0 -1 0 0 5 + 7425 2850 8400 2850 8400 3600 7425 3600 7425 2850 +2 2 0 1 0 5 50 -1 20 0.000 0 0 -1 0 0 5 + 7425 3675 8400 3675 8400 4425 7425 4425 7425 3675 +2 2 0 1 0 5 50 -1 20 0.000 0 0 -1 0 0 5 + 7425 1200 8400 1200 8400 1950 7425 1950 7425 1200 +2 2 0 1 0 6 500 -1 20 0.000 0 0 -1 0 0 5 + 525 3000 2025 3000 2025 4050 525 4050 525 3000 +2 2 0 1 0 6 100 -1 20 0.000 0 0 -1 0 0 5 + 4500 1350 6300 1350 6300 2700 4500 2700 4500 1350 +2 2 0 1 0 6 100 -1 20 0.000 0 0 -1 0 0 5 + 4500 3225 6375 3225 6375 4425 4500 4425 4500 3225 +2 4 0 1 0 11 55 -1 20 0.000 0 0 7 0 0 5 + 8850 5850 4200 5850 4200 5100 8850 5100 8850 5850 +2 4 0 1 0 11 55 -1 20 0.000 0 0 7 0 0 5 + 8850 6675 4200 6675 4200 6075 8850 6075 8850 6675 +3 0 0 1 0 7 50 -1 -1 0.000 0 0 0 2 + 7350 1575 6375 1950 + 0.000 0.000 +3 0 0 1 0 7 50 -1 -1 0.000 0 0 0 2 + 7350 1725 6450 3975 + 0.000 0.000 +4 0 0 50 -1 0 12 0.0000 4 150 450 750 3300 Swift\001 +4 0 0 50 -1 0 12 0.0000 4 150 1155 750 3555 commandline\001 +4 0 0 50 -1 0 12 0.0000 4 150 465 750 3810 client\001 +4 0 0 50 -1 0 12 0.0000 4 195 1500 4725 2100 shared filesystem\001 +4 0 0 50 -1 0 12 0.0000 4 150 465 4800 3900 LRM\001 +4 0 0 50 -1 0 12 0.0000 4 150 1215 7350 900 Worker nodes\001 +4 0 0 50 -1 0 12 0.0000 4 150 465 6525 1425 Posix\001 +4 0 0 50 -1 0 12 0.0000 4 150 1065 5325 5475 Another site\001 +4 0 0 50 -1 0 12 0.0000 4 150 1065 5250 6450 Another site\001 +4 0 0 50 -1 0 12 0.0000 4 150 810 2475 2475 remote fs\001 +4 0 0 50 -1 0 12 0.0000 4 105 600 2475 2730 access\001 +4 0 0 50 -1 0 12 0.0000 4 195 1260 2400 4125 job submission\001 +4 0 0 50 -1 0 12 0.0000 4 195 915 2400 4380 eg GRAM\001 +4 0 0 50 -1 0 12 0.0000 4 195 1080 3075 3000 eg. GridFTP\001 Added: trunk/docs/swift-site-model.png =================================================================== (Binary files differ) Property changes on: trunk/docs/swift-site-model.png ___________________________________________________________________ Name: svn:mime-type + application/octet-stream Modified: trunk/docs/userguide.xml =================================================================== --- trunk/docs/userguide.xml 2009-01-26 17:15:30 UTC (rev 2450) +++ trunk/docs/userguide.xml 2009-01-26 23:34:49 UTC (rev 2451) @@ -2019,6 +2019,7 @@ make modifications to external databases that causes their output to differ if they are run more than once. +
From noreply at svn.ci.uchicago.edu Mon Jan 26 17:52:29 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Mon, 26 Jan 2009 17:52:29 -0600 (CST) Subject: [Swift-commit] r2452 - trunk/docs Message-ID: <20090126235229.64A742281A0@www.ci.uchicago.edu> Author: benc Date: 2009-01-26 17:52:27 -0600 (Mon, 26 Jan 2009) New Revision: 2452 Modified: trunk/docs/build-chunked-userguide.sh Log: copy PNGs from parent directory for the chunked user guide Modified: trunk/docs/build-chunked-userguide.sh =================================================================== --- trunk/docs/build-chunked-userguide.sh 2009-01-26 23:34:49 UTC (rev 2451) +++ trunk/docs/build-chunked-userguide.sh 2009-01-26 23:52:27 UTC (rev 2452) @@ -3,6 +3,7 @@ mkdir -p userguide/ || exit 1 cd userguide/ || exit 2 rm -f *.html *.php +cp ../*.png . xsltproc --nonet ../formatting/swiftsh_html_chunked.xsl ../userguide.xml From noreply at svn.ci.uchicago.edu Wed Jan 28 03:04:29 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Wed, 28 Jan 2009 03:04:29 -0600 (CST) Subject: [Swift-commit] r2453 - trunk/tests/sites/coaster Message-ID: <20090128090429.D849A22814F@www.ci.uchicago.edu> Author: benc Date: 2009-01-28 03:04:28 -0600 (Wed, 28 Jan 2009) New Revision: 2453 Added: trunk/tests/sites/coaster/renci-engage-coaster.xml Log: working test for renci engagement site using coasters - needs cog >=r2262 to get coasterInternalIP profile Added: trunk/tests/sites/coaster/renci-engage-coaster.xml =================================================================== --- trunk/tests/sites/coaster/renci-engage-coaster.xml (rev 0) +++ trunk/tests/sites/coaster/renci-engage-coaster.xml 2009-01-28 09:04:28 UTC (rev 2453) @@ -0,0 +1,11 @@ + + + + + + + /nfs/osg-data/osgedu/benc/swift +192.168.1.11 + + + From noreply at svn.ci.uchicago.edu Wed Jan 28 05:14:34 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Wed, 28 Jan 2009 05:14:34 -0600 (CST) Subject: [Swift-commit] r2454 - trunk/docs Message-ID: <20090128111434.EC6652281A0@www.ci.uchicago.edu> Author: benc Date: 2009-01-28 05:14:32 -0600 (Wed, 28 Jan 2009) New Revision: 2454 Modified: trunk/docs/userguide.xml Log: document coaster profile entries introduced in cog r2262 and r2263 Modified: trunk/docs/userguide.xml =================================================================== --- trunk/docs/userguide.xml 2009-01-28 09:04:28 UTC (rev 2453) +++ trunk/docs/userguide.xml 2009-01-28 11:14:32 UTC (rev 2454) @@ -2362,7 +2362,21 @@ the number of coaster workers to be run on each node. This profile entry is used by the coaster execution provider. + coasterWorkerMaxwalltime +specifies the maxwalltime to be used when submitting coaster workers. This +profile entry is used by the coaster execution +provider. If this entry is not specified, the coaster provider +will compute a maxwalltime based on the maxwalltime of jobs submitted. (since 0.9) + + coasterInternalIP +specifies the internal address of the coaster head node, to be used by +coaster workers to communicate with the coaster head node. This can be used +when the address determined automatically by the coaster provider +is inaccessible from coaster workers (for example, when the workers +reside on an unrouted internal network). (since 0.9) +
+
env namespace Profile keys set in the env namespace will be set in the unix environment of the From noreply at svn.ci.uchicago.edu Wed Jan 28 05:20:30 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Wed, 28 Jan 2009 05:20:30 -0600 (CST) Subject: [Swift-commit] r2455 - trunk/docs Message-ID: <20090128112030.689E422814F@www.ci.uchicago.edu> Author: benc Date: 2009-01-28 05:20:29 -0600 (Wed, 28 Jan 2009) New Revision: 2455 Modified: trunk/docs/userguide.xml Log: change formatting of profile section for slightly better readability Modified: trunk/docs/userguide.xml =================================================================== --- trunk/docs/userguide.xml 2009-01-28 11:14:32 UTC (rev 2454) +++ trunk/docs/userguide.xml 2009-01-28 11:20:29 UTC (rev 2455) @@ -2303,7 +2303,7 @@
Profiles
Karajan namespace - maxSubmitRate - limits the maximum rate of job submission, in jobs per second. + maxSubmitRate limits the maximum rate of job submission, in jobs per second. For example: <profile namespace="karajan" key="maxSubmitRate">0.2</profile> @@ -2311,30 +2311,30 @@ will limit job submission to 0.2 jobs per second (or equivalently, one job every five seconds). - jobThrottle - + jobThrottle allows the job throttle factor (see Swift property throttle.score.job.factor) to be set per site. - initialScore - + initialScore allows the initial score for rate limiting and site selection to be set to a value other than 0. - delayBase - controls how much a site will be delayed when it performs poorly. With each reduction + delayBase controls how much a site will be delayed when it performs poorly. With each reduction in a sites score by 1, the delay between execution attempts will increase by a factor of delayBase. - status.mode - allows the status.mode property to be set per-site instead of for an entire run. + status.mode allows the status.mode property to be set per-site instead of for an entire run. See the Swift configuration properties section for more information. (since Swift 0.8)
swift namespace - storagesize limits the + storagesize limits the amount of space that will be used on the remote site for temporary files. When more than that amount of space is used, the remote temporary file cache will be cleared using the algorithm specified in the -caching.algorithm property. +caching.algorithm property.
Globus namespace - maxwalltime specifies a walltime limit for each job, in minutes. This profile setting also interacts + maxwalltime specifies a walltime limit for each job, in minutes. This profile setting also interacts with the clustering mechanism. @@ -2345,30 +2345,30 @@ Hours:Minutes:Seconds - queue + queue is used by the PBS, GRAM2 and GRAM4 providers. This profile entry specifies which queue jobs will be submitted to. The valid queue names are site-specific. - host_types + host_types specifies the types of host that are permissible for a job to run on. The valid values are site-specific. This profile entry is used by the GRAM2 and GRAM4 providers. - condor_requirements allows a requirements string to be specified + condor_requirements allows a requirements string to be specified when Condor is used as an LRM behind GRAM2. Example: <profile namespace="globus" key="condor_requirements">Arch == "X86_64" || Arch="INTEL"</profile> - coastersPerNode specifies + coastersPerNode specifies the number of coaster workers to be run on each node. This profile entry is used by the coaster execution provider. - coasterWorkerMaxwalltime + coasterWorkerMaxwalltime specifies the maxwalltime to be used when submitting coaster workers. This profile entry is used by the coaster execution provider. If this entry is not specified, the coaster provider will compute a maxwalltime based on the maxwalltime of jobs submitted. (since 0.9) - coasterInternalIP + coasterInternalIP specifies the internal address of the coaster head node, to be used by coaster workers to communicate with the coaster head node. This can be used when the address determined automatically by the coaster provider From noreply at svn.ci.uchicago.edu Wed Jan 28 05:23:25 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Wed, 28 Jan 2009 05:23:25 -0600 (CST) Subject: [Swift-commit] r2456 - trunk/docs Message-ID: <20090128112325.D41C022814F@www.ci.uchicago.edu> Author: benc Date: 2009-01-28 05:23:25 -0600 (Wed, 28 Jan 2009) New Revision: 2456 Modified: trunk/docs/userguide.xml Log: change styling on environment variables section for readbility Modified: trunk/docs/userguide.xml =================================================================== --- trunk/docs/userguide.xml 2009-01-28 11:20:29 UTC (rev 2455) +++ trunk/docs/userguide.xml 2009-01-28 11:23:25 UTC (rev 2456) @@ -2727,20 +2727,20 @@ documented in this section: -PATHPREFIX - set in env namespace profiles. This path is prefixed onto the start -of the PATH when jobs are -executed. It can be more useful than setting the PATH environment variable directly, -because setting PATH will cause the execution site's default path to be lost. +PATHPREFIX - set in env namespace profiles. This path is prefixed onto the start +of the PATH when jobs are +executed. It can be more useful than setting the PATH environment variable directly, +because setting PATH will cause the execution site's default path to be lost. -GLOBUS_HOSTNAME, GLOBUS_TCP_PORT_RANGE - set in the environment before running +GLOBUS_HOSTNAME, GLOBUS_TCP_PORT_RANGE - set in the environment before running Swift. These can be set to inform Swift of the configuration of your local firewall. More information can be found in the Globus firewall How-to. -COG_OPTS - set in the environment before running Swift. Options set in this +COG_OPTS - set in the environment before running Swift. Options set in this variable will be passed as parameters to the Java Virtual Machine which will run Swift. The parameters vary between virtual machine imlementations, but can usually be used to alter settings such as maximum heap size. @@ -2748,7 +2748,7 @@ 1.4.2 command line options are documented here. -SWIFT_JOBDIR_PATH - set in env namespace profiles. If set, then Swift will +SWIFT_JOBDIR_PATH - set in env namespace profiles. If set, then Swift will use the path specified here as a worker-node local temporary directory to copy input files to before running a job. If unset, Swift will keep input files on the site-shared filesystem. In some cases, copying to a worker-node From noreply at svn.ci.uchicago.edu Wed Jan 28 05:26:26 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Wed, 28 Jan 2009 05:26:26 -0600 (CST) Subject: [Swift-commit] r2457 - trunk/docs Message-ID: <20090128112626.A230222814F@www.ci.uchicago.edu> Author: benc Date: 2009-01-28 05:26:24 -0600 (Wed, 28 Jan 2009) New Revision: 2457 Modified: trunk/docs/userguide.xml Log: styling in build options Modified: trunk/docs/userguide.xml =================================================================== --- trunk/docs/userguide.xml 2009-01-28 11:23:25 UTC (rev 2456) +++ trunk/docs/userguide.xml 2009-01-28 11:26:24 UTC (rev 2457) @@ -2764,14 +2764,15 @@ be supplied on the ant commandline. These are summarised here: -with-provider-condor - build with CoG condor provider +with-provider-condor - build with CoG condor provider -with-provider-coaster - build with CoG coaster provider (see -the section on coasters) +with-provider-coaster - build with CoG coaster provider (see +the section on coasters). Since 0.8, +coasters are always built, and this option has no effect. -with-provider-deef - build with Falkon provider deef. In order for this +with-provider-deef - build with Falkon provider deef. In order for this option to work, it is necessary to check out the provider-deef code in the cog/modules directory alongside swift: @@ -2784,10 +2785,10 @@ -with-provider-wonky - build with provider-wonky, an execution provider +with-provider-wonky - build with provider-wonky, an execution provider that provides delays and unreliability for the purposes of testing Swift's fault tolerance mechanisms. In order for this option to work, it is -necessary to check out the provider-wonky code in the cog/modules +necessary to check out the provider-wonky code in the cog/modules directory alongside swift: @@ -2798,8 +2799,8 @@ -no-supporting - produces a distribution without supporting commands such -as grid-proxy-init. This is intended for when the Swift distribution will be +no-supporting - produces a distribution without supporting commands such +as grid-proxy-init. This is intended for when the Swift distribution will be used in an environment where those commands are already provided by other packages, where the Swift package should be providing only Swift commands, and where the presence of commands such as grid-proxy-init from From noreply at svn.ci.uchicago.edu Wed Jan 28 05:33:21 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Wed, 28 Jan 2009 05:33:21 -0600 (CST) Subject: [Swift-commit] r2458 - trunk/docs Message-ID: <20090128113321.8E9B622814F@www.ci.uchicago.edu> Author: benc Date: 2009-01-28 05:33:20 -0600 (Wed, 28 Jan 2009) New Revision: 2458 Modified: trunk/docs/userguide.xml Log: duplicate ID was stopping pdf generation Modified: trunk/docs/userguide.xml =================================================================== --- trunk/docs/userguide.xml 2009-01-28 11:26:24 UTC (rev 2457) +++ trunk/docs/userguide.xml 2009-01-28 11:33:20 UTC (rev 2458) @@ -225,7 +225,7 @@
-
+
Procedures Datasets are operated on by procedures, which take input in the form of From noreply at svn.ci.uchicago.edu Wed Jan 28 07:05:35 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Wed, 28 Jan 2009 07:05:35 -0600 (CST) Subject: [Swift-commit] r2459 - trunk/docs Message-ID: <20090128130535.3451D22814F@www.ci.uchicago.edu> Author: benc Date: 2009-01-28 07:05:34 -0600 (Wed, 28 Jan 2009) New Revision: 2459 Modified: trunk/docs/userguide.xml Log: swift command return codes in a table, rather than a bulleted list - compactor, prettier Modified: trunk/docs/userguide.xml =================================================================== --- trunk/docs/userguide.xml 2009-01-28 11:33:20 UTC (rev 2458) +++ trunk/docs/userguide.xml 2009-01-28 13:05:34 UTC (rev 2459) @@ -1876,14 +1876,24 @@
Return codes -The swift command may exit with the following return codes: - -0 - success -1 - command line syntax error or missing project name -2 - error during execution -3 - error during compilation of SwiftScript program -4 - input file does not exist - +The swift command may exit with the following return codes: + + + + + value + meaning + + + + 0success + 1command line syntax error or missing project name + 2error during execution + 3error during compilation + 4input file does not exist + + +
From noreply at svn.ci.uchicago.edu Wed Jan 28 08:17:37 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Wed, 28 Jan 2009 08:17:37 -0600 (CST) Subject: [Swift-commit] r2460 - trunk/docs Message-ID: <20090128141737.5502722819E@www.ci.uchicago.edu> Author: benc Date: 2009-01-28 08:17:35 -0600 (Wed, 28 Jan 2009) New Revision: 2460 Modified: trunk/docs/userguide.xml Log: import a lot of work that was done for an HPDC paper that was not submitted Modified: trunk/docs/userguide.xml =================================================================== --- trunk/docs/userguide.xml 2009-01-28 13:05:34 UTC (rev 2459) +++ trunk/docs/userguide.xml 2009-01-28 14:17:35 UTC (rev 2460) @@ -35,6 +35,424 @@
The SwiftScript Language +
Language basics + +A Swift script describes data, application components, invocations +of applications components, and the inter-relations (data flow) +between those invocations. + + +Data is represented in a script by strongly-typed single-assignment +variables, using a C-like syntax. + + +Types in Swift can be atomic or +composite. An atomic type can be either a +primitive type or a mapped type. +Swift provides a fixed set of primitive types, such as +integer and string. A mapped +type indicates that the actual data does not reside in CPU addressable +memory (as it would in conventional programming languages), but in +POSIX-like files. Composite types are further subdivided into +structures and arrays. +Structures are similar in most respects to structure types in other languages. +Arrays use numeric indices, but are sparse. They can contain elements of +any type, including other array types, but all elements in an array must be +of the same type. We often refer to instances of composites of mapped types +as datasets. + + + +Mapped type and composite type variable declarations can be annotated with a +mapping descriptor indicating the file(s) that make up +that dataset. For example, the following line declares a variable named +photo with type image. It additionally +declares that the data for this variable is stored in a single file named +shane.jpeg. + + + + image photo <"shane.jpeg">; + + + +Conceptually, a parallel can be drawn between Swift mapped variables +and Java reference types. In both cases there is no syntactic distinction +between primitive types and mapped types or reference types respectively. +Additionally, the semantic distinction is also kept to a minimum. + + + +Component programs of scripts are declared in an app +declaration, with the description of the command line syntax +for that program and a list of input and output data. An app +block describes a functional/dataflow style interface to imperative +components. + + + +For example, the following example lists a procedure which makes use of +the ImageMagick +convert command to rotate a supplied +image by a specified angle: + + + + app (image output) rotate(image input) { + convert "-rotate" angle @input @output; + } + + + +A procedure is invoked using the familiar syntax: + + + + rotated = rotate(photo, 180); + + + +While this looks like an assignment, the actual unix level execution +consists of invoking the command line specified in the app +declaration, with variables on the left of the assignment bound to the +output parameters, and variables to the right of the procedure +invocation passed as inputs. + + + +The examples above have used the type \verb|image| with out any +definition of that type. We can declare it as a \emph{marker type} +which has no structure exposed to SwiftScript: + + + + type image; + + + +This does not indicate that the data is unstructured; but it indicates +that the structure of the data is not exposed to SwiftScript. Instead, +SwiftScript will treat variables of this type as individual opaque +files. + + + +With mechanisms to declare types, map variables to data files, and +declare and invoke procedures, we can build a complete (albeit simple) +script: + + + + type image; + image photo <"shane.jpeg">; + image rotated <"rotated.jpeg">; + + app (image output) rotate(image input, int angle) { + convert "-rotate" angle @input @output; + } + + rotated = rotate(photo, 180); + + + +This script can be invoked from the command line: + + + + $ ls *.jpeg + shane.jpeg + $ swift example.swift + ... + $ ls *.jpeg + shane.jpeg rotated.jpeg + + + +This executes a single convert command, hiding from the +user features such as remote multisite execution and fault tolerance that +will be discussed in a later section. + + +
+ +
Arrays and Parallel Execution + +Arrays of values can be delcared using the [] suffix. An +array be mapped to a collection of files, one element per file, by using +a different form of mapping expression. For example, the +filesys_mapper maps all files matching a particular +unix glob pattern into an array: + + + + file frames[] <filesys_mapper; pattern="*.jpeg">; + + + +The foreach construct can be used +to apply the same block of code to each element of an array: + + + + foreach f,ix in frames { + output[ix] = rotate(frames, 180); + } + + + +Sequential iteration can be expressed using the iterate +construct: + + + + step[0] = initialCondition(); + iterate ix { + step[ix] = simulate(step[ix-1]); + } + + + +This fragment will initialise the 0-th element of the step +array to some initial condition, and then repeatedly run the +simulate procedure, using each execution's outputs as +input to the next step. + + +
+ +
Ordering of execution + + +Non-array variables are single-assignment, which +means that they must be assigned to exactly one value during execution. +A procedure or expression will be executed when all of its input parameters +have been assigned values. As a result of such execution, more variables may +become assigned, possibly allowing further parts of the script to +execute. + + + +In this way, scripts are implicitly parallel. Aside from serialisation +implied by these dataflow dependencies, execution of component programs +can proceed in parallel. + + + +In this fragment, execution of procedures p and +q can happen in parallel: + + + + y=p(x); + z=q(x); + + +while in this fragment, execution is serialised by the variable +y, with procedure p executing +before q: + + + y=p(x); + z=q(y); + + + +Arrays in SwiftScript are more generally +monotonic; that is, knowledge about the +content of an array increases during execution, but cannot otherwise +change. Each element of the array is single assignment. +Eventually, all values for an array are known, and that array +is regarded as closed. + + + +Statements which deal with the array as a whole will often wait for the array +to be closed before executing (thus, a closed array is the equivalent +of a non-array type being assigned). However, a foreach +statement will apply its body to elements of an array as they become +known. It will not wait until the array is closed. + + + +Consider this script: + + + + file a[]; + file b[]; + foreach v,i in a { + b[i] = p(v); + } + a[0] = r(); + a[1] = s(); + + + +Initially, the foreach statement will have nothing to +execute, as the array a has not been assigned any values. +The procedures r and s will execute. +As soon as either of them is finished, the corresponding invocation of +procedure p will occur. After both r +and s have completed, the array a will +be closed since no other statements in the script make an assignment to +a. + + +
+ +
Compound procedures + +As with many other programming languages, procedures consisting of SwiftScript +code can be defined. These differ from the previously mentioned procedures +declared with the app keyword, as they invoke other +SwiftScript procedures rather than a component program. + + + + (file output) process (file input) { + file intermediate; + intermediate = first(input); + output = second(intermediate); + } + + file x <"x.txt">; + file y <"y.txt">; + y = process(x); + + + +This will invoke two procedures, with an intermediate data file named +anonymously connecting the first and +second procedures. + + + +Ordering of execution is generally determined by execution of +app procedures, not by any containing compound procedures. +In this code block: + + + + (file a, file b) A() { + a = A1(); + b = A2(); + } + file x, y, s, t; + (x,y) = A(); + s = S(x); + t = S(y); + + + +then a valid execution order is: A1 S(x) A2 S(y). The +compound procedure A does not have to have fully completed +for its return values to be used by subsequent statements. + + +
+ +
More about types + +Each variable and procedure parameter in SwiftScript is strongly typed. +Types are used to structure data, to aid in debugging and checking program +correctness and to influence how Swift interacts with data. + + + +The image type declared in previous examples is a +marker type. Marker types indicate that data for a +variable is stored in a single file with no further structure exposed at +the SwiftScript level. + + + +Arrays have been mentioned above, in the arrays section. A code block +may be applied to each element of an array using foreach; +or individual elements may be references using [] notation. + + +There are a number of primitive types: + + + + typecontains + + intintegers + stringstrings of text + floatfloating point numbers + booleantrue/false + + +
+ + +Complex types may be defined using the typekeyword: + + + type headerfile; + type voxelfile; + type volume { + headerfile h; + voxelfile v; + } + + + +Members of a complex type can be accessed using the . +operator: + + + + volume brain; + o = p(brain.h); + + + +Sometimes data may be stored in a form that does not fit with Swift's +file-and-site model; for example, data might be stored in an RDBMS on some +database server. In that case, a variable can be declared to have +extern type. This indicates that +Swift should use the variable to determine execution dependency, but should +not attempt other data management; for example, it will not perform any form +of data stage-in or stage-out it will not manage local data caches on sites; +and it will not enforce component program atomicity on data output. This can +add substantial responsibility to component programs, in exchange for allowing +arbitrary data storage and access methods to be plugged in to scripts. + + + + type file; + + app (extern o) populateDatabase() { + populationProgram; + } + + app (file o) analyseDatabase(extern i) { + analysisProgram @o; + } + + extern database; + file result <"results.txt">; + + database = populateDatabase(); + result = analyseDatabase(database); + + + +Some external database is represented by the database +variable. The populateDatabase procedure populates the +database with some data, and the analyseDatabase procedure +performs some subsequent analysis on that database. The declaration of +database contains no mapping; and the procedures which +use database do not reference them in any way; the +description of database is entirely outside of the script. +The single assignment and execution ordering rules will still apply though; +populateDatabase will always be run before +analyseDatabase. + + +
+
Data model Data processed by Swift is strongly typed. It may be take the form of values in memory or as out-of-core files on disk. Language constructs From noreply at svn.ci.uchicago.edu Wed Jan 28 08:21:40 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Wed, 28 Jan 2009 08:21:40 -0600 (CST) Subject: [Swift-commit] r2461 - trunk/docs Message-ID: <20090128142140.4EED02281A0@www.ci.uchicago.edu> Author: benc Date: 2009-01-28 08:21:39 -0600 (Wed, 28 Jan 2009) New Revision: 2461 Modified: trunk/docs/userguide.xml Log: old section on execution entirely superceded by hpdc import Modified: trunk/docs/userguide.xml =================================================================== --- trunk/docs/userguide.xml 2009-01-28 14:17:35 UTC (rev 2460) +++ trunk/docs/userguide.xml 2009-01-28 14:21:39 UTC (rev 2461) @@ -541,19 +541,6 @@
-
Execution order based on data dependencies - -Procedures in swift are by default executed in parallel. If five separate -procedures are invoked, Swift will attempt to run them all at once. - - -The main exception to this is when one procedure produces a dataset as an -output and another procedure uses that dataset as an input. In that case, -the second procedure will be executed after the first procedure has -produced the intermediate dataset. - -
-
Type System From noreply at svn.ci.uchicago.edu Wed Jan 28 08:24:10 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Wed, 28 Jan 2009 08:24:10 -0600 (CST) Subject: [Swift-commit] r2462 - trunk/docs Message-ID: <20090128142410.576F62281A0@www.ci.uchicago.edu> Author: benc Date: 2009-01-28 08:24:09 -0600 (Wed, 28 Jan 2009) New Revision: 2462 Modified: trunk/docs/userguide.xml Log: crosslink filesys mapper Modified: trunk/docs/userguide.xml =================================================================== --- trunk/docs/userguide.xml 2009-01-28 14:21:39 UTC (rev 2461) +++ trunk/docs/userguide.xml 2009-01-28 14:24:09 UTC (rev 2462) @@ -180,8 +180,8 @@ Arrays of values can be delcared using the [] suffix. An array be mapped to a collection of files, one element per file, by using a different form of mapping expression. For example, the -filesys_mapper maps all files matching a particular -unix glob pattern into an array: +filesys_mapper +maps all files matching a particular unix glob pattern into an array: From noreply at svn.ci.uchicago.edu Wed Jan 28 08:25:47 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Wed, 28 Jan 2009 08:25:47 -0600 (CST) Subject: [Swift-commit] r2463 - trunk/docs Message-ID: <20090128142547.633312281A0@www.ci.uchicago.edu> Author: benc Date: 2009-01-28 08:25:46 -0600 (Wed, 28 Jan 2009) New Revision: 2463 Modified: trunk/docs/userguide.xml Log: tidy up a few LaTeX cut-and-paste errors Modified: trunk/docs/userguide.xml =================================================================== --- trunk/docs/userguide.xml 2009-01-28 14:24:09 UTC (rev 2462) +++ trunk/docs/userguide.xml 2009-01-28 14:25:46 UTC (rev 2463) @@ -120,8 +120,8 @@ -The examples above have used the type \verb|image| with out any -definition of that type. We can declare it as a \emph{marker type} +The examples above have used the type image with out any +definition of that type. We can declare it as a marker type which has no structure exposed to SwiftScript: From noreply at svn.ci.uchicago.edu Wed Jan 28 08:26:44 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Wed, 28 Jan 2009 08:26:44 -0600 (CST) Subject: [Swift-commit] r2464 - trunk/docs Message-ID: <20090128142644.677732281A0@www.ci.uchicago.edu> Author: benc Date: 2009-01-28 08:26:43 -0600 (Wed, 28 Jan 2009) New Revision: 2464 Modified: trunk/docs/userguide.xml Log: typo Modified: trunk/docs/userguide.xml =================================================================== --- trunk/docs/userguide.xml 2009-01-28 14:25:46 UTC (rev 2463) +++ trunk/docs/userguide.xml 2009-01-28 14:26:43 UTC (rev 2464) @@ -177,7 +177,7 @@
Arrays and Parallel Execution -Arrays of values can be delcared using the [] suffix. An +Arrays of values can be declared using the [] suffix. An array be mapped to a collection of files, one element per file, by using a different form of mapping expression. For example, the filesys_mapper From noreply at svn.ci.uchicago.edu Wed Jan 28 08:37:28 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Wed, 28 Jan 2009 08:37:28 -0600 (CST) Subject: [Swift-commit] r2465 - trunk/docs Message-ID: <20090128143728.D1EFC22819E@www.ci.uchicago.edu> Author: benc Date: 2009-01-28 08:37:27 -0600 (Wed, 28 Jan 2009) New Revision: 2465 Modified: trunk/docs/userguide.xml Log: old kinds of types section removed - it has been superceded by hpdc draft import Modified: trunk/docs/userguide.xml =================================================================== --- trunk/docs/userguide.xml 2009-01-28 14:26:43 UTC (rev 2464) +++ trunk/docs/userguide.xml 2009-01-28 14:37:27 UTC (rev 2465) @@ -457,40 +457,7 @@ Data processed by Swift is strongly typed. It may be take the form of values in memory or as out-of-core files on disk. Language constructs called mappers specify how each piece of data is stored. -
- Data - -Data is represented in Swift by -DSHandles (Dataset handles). - - -In Swift, a DSHandle can represent data in one of three forms: - - - -an in-memory value, such as a string or integer. - - -a data file (on local disk or stored elsewhere on the internet). When a -DSHandle represents an on-disk data file, the name of that file is -provided by a mapper. - - -a container of other DSHandles - either an array or a defined type. -Such a DSHandles contains subordinate DSHandles; these may be nested -to arbitrary depth. - - - -The above three are mutually exclusive - for example, a data set -that is mapped to a file cannot have a value that can be used in a -SwiftScript expression; and a data item that is a -value cannot be passed into an application -executable as a data file using the @filename function. - -
-
Mappers When a DSHandle represents a data file (or container of datafiles), it is From noreply at svn.ci.uchicago.edu Wed Jan 28 08:50:23 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Wed, 28 Jan 2009 08:50:23 -0600 (CST) Subject: [Swift-commit] r2466 - trunk/docs Message-ID: <20090128145023.B3F1F2281A0@www.ci.uchicago.edu> Author: benc Date: 2009-01-28 08:50:22 -0600 (Wed, 28 Jan 2009) New Revision: 2466 Added: trunk/docs/type-hierarchy.fig trunk/docs/type-hierarchy.png Modified: trunk/docs/userguide.xml Log: diagram of type hierarchy Added: trunk/docs/type-hierarchy.fig =================================================================== --- trunk/docs/type-hierarchy.fig (rev 0) +++ trunk/docs/type-hierarchy.fig 2009-01-28 14:50:22 UTC (rev 2466) @@ -0,0 +1,29 @@ +#FIG 3.2 Produced by xfig version 3.2.5 +Landscape +Center +Inches +Letter +100.00 +Single +-2 +1200 2 +2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 4950 1350 4050 1800 +2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 5325 1350 6225 1800 +2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 3525 2100 3150 2850 +2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 3975 2100 4275 2775 +2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 6525 2175 6225 2775 +2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 6900 2175 7125 2775 +4 0 0 50 -1 0 12 0.0000 4 195 960 4575 1200 Swift types\001 +4 0 0 50 -1 0 12 0.0000 4 195 1275 2475 3075 Primitive types\001 +4 0 0 50 -1 0 12 0.0000 4 195 615 2475 3330 (eg int)\001 +4 0 0 50 -1 0 12 0.0000 4 195 1170 3975 3075 Marker types\001 +4 0 0 50 -1 0 12 0.0000 4 195 615 5925 3000 Arrays\001 +4 0 0 50 -1 0 12 0.0000 4 150 900 6900 3000 Structures\001 +4 0 0 50 -1 0 12 0.0000 4 195 1440 6225 2100 Composite types\001 +4 0 0 50 -1 0 12 0.0000 4 195 1155 3300 2025 Atomic types\001 Added: trunk/docs/type-hierarchy.png =================================================================== (Binary files differ) Property changes on: trunk/docs/type-hierarchy.png ___________________________________________________________________ Name: svn:mime-type + application/octet-stream Modified: trunk/docs/userguide.xml =================================================================== --- trunk/docs/userguide.xml 2009-01-28 14:37:27 UTC (rev 2465) +++ trunk/docs/userguide.xml 2009-01-28 14:50:22 UTC (rev 2466) @@ -61,7 +61,7 @@ of the same type. We often refer to instances of composites of mapped types as datasets. - + Mapped type and composite type variable declarations can be annotated with a mapping descriptor indicating the file(s) that make up From noreply at svn.ci.uchicago.edu Wed Jan 28 09:26:44 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Wed, 28 Jan 2009 09:26:44 -0600 (CST) Subject: [Swift-commit] r2467 - trunk/docs Message-ID: <20090128152644.B61132281F3@www.ci.uchicago.edu> Author: benc Date: 2009-01-28 09:26:43 -0600 (Wed, 28 Jan 2009) New Revision: 2467 Added: trunk/docs/userguide-rotated.jpeg trunk/docs/userguide-shane.jpeg Modified: trunk/docs/userguide.xml Log: example input and output for basic swift program Added: trunk/docs/userguide-rotated.jpeg =================================================================== (Binary files differ) Property changes on: trunk/docs/userguide-rotated.jpeg ___________________________________________________________________ Name: svn:mime-type + application/octet-stream Added: trunk/docs/userguide-shane.jpeg =================================================================== (Binary files differ) Property changes on: trunk/docs/userguide-shane.jpeg ___________________________________________________________________ Name: svn:mime-type + application/octet-stream Modified: trunk/docs/userguide.xml =================================================================== --- trunk/docs/userguide.xml 2009-01-28 14:50:22 UTC (rev 2466) +++ trunk/docs/userguide.xml 2009-01-28 15:26:43 UTC (rev 2467) @@ -172,7 +172,12 @@ user features such as remote multisite execution and fault tolerance that will be discussed in a later section. - +
shane.jpeg + +
+
rotated.jpeg + +
Arrays and Parallel Execution From noreply at svn.ci.uchicago.edu Thu Jan 29 06:06:50 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Thu, 29 Jan 2009 06:06:50 -0600 (CST) Subject: [Swift-commit] r2468 - trunk/docs Message-ID: <20090129120650.293F922819E@www.ci.uchicago.edu> Author: benc Date: 2009-01-29 06:06:49 -0600 (Thu, 29 Jan 2009) New Revision: 2468 Modified: trunk/docs/userguide.xml Log: remove old type section replaced by hpdc import Modified: trunk/docs/userguide.xml =================================================================== --- trunk/docs/userguide.xml 2009-01-28 15:26:43 UTC (rev 2467) +++ trunk/docs/userguide.xml 2009-01-29 12:06:49 UTC (rev 2468) @@ -513,42 +513,7 @@
-
- Type System - -The SwiftScript type system consists of a number of simple -types, marker types for files, -arrays, and complex types composed of these types. - - -The simple types are: string, float, int and boolean. - - - -Complex types are specified using the type -keyword. The syntax is similar to struct in C or class in Java. -For example, the below example declares a complex type with two -members, a string called name and an integer called age. - -type person { - string name; - int age; -} - - - - -When referring to files on disk, the internal structure of the -file is irrelevant; but it is still useful to declare types for those -files so that Swift can perform type-checking on SwiftScript programs. -In this case, a marker type can be declared, like this: - - -type binaryfile; - -
-
Variables Variables in SwiftScript are declared to be of a specific type. From noreply at svn.ci.uchicago.edu Thu Jan 29 06:24:03 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Thu, 29 Jan 2009 06:24:03 -0600 (CST) Subject: [Swift-commit] r2469 - trunk/docs Message-ID: <20090129122404.4401E22819E@www.ci.uchicago.edu> Author: benc Date: 2009-01-29 06:24:02 -0600 (Thu, 29 Jan 2009) New Revision: 2469 Modified: trunk/docs/userguide.xml Log: format operator information as a table Modified: trunk/docs/userguide.xml =================================================================== --- trunk/docs/userguide.xml 2009-01-29 12:06:49 UTC (rev 2468) +++ trunk/docs/userguide.xml 2009-01-29 12:24:02 UTC (rev 2469) @@ -799,17 +799,23 @@ The following infix operators are available for use in SwiftScript expressions. - + numeric addition; string concatenation - - numeric subtraction - * numeric multiplication - / floating point division - %/ integer division - %% integer remainder-of-division - == != comparison and not-comparison - < > <= >= - && || boolean and, or - ! boolean not - + + + operatorpurpose + + +numeric addition; string concatenation + -numeric subtraction + *numeric multiplication + /floating point division + %/integer division + %%integer remainder of division + == !=comparison and not-equal-to + < > <= >=numerical ordering + && ||boolean and, or + !boolean not + + +
From noreply at svn.ci.uchicago.edu Thu Jan 29 06:48:53 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Thu, 29 Jan 2009 06:48:53 -0600 (CST) Subject: [Swift-commit] r2470 - www/css Message-ID: <20090129124853.5DBE22281D8@www.ci.uchicago.edu> Author: benc Date: 2009-01-29 06:48:52 -0600 (Thu, 29 Jan 2009) New Revision: 2470 Modified: www/css/style1col.css Log: give h5 styling like h4 Modified: www/css/style1col.css =================================================================== --- www/css/style1col.css 2009-01-29 12:24:02 UTC (rev 2469) +++ www/css/style1col.css 2009-01-29 12:48:52 UTC (rev 2470) @@ -214,6 +214,20 @@ text-transform: uppercase; } +h5 { + margin: 0px; + font-size: 12px; + font-weight: bold; + color: #666; + letter-spacing: 1px; + padding-top: 5px; + padding-right: 5px; + padding-bottom: 5px; + padding-left: 20px; + font-family:"Trebuchet MS", Arial, sans-serif; + text-transform: uppercase; + +} #container { background-color: #FFFFFF; font-size: 10px; From noreply at svn.ci.uchicago.edu Thu Jan 29 07:57:57 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Thu, 29 Jan 2009 07:57:57 -0600 (CST) Subject: [Swift-commit] r2471 - trunk/docs Message-ID: <20090129135757.A72742281D8@www.ci.uchicago.edu> Author: benc Date: 2009-01-29 07:57:56 -0600 (Thu, 29 Jan 2009) New Revision: 2471 Modified: trunk/docs/userguide.xml Log: some rearranging and a bit of rephrasing Modified: trunk/docs/userguide.xml =================================================================== --- trunk/docs/userguide.xml 2009-01-29 12:48:52 UTC (rev 2470) +++ trunk/docs/userguide.xml 2009-01-29 13:57:56 UTC (rev 2471) @@ -514,7 +514,19 @@
+
+ Syntax +The syntax of SwiftScript has a superficial resemblance to C and +Java. For example, { and } characters are used to enclose blocks of +statements. + + +A SwiftScript program consists of a number of statements. +Statements may declare types, procedures and variables, assign values to +variables, and express operations over arrays. + +
Variables Variables in SwiftScript are declared to be of a specific type. Assignments to those variables must be data of that type. @@ -565,14 +577,10 @@ described in another section.
+
- -
- Procedures - -Datasets are operated on by procedures, which take input in the form of -mapped variables, perform computations, and produce typed data as output -that is again mapped to variables. +
Procedures + There are two kinds of procedure: An atomic procedure, which describes how an external program can be executed; and compound procedures which consist of a sequence of SwiftScript statements. @@ -581,40 +589,42 @@ A procedure declaration defines the name of a procedure and its input and output parameters. SwiftScript procedures can take multiple -inputs and produce multiple outputs. -Inputs are specified to the right of the function name where -outputs to the left. For instance: +inputs and produce multiple outputs. Inputs are specified to the right +of the function name, and outputs are specified to the left. For example: (type3 out1, type4 out2) myproc (type1 in1, type2 in2) -The above example declares a procedure called myproc, which -has two inputs in1 (of type type1) and in2 (of type type2) -and two outputs out1 (of type type3) and out2 (of type type4). +The above example declares a procedure called myproc, which +has two inputs in1 (of type type1) +and in2 (of type type2) +and two outputs out1 (of type type3) +and out2 (of type type4). A procedure input parameter can be an optional parameter in which case it must be declared with a default -value. When we call a -procedure, passing in the actual parameters, we allow both positional -parameter and named parameter passing, provided that all optional -parameters have to be declared after the required parameters and any -optional parameter has to be bound using keyword parameter passing. -So for instance if we declare a procedure myproc1: +value. When calling a procedure, both positional parameter and named +parameter passings can be passed, provided that all optional +parameters are declared after the required parameters and any +optional parameter is bound using keyword parameter passing. +For example, if myproc1 is defined as: (binaryfile bf) myproc1 (int i, string s="foo") -Then the procedure can be called like this +Then that procedure can be called like this, omitting the optional +parameter s: binaryfile mybf = myproc1(1); -or like this supplying the value for the optional parameter s: +or like this supplying a value for the optional parameter +s: binaryfile mybf = myproc1 (1, s="bar"); @@ -624,32 +634,29 @@
Atomic procedures -The body of an atomic procedure specifies how to invoke an -external executable program or Web Service, and how logical data -types are mapped to command line arguments. A complete specification -for myproc1 can be: +An atomic procedure specifies how to invoke an +external executable program, and how logical data +types are mapped to command line arguments. + + +Atomic procedures are defined with the app keyword: -app (binaryfile bf) myproc1 (int i, string s="foo") { - myapp1 i s @filename(bf); +app (binaryfile bf) myproc (int i, string s="foo") { + myapp i s @filename(bf); } -which specifies that myproc1 invokes an executable called myapp1, -passing the values of i, s and the file name of bf as command line arguments. -The @filename notation serves as a function denoting that the -argument should be mapped as a file name, and since the notation is -often required in invoking applications, a shorter syntax is defined where -we can omit the filename part and use the @ sign only. +which specifies that myproc invokes an executable +called myapp, +passing the values of i, s +and the filename of bf as command line arguments.
Compound procedures -A compound procedure contains a set of calls to other procedures. Shared -variables in the body of a compound procedure specify data dependencies -and thus the execution sequence of the procedure calls. For simple -illustration, we define a compound procedure in below: +A compound procedure contains a set of SwiftScript statements: (type2 b) foo_bar (type1 a) { @@ -659,41 +666,9 @@ } -
-
-
- Syntax, Statements -The syntax of SwiftScript has a superficial resemblance to C and -Java. For example, { and } characters are used to enclose blocks of -statements. - - -A SwiftScript program consists of a number of statements. -Statements may declare types, procedures and variables, assign values to -variables, and express operations over arrays. - -
-
- Multivalued procedure invocation statements - -Procedures can return more than one value. In such case, the previously -mentioned declaration and assignment statements are insufficient. A -multi-valued procedure invocation can be used. This has the general -form: - -'(' ((type)? variableName ( '=' binding ))+ ')' = procedureinvocation - - -Variables can be either declared (if a type is included) or assigned (if -a type is not included). If no bindings are specified, then variables -are assigned in the same order that they are specified in the -procedure declaration. If bindings are specified, then variables are -assigned to the named return parameter. - -
- +
Control Constructs @@ -793,6 +768,7 @@
+
Operators @@ -818,8 +794,7 @@
- - +
Mappers From noreply at svn.ci.uchicago.edu Thu Jan 29 08:07:25 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Thu, 29 Jan 2009 08:07:25 -0600 (CST) Subject: [Swift-commit] r2472 - trunk/docs Message-ID: <20090129140726.06D512281D8@www.ci.uchicago.edu> Author: benc Date: 2009-01-29 08:07:25 -0600 (Thu, 29 Jan 2009) New Revision: 2472 Modified: trunk/docs/userguide.xml Log: formatting of control constructs section Modified: trunk/docs/userguide.xml =================================================================== --- trunk/docs/userguide.xml 2009-01-29 13:57:56 UTC (rev 2471) +++ trunk/docs/userguide.xml 2009-01-29 14:07:25 UTC (rev 2472) @@ -672,13 +672,14 @@
Control Constructs -SwiftScript provides if, switch, foreach, and while constructs, +SwiftScript provides if, switch, +foreach, and iterate constructs, with syntax and semantics similar to comparable constructs in other high-level languages.
foreach -The foreach construct is used to apply a block of statements to +The foreach construct is used to apply a block of statements to each element in an array. For example: @@ -690,7 +691,7 @@ -foreach statements have the general form: +foreach statements have the general form: foreach controlvariable (,index) in expression { @@ -698,8 +699,10 @@ } -The block of statements is evaluated once for each element in 'expression', -with controlvariable set to the corresponding element and index set to the +The block of statements is evaluated once for each element in +expression which must be an array, +with controlvariable set to the corresponding element +and index (if specified) set to the integer position in the array that is being iterated over. @@ -707,8 +710,8 @@
if -The 'if' statement allows one of two blocks of statements to be -executed, based on a boolean predicate. 'if' statements generally +The if statement allows one of two blocks of statements to be +executed, based on a boolean predicate. if statements generally have the form: if(predicate) { @@ -718,14 +721,14 @@ } -where predicate is a boolean expression. +where predicate is a boolean expression.
switch -Switch expressions allow one of a selection of blocks to be chosen based on -the value of a numerical control expression. Switch statements take the +switch expressions allow one of a selection of blocks to be chosen based on +the value of a numerical control expression. switch statements take the general form: switch(controlExpression) { @@ -738,20 +741,21 @@ statements } -The control expression is evaluated and the resulting numerical value used to -select a corresponding case, and the statements belonging to that case +The control expression is evaluated, the resulting numerical value used to +select a corresponding case, and the statements belonging to that +case block are evaluated. If no case corresponds, then the statements belonging to -the default block are evaluated. +the default block are evaluated. Unlike C or Java switch statements, execution does not fall through to -subsequent case blocks, and no break statement is necessary at the end -of each block. +subsequent case blocks, and no break +statement is necessary at the end of each block.
iterate -Iterate expressions allow a block of code to be evaluated repeatedly, with an +iterate expressions allow a block of code to be evaluated repeatedly, with an integer parameter sweeping upwards from 0 until a termination condition holds. @@ -762,7 +766,8 @@ statements; } until (terminationExpression); -with the variable var starting at 0 and increasing each iteration. That +with the variable var starting at 0 and increasing +by one in each iteration. That variable is in scope in the statements block and when evaluating the termination expression. From noreply at svn.ci.uchicago.edu Thu Jan 29 08:09:05 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Thu, 29 Jan 2009 08:09:05 -0600 (CST) Subject: [Swift-commit] r2473 - trunk/docs Message-ID: <20090129140905.71A3822819E@www.ci.uchicago.edu> Author: benc Date: 2009-01-29 08:09:05 -0600 (Thu, 29 Jan 2009) New Revision: 2473 Modified: trunk/docs/userguide.xml Log: compound procedure example was incorrectly tagged Modified: trunk/docs/userguide.xml =================================================================== --- trunk/docs/userguide.xml 2009-01-29 14:07:25 UTC (rev 2472) +++ trunk/docs/userguide.xml 2009-01-29 14:09:05 UTC (rev 2473) @@ -313,7 +313,7 @@ SwiftScript procedures rather than a component program. - + (file output) process (file input) { file intermediate; intermediate = first(input); @@ -323,7 +323,7 @@ file x <"x.txt">; file y <"y.txt">; y = process(x); - + This will invoke two procedures, with an intermediate data file named From noreply at svn.ci.uchicago.edu Thu Jan 29 08:26:50 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Thu, 29 Jan 2009 08:26:50 -0600 (CST) Subject: [Swift-commit] r2474 - trunk/src/org/griphyn/vdl/engine Message-ID: <20090129142650.8C5FB22819E@www.ci.uchicago.edu> Author: benc Date: 2009-01-29 08:26:47 -0600 (Thu, 29 Jan 2009) New Revision: 2474 Modified: trunk/src/org/griphyn/vdl/engine/Karajan.java Log: fix null pointer exception when trying to typecheck an @function that does not exist - now give a proper compilation error Modified: trunk/src/org/griphyn/vdl/engine/Karajan.java =================================================================== --- trunk/src/org/griphyn/vdl/engine/Karajan.java 2009-01-29 14:09:05 UTC (rev 2473) +++ trunk/src/org/griphyn/vdl/engine/Karajan.java 2009-01-29 14:26:47 UTC (rev 2474) @@ -808,7 +808,9 @@ StringTemplate funcST = template("function"); funcST.setAttribute("name", func.getName()); ProcedureSignature funcSignature = (ProcedureSignature) functionsMap.get(func.getName()); - + if(funcSignature == null) { + throw new CompilationException("Unknown function "+func.getName()); + } XmlObject[] arguments = func.getAbstractExpressionArray(); int noOfOptInArgs = 0; for (int i = 0; i < funcSignature.sizeOfInputArray(); i++) { From noreply at svn.ci.uchicago.edu Thu Jan 29 08:29:25 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Thu, 29 Jan 2009 08:29:25 -0600 (CST) Subject: [Swift-commit] r2475 - trunk/docs Message-ID: <20090129142925.743CB2281D8@www.ci.uchicago.edu> Author: benc Date: 2009-01-29 08:29:24 -0600 (Thu, 29 Jan 2009) New Revision: 2475 Modified: trunk/docs/userguide.xml Log: missing space Modified: trunk/docs/userguide.xml =================================================================== --- trunk/docs/userguide.xml 2009-01-29 14:26:47 UTC (rev 2474) +++ trunk/docs/userguide.xml 2009-01-29 14:29:24 UTC (rev 2475) @@ -391,7 +391,7 @@ -Complex types may be defined using the typekeyword: +Complex types may be defined using the type keyword: type headerfile; From noreply at svn.ci.uchicago.edu Thu Jan 29 08:39:26 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Thu, 29 Jan 2009 08:39:26 -0600 (CST) Subject: [Swift-commit] r2477 - trunk/docs Message-ID: <20090129143927.11EFF22814F@www.ci.uchicago.edu> Author: benc Date: 2009-01-29 08:39:26 -0600 (Thu, 29 Jan 2009) New Revision: 2477 Modified: trunk/docs/userguide.xml Log: move clustering and coasters next to each other Modified: trunk/docs/userguide.xml =================================================================== --- trunk/docs/userguide.xml 2009-01-29 14:35:59 UTC (rev 2476) +++ trunk/docs/userguide.xml 2009-01-29 14:39:26 UTC (rev 2477) @@ -2759,52 +2759,6 @@ site-shared filesystem directly.
- -
Clustering - -Swift can group a number of short job submissions into a single larger -job submission to minimize overhead involved in launching jobs (for example, -caused by security negotiation and queuing delay). In general, -CoG coasters should be used in preference -to the clustering mechanism documented in this section. - - - -By default, clustering is disabled. It can be activated by setting the -clustering.enabled -property to true. - - - -A job is eligible for clustering if -the GLOBUS::maxwalltime profile is specified in the tc.data entry for that job, and its value is -less than the value of the -clustering.min.time -property. - - - -Two or more jobs are considered compatible if they share the same site -and do not have conflicting profiles (e.g. different values for the same -environment variable). - - - -When a submitted job is eligible for clustering, -it will be put in a clustering queue rather than being submitted to -a remote site. The clustering queue is processed at intervals -specified by the -clustering.queue.delay -property. The processing of the clustering queue consists of selecting -compatible jobs and grouping them into clusters whose maximum wall time does -not exceed twice the value of the clustering.min.time -property. - - - -
-
How-To Tips for Specific User Communities
Saving Logs - for UChicago CI Users @@ -3151,6 +3105,54 @@
+ +
Clustering + +Swift can group a number of short job submissions into a single larger +job submission to minimize overhead involved in launching jobs (for example, +caused by security negotiation and queuing delay). In general, +CoG coasters should be used in preference +to the clustering mechanism documented in this section. + + + +By default, clustering is disabled. It can be activated by setting the +clustering.enabled +property to true. + + + +A job is eligible for clustering if +the GLOBUS::maxwalltime profile is specified in the tc.data entry for that job, and its value is +less than the value of the +clustering.min.time +property. + + + +Two or more jobs are considered compatible if they share the same site +and do not have conflicting profiles (e.g. different values for the same +environment variable). + + + +When a submitted job is eligible for clustering, +it will be put in a clustering queue rather than being submitted to +a remote site. The clustering queue is processed at intervals +specified by the +clustering.queue.delay +property. The processing of the clustering queue consists of selecting +compatible jobs and grouping them into clusters whose maximum wall time does +not exceed twice the value of the clustering.min.time +property. + + + +
+ + +
Coasters Coasters were introduced in Swift v0.6 as an experimental feature. From noreply at svn.ci.uchicago.edu Thu Jan 29 08:46:26 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Thu, 29 Jan 2009 08:46:26 -0600 (CST) Subject: [Swift-commit] r2478 - trunk/docs Message-ID: <20090129144626.DA2F52281D8@www.ci.uchicago.edu> Author: benc Date: 2009-01-29 08:46:26 -0600 (Thu, 29 Jan 2009) New Revision: 2478 Modified: trunk/docs/userguide.xml Log: move kickstart and restart to just before clusters/coasters - these are all "fancy execution time features" Modified: trunk/docs/userguide.xml =================================================================== --- trunk/docs/userguide.xml 2009-01-29 14:39:26 UTC (rev 2477) +++ trunk/docs/userguide.xml 2009-01-29 14:46:26 UTC (rev 2478) @@ -1465,97 +1465,6 @@
- -
Kickstart - - -Kickstart is a tool that can be used to gather various information -about the remote execution environment for each job that Swift tries -to run. - - - -For each job, Kickstart generates an XML invocation -record. By default this record is staged back to the submit -host if the job fails. - - - -Before it can be used it must be installed on the remote site and -the sites file must be configured to point to kickstart. - - - -Kickstart can be downloaded as part of the Pegasus 'worker package' available -from the worker packages section of the Pegasus download page. - - -Untar the relevant worker package somewhere where it is visible to all of the -worker nodes on the remote execution machine (such as in a shared application -filesystem). - - -Now configure the gridlaunch attribute of the sites catalog -to point to that path, by adding a gridlaunch -attribute to the pool element in the site -catalog: - - - -<pool handle="example" gridlaunch="/usr/local/bin/kickstart" sysinfo="INTEL32::LINUX"> -[...] -</pool> - - - - - - -There are various kickstat.* properties, which have sensible default -values. These are documented in the -properties section. - - - - -
- -
Workflow restart/recovery - -If a run fails, Swift can resume the program from the point of -failure. When a run fails, a restart log file will be left behind in -a file named using the unique job ID and a .rlog extension. This restart log -can then be passed to a subsequent Swift invocation using the -resume -parameter. Swift will resume execution, avoiding execution of invocations -that have previously completed successfully. The SwiftScript source file -and input data files should not be modified between runs. - - -Every run creates a restart -log file with a named composed of the file name of the workflow -being executed, an invocation ID, a numeric ID, and the .rlog extension. For example, example.swift, when executed, could produce -the following restart log file: example-ht0adgi315l61.0.rlog. Normally, if -the run completes successfully, the restart log file is -deleted. If however the workflow fails, swift -can use the restart log file to continue -execution from a point before the -failure occurred. In order to restart from a restart log -file, the argument can be -used after the SwiftScript program file name. Example: - - -> swift . - - - -
-
Invoking an application from Swift There are certain requirements on the behaviour of application programs @@ -3106,6 +3015,97 @@
+
Kickstart + + +Kickstart is a tool that can be used to gather various information +about the remote execution environment for each job that Swift tries +to run. + + + +For each job, Kickstart generates an XML invocation +record. By default this record is staged back to the submit +host if the job fails. + + + +Before it can be used it must be installed on the remote site and +the sites file must be configured to point to kickstart. + + + +Kickstart can be downloaded as part of the Pegasus 'worker package' available +from the worker packages section of the Pegasus download page. + + +Untar the relevant worker package somewhere where it is visible to all of the +worker nodes on the remote execution machine (such as in a shared application +filesystem). + + +Now configure the gridlaunch attribute of the sites catalog +to point to that path, by adding a gridlaunch +attribute to the pool element in the site +catalog: + + + +<pool handle="example" gridlaunch="/usr/local/bin/kickstart" sysinfo="INTEL32::LINUX"> +[...] +</pool> + + + + + + +There are various kickstat.* properties, which have sensible default +values. These are documented in the +properties section. + + + + +
+ +
Workflow restart/recovery + +If a run fails, Swift can resume the program from the point of +failure. When a run fails, a restart log file will be left behind in +a file named using the unique job ID and a .rlog extension. This restart log +can then be passed to a subsequent Swift invocation using the -resume +parameter. Swift will resume execution, avoiding execution of invocations +that have previously completed successfully. The SwiftScript source file +and input data files should not be modified between runs. + + +Every run creates a restart +log file with a named composed of the file name of the workflow +being executed, an invocation ID, a numeric ID, and the .rlog extension. For example, example.swift, when executed, could produce +the following restart log file: example-ht0adgi315l61.0.rlog. Normally, if +the run completes successfully, the restart log file is +deleted. If however the workflow fails, swift +can use the restart log file to continue +execution from a point before the +failure occurred. In order to restart from a restart log +file, the argument can be +used after the SwiftScript program file name. Example: + + +> swift . + + + +
+ +
Clustering Swift can group a number of short job submissions into a single larger From noreply at svn.ci.uchicago.edu Thu Jan 29 08:53:29 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Thu, 29 Jan 2009 08:53:29 -0600 (CST) Subject: [Swift-commit] r2479 - trunk/docs Message-ID: <20090129145329.762F722819E@www.ci.uchicago.edu> Author: benc Date: 2009-01-29 08:53:28 -0600 (Thu, 29 Jan 2009) New Revision: 2479 Modified: trunk/docs/userguide.xml Log: community-specific howtos to the very end Modified: trunk/docs/userguide.xml =================================================================== --- trunk/docs/userguide.xml 2009-01-29 14:46:26 UTC (rev 2478) +++ trunk/docs/userguide.xml 2009-01-29 14:53:28 UTC (rev 2479) @@ -2668,79 +2668,6 @@ site-shared filesystem directly.
-
How-To Tips for Specific User Communities -
Saving Logs - for UChicago CI Users - -If you have a UChicago Computation Institute account, run this command in your -submit directory after each run. It will copy all your logs and kickstart -records into a directory at the CI for reporting, usage tracking, support and debugging. - - - -rsync --ignore-existing *.log *.d login.ci.uchicago.edu:/home/benc/swift-logs/ --verbose - - -
-
Specifying TeraGrid allocations -TeraGrid users with no default project or with several project -allocations can specify a project allocation using a profile key in -the site catalog entry for a TeraGrid site: - -<profile namespace="globus" key="project">TG-CCR080002N</profile> - - - - -More information on the TeraGrid allocations process can -be found here. - - -
-
Launching MPI jobs from Swift - -Here is an example of running a simple MPI program. - - -In SwiftScript, we make an invocation that does not look any different -from any other invocation. In the below code, we do not have any input -files, and have two output files on stdout and stderr: - -type file; - -(file o, file e) p() { - app { - mpi stdout=@filename(o) stderr=@filename(e); - } -} - -file mpiout <"mpi.out">; -file mpierr <"mpi.err">; - -(mpiout, mpierr) = p(); - - - -Now we define how 'mpi' will run in tc.data: - -tguc mpi /home/benc/mpi/mpi.sh INSTALLED INTEL32::LINUX GLOBUS::host_xcount=3 - - - -mpi.sh is a wrapper script that launches the MPI program. It must be installed -on the remote site: - -#!/bin/bash -mpirun -np 3 -machinefile $PBS_NODEFILE /home/benc/mpi/a.out - - - -Because of the way that Swift runs its server side code, provider-specific -MPI modes (such as GRAM jobType=mpi) should not be used. Instead, the -mpirun command should be explicitly invoked. - -
-
-
The Site Catalog - sites.xml The site catalog lists details of each site that Swift can use. The default @@ -3213,6 +3140,79 @@ endpoint should be specified here rather than a GridFTP endpoint.
+
How-To Tips for Specific User Communities +
Saving Logs - for UChicago CI Users + +If you have a UChicago Computation Institute account, run this command in your +submit directory after each run. It will copy all your logs and kickstart +records into a directory at the CI for reporting, usage tracking, support and debugging. + + + +rsync --ignore-existing *.log *.d login.ci.uchicago.edu:/home/benc/swift-logs/ --verbose + + +
+
Specifying TeraGrid allocations +TeraGrid users with no default project or with several project +allocations can specify a project allocation using a profile key in +the site catalog entry for a TeraGrid site: + +<profile namespace="globus" key="project">TG-CCR080002N</profile> + + + +More information on the TeraGrid allocations process can +be found here. + + +
+
Launching MPI jobs from Swift + +Here is an example of running a simple MPI program. + + +In SwiftScript, we make an invocation that does not look any different +from any other invocation. In the below code, we do not have any input +files, and have two output files on stdout and stderr: + +type file; + +(file o, file e) p() { + app { + mpi stdout=@filename(o) stderr=@filename(e); + } +} + +file mpiout <"mpi.out">; +file mpierr <"mpi.err">; + +(mpiout, mpierr) = p(); + + + +Now we define how 'mpi' will run in tc.data: + +tguc mpi /home/benc/mpi/mpi.sh INSTALLED INTEL32::LINUX GLOBUS::host_xcount=3 + + + +mpi.sh is a wrapper script that launches the MPI program. It must be installed +on the remote site: + +#!/bin/bash +mpirun -np 3 -machinefile $PBS_NODEFILE /home/benc/mpi/a.out + + + +Because of the way that Swift runs its server side code, provider-specific +MPI modes (such as GRAM jobType=mpi) should not be used. Instead, the +mpirun command should be explicitly invoked. + +
+
+ + From noreply at svn.ci.uchicago.edu Thu Jan 29 09:15:47 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Thu, 29 Jan 2009 09:15:47 -0600 (CST) Subject: [Swift-commit] r2480 - trunk/docs Message-ID: <20090129151547.3A29122819E@www.ci.uchicago.edu> Author: benc Date: 2009-01-29 09:15:46 -0600 (Thu, 29 Jan 2009) New Revision: 2480 Modified: trunk/docs/userguide.xml Log: formatting of site catalog doc Modified: trunk/docs/userguide.xml =================================================================== --- trunk/docs/userguide.xml 2009-01-29 14:53:28 UTC (rev 2479) +++ trunk/docs/userguide.xml 2009-01-29 15:15:46 UTC (rev 2480) @@ -2677,29 +2677,29 @@ By default, the site catalog is stored in etc/sites.xml. -This path can be overridden with the sites.file configuration property, +This path can be overridden with the sites.file configuration property, either in the Swift configuration file or on the command line. -The sites file is formatted as XML. It consists of <pool> elements, +The sites file is formatted as XML. It consists of <pool> elements, one for each site that Swift will use.
Pool element -Each pool element must have a handle attribute, giving a symbolic name +Each pool element must have a handle attribute, giving a symbolic name for the site. This can be any name, but must correspond to entries for that site in the transformation catalog. -Optionally, the gridlaunch attribute can be used to specify the path to +Optionally, the gridlaunch attribute can be used to specify the path to kickstart on the site. -Each pool must specify a file transfer method, an execution method +Each pool must specify a file transfer method, an execution method and a remote working directory. Optionally, profile settings can be specified. @@ -2707,23 +2707,24 @@
File transfer method specification -Transfer methods are specified with the <gridftp> element or -with the <filesystem> element. +Transfer methods are specified with either +the <gridftp> element or the +<filesystem> element. -To use gridftp or local filesystem copy, use the <gridftp> +To use gridftp or local filesystem copy, use the <gridftp> element: <gridftp url="gsiftp://evitable.ci.uchicago.edu" /> -The URL attribute may specify a GridFTP server, using the gsiftp URI scheme; +The url attribute may specify a GridFTP server, using the gsiftp URI scheme; or it may specify that filesystem copying will be used (which assumes that the site has access to the same filesystem as the submitting machine) using -the URI local://localhost. +the URI local://localhost. Filesystem access using scp (the SSH copy protocol) can be specified using the -<filesystem> element: +<filesystem> element: <filesystem url="www11.i2u2.org" provider="ssh"/> @@ -2732,7 +2733,7 @@ Filesystem access using CoG coasters can be -also be specified using the <filesystem> element. More detail about +also be specified using the <filesystem> element. More detail about configuring that can be found in the CoG coasters section. @@ -2741,52 +2742,55 @@
Execution method specification -Execution methods may be specified either with a <jobmanager> or -<execution> element. +Execution methods may be specified either with the <jobmanager> +or <execution> element. -The <jobmanager> element can be used to specify execution through -GRAM2. For example, +The <jobmanager> element can be used to specify +execution through GRAM2. For example, <jobmanager universe="vanilla" url="evitable.ci.uchicago.edu/jobmanager-fork" major="2" /> -The universe attribute should always be set to vanilla. The url attribute +The universe attribute should always be set to vanilla. The +url attribute should specify the name of the GRAM2 gatekeeper host, and the name of the jobmanager to use. The major attribute should always be set to 2. -The <execution> element can be used to specify execution through -other execution providers: +The <execution> element can be used to specify +execution through other execution providers: -To use GRAM4, specify the gt4 provider. For example: +To use GRAM4, specify the gt4 provider. For example: <execution provider="gt4" jobmanager="PBS" url="tg-grid.uc.teragrid.org" /> -The url attribute should specify the GRAM4 submission site. The jobmanager +The url attribute should specify the GRAM4 submission site. +The jobmanager attribute should specify which GRAM4 jobmanager will be used. -For local execution, the local provider should be used, like this: +For local execution, the local provider should be used, +like this: <execution provider="local" url="none" /> -For PBS execution, the pbs provider should be used: +For PBS execution, the pbs provider should be used: <execution provider="pbs" url="none" /> -The GLOBUS::queue profile key +The GLOBUS::queue profile key can be used to specify which PBS queue jobs will be submitted to. -For execution through SSH, the ssh provider should be used: +For execution through SSH, the ssh provider should be used: <execution url="www11.i2u2.org" provider="ssh"/> @@ -2801,7 +2805,7 @@ For execution using the -CoG Coaster mechanism, the coaster provider +CoG Coaster mechanism, the coaster provider should be used: <execution provider="coaster" url="tg-grid.uc.teragrid.org" @@ -2814,13 +2818,13 @@
Other site catalog parameters -The workdirectory element specifies where on the site files can be +The workdirectory element specifies where on the site files can be stored. <workdirectory>/home/benc</workdirectory> This file must be accessible through the transfer mechanism specified -in the <gridftp> element and also mounted on all worker nodes that +in the <gridftp> element and also mounted on all worker nodes that will be used for execution. A shared cluster scratch filesystem is appropriate for this. From noreply at svn.ci.uchicago.edu Thu Jan 29 09:38:46 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Thu, 29 Jan 2009 09:38:46 -0600 (CST) Subject: [Swift-commit] r2481 - trunk/docs Message-ID: <20090129153846.ACE7022819E@www.ci.uchicago.edu> Author: benc Date: 2009-01-29 09:38:45 -0600 (Thu, 29 Jan 2009) New Revision: 2481 Modified: trunk/docs/userguide.xml Log: some more formatting Modified: trunk/docs/userguide.xml =================================================================== --- trunk/docs/userguide.xml 2009-01-29 15:15:46 UTC (rev 2480) +++ trunk/docs/userguide.xml 2009-01-29 15:38:45 UTC (rev 2481) @@ -1289,18 +1289,18 @@ The output of the executable should consist of two columns of data, separated by a space. The first column should be the path of the mapped variable, -in SwiftScript syntax (for example [2] means the 2nd element of an -array) or the symbol $ to represent the root of the mapped variable. +in SwiftScript syntax (for example [2] means the 2nd element of an +array) or the symbol $ to represent the root of the mapped variable. Example: -With the following in mapper.sh, - +With the following in mapper.sh, + #!/bin/bash echo "[2] qux" echo "[0] foo" echo "[1] bar" - + then a mapping statement: @@ -1814,14 +1814,14 @@ Swift properties are specified in the following format: - + <name>=<value> - + The value can contain variables which will be expanded when the properties file is read. Expansion is performed when the name of - the variable is used inside the "standard" shell dereference - construct: ${name}. The following variables + the variable is used inside the standard shell dereference + construct: ${name}. The following variables can be used in the Swift configuration file: @@ -2023,7 +2023,7 @@ swift:storagesize profile entry in the sites.xml file. Example: - + <pool handle="example" sysinfo="INTEL32::LINUX"> <gridftp url="gsiftp://example.org" storage="/scratch/swift" major="2" minor="4" patch="3"/> @@ -2032,7 +2032,7 @@ <profile namespace="SWIFT" key="storagesize">20000000</profile> </pool> - + The decision of which files to keep in the cache @@ -2104,10 +2104,10 @@ url="http://www.graphviz.org">Graphviz, for example with a command-line such as: - - swift -pgraph graph1.dot q1.swift - dot -ograph.png -Tpng graph1.dot - + +$ swift -pgraph graph1.dot q1.swift +$ dot -ograph.png -Tpng graph1.dot + @@ -2538,11 +2538,11 @@ Example: - + sites.file=${vds.home}/etc/sites.xml tc.file=${vds.home}/etc/tc.data ip.address=192.168.0.1 - + @@ -2604,7 +2604,7 @@ GRAM2 and GRAM4 providers. condor_requirements allows a requirements string to be specified -when Condor is used as an LRM behind GRAM2. Example: <profile namespace="globus" key="condor_requirements">Arch == "X86_64" || Arch="INTEL"</profile> +when Condor is used as an LRM behind GRAM2. Example: <profile namespace="globus" key="condor_requirements">Arch == "X86_64" || Arch="INTEL"</profile> coastersPerNode specifies the number of coaster workers to be run on each node. This profile entry @@ -2980,13 +2980,13 @@ attribute to the pool element in the site catalog: - + <pool handle="example" gridlaunch="/usr/local/bin/kickstart" sysinfo="INTEL32::LINUX"> [...] </pool> - + @@ -3197,17 +3197,17 @@ Now we define how 'mpi' will run in tc.data: - + tguc mpi /home/benc/mpi/mpi.sh INSTALLED INTEL32::LINUX GLOBUS::host_xcount=3 - + mpi.sh is a wrapper script that launches the MPI program. It must be installed on the remote site: - + #!/bin/bash mpirun -np 3 -machinefile $PBS_NODEFILE /home/benc/mpi/a.out - + Because of the way that Swift runs its server side code, provider-specific From noreply at svn.ci.uchicago.edu Thu Jan 29 09:40:38 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Thu, 29 Jan 2009 09:40:38 -0600 (CST) Subject: [Swift-commit] r2482 - trunk/docs Message-ID: <20090129154038.F23DB22814F@www.ci.uchicago.edu> Author: benc Date: 2009-01-29 09:40:38 -0600 (Thu, 29 Jan 2009) New Revision: 2482 Modified: trunk/docs/userguide.xml Log: formatting of tc.data section Modified: trunk/docs/userguide.xml =================================================================== --- trunk/docs/userguide.xml 2009-01-29 15:38:45 UTC (rev 2481) +++ trunk/docs/userguide.xml 2009-01-29 15:40:38 UTC (rev 2482) @@ -2852,7 +2852,7 @@ By default, the site catalog is stored in etc/tc.data. -This path can be overridden with the tc.file configuration property, +This path can be overridden with the tc.file configuration property, either in the Swift configuration file or on the command line. @@ -2874,7 +2874,7 @@ catalog. The transformation name should correspond to the transformation name -used in a SwiftScript app {} block. +used in a SwiftScript app procedure. The executable path should specify where the particular executable is @@ -2882,10 +2882,10 @@ The installation status and platform fields are not used. Set them to -INSTALLED and INTEL32::LINUX respectively. +INSTALLED and INTEL32::LINUX respectively. -The profiles field should be set to 'null' if no profile entries are to be +The profiles field should be set to null if no profile entries are to be specified, or should contain the profile entries separated by semicolons.
From noreply at svn.ci.uchicago.edu Thu Jan 29 10:56:04 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Thu, 29 Jan 2009 10:56:04 -0600 (CST) Subject: [Swift-commit] r2483 - trunk/src/org/griphyn/vdl/engine Message-ID: <20090129165605.115282281A0@www.ci.uchicago.edu> Author: benc Date: 2009-01-29 10:56:02 -0600 (Thu, 29 Jan 2009) New Revision: 2483 Modified: trunk/src/org/griphyn/vdl/engine/Karajan.java Log: change error message format Modified: trunk/src/org/griphyn/vdl/engine/Karajan.java =================================================================== --- trunk/src/org/griphyn/vdl/engine/Karajan.java 2009-01-29 15:40:38 UTC (rev 2482) +++ trunk/src/org/griphyn/vdl/engine/Karajan.java 2009-01-29 16:56:02 UTC (rev 2483) @@ -809,7 +809,7 @@ funcST.setAttribute("name", func.getName()); ProcedureSignature funcSignature = (ProcedureSignature) functionsMap.get(func.getName()); if(funcSignature == null) { - throw new CompilationException("Unknown function "+func.getName()); + throw new CompilationException("Unknown function: @"+func.getName()); } XmlObject[] arguments = func.getAbstractExpressionArray(); int noOfOptInArgs = 0; From noreply at svn.ci.uchicago.edu Thu Jan 29 12:24:39 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Thu, 29 Jan 2009 12:24:39 -0600 (CST) Subject: [Swift-commit] r2484 - in trunk: docs libexec src/org/griphyn/vdl/engine src/org/griphyn/vdl/karajan/lib/swiftscript tests/language-behaviour Message-ID: <20090129182440.332352281FB@www.ci.uchicago.edu> Author: hategan Date: 2009-01-29 12:24:39 -0600 (Thu, 29 Jan 2009) New Revision: 2484 Added: trunk/tests/language-behaviour/0054-strsplit.out.expected trunk/tests/language-behaviour/0054-strsplit.swift Modified: trunk/docs/userguide.xml trunk/libexec/vdl-lib.xml trunk/src/org/griphyn/vdl/engine/ProcedureSignature.java trunk/src/org/griphyn/vdl/karajan/lib/swiftscript/Misc.java Log: added strsplit Modified: trunk/docs/userguide.xml =================================================================== --- trunk/docs/userguide.xml 2009-01-29 16:56:02 UTC (rev 2483) +++ trunk/docs/userguide.xml 2009-01-29 18:24:39 UTC (rev 2484) @@ -1682,6 +1682,32 @@ will output the message 'Your name is John'.
+ +
@strsplit + + at strsplit(input,pattern) will split the input string based on separators +that match the given pattern and return a string array. + + +Example: + + + +string t = "my name is John and i like puppies."; +string words[] = @strsplit(t, "\\s"); +foreach word in words { + print(word); +} + + + +will output one word of the sentence on each line (though +not necessarily in order, due to the fact that foreach +iterations execute in parallel). + +
+ +
@toint @toint(input) will parse its input string into an integer. This can be Modified: trunk/libexec/vdl-lib.xml =================================================================== --- trunk/libexec/vdl-lib.xml 2009-01-29 16:56:02 UTC (rev 2483) +++ trunk/libexec/vdl-lib.xml 2009-01-29 18:24:39 UTC (rev 2484) @@ -7,6 +7,7 @@ + Modified: trunk/src/org/griphyn/vdl/engine/ProcedureSignature.java =================================================================== --- trunk/src/org/griphyn/vdl/engine/ProcedureSignature.java 2009-01-29 16:56:02 UTC (rev 2483) +++ trunk/src/org/griphyn/vdl/engine/ProcedureSignature.java 2009-01-29 18:24:39 UTC (rev 2484) @@ -182,6 +182,15 @@ strcut.addOutputArg(strcutOut1); functionsMap.put(strcut.getName(), strcut); + ProcedureSignature strsplit = new ProcedureSignature("strsplit"); + FormalArgumentSignature strsplitIn1 = new FormalArgumentSignature("string"); + strsplit.addInputArg(strsplitIn1); + FormalArgumentSignature strsplitIn2 = new FormalArgumentSignature("string"); + strsplit.addInputArg(strsplitIn2); + FormalArgumentSignature strsplitOut1 = new FormalArgumentSignature("string[]"); + strsplit.addOutputArg(strsplitOut1); + functionsMap.put(strsplit.getName(), strsplit); + ProcedureSignature toint = new ProcedureSignature("toint"); FormalArgumentSignature tointIn1 = new FormalArgumentSignature("string"); toint.addInputArg(tointIn1); Modified: trunk/src/org/griphyn/vdl/karajan/lib/swiftscript/Misc.java =================================================================== --- trunk/src/org/griphyn/vdl/karajan/lib/swiftscript/Misc.java 2009-01-29 16:56:02 UTC (rev 2483) +++ trunk/src/org/griphyn/vdl/karajan/lib/swiftscript/Misc.java 2009-01-29 18:24:39 UTC (rev 2484) @@ -12,6 +12,8 @@ import org.griphyn.vdl.karajan.lib.SwiftArg; import org.griphyn.vdl.mapping.DSHandle; import org.griphyn.vdl.mapping.InvalidPathException; +import org.griphyn.vdl.mapping.Path; +import org.griphyn.vdl.mapping.RootArrayDataNode; import org.griphyn.vdl.mapping.RootDataNode; import org.griphyn.vdl.type.NoSuchTypeException; import org.griphyn.vdl.type.Types; @@ -28,6 +30,7 @@ setArguments("swiftscript_trace", new Arg[] { Arg.VARGS }); setArguments("swiftscript_strcat", new Arg[] { Arg.VARGS }); setArguments("swiftscript_strcut", new Arg[] { PA_INPUT, PA_PATTERN }); + setArguments("swiftscript_strsplit", new Arg[] { PA_INPUT, PA_PATTERN }); setArguments("swiftscript_regexp", new Arg[] { PA_INPUT, PA_PATTERN, PA_TRANSFORM }); setArguments("swiftscript_toint", new Arg[] { PA_INPUT }); } @@ -88,6 +91,22 @@ handle.closeShallow(); return handle; } + + public DSHandle swiftscript_strsplit(VariableStack stack) throws ExecutionException, NoSuchTypeException, + InvalidPathException { + String str = TypeUtil.toString(PA_INPUT.getValue(stack)); + String pattern = TypeUtil.toString(PA_PATTERN.getValue(stack)); + + String[] split = str.split(pattern); + + DSHandle handle = new RootArrayDataNode(Types.STRING.arrayType()); + for (int i = 0; i < split.length; i++) { + DSHandle el = handle.getField(Path.EMPTY_PATH.addFirst(String.valueOf(i), true)); + el.setValue(split[i]); + } + handle.closeDeep(); + return handle; + } public DSHandle swiftscript_regexp(VariableStack stack) throws ExecutionException, NoSuchTypeException, InvalidPathException { Added: trunk/tests/language-behaviour/0054-strsplit.out.expected =================================================================== --- trunk/tests/language-behaviour/0054-strsplit.out.expected (rev 0) +++ trunk/tests/language-behaviour/0054-strsplit.out.expected 2009-01-29 18:24:39 UTC (rev 2484) @@ -0,0 +1 @@ +ab , c , def , ghij Added: trunk/tests/language-behaviour/0054-strsplit.swift =================================================================== --- trunk/tests/language-behaviour/0054-strsplit.swift (rev 0) +++ trunk/tests/language-behaviour/0054-strsplit.swift 2009-01-29 18:24:39 UTC (rev 2484) @@ -0,0 +1,14 @@ +type messagefile {} + +(messagefile t) greeting(string a, string b, string c, string d) { + app { + echo a "," b "," c "," d stdout=@filename(t); + } +} + +messagefile outfile <"0054-strsplit.out">; + +string s[] = @strsplit("ab c def ghij", "\\s"); + +outfile = greeting(s[0], s[1], s[2], s[3]); + From noreply at svn.ci.uchicago.edu Fri Jan 30 03:57:38 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Fri, 30 Jan 2009 03:57:38 -0600 (CST) Subject: [Swift-commit] r2485 - trunk/docs Message-ID: <20090130095738.A36EE228115@www.ci.uchicago.edu> Author: benc Date: 2009-01-30 03:57:37 -0600 (Fri, 30 Jan 2009) New Revision: 2485 Modified: trunk/docs/userguide.xml Log: clarification of when strsplit was added, and tidy a couple of revious indications Modified: trunk/docs/userguide.xml =================================================================== --- trunk/docs/userguide.xml 2009-01-29 18:24:39 UTC (rev 2484) +++ trunk/docs/userguide.xml 2009-01-30 09:57:37 UTC (rev 2485) @@ -1686,7 +1686,7 @@
@strsplit @strsplit(input,pattern) will split the input string based on separators -that match the given pattern and return a string array. +that match the given pattern and return a string array. (since Swift 0.9) Example: @@ -2640,14 +2640,14 @@ specifies the maxwalltime to be used when submitting coaster workers. This profile entry is used by the coaster execution provider. If this entry is not specified, the coaster provider -will compute a maxwalltime based on the maxwalltime of jobs submitted. (since 0.9) +will compute a maxwalltime based on the maxwalltime of jobs submitted. (since Swift 0.9) coasterInternalIP specifies the internal address of the coaster head node, to be used by coaster workers to communicate with the coaster head node. This can be used when the address determined automatically by the coaster provider is inaccessible from coaster workers (for example, when the workers -reside on an unrouted internal network). (since 0.9) +reside on an unrouted internal network). (since Swift 0.9)
From noreply at svn.ci.uchicago.edu Fri Jan 30 06:33:29 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Fri, 30 Jan 2009 06:33:29 -0600 (CST) Subject: [Swift-commit] r2486 - trunk Message-ID: <20090130123329.C350322814F@www.ci.uchicago.edu> Author: benc Date: 2009-01-30 06:33:29 -0600 (Fri, 30 Jan 2009) New Revision: 2486 Modified: trunk/CHANGES.txt Log: 0.8 release version details in CHANGES Modified: trunk/CHANGES.txt =================================================================== --- trunk/CHANGES.txt 2009-01-30 09:57:37 UTC (rev 2485) +++ trunk/CHANGES.txt 2009-01-30 12:33:29 UTC (rev 2486) @@ -1,3 +1,6 @@ +(01/23/09) +*** Swift 0.8 built from Swift SVN r2448 and cog SVN r2261 + (01/14/09) *** Application success/failure status reporting can now be done using CoG provider status, rather than the previous only choice of From noreply at svn.ci.uchicago.edu Fri Jan 30 08:03:51 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Fri, 30 Jan 2009 08:03:51 -0600 (CST) Subject: [Swift-commit] r2487 - www/downloads Message-ID: <20090130140351.75CB52281E9@www.ci.uchicago.edu> Author: benc Date: 2009-01-30 08:03:50 -0600 (Fri, 30 Jan 2009) New Revision: 2487 Added: www/downloads/release-notes-0.8.txt Modified: www/downloads/index.php Log: 0.8 link on website and 0.8 release notes Modified: www/downloads/index.php =================================================================== --- www/downloads/index.php 2009-01-30 12:33:29 UTC (rev 2486) +++ www/downloads/index.php 2009-01-30 14:03:50 UTC (rev 2487) @@ -27,12 +27,13 @@

DOWNLOADS

Latest Release

-

Swift 0.7 - 2008/11/11

-

Swift v0.7 is a development release intended to release functionality +

Swift 0.8 - 2009/01/30

+

Swift v0.8 is a development release intended to release functionality and fixes that have gone in to trunk since v0.7. -[vdsk-0.7.tar.gz] -[release-notes-0.7.txt] +[vdsk-0.8.tar.gz] +[release-notes-0.8.txt]

+

Nightly Builds and Tests

@@ -116,6 +117,36 @@

Historical releases

+

Swift 0.7 - 2008/11/11

+

Swift v0.7 is a development release intended to release functionality +and fixes that have gone in to trunk since v0.7. +[vdsk-0.7.tar.gz] +[release-notes-0.7.txt] +

+

+As an alternative to the above traditional Swift packaging, Swift can be +downloaded and installed using +pacman, a package +manager commonly used on the Open Science Grid. +

+

+There are two installation targets:
+The first will install Swift alongside an existing VDT installation
+

+pacman -get http://www.ci.uchicago.edu/~benc/pacman:swift-0.7
+

+

+The second will install Swift as well as a number +of supporting packages from the VDT software release to support use of the +DOE CA and use of VOMS. (These packages are also available in a regular OSG or +VDT installation):
+

+pacman -get http://www.ci.uchicago.edu/~benc/pacman:swift-tools
+
+

+ + +

Swift 0.6 - 2008/08/25

Swift v0.6 is a development release intended to release functionality and fixes that have gone in to trunk since v0.5. Added: www/downloads/release-notes-0.8.txt =================================================================== --- www/downloads/release-notes-0.8.txt (rev 0) +++ www/downloads/release-notes-0.8.txt 2009-01-30 14:03:50 UTC (rev 2487) @@ -0,0 +1,53 @@ +These are the release notes for Swift 0.8 + +More information about Swift can be found at http://www.ci.uchicago.edu/swift/ + +Swift 0.8 is built from Swift SVN r2448 and cog SVN r2261 + +The following are significant changes since Swift 0.7 was released: + +Log plotting: + * The swift log-processing package is now included in the release. This + can be used to plot graphs of Swift logs using the swift-plot-log + command: + + $ swift-plot-log first-200901010000-abcdefg.log + + which will create a report in report-first-200901010000-abcdefg/index.html + +Execution: + * Some job execution systems do not set the initial job working directory + as specified by Swift. Previously Swift was unable to execute on such + systems. As of 0.8, Swift sets the initial job working directory in a + more robust fashion. This problem commonly affected OSG sites running + Condor. + + * Application success/failure status can now be taken from the CoG + provider layer, instead of using success/failure files on the remote + file system. This can reduce the load on the remote file system, but + does not work with all job execution mechanisms. Specifically, this + mechanism will not work with GRAM2 or clustering. + +Local execution: + * A deadlock caused when jobs output large amounts of data to stderr has been + fixed. + +Commandline client: + * A number of error messages have been improved. + * Console output is much less verbose: the progress ticker appears more often, + and individual application start and end messages are no longer shown. This + improves the quality of console output when jobs are executed at a high rate. + +Language: + * Handling of assignment statements in declarations and out of declarations + has been made more homogenous. Previously, some assignments could only + be made in a declaration statement (such as assigning arrays); and some + assignments could only be made away from a declaration statement (such as + expressions whose values were computed by some slow process). These two + forms of assignment should now be interchangeable. + + * Mapper parameters can now be results of slow computations, rather than + needing to be known by the time that declarations are first encountered. + This remedies previous unintuitive behaviour. + + From noreply at svn.ci.uchicago.edu Fri Jan 30 08:10:38 2009 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Fri, 30 Jan 2009 08:10:38 -0600 (CST) Subject: [Swift-commit] r2488 - www/downloads Message-ID: <20090130141038.CA27E22814F@www.ci.uchicago.edu> Author: benc Date: 2009-01-30 08:10:38 -0600 (Fri, 30 Jan 2009) New Revision: 2488 Modified: www/downloads/index.php Log: correct title of link to 0.8 download Modified: www/downloads/index.php =================================================================== --- www/downloads/index.php 2009-01-30 14:03:50 UTC (rev 2487) +++ www/downloads/index.php 2009-01-30 14:10:38 UTC (rev 2488) @@ -30,7 +30,7 @@

Swift 0.8 - 2009/01/30

Swift v0.8 is a development release intended to release functionality and fixes that have gone in to trunk since v0.7. -[vdsk-0.8.tar.gz] +[swift-0.8.tar.gz] [release-notes-0.8.txt]