From noreply at svn.ci.uchicago.edu Mon Apr 5 12:51:48 2010 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Mon, 5 Apr 2010 12:51:48 -0500 (CDT) Subject: [Swift-commit] r3269 - text Message-ID: <20100405175148.78A339CCBB@vm-125-59.ci.uchicago.edu> Author: lgadelha Date: 2010-04-05 12:51:47 -0500 (Mon, 05 Apr 2010) New Revision: 3269 Added: text/swift_pc3_fgcs/ Log: From noreply at svn.ci.uchicago.edu Mon Apr 5 13:52:06 2010 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Mon, 5 Apr 2010 13:52:06 -0500 (CDT) Subject: [Swift-commit] r3271 - text/swift_pc3_fgcs Message-ID: <20100405185206.B224C9CC81@vm-125-59.ci.uchicago.edu> Author: lgadelha Date: 2010-04-05 13:52:06 -0500 (Mon, 05 Apr 2010) New Revision: 3271 Modified: text/swift_pc3_fgcs/swift_pc3_fgcs.tex Log: Modified: text/swift_pc3_fgcs/swift_pc3_fgcs.tex =================================================================== --- text/swift_pc3_fgcs/swift_pc3_fgcs.tex 2010-04-05 17:55:07 UTC (rev 3270) +++ text/swift_pc3_fgcs/swift_pc3_fgcs.tex 2010-04-05 18:52:06 UTC (rev 3271) @@ -254,19 +254,13 @@ Except for {\em wasControlledBy}, the dependency relationships defined in OPM can be derived from the {\tt dataset\_usage} database relation. {\em used} and {\em wasGeneratedBy} are explicitly stored in the relation. For instance, if the tuple $\langle P_{id}, D_{id}, {\text I}, R \rangle$ is in the {\tt dataset\_usage} relation then it is equivalent to say $D_{id} \xleftarrow{\text{used(R)}} P_{id}$ in OPM. If we had 'O' instead of 'I' as the value for attribute {\tt direction} it would be equivalent to $P_{id} \xleftarrow{\text{wasGeneratedBy(R)}} D_{id}$ in OPM. +One of the main concerns with using a relational model for representing provenance is the need for querying over the transitive relation expressed in the {\tt dataset\_usage} table. For example, after executing the SwiftScript code in listing \ref{transit}, it might be desirable to find all dataset handles that lead to {\tt c}: that is, {\tt a} and {\tt b}. However simple SQL queries over the {\tt dataset\_usage} relation can only go back one step, leading to the answer {\tt b} but not to the answer {\tt a}. To address this problem, we generate a transitive closure table by an incremental evaluation system \cite{SQLTRANS}. This approach makes it straightforward to query over transitive relations using natural SQL syntax, at the expense of larger database size and longer import time. - -One of the main concerns with using a relational model for representing provenance is the need for querying over the transitive relation expressed in the {\tt dataset\_usage} table. For example, after executing the fragment: - - -\begin{lstlisting}[float,caption=A floating example,frame=lines] +\begin{lstlisting}[float,caption=Transitivity of provenance relationships.,frame=lines,label=transit] b = p(a); c = q(b); \end{lstlisting} - -it might be desirable to find all dataset handles that lead to {\tt c}: that is, {\tt a} and {\tt b}. However simple SQL queries over the {\tt dataset\_usage} relation can only go back one step, leading to the answer {\tt b} but not to the answer {\tt a}. To address this problem, we generate a transitive closure table by an incremental evaluation system \cite{SQLTRANS}. This approach makes it straightforward to query over transitive relations using natural SQL syntax, at the expense of larger database size and longer import time. - \section{Third Provenance Challenge Queries} The workflow selected for PC3 receives a set of CSV files containing astronomical data, stores the contents of these files in a relational database, and performs a series of validation steps on the database. This workflow makes extensive use of conditional and loop flow controls and database operations. A Java implementation of the component applications of the workflow was provided in the Provenance Challenge Wiki \cite{pc}, where our Swift implementation is also available. Swift has an application (local or remote) catalog where wrapper scripts that call these component applications were listed. Most of the inputs and outputs of the component applications are XML files, so we defined a mapped variable type called {\tt xmlfile} for handling these files. Component applications are declared in the SwiftScript program, such as: From noreply at svn.ci.uchicago.edu Mon Apr 5 14:18:25 2010 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Mon, 5 Apr 2010 14:18:25 -0500 (CDT) Subject: [Swift-commit] r3272 - text/swift_pc3_fgcs Message-ID: <20100405191825.E73D49CCBB@vm-125-59.ci.uchicago.edu> Author: lgadelha Date: 2010-04-05 14:18:25 -0500 (Mon, 05 Apr 2010) New Revision: 3272 Modified: text/swift_pc3_fgcs/swift_pc3_fgcs.tex Log: Modified: text/swift_pc3_fgcs/swift_pc3_fgcs.tex =================================================================== --- text/swift_pc3_fgcs/swift_pc3_fgcs.tex 2010-04-05 18:52:06 UTC (rev 3271) +++ text/swift_pc3_fgcs/swift_pc3_fgcs.tex 2010-04-05 19:18:25 UTC (rev 3272) @@ -226,7 +226,7 @@ %\begin{verbatim} \lstset{basicstyle=\tt \footnotesize} -\begin{lstlisting}[float,caption=sortProg Swift program.,frame=lines,label=sortprog] +\begin{lstlisting}[float,caption={\tt sortProg} Swift program.,frame=lines,label=sortprog] app (file o) sortProg(file i) { sort stdin=@i stdout=@o; } @@ -237,9 +237,9 @@ %\end{verbatim} %\normalsize -The Swift provenance model is close to OPM, but there are some differences. Dataset handles correspond closely with OPM artifacts as immutable representations of data. However they do not correspond exactly. An OPM artifact has unique provenance. However, a dataset handle can have multiple provenance descriptions. For example, in this SwiftScript program: +The Swift provenance model is close to OPM, but there are some differences. Dataset handles correspond closely with OPM artifacts as immutable representations of data. However they do not correspond exactly. An OPM artifact has unique provenance. However, a dataset handle can have multiple provenance descriptions. For example, in the SwiftScript program displayed in listing \ref{multi}, the expression {\tt c[0]} evaluates to the dataset handle corresponding to the variable {\tt a}. That dataset handle has a provenance trace indicating it was assigned from the constant value {\tt 7}. However, that dataset handle now has additional provenance indicating that it was output by applying the array access operator {\tt []} to the array {\tt c} and the numerical value {\tt 0}. -\begin{lstlisting}[float,caption=Example generating ,frame=lines] +\begin{lstlisting}[float,caption=Multiple provenance descriptions for a dataset.,frame=lines, label=multi] int a = 7; int b = 10; int c[] = [a, b]; @@ -247,8 +247,8 @@ \normalsize -the expression {\tt c[0]} evaluates to the dataset handle corresponding to the variable {\tt a}. That dataset handle has a provenance trace indicating it was assigned from the constant value {\tt 7}. However, that dataset handle now has additional provenance indicating that it was output by applying the array access operator {\tt []} to the array {\tt c} and the numerical value {\tt 0}. + In OPM, the artifact resulting from evaluating {\tt c[0]} is distinct from the artifact resulting from evaluating {\tt a}, although they may be annotated with an isIdenticalTo arc \cite{OPMcollections}. Except for {\em wasControlledBy}, the dependency relationships defined in OPM can be derived from the {\tt dataset\_usage} database relation. {\em used} and {\em wasGeneratedBy} are explicitly stored in the relation. For instance, if the tuple $\langle P_{id}, D_{id}, {\text I}, R \rangle$ is in the {\tt dataset\_usage} relation then it is equivalent to say $D_{id} \xleftarrow{\text{used(R)}} P_{id}$ in OPM. If we had 'O' instead of 'I' as the value for attribute {\tt direction} it would be equivalent to From noreply at svn.ci.uchicago.edu Mon Apr 5 22:26:43 2010 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Mon, 5 Apr 2010 22:26:43 -0500 (CDT) Subject: [Swift-commit] r3273 - trunk/src/org/griphyn/vdl/mapping Message-ID: <20100406032643.635079CC81@vm-125-59.ci.uchicago.edu> Author: hategan Date: 2010-04-05 22:26:43 -0500 (Mon, 05 Apr 2010) New Revision: 3273 Modified: trunk/src/org/griphyn/vdl/mapping/RootArrayDataNode.java Log: make dependence on future mapper params direct Modified: trunk/src/org/griphyn/vdl/mapping/RootArrayDataNode.java =================================================================== --- trunk/src/org/griphyn/vdl/mapping/RootArrayDataNode.java 2010-04-05 19:18:25 UTC (rev 3272) +++ trunk/src/org/griphyn/vdl/mapping/RootArrayDataNode.java 2010-04-06 03:26:43 UTC (rev 3273) @@ -9,9 +9,10 @@ public class RootArrayDataNode extends ArrayDataNode implements DSHandleListener { - private boolean initialized=false; + private boolean initialized = false; private Mapper mapper; private Map params; + private DSHandle waitingMapperParam; /** * Instantiate a root array data node with specified type. @@ -23,9 +24,10 @@ public void init(Map params) { this.params = params; - if(this.params == null) { + if (this.params == null) { initialized(); - } else { + } + else { innerInit(); } } @@ -35,9 +37,9 @@ while(i.hasNext()) { Map.Entry entry = (Map.Entry) i.next(); Object v = entry.getValue(); - if(v instanceof DSHandle && !( (DSHandle)v).isClosed()) { - DSHandle dh = (DSHandle)v; - dh.addListener(this); + if (v instanceof DSHandle && !((DSHandle) v).isClosed()) { + waitingMapperParam = (DSHandle) v; + waitingMapperParam.addListener(this); return; } } @@ -93,25 +95,26 @@ return null; } - public Mapper getMapper() { - if(initialized) { + public synchronized Mapper getMapper() { + if (initialized) { return mapper; - } else { - throw new VDL2FutureException(this); } + else { + assert(waitingMapperParam != null); + throw new VDL2FutureException(waitingMapperParam); + } } public boolean isArray() { return true; } - public void setValue(Object value) { - super.setValue(value); - initialized(); - } + public void setValue(Object value) { + super.setValue(value); + initialized(); + } - private void initialized() { - initialized=true; - } - + private void initialized() { + initialized = true; + } } From noreply at svn.ci.uchicago.edu Tue Apr 6 16:53:15 2010 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Tue, 6 Apr 2010 16:53:15 -0500 (CDT) Subject: [Swift-commit] r3274 - text/swift_pc3_fgcs Message-ID: <20100406215315.28E8C9CC81@vm-125-59.ci.uchicago.edu> Author: lgadelha Date: 2010-04-06 16:53:14 -0500 (Tue, 06 Apr 2010) New Revision: 3274 Added: text/swift_pc3_fgcs/sortProgGraph.odg text/swift_pc3_fgcs/sortProgGraph.png Modified: text/swift_pc3_fgcs/swift_pc3_fgcs.tex Log: Updated "Data Model" section. Added: text/swift_pc3_fgcs/sortProgGraph.odg =================================================================== (Binary files differ) Property changes on: text/swift_pc3_fgcs/sortProgGraph.odg ___________________________________________________________________ Name: svn:mime-type + application/octet-stream Added: text/swift_pc3_fgcs/sortProgGraph.png =================================================================== (Binary files differ) Property changes on: text/swift_pc3_fgcs/sortProgGraph.png ___________________________________________________________________ Name: svn:mime-type + application/octet-stream Modified: text/swift_pc3_fgcs/swift_pc3_fgcs.tex =================================================================== --- text/swift_pc3_fgcs/swift_pc3_fgcs.tex 2010-04-06 03:26:43 UTC (rev 3273) +++ text/swift_pc3_fgcs/swift_pc3_fgcs.tex 2010-04-06 21:53:14 UTC (rev 3274) @@ -44,9 +44,9 @@ \usepackage{upquote} %% if you use PostScript figures in your article %% use the graphics package for simple commands -%% \usepackage{graphics} +%%\usepackage{graphics} %% or use the graphicx package for more complicated commands -%% \usepackage{graphicx} +%%\usepackage{graphicx} %% or use the epsfig package if you prefer to use the old commands %% \usepackage{epsfig} @@ -65,39 +65,10 @@ \begin{frontmatter} -%% Title, authors and addresses -%% use the tnoteref command within \title for footnotes; -%% use the tnotetext command for theassociated footnote; -%% use the fnref command within \author or \address for footnotes; -%% use the fntext command for theassociated footnote; -%% use the corref command within \author for corresponding author footnotes; -%% use the cortext command for theassociated footnote; -%% use the ead command for the email address, -%% and the form \ead[url] for the home page: -%% \title{Title\tnoteref{label1}} -%% \tnotetext[label1]{} -%% \author{Name\corref{cor1}\fnref{label2}} -%% \ead{email address} -%% \ead[url]{home page} -%% \fntext[label2]{} -%% \cortext[cor1]{} -%% \address{Address\fnref{label3}} -%% \fntext[label3]{} - \title{Provenance Management in Swift} -%% use optional labels to link authors explicitly to addresses: -%% \author[label1,label2]{} -%% \address[label1]{} -%% \address[label2]{} -%\author{Ben Clifford} -%\author{Luiz M. R. Gadelha Jr.} -%\author{Marta Mattoso} -%\author{Michael Wilde} -%\author{Ian Foster} - \author[no]{Ben Clifford} \ead{benc at hawaga.org.uk} \author[coppe]{Luiz M. R. Gadelha Jr.} @@ -136,7 +107,7 @@ \section{Introduction} -The automation of large scale computational scientific experiments can be accomplished through the use of workflow management systems, parallel scripting tools, and related systems that allow the definition of the activities, input and output data, and data dependencies of such experiments. The manual analysis of the data resulting from their execution is not feasible, due to the usually large amount of information. Provenance systems can be used to facilitate this task, since they gather details about the design \cite{FrSi06} \cite{DeGa09} and execution of these experiments, such as data artifacts consumed and produced by their activities. They also make it easier to reproduce an experiment for the purpose of verification. +The automation of large scale computational scientific experiments can be accomplished through the use of workflow management systems \cite{DeGa09}, parallel scripting tools \cite{WiFo09}, and related systems that allow the definition of the activities, input and output data, and data dependencies of such experiments. The manual analysis of the data resulting from their execution is not feasible, due to the usually large amount of information. Provenance systems can be used to facilitate this task, since they gather details about the design \cite{FrSi06} and execution of these experiments, such as data artifacts consumed and produced by their activities. They also make it easier to reproduce an experiment for the purpose of verification. The Open Provenance Model (OPM) \cite{opm1.1} is an ongoing effort to standardize the representation of provenance information. OPM defines the entities {\em artifact}, {\em process}, and {\em agent} and the relations {\em used} (between an artifact and a process), {\em wasGeneratedBy} (between a process and an artifact), {\em wasControlledBy} (between an agent and a process), {\em wasTriggeredBy} (between two processes), and {\em wasDerivedFrom} (between two artifacts). @@ -146,30 +117,20 @@ \section{Data Model} \label{datamodel} -In Swift, data is represented by strongly-typed single-assignment variables. Data types can be {\em atomic} or {\em composite}. Atomic types are given by {\em primitive} types, such as integers or strings, or {\em mapped} types. Mapped types are used for representing and accessing data stored in local or remote files. {\em Composite} types are given by structures and arrays. In the Swift runtime, data is represented by a {\em dataset handle}. It may have as attributes a value, a filename, a child dataset handle (when it is a structure or an array), or a parent dataset handle (when it is contained in a structure or an array). Swift processes are given by invocations of external programs, functions, and operators. Dataset handles are produced and consumed by Swift processes. +In Swift, data is represented by strongly-typed single-assignment variables. Data types can be {\em atomic} or {\em composite}. Atomic types are given by {\em primitive} types, such as integers or strings, or {\em mapped} types. Mapped types are used for representing and accessing data stored in local or remote files. {\em Composite} types are given by structures and arrays. In the Swift runtime, data is represented by a {\em dataset handle}. It may have as attributes a value, a file name, a child dataset handle (when it is a structure or an array), or a parent dataset handle (when it is contained in a structure or an array). Swift processes are given by invocations of external programs, functions, and operators. Dataset handles are produced and consumed by Swift processes. -% brief intro to Swift with sortProg? - In the Swift provenance model, dataset handles and processes are recorded, as are the relations between them (either a process consuming a dataset handle as input, or a process producing a dataset handle as output). Each dataset handle and process is uniquely identified in time and space by a URI. This information is stored persistently in a relational database; we have also experimented with other database layouts \cite{ClGaMa09}. The two key relational tables used to store the structure of the provenance graph are {\tt processes}, that stores brief information about processes (see table \ref{processes_table}), and {\tt dataset\_usage}, that stores produced and consumed relationships between processes and dataset handles (see table \ref{dataset_usage_table}). Other tables \cite{ClGaMa09} are used to record details about each process and dataset, and other relationships such as containment. - - -Consider the Swiftscript program in listing \ref{sortprog}, which first describes a procedure ({\tt sortProg}, which calls the external executable {\tt sort}); then declares references to two files ({\tt f}, a reference to {\tt inputfile}, and {\tt g}, a reference to {\tt outputfile}); and finally calls the procedure {\tt sortProg}. When this program is run, provenance records are generated as follows: a process record is generated for the initial call to the {\tt sortProg(f)} procedure; a process record is generated for the {\tt @i} inside {\tt sortProg}, representing the evaluation of the {\tt @filename} function that Swift uses to determine the physical file name corresponding to the reference {\tt f}; a process record is generated for the {\tt @o} inside {\tt sortProg}, again representing the evaluation of the {\tt @filename} function, this time for the reference {\tt g}. - -Dataset handles are recorded for: the string {\tt "inputfile"}; the string {\tt "outputfile"}; file variable {\tt f}; the file variable {\tt g}; the filename of {\tt i}; the filename of {\tt o}. - -Input/output relations are recorded as: {\tt sortProg(f)} takes {\tt f} as an input; {\tt sortProg(f)} produces {\tt g} as an output; the {\tt @filename} function takes {\tt f} as an input; the {\tt @filename} function takes {\tt g} as an input; the {\tt @filename} produces the filename of {\tt i} as an output. - \begin{table} \begin{center} -\caption{Database table {\tt processes}.\label{processes_table}} -\begin{tabular}{ | l | p{11cm} | } +\caption{Database relation {\tt processes}.\label{processes_table}} +\begin{tabular}{ | l | p{10cm} | } \hline {\bf Attribute} & {\bf Definition}\\ \hline - {\tt id} & the URI identifying the process\\ + {\tt id} & the URI identifying the process\\ \hline - {\tt type} & the type of the process: execution, compound procedure, function, operator\\ + {\tt type} & the type of the process: execution, compound procedure, function, operator\\ \hline \end{tabular} \end{center} @@ -177,10 +138,10 @@ \begin{table} \begin{center} -\caption{Database table {\tt dataset\_usage}.\label{dataset_usage_table}} +\caption{Database relation {\tt dataset\_usage}.\label{dataset_usage_table}} \begin{tabular}{ | l | p{9.8cm} | } \hline - {\bf Attribute} & {\bf Definition}\\ + {\bf Attribute} & {\bf Definition}\\ \hline {\tt process\_id} & a URI identifying the process end of the relationship\\ \hline @@ -211,7 +172,6 @@ %\item[(V)] the filename of {\tt o}. %\end{enumerate} - %Input/output relations are recorded as: %\begin{itemize} @@ -222,9 +182,6 @@ %\item (B) produces (U) as an output. %\end{itemize} -%%\footnotesize - -%\begin{verbatim} \lstset{basicstyle=\tt \footnotesize} \begin{lstlisting}[float,caption={\tt sortProg} Swift program.,frame=lines,label=sortprog] app (file o) sortProg(file i) { @@ -234,11 +191,39 @@ file g <"outputfile">; g = sortProg(f); \end{lstlisting} -%\end{verbatim} -%\normalsize -The Swift provenance model is close to OPM, but there are some differences. Dataset handles correspond closely with OPM artifacts as immutable representations of data. However they do not correspond exactly. An OPM artifact has unique provenance. However, a dataset handle can have multiple provenance descriptions. For example, in the SwiftScript program displayed in listing \ref{multi}, the expression {\tt c[0]} evaluates to the dataset handle corresponding to the variable {\tt a}. That dataset handle has a provenance trace indicating it was assigned from the constant value {\tt 7}. However, that dataset handle now has additional provenance indicating that it was output by applying the array access operator {\tt []} to the array {\tt c} and the numerical value {\tt 0}. +Consider the Swiftscript program in listing \ref{sortprog}, which first describes a procedure ({\tt sortProg}, which calls the external executable {\tt sort}); then declares references to two files, ({\tt f}, a reference to {\tt inputfile}, and {\tt g}, a reference to {\tt outputfile}); and finally calls the procedure {\tt sortProg}. +When this program is run, provenance records are generated as follows: + a process record is generated for the initial call to the {\tt sortProg(f)} procedure; + a process record is generated for the {\tt @i} inside {\tt sortProg}, representing the evaluation of the {\tt @filename} function that Swift uses to determine the physical file name corresponding to the reference {\tt f}; + a process record is generated for the {\tt @o} inside {\tt sortProg}, again representing the evaluation of the {\tt @filename} function, this time for the reference {\tt g}. +Dataset handles are recorded for: + the string {\tt "inputfile"}; + the string {\tt "outputfile"}; + file variable {\tt f}; + the file variable {\tt g}; + the file name of {\tt i}; + the file name of {\tt o}. +Input and output relations are recorded as: + {\tt sortProg(f)} takes {\tt f} as an input; + {\tt sortProg(f)} produces {\tt g} as an output; + the {\tt @i} function takes {\tt f} as an input; + {\tt @i} produces the file name of {\tt i} as an output. + the {\tt @o} function takes {\tt g} as an input; + {\tt @o} produces the file name of {\tt o} as an output. +The Swift provenance model is close to OPM, but there are some differences. Dataset handles correspond closely with OPM artifacts as immutable representations of data. However they do not correspond exactly. An OPM artifact has unique provenance. However, a dataset handle can have multiple provenance descriptions. For example, given the SwiftScript program displayed in listing \ref{multi}, the expression {\tt c[0]} evaluates to the dataset handle corresponding to the variable {\tt a}. That dataset handle has a provenance trace indicating it was assigned from the constant value {\tt 7}. However, that dataset handle now has additional provenance indicating that it was output by applying the array access operator {\tt []} to the array {\tt c} and the numerical value {\tt 0}. In OPM, the artifact resulting from evaluating {\tt c[0]} is distinct from the artifact resulting from evaluating {\tt a}, although they may be annotated with an {\em isIdenticalTo} arc \cite{OPMcollections }. The OPM entity agent is currently not represented in Swift's provenance model. + +Except for {\em wasControlledBy}, the dependency relationships defined in OPM can be derived from the {\tt dataset\_usage} database relation. It explicitly stores the {\em used} and {\em wasGeneratedBy} relationships. For instance, the provenance database for {\tt sortProg} contains the tuples $\langle \text{{\tt sortProg}}, \text{{\tt f}}, \text{In}, \text{{\tt i}} \rangle$ and $\langle \text{{\tt sortProg}}, \text{{\tt g}}, \text{Out}, \text{{\tt o}} \rangle$. In OPM, this is equivalent to say $\text{{\tt f}} \xleftarrow{\text{used(\text{{\tt i}})}} \text{{\tt sortProg}}$ and $\text{{\tt sortProg}} \xleftarrow{\text{wasGeneratedBy(\text{{\tt o}})}} \text{{\tt g}}$ respectively. Figure \ref{sortProgGraph} shows an OPM graph containing the relationships stored in the provenance database for the {\tt sortProg} example. + +\begin{figure*} +\caption{Provenance graph of {\tt sortProg}.\label{sortProgGraph}} +\begin{center} +\includegraphics[width=13.5cm]{sortProgGraph} +\end{center} +\label{tsp} +\end{figure*} + \begin{lstlisting}[float,caption=Multiple provenance descriptions for a dataset.,frame=lines, label=multi] int a = 7; int b = 10; @@ -246,14 +231,6 @@ \end{lstlisting} \normalsize - - - -In OPM, the artifact resulting from evaluating {\tt c[0]} is distinct from the artifact resulting from evaluating {\tt a}, although they may be annotated with an isIdenticalTo arc \cite{OPMcollections}. - -Except for {\em wasControlledBy}, the dependency relationships defined in OPM can be derived from the {\tt dataset\_usage} database relation. {\em used} and {\em wasGeneratedBy} are explicitly stored in the relation. For instance, if the tuple $\langle P_{id}, D_{id}, {\text I}, R \rangle$ is in the {\tt dataset\_usage} relation then it is equivalent to say $D_{id} \xleftarrow{\text{used(R)}} P_{id}$ in OPM. If we had 'O' instead of 'I' as the value for attribute {\tt direction} it would be equivalent to -$P_{id} \xleftarrow{\text{wasGeneratedBy(R)}} D_{id}$ in OPM. - One of the main concerns with using a relational model for representing provenance is the need for querying over the transitive relation expressed in the {\tt dataset\_usage} table. For example, after executing the SwiftScript code in listing \ref{transit}, it might be desirable to find all dataset handles that lead to {\tt c}: that is, {\tt a} and {\tt b}. However simple SQL queries over the {\tt dataset\_usage} relation can only go back one step, leading to the answer {\tt b} but not to the answer {\tt a}. To address this problem, we generate a transitive closure table by an incremental evaluation system \cite{SQLTRANS}. This approach makes it straightforward to query over transitive relations using natural SQL syntax, at the expense of larger database size and longer import time. \begin{lstlisting}[float,caption=Transitivity of provenance relationships.,frame=lines,label=transit] @@ -291,11 +268,11 @@ In our first attempt to implement LoadWorkflow in Swift, we found the use of the foreach loop problematic because the database routines are internal to the Java implementation and, therefore, Swift has no control over them. Since Swift tries to parallelize the {\tt foreach} iterations it ended up incorrectly parallelizing the database operations. It was necessary to serialize the execution of the workflow to avoid this problem. Most of the PC3 queries are for row-level database provenance. A workaround for gathering provenance about database operations was implemented by modifying the application database so that for every row inserted or modified, an entry containing the execution identifier of the Swift process that performed the corresponding database operation is also inserted. -{\em Core Query 1}. The first query asks, for a given application database row, which CSV files contributed to it. The strategy used to answer this query is to determine input CSV files that precede, in the transitivity table, the process that inserted the row. This query can be answered by first obtaining the Swift process identifier of the process that inserted the row from the annotations included in the application database. Finally, we query for filenames of datasets that contain CSV inputs in the set of predecessors of the process that inserted the row. +{\em Core Query 1}. The first query asks, for a given application database row, which CSV files contributed to it. The strategy used to answer this query is to determine input CSV files that precede, in the transitivity table, the process that inserted the row. This query can be answered by first obtaining the Swift process identifier of the process that inserted the row from the annotations included in the application database. Finally, we query for file names of datasets that contain CSV inputs in the set of predecessors of the process that inserted the row. -{\em Core Query 2}. This query asks if the range check (IsMatchColumnRanges) was performed in a particular table, given that a user found values that were not expected in it. This is implemented with the SQL query: +{\em Core Query 2}. This query asks if the range check (IsMatchColumnRanges) was performed in a particular table, given that a user found values that were not expected in it. This is implemented with the SQL query displayed in listing \ref{qc2}. It returns the input parameter XML for all IsMatchColumnRanges calls. These are XML values, and it is necessary to examine the resulting XML to determine if it was invoked for the specific table. There is unpleasant cross-format joining necessary here to get an actual yes/no result properly, although we could use a {\tt LIKE} clause to examine the value. -\begin{lstlisting}[float,caption=A floating example,frame=lines] +\begin{lstlisting}[float,caption=Core query 2.,frame=lines, label=qc2] > select dataset_values.value from processes, invocation_procedure_names, dataset_usage, @@ -310,8 +287,8 @@ dataset_usage.dataset_id = dataset_values.dataset_id; \end{lstlisting} -This returns the input parameter XML for all IsMatchColumnRanges calls. These are XML values, and it is necessary to examine the resulting XML to determine if it was invoked for the specific table. There is unpleasant cross-format joining necessary here to get an actual yes/no result properly, although we probably could use a {\tt LIKE} clause to examine the value. + {\em Core Query 3}. The third core query asks which operation executions were strictly necessary for the Image table to contain a particular (non-computed) value. This uses the additional annotations made, that only store which process originally inserted a row, not which processes have modified a row. So to some extent, rows are regarded a bit like artifacts (though not first order artifacts in the provenance database); and we can only answer questions about the provenance of rows, not the individual fields within those rows. That is sufficient for this query, though. First find the row that contains the interesting value and extract its {\tt IMAGEID}. Then find the process that created the {\tt IMAGEID} by querying the Derby database table {\tt P2IMAGEPROV}. This gives the process ID for the process that created the row. Now query the transitive closure table for all predecessors for that process (as in the first core query). This will produce all processes and artifacts t hat preceded this row creation. Our answer differs from the sample answer because we have sequenced access to the database, rather than regarding each row as a proper first-order artifact. The entire database state at a particular time is a successor to all previous database accessing operations, so any process which led to any database access before the row in question is regarded as a necessary operation. This is undesirable in some respects, but desirable in others. For example, a row insert only works because previous database operations which inserted other rows did not insert a conflicting primary key - so there is data dependency between the different operations even though they operate on different rows. {\em Optional Query 1}. The computation halts due to failing an IsMatchTable-ColumnRanges check. How many tables were loaded successfully before the computation halted due to the failed check? The answer was given by querying how many load processes are known to the database (over all recorded computation), which can be restricted to a particular computation. From noreply at svn.ci.uchicago.edu Wed Apr 7 16:03:50 2010 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Wed, 7 Apr 2010 16:03:50 -0500 (CDT) Subject: [Swift-commit] r3275 - text/swift_pc3_fgcs Message-ID: <20100407210350.304159CC81@vm-125-59.ci.uchicago.edu> Author: lgadelha Date: 2010-04-07 16:03:49 -0500 (Wed, 07 Apr 2010) New Revision: 3275 Modified: text/swift_pc3_fgcs/swift_pc3_fgcs.tex Log: Update to Data Model section Modified: text/swift_pc3_fgcs/swift_pc3_fgcs.tex =================================================================== --- text/swift_pc3_fgcs/swift_pc3_fgcs.tex 2010-04-06 21:53:14 UTC (rev 3274) +++ text/swift_pc3_fgcs/swift_pc3_fgcs.tex 2010-04-07 21:03:49 UTC (rev 3275) @@ -111,7 +111,7 @@ The Open Provenance Model (OPM) \cite{opm1.1} is an ongoing effort to standardize the representation of provenance information. OPM defines the entities {\em artifact}, {\em process}, and {\em agent} and the relations {\em used} (between an artifact and a process), {\em wasGeneratedBy} (between a process and an artifact), {\em wasControlledBy} (between an agent and a process), {\em wasTriggeredBy} (between two processes), and {\em wasDerivedFrom} (between two artifacts). -The Swift parallel scripting system \cite{swift} \cite{WiFo09} is a successor of the Virtual Data System (VDS) \cite{chimera} \cite{ZhWiFo06} \cite{ClFo08}. It allows the specification, management and execution of large-scale scientific workflows on parallel and distributed environments. The SwiftScript language is used for high-level specification of computations, it has features such as data types, data mappers, conditional and repetition flow controls, and sub-workflow composition. Its data model and type system are derived from XDTM \cite{xdtm}, which allows for the definition of abstract data types and objects without referring to their physical representation. If some dataset does not reside in main memory, its materialization is done through the use of data mappers. Procedures perform logical operations on input data, without modifying them. Swiftscript also allows procedures to be composed to define more complex computations. By analyzing the inputs and outputs of th ese procedures, the system determines data dependencies between them. This information is used to execute procedures that have no mutual data dependencies in parallel. It supports common execution managers for clustered systems and for grid environments, such as Falkon \cite{falkon}, which provides high job execution throughput. Swift logs a variety of information about each computation. This information can be exported to a relational database that uses a data model similar to OPM. +The Swift parallel scripting system \cite{swift} \cite{WiFo09} is a successor of the Virtual Data System (VDS) \cite{chimera} \cite{ZhWiFo06} \cite{ClFo08}. It allows the specification, management and execution of large-scale scientific workflows on parallel and distributed environments. The SwiftScript language is used for high-level specification of computations, it has features such as data types, data mappers, conditional and repetition flow controls, and sub-workflow composition. Its data model and type system are derived from XDTM \cite{xdtm}, which allows for the definition of abstract data types and objects without referring to their physical representation. If some dataset does not reside in main memory, its materialization is done through the use of data mappers. Procedures perform logical operations on input data, without modifying them. SwiftScript also allows procedures to be composed to define more complex computations. By analyzing the inputs and outputs of th ese procedures, the system determines data dependencies between them. This information is used to execute procedures that have no mutual data dependencies in parallel. It supports common execution managers for clustered systems and for grid environments, such as Falkon \cite{falkon}, which provides high job execution throughput. Swift logs a variety of information about each computation. This information can be exported to a relational database that uses a data model similar to OPM. The objective of this paper is to present the local and remote provenance recording and analysis capabilities of Swift. In the sections that follow, we demonstrate the provenance capabilities of the Swift system and evaluate its interoperability with other systems through the use of OPM. We describe the provenance data model of the Swift system and compare it to OPM. We also describe activities performed within the Third Provenance Challenge (PC3) which consisted of implementing a specific scientific workflow (LoadWorkflow), performing provenance queries, and exchanging provenance information with other systems. @@ -119,7 +119,7 @@ In Swift, data is represented by strongly-typed single-assignment variables. Data types can be {\em atomic} or {\em composite}. Atomic types are given by {\em primitive} types, such as integers or strings, or {\em mapped} types. Mapped types are used for representing and accessing data stored in local or remote files. {\em Composite} types are given by structures and arrays. In the Swift runtime, data is represented by a {\em dataset handle}. It may have as attributes a value, a file name, a child dataset handle (when it is a structure or an array), or a parent dataset handle (when it is contained in a structure or an array). Swift processes are given by invocations of external programs, functions, and operators. Dataset handles are produced and consumed by Swift processes. -In the Swift provenance model, dataset handles and processes are recorded, as are the relations between them (either a process consuming a dataset handle as input, or a process producing a dataset handle as output). Each dataset handle and process is uniquely identified in time and space by a URI. This information is stored persistently in a relational database; we have also experimented with other database layouts \cite{ClGaMa09}. The two key relational tables used to store the structure of the provenance graph are {\tt processes}, that stores brief information about processes (see table \ref{processes_table}), and {\tt dataset\_usage}, that stores produced and consumed relationships between processes and dataset handles (see table \ref{dataset_usage_table}). Other tables \cite{ClGaMa09} are used to record details about each process and dataset, and other relationships such as containment. +In the Swift provenance model, dataset handles and processes are recorded, as are the relations between them (either a process consuming a dataset handle as input, or a process producing a dataset handle as output). Each dataset handle and process is uniquely identified in time and space by a URI. This information is stored persistently in a relational database; we have also experimented with other database layouts \cite{ClGaMa09}. The two key relational tables used to store the structure of the provenance graph are {\tt processes}, that stores brief information about processes (see table \ref{processes_table}), and {\tt dataset\_usage}, that stores produced and consumed relationships between processes and dataset handles (see table \ref{dataset_usage_table}). Other tables (see \cite{ClGaMa09} for details) are used to record details about each process and dataset, and other relationships such as containment. \begin{table} \begin{center} @@ -192,7 +192,7 @@ g = sortProg(f); \end{lstlisting} -Consider the Swiftscript program in listing \ref{sortprog}, which first describes a procedure ({\tt sortProg}, which calls the external executable {\tt sort}); then declares references to two files, ({\tt f}, a reference to {\tt inputfile}, and {\tt g}, a reference to {\tt outputfile}); and finally calls the procedure {\tt sortProg}. +Consider the SwiftScript program in listing \ref{sortprog}, which first describes a procedure ({\tt sortProg}, which calls the external executable {\tt sort}); then declares references to two files, ({\tt f}, a reference to {\tt inputfile}, and {\tt g}, a reference to {\tt outputfile}); and finally calls the procedure {\tt sortProg}. When this program is run, provenance records are generated as follows: a process record is generated for the initial call to the {\tt sortProg(f)} procedure; a process record is generated for the {\tt @i} inside {\tt sortProg}, representing the evaluation of the {\tt @filename} function that Swift uses to determine the physical file name corresponding to the reference {\tt f}; @@ -214,10 +214,10 @@ The Swift provenance model is close to OPM, but there are some differences. Dataset handles correspond closely with OPM artifacts as immutable representations of data. However they do not correspond exactly. An OPM artifact has unique provenance. However, a dataset handle can have multiple provenance descriptions. For example, given the SwiftScript program displayed in listing \ref{multi}, the expression {\tt c[0]} evaluates to the dataset handle corresponding to the variable {\tt a}. That dataset handle has a provenance trace indicating it was assigned from the constant value {\tt 7}. However, that dataset handle now has additional provenance indicating that it was output by applying the array access operator {\tt []} to the array {\tt c} and the numerical value {\tt 0}. In OPM, the artifact resulting from evaluating {\tt c[0]} is distinct from the artifact resulting from evaluating {\tt a}, although they may be annotated with an {\em isIdenticalTo} arc \cite{OPMcollections }. The OPM entity agent is currently not represented in Swift's provenance model. -Except for {\em wasControlledBy}, the dependency relationships defined in OPM can be derived from the {\tt dataset\_usage} database relation. It explicitly stores the {\em used} and {\em wasGeneratedBy} relationships. For instance, the provenance database for {\tt sortProg} contains the tuples $\langle \text{{\tt sortProg}}, \text{{\tt f}}, \text{In}, \text{{\tt i}} \rangle$ and $\langle \text{{\tt sortProg}}, \text{{\tt g}}, \text{Out}, \text{{\tt o}} \rangle$. In OPM, this is equivalent to say $\text{{\tt f}} \xleftarrow{\text{used(\text{{\tt i}})}} \text{{\tt sortProg}}$ and $\text{{\tt sortProg}} \xleftarrow{\text{wasGeneratedBy(\text{{\tt o}})}} \text{{\tt g}}$ respectively. Figure \ref{sortProgGraph} shows an OPM graph containing the relationships stored in the provenance database for the {\tt sortProg} example. +Except for {\em wasControlledBy}, the dependency relationships defined in OPM can be derived from the {\tt dataset\_usage} database relation. It explicitly stores the {\em used} and {\em wasGeneratedBy} relationships. For instance, the provenance database for {\tt sortProg} contains the tuples $\langle \text{{\tt sortProg}}, \text{{\tt f}}, \text{In}, \text{{\tt i}} \rangle$ and $\langle \text{{\tt sortProg}}, \text{{\tt g}}, \text{Out}, \text{{\tt o}} \rangle$. In OPM, this is equivalent to say $\text{{\tt f}} \xleftarrow{\text{used(\text{{\tt i}})}} \text{{\tt sortProg}}$ and $\text{{\tt sortProg}} \xleftarrow{\text{wasGeneratedBy(\text{{\tt o}})}} \text{{\tt g}}$ respectively. {\em wasTriggeredBy} and {\em wasDerivedFrom} dependency relationships can be inferred from {\tt database\_usage}, in the {\tt sortProg} example we have ${\tt f} \xleftarrow{\text{wasDerivedFrom}} g$. Figure \ref{sortProgGraph} shows the relationships stored in Swift's provenance database for the {\ tt sortProg} example using OPM notation. \begin{figure*} -\caption{Provenance graph of {\tt sortProg}.\label{sortProgGraph}} +\caption{Provenance relationships of {\tt sortProg}.\label{sortProgGraph}} \begin{center} \includegraphics[width=13.5cm]{sortProgGraph} \end{center} @@ -287,8 +287,6 @@ dataset_usage.dataset_id = dataset_values.dataset_id; \end{lstlisting} - - {\em Core Query 3}. The third core query asks which operation executions were strictly necessary for the Image table to contain a particular (non-computed) value. This uses the additional annotations made, that only store which process originally inserted a row, not which processes have modified a row. So to some extent, rows are regarded a bit like artifacts (though not first order artifacts in the provenance database); and we can only answer questions about the provenance of rows, not the individual fields within those rows. That is sufficient for this query, though. First find the row that contains the interesting value and extract its {\tt IMAGEID}. Then find the process that created the {\tt IMAGEID} by querying the Derby database table {\tt P2IMAGEPROV}. This gives the process ID for the process that created the row. Now query the transitive closure table for all predecessors for that process (as in the first core query). This will produce all processes and artifacts t hat preceded this row creation. Our answer differs from the sample answer because we have sequenced access to the database, rather than regarding each row as a proper first-order artifact. The entire database state at a particular time is a successor to all previous database accessing operations, so any process which led to any database access before the row in question is regarded as a necessary operation. This is undesirable in some respects, but desirable in others. For example, a row insert only works because previous database operations which inserted other rows did not insert a conflicting primary key - so there is data dependency between the different operations even though they operate on different rows. {\em Optional Query 1}. The computation halts due to failing an IsMatchTable-ColumnRanges check. How many tables were loaded successfully before the computation halted due to the failed check? The answer was given by querying how many load processes are known to the database (over all recorded computation), which can be restricted to a particular computation. From noreply at svn.ci.uchicago.edu Thu Apr 8 21:44:06 2010 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Thu, 8 Apr 2010 21:44:06 -0500 (CDT) Subject: [Swift-commit] r3276 - trunk/libexec Message-ID: <20100409024406.45B179CC99@vm-125-59.ci.uchicago.edu> Author: hategan Date: 2010-04-08 21:44:05 -0500 (Thu, 08 Apr 2010) New Revision: 3276 Modified: trunk/libexec/_swiftwrap Log: fixed mv -v issues on busybox Modified: trunk/libexec/_swiftwrap =================================================================== --- trunk/libexec/_swiftwrap 2010-04-07 21:03:49 UTC (rev 3275) +++ trunk/libexec/_swiftwrap 2010-04-09 02:44:05 UTC (rev 3276) @@ -1,3 +1,4 @@ +#!/bin/bash # this script must be invoked inside of bash, not plain sh # note that this script modifies $IFS @@ -533,7 +534,7 @@ logstate "MOVING_OUTPUTS $OUTF" for O in $OUTF ; do if ! contains SKIPPED_OUTPUT $O ; then - mv -v "$DIR/$O" "$WFDIR/shared/$O" 2>&1 >& "$INFO" + mv "$DIR/$O" "$WFDIR/shared/$O" 2>&1 >& "$INFO" checkError 254 "Failed to move output file $O to shared directory" fi done From noreply at svn.ci.uchicago.edu Sat Apr 10 14:16:57 2010 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Sat, 10 Apr 2010 14:16:57 -0500 (CDT) Subject: [Swift-commit] r3277 - trunk/libexec Message-ID: <20100410191657.8043D9CCAA@vm-125-59.ci.uchicago.edu> Author: wozniak Date: 2010-04-10 14:16:56 -0500 (Sat, 10 Apr 2010) New Revision: 3277 Modified: trunk/libexec/_swiftwrap trunk/libexec/cdm_broadcast.sh trunk/libexec/vdl-int.k Log: Corrections to CDM BROADCAST for BG/P. Modified: trunk/libexec/_swiftwrap =================================================================== --- trunk/libexec/_swiftwrap 2010-04-09 02:44:05 UTC (rev 3276) +++ trunk/libexec/_swiftwrap 2010-04-10 19:16:56 UTC (rev 3277) @@ -177,9 +177,9 @@ log "CDM[LOCAL]: Copying $DIRECT_DIR/$FILE to $JOBDIR/$FILE" if [ $MODE == "INPUT" ]; then [ -f "$DIRECT_DIR/$FILE" ] - checkError 254 "CDM[DIRECT]: $REMOTE_DIR/$FILE does not exist!" + checkError 254 "CDM[LOCAL]: $REMOTE_DIR/$FILE does not exist!" $TOOL $FLAGS $REMOTE_DIR/$FILE $JOBDIR/$FILE - checkError 254 "CDM[DIRECT]: Tool failed!" + checkError 254 "CDM[LOCAL]: Tool failed!" elif [ $MODE == "OUTPUT" ]; then log "CDM[LOCAL]..." else @@ -188,14 +188,15 @@ ;; BROADCAST) BROADCAST_DIR=${ARGS[0]} - log "CDM[BROADCAST]: Linking $JOBDIR/$FILE to $BROADCAST_DIR/$FILE" + if [ $MODE == "INPUT" ]; then + log "CDM[BROADCAST]: Linking $JOBDIR/$FILE to $BROADCAST_DIR/$FILE" [ -f "$BROADCAST_DIR/$FILE" ] checkError 254 "CDM[BROADCAST]: $BROADCAST_DIR/$FILE does not exist!" ln -s $BROADCAST_DIR/$FILE $JOBDIR/$FILE checkError 254 "CDM[BROADCAST]: Linking to $BROADCAST_DIR/$FILE failed!" else - fail 254 "Cannot BROADCAST an output file!" + echo "CDM[BROADCAST]: Skipping output file: ${FILE}" fi ;; GATHER) @@ -214,7 +215,9 @@ fi CDM_POLICY=$( cdm_lookup $L $CDM_FILE ) - cdm_local_output_perform $L $CDM_POLICY + if [[ $CDM_POLICY == "LOCAL" ]]; then + cdm_local_output_perform $L $CDM_POLICY + fi } cdm_local_output_perform() @@ -448,7 +451,8 @@ GATHER_OUTPUT=() for L in $OUTF ; do CDM_POLICY=$( cdm_lookup $L $CDM_FILE ) - if [ $CDM_POLICY != "DEFAULT" ]; then + if [[ $CDM_POLICY != "DEFAULT" && + $CDM_POLICY != "BROADCAST"* ]]; then log "CDM_POLICY: $L -> $CDM_POLICY" eval cdm_action $DIR "OUTPUT" $L $CDM_POLICY SKIPPED_OUTPUT=( $SKIPPED_OUTPUT $L ) Modified: trunk/libexec/cdm_broadcast.sh =================================================================== --- trunk/libexec/cdm_broadcast.sh 2010-04-09 02:44:05 UTC (rev 3276) +++ trunk/libexec/cdm_broadcast.sh 2010-04-10 19:16:56 UTC (rev 3277) @@ -2,13 +2,53 @@ SWIFT_HOME=$( dirname $( dirname $0 ) ) LOG=${SWIFT_HOME}/etc/cdm_broadcast.log + +bgp_broadcast() { + DIR=$1 + FILE=$2 + DEST=$3 + if [[ ! -f ips.list ]] + then + BLOCKS=$( qstat -u ${USER} | grep ${USER} | awk '{ print $6 }' ) + IPS=$( listip ${BLOCKS} ) + for IP in ${IPS} + do + echo ${IP} + done >> ip.list + else + while read T + do + BLOCKS="$BLOCKS $T" + done < ip.list + fi + for IP in ${BLOCKS} + do + ssh ${IP} /bin.rd/f2cn ${DIR}/${FILE} ${DEST}/${FILE} + done +} + +local_broadcast() +{ + DIR=$1 + FILE=$2 + DEST=$3 + cp -v ${FILE} ${DEST}/${FILE} +} + +{ + declare -p PWD set -x FILE=$1 DIR=$2 DEST=$3 - - cp -v ${DIR}/${FILE} ${DEST} + if [[ $( uname -p ) == "ppc64" ]] + then + bgp_broadcast ${DIR} ${FILE} ${DEST} + else + bgp_local ${DIR} ${FILE} ${DEST} + fi + } >> ${LOG} 2>&1 Modified: trunk/libexec/vdl-int.k =================================================================== --- trunk/libexec/vdl-int.k 2010-04-09 02:44:05 UTC (rev 3276) +++ trunk/libexec/vdl-int.k 2010-04-10 19:16:56 UTC (rev 3277) @@ -314,8 +314,8 @@ log(LOG:DEBUG, "FILE_STAGE_IN_BROADCAST file={srcfile} policy={policy}") cdm:broadcast(srcfile=srcfile, srcdir=srcdir)) else(log(LOG:DEBUG, "FILE_STAGE_IN_SKIP file={srcfile} policy={policy}"))) - log(LOG:DEBUG, "FILE_STAGE_IN_END file={srcfile} srchost={srchost} srcdir={srcdir} srcname={srcfile} ", - "desthost={desthost} destdir={destdir} provider={provider}") + log(LOG:DEBUG, "FILE_STAGE_IN_END file={srcfile} srchost={srchost} srcdir={srcdir} srcname={srcfile} ", + "desthost={desthost} destdir={destdir} provider={provider}") ) ) @@ -338,10 +338,11 @@ //make sure we do have the directory on the client side dir:make(ldir, host=dhost, provider=provider) policy := cdm:query(query=file) - if (policy == "DEFAULT" then( + if (sys:or(policy == "DEFAULT", policy == "BROADCAST") + then( restartOnError(".*", 2 - task:transfer(srchost=host, srcfile=bname, - srcdir=rdir, destdir=ldir, desthost=dhost, destprovider=provider))) + task:transfer(srchost=host, srcfile=bname,srcdir=rdir, + destdir=ldir, desthost=dhost, destprovider=provider))) else(log(LOG:DEBUG, "FILE_STAGE_OUT_SKIP srcname={bname}")) ) log(LOG:DEBUG, "FILE_STAGE_OUT_END srcname={bname} srcdir={rdir} srchost={host} ", From noreply at svn.ci.uchicago.edu Sun Apr 11 09:15:00 2010 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Sun, 11 Apr 2010 09:15:00 -0500 (CDT) Subject: [Swift-commit] r3278 - trunk/src/org/globus/swift/data/policy Message-ID: <20100411141500.3A8269CC97@vm-125-59.ci.uchicago.edu> Author: wozniak Date: 2010-04-11 09:14:59 -0500 (Sun, 11 Apr 2010) New Revision: 3278 Added: trunk/src/org/globus/swift/data/policy/AllocationHook.java Log: Draft of hook for Coasters integration with CDM. Added: trunk/src/org/globus/swift/data/policy/AllocationHook.java =================================================================== --- trunk/src/org/globus/swift/data/policy/AllocationHook.java (rev 0) +++ trunk/src/org/globus/swift/data/policy/AllocationHook.java 2010-04-11 14:14:59 UTC (rev 3278) @@ -0,0 +1,12 @@ + +import org.globus.cog.abstraction.interfaces.Status; +import org.globus.cog.abstraction.impl.common.StatusEvent; +import org.globus.cog.abstraction.coaster.service.job.manager.Hook; + +public class AllocationHook extends Hook +{ + public void blockActive(StatusEvent e) + { + System.out.println("blockActive: " + e.getStatus().getMessage()); + } +} From noreply at svn.ci.uchicago.edu Sun Apr 11 09:29:38 2010 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Sun, 11 Apr 2010 09:29:38 -0500 (CDT) Subject: [Swift-commit] r3279 - trunk/src/org/globus/swift/data/policy Message-ID: <20100411142938.DFC3E9CC97@vm-125-59.ci.uchicago.edu> Author: wozniak Date: 2010-04-11 09:29:38 -0500 (Sun, 11 Apr 2010) New Revision: 3279 Modified: trunk/src/org/globus/swift/data/policy/AllocationHook.java Log: Package fix. Modified: trunk/src/org/globus/swift/data/policy/AllocationHook.java =================================================================== --- trunk/src/org/globus/swift/data/policy/AllocationHook.java 2010-04-11 14:14:59 UTC (rev 3278) +++ trunk/src/org/globus/swift/data/policy/AllocationHook.java 2010-04-11 14:29:38 UTC (rev 3279) @@ -1,4 +1,6 @@ +package org.globus.swift.data.AllocationHook; + import org.globus.cog.abstraction.interfaces.Status; import org.globus.cog.abstraction.impl.common.StatusEvent; import org.globus.cog.abstraction.coaster.service.job.manager.Hook; From noreply at svn.ci.uchicago.edu Sun Apr 11 09:38:29 2010 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Sun, 11 Apr 2010 09:38:29 -0500 (CDT) Subject: [Swift-commit] r3280 - trunk/src/org/globus/swift/data/policy Message-ID: <20100411143830.097479CC97@vm-125-59.ci.uchicago.edu> Author: wozniak Date: 2010-04-11 09:38:29 -0500 (Sun, 11 Apr 2010) New Revision: 3280 Modified: trunk/src/org/globus/swift/data/policy/AllocationHook.java Log: Actual package fix. Modified: trunk/src/org/globus/swift/data/policy/AllocationHook.java =================================================================== --- trunk/src/org/globus/swift/data/policy/AllocationHook.java 2010-04-11 14:29:38 UTC (rev 3279) +++ trunk/src/org/globus/swift/data/policy/AllocationHook.java 2010-04-11 14:38:29 UTC (rev 3280) @@ -1,5 +1,5 @@ -package org.globus.swift.data.AllocationHook; +package org.globus.swift.data.policy; import org.globus.cog.abstraction.interfaces.Status; import org.globus.cog.abstraction.impl.common.StatusEvent; From noreply at svn.ci.uchicago.edu Sun Apr 11 16:50:15 2010 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Sun, 11 Apr 2010 16:50:15 -0500 (CDT) Subject: [Swift-commit] r3281 - in trunk: libexec src/org/globus/swift/data src/org/globus/swift/data/policy Message-ID: <20100411215015.37B8E9CC97@vm-125-59.ci.uchicago.edu> Author: wozniak Date: 2010-04-11 16:50:14 -0500 (Sun, 11 Apr 2010) New Revision: 3281 Modified: trunk/libexec/cdm_broadcast.sh trunk/libexec/vdl-int.k trunk/libexec/vdl-lib.xml trunk/src/org/globus/swift/data/Action.java trunk/src/org/globus/swift/data/Director.java trunk/src/org/globus/swift/data/Query.java trunk/src/org/globus/swift/data/policy/AllocationHook.java trunk/src/org/globus/swift/data/policy/Broadcast.java Log: Connect CDM BROADCAST functionality to Coasters notification. Modified: trunk/libexec/cdm_broadcast.sh =================================================================== --- trunk/libexec/cdm_broadcast.sh 2010-04-11 14:38:29 UTC (rev 3280) +++ trunk/libexec/cdm_broadcast.sh 2010-04-11 21:50:14 UTC (rev 3281) @@ -3,29 +3,80 @@ SWIFT_HOME=$( dirname $( dirname $0 ) ) LOG=${SWIFT_HOME}/etc/cdm_broadcast.log +# For each given location, broadcast the given files to it +# Input: bgp_broadcast [-l *]* bgp_broadcast() { - DIR=$1 - FILE=$2 - DEST=$3 - if [[ ! -f ips.list ]] + while [[ ${*} != "" ]] + do + L=$1 # -l + shift + ARGS=$1 # Location + shift + while true + do + if [[ $1 == "-l" || $1 == "" ]] + then + break + fi + ARGS="${ARGS} $1" + shift + done + bgp_broadcast_perform ${ARGS} + done +} + +# Broadcast the given files to the given location +# Input: bgp_broadcast_perform * +bgp_broadcast_perform() +{ + LOCATION=$1 + shift + WORK=( ${*} ) + + IP=$( listip ${LOCATION} ) + + if [[ ${#WORK[@]} > 3 ]] then - BLOCKS=$( qstat -u ${USER} | grep ${USER} | awk '{ print $6 }' ) - IPS=$( listip ${BLOCKS} ) - for IP in ${IPS} + SCRIPT=$( mktemp ) + { + echo "#!/bin/sh" + while [[ ${*} != "" ]] + do + FILE=$1 + DEST=$2 + shift 2 + echo "/bin.rd/f2cn ${FILE} ${DEST}" + done + } > ${SCRIPT} + scp ${SCRIPT} ${IP}:${SCRIPT} + ssh ${IP} ${SCRIPT} + else + while [[ ${*} != "" ]] do - echo ${IP} - done >> ip.list - else - while read T - do - BLOCKS="$BLOCKS $T" - done < ip.list + FILE=$1 + DEST=$2 + shift 2 + ssh_until_success 120 ${IP} /bin.rd/f2cn ${FILE} ${DEST} + done fi - for IP in ${BLOCKS} +} + +# Repeat command N times until success +ssh_until_success() +{ + N=$1 + shift + for (( i=0 ; i < N ; i++ )) do - ssh ${IP} /bin.rd/f2cn ${DIR}/${FILE} ${DEST}/${FILE} + ssh -o PasswordAuthentication=no ${*} + if [[ $? == 0 ]]; + then + break + fi + sleep 1 done + return 0 } local_broadcast() @@ -36,19 +87,15 @@ cp -v ${FILE} ${DEST}/${FILE} } +set -x { - declare -p PWD - set -x - - FILE=$1 - DIR=$2 - DEST=$3 - - if [[ $( uname -p ) == "ppc64" ]] - then - bgp_broadcast ${DIR} ${FILE} ${DEST} - else - bgp_local ${DIR} ${FILE} ${DEST} - fi - -} >> ${LOG} 2>&1 + declare -p PWD LOG + + if [[ $( uname -p ) == "ppc64" ]] + then + bgp_broadcast ${*} + else + bgp_local ${*} + fi + +} >> /tmp/cdm_broadcast.log 2>&1 # ${LOG} 2>&1 Modified: trunk/libexec/vdl-int.k =================================================================== --- trunk/libexec/vdl-int.k 2010-04-11 14:38:29 UTC (rev 3280) +++ trunk/libexec/vdl-int.k 2010-04-11 21:50:14 UTC (rev 3281) @@ -267,8 +267,6 @@ log(LOG:INFO, "START jobid={jobid} - Staging in files") cdmfile := cdm:file() - log(LOG:DEBUG, "cdmfile: {cdmfile}") - log(LOG:DEBUG, "swift.home: {swift.home}") libexec := "{swift.home}/libexec" if (cdmfile != "" then( @@ -292,7 +290,7 @@ size := file:size("{srcdir}/{filename}", host=srchost, provider=provider) policy := cdm:query(query=file) - log(LOG:DEBUG, "policy: {file} : {policy}") + log(LOG:DEBUG, "CDM: {file} : {policy}") doStageinFile(provider=provider, srchost=srchost, srcfile=filename, srcdir=srcdir, desthost=host, destdir=destdir, size=size, policy=policy) @@ -304,8 +302,10 @@ vdl:cacheAddAndLockFile(srcfile, destdir, desthost, size cleanupFiles(cacheFilesToRemove, desthost) - log(LOG:DEBUG, "FILE_STAGE_IN_START file={srcfile} srchost={srchost} srcdir={srcdir} srcname={srcfile} ", - "desthost={desthost} destdir={destdir} provider={provider}") + log(LOG:DEBUG, "FILE_STAGE_IN_START file={srcfile} ", + "srchost={srchost} srcdir={srcdir} srcname={srcfile} ", + "desthost={desthost} destdir={destdir} provider={provider} ", + "policy={policy}") if (policy == "DEFAULT" then( restartOnError(".*", 2 task:transfer(srcprovider=provider, srchost=srchost, srcfile=srcfile, @@ -314,9 +314,12 @@ log(LOG:DEBUG, "FILE_STAGE_IN_BROADCAST file={srcfile} policy={policy}") cdm:broadcast(srcfile=srcfile, srcdir=srcdir)) else(log(LOG:DEBUG, "FILE_STAGE_IN_SKIP file={srcfile} policy={policy}"))) - log(LOG:DEBUG, "FILE_STAGE_IN_END file={srcfile} srchost={srchost} srcdir={srcdir} srcname={srcfile} ", - "desthost={desthost} destdir={destdir} provider={provider}") - ) + log(LOG:DEBUG, "FILE_STAGE_IN_END file={srcfile} ", + "srchost={srchost} srcdir={srcdir} srcname={srcfile} ", + "desthost={desthost} destdir={destdir} provider={provider}") + ) + cdm:wait() + echo("doStageinFile: complete: {srcfile}") ) element(doStageout, [jobid, stageouts, dir, host] Modified: trunk/libexec/vdl-lib.xml =================================================================== --- trunk/libexec/vdl-lib.xml 2010-04-11 14:38:29 UTC (rev 3280) +++ trunk/libexec/vdl-lib.xml 2010-04-11 21:50:14 UTC (rev 3281) @@ -106,6 +106,7 @@ + Modified: trunk/src/org/globus/swift/data/Action.java =================================================================== --- trunk/src/org/globus/swift/data/Action.java 2010-04-11 14:38:29 UTC (rev 3280) +++ trunk/src/org/globus/swift/data/Action.java 2010-04-11 21:50:14 UTC (rev 3281) @@ -9,6 +9,9 @@ import org.globus.swift.data.policy.Broadcast; import org.globus.swift.data.policy.Policy; +/** + * Karajan-accessible CDM functions that change something. + * */ public class Action extends FunctionsCollection { public static final Arg PA_FILE = new Arg.Positional("srcfile"); @@ -16,23 +19,39 @@ static { setArguments("cdm_broadcast", new Arg[]{ PA_FILE, PA_DIR }); + setArguments("cdm_wait", new Arg[]{}); } + /** + Register a file for broadcast by CDM. + The actual broadcast is triggered by {@link cdm_wait}. + */ public void cdm_broadcast(VariableStack stack) throws ExecutionException { String srcfile = (String) PA_FILE.getValue(stack); String srcdir = (String) PA_DIR.getValue(stack); + + System.out.println("cdm_broadcast()"); Policy policy = Director.lookup(srcfile); if (!(policy instanceof Broadcast)) { - throw new RuntimeException("Attempting to BROADCAST the wrong file"); + throw new RuntimeException + ("Attempting to BROADCAST the wrong file: directory: `" + + srcdir + "' `" + srcfile + "' -> " + policy); } if (srcdir == "") { srcdir = "."; } - - Broadcast broadcast = (Broadcast) policy; - broadcast.action(srcfile, srcdir); + + Director.addBroadcast(srcdir, srcfile); } + + /** + Wait until CDM has ensured that all data has been propagated. + */ + public void cdm_wait(VariableStack stack) throws ExecutionException { + System.out.println("cdm_wait()"); + Director.doBroadcast(); + } } Modified: trunk/src/org/globus/swift/data/Director.java =================================================================== --- trunk/src/org/globus/swift/data/Director.java 2010-04-11 14:38:29 UTC (rev 3280) +++ trunk/src/org/globus/swift/data/Director.java 2010-04-11 21:50:14 UTC (rev 3281) @@ -3,19 +3,23 @@ import java.io.File; import java.io.IOException; +import java.util.ArrayList; import java.util.Arrays; import java.util.HashMap; import java.util.HashSet; import java.util.Iterator; import java.util.LinkedHashMap; +import java.util.LinkedHashSet; import java.util.List; import java.util.Map; +import java.util.Map.Entry; import java.util.Set; import java.util.regex.Matcher; import java.util.regex.Pattern; import org.apache.log4j.Logger; import org.globus.swift.data.policy.Policy; +import org.globus.swift.data.policy.Broadcast; import org.globus.swift.data.util.LineReader; import org.griphyn.vdl.karajan.Loader; @@ -28,6 +32,11 @@ private static final Logger logger = Logger.getLogger(Director.class); /** + Has a CDM policy file been provided? + */ + static boolean enabled = false; + + /** Save the location of the given CDM policy file */ static File policyFile; @@ -43,12 +52,35 @@ static Map properties = new HashMap(); /** - Remember the files we have broadcasted + Remember the files we have broadcasted. + Map from allocations to filenames. + NOTE: must be accessed only using synchronized Director methods */ - static Set broadcasted = new HashSet(); + private static Map> broadcasted = + new LinkedHashMap>(); + + /** + Set of files to be broadcasted. + NOTE: must be accessed only using synchronized Director methods + */ + private static Set broadcastWork = new LinkedHashSet(); + /** + Remember all known allocations + */ + private static List allocations = new ArrayList(); + + public static boolean isEnabled() + { + return enabled; + } + + /** + Read in the user-supplied CDM policy file. + */ public static void load(File policyFile) throws IOException { - logger.info("loading: " + policyFile); + logger.info("CDM file: " + policyFile); + enabled = true; Director.policyFile = policyFile; LineReader lines = new LineReader(); List list = lines.read(policyFile); @@ -67,7 +99,6 @@ else if (type.equals("property")) { addProperty(tokens); } - } static void addRule(String[] tokens) { @@ -91,38 +122,127 @@ } return result.toString(); } - + + /** + Obtain the CDM policy for a given file. + */ public static Policy lookup(String file) { + logger.debug("Director.lookup(): map: " + map); for (Pattern pattern : map.keySet()) { - Matcher matcher = pattern.matcher(file); - if (matcher.matches()) - return map.get(pattern); + Matcher matcher = pattern.matcher(file); + if (matcher.matches()) + return map.get(pattern); } + return Policy.DEFAULT; } - + + /** + Obtain the value of a CDM property. + */ public static String property(String name) { String result = properties.get(name); if (result == null) result = "UNSET"; return result; } + + /** + Add a file to the list of files to be broadcasted. + */ + public static synchronized void addBroadcast(String srcdir, String srcfile) { + logger.debug("addBroadcast(): " + srcdir + " " + srcfile); + String path = srcdir+"/"+srcfile; + broadcastWork.add(path); + } + + /** + Add a location to the list of allocations. + If the location is added twice, the second addition is considered to be an + empty allocation with no CDM state + */ + public static synchronized void addAllocation(String allocation) { + logger.debug("addAllocation(): " + allocation); + allocations.add(allocation); + broadcasted.put(allocation, new HashSet()); + doBroadcast(); + } + /** + Create a batch of broadcast work to do and send it to be performed. + */ + public static synchronized void doBroadcast() { + logger.debug("doBroadcast: broadcasted: " + broadcasted); + // Map from locations to files + Map> batch = getBroadcastBatch(); + if (batch.size() == 0) + return; + logger.debug("doBroadcast(): batch: " + batch); + Broadcast.perform(batch); + markBroadcasts(batch); + logger.debug("marked: " + broadcasted); + } + + /** + Obtain a map from allocations to files. + For each allocation, its corresponding files should be broadcasted to it. + Should only be called by {@link doBroadcast}. + */ + private static Map> getBroadcastBatch() { + logger.debug("getBroadcastBatch(): "); + Map> batch = new LinkedHashMap>(); + for (String file : broadcastWork) { + logger.debug("file: " + file); + logger.debug("allocations: " + allocations); + for (String allocation : allocations) { + Set files = broadcasted.get(allocation); + logger.debug("files: " + files); + if (! files.contains(file)) { + logger.debug("adding: " + file + " to: " + allocation); + List work = batch.get(allocation); + if (work == null) { + work = new ArrayList(); + batch.put(allocation, work); + } + work.add(file); + } + } + } + return batch; + } + + /** + Mark that the files in the given batch have been sucessfully broadcasted. + Should only be called by {@link doBroadcast}. + */ + private static void markBroadcasts(Map> batch) { + logger.debug("markBroadcasts: batch: " + batch); + for (Map.Entry> entry : batch.entrySet()) { + String location = entry.getKey(); + logger.debug("markBroadcasts: location: " + location); + List files = entry.getValue(); + for (String file : files) { + Set contents = broadcasted.get(location); + assert (! contents.contains(file)); + logger.debug("markBroadcasts: add: " + file); + contents.add(file); + } + } + } + + /* public static boolean broadcasted(String file, String dir) { return broadcasted.contains(dir+"/"+file); } + */ - public static void broadcast(String file, String dir) { - broadcasted.add(dir+"/"+file); - } - /** - * Check the policy effect of name with respect to policy_file - * @param args {name, policy_file} - */ + Check the policy effect of name with respect to policy_file + @param args {name, policy_file} + */ public static void main(String[] args) { if (args.length != 2) { - System.out.println("Incorrect args"); + logger.debug("Incorrect args"); System.exit(1); } @@ -131,12 +251,12 @@ String name = args[0]; File policyFile = new File(args[1]); if (! policyFile.exists()) { - System.out.println("Policy file does not exist: " + + logger.debug("Policy file does not exist: " + args[1]); } load(policyFile); Policy policy = lookup(name); - System.out.println(name + ": " + policy); + logger.debug(name + ": " + policy); } catch (Exception e) { e.printStackTrace(); System.exit(2); Modified: trunk/src/org/globus/swift/data/Query.java =================================================================== --- trunk/src/org/globus/swift/data/Query.java 2010-04-11 14:38:29 UTC (rev 3280) +++ trunk/src/org/globus/swift/data/Query.java 2010-04-11 21:50:14 UTC (rev 3281) @@ -12,6 +12,9 @@ import org.globus.swift.data.policy.Policy; +/** + Karajan-accessible read-queries to CDM functionality. +*/ public class Query extends FunctionsCollection { public static final Arg PA_QUERY = new Arg.Positional("query"); @@ -23,21 +26,29 @@ setArguments("cdm_file", new Arg[]{}); } + /** + Do CDM policy lookup based on the CDM file. + */ public String cdm_query(VariableStack stack) throws ExecutionException { String file = (String) PA_QUERY.getValue(stack); Policy policy = Director.lookup(file); + System.out.println("Director.lookup(): " + file + " -> " + policy); return policy.toString(); } /** - Get a CDM property + Get a CDM property */ public String cdm_get(VariableStack stack) throws ExecutionException { String name = (String) PA_NAME.getValue(stack); String value = Director.property(name); return value; } - + + /** + Obtain the CDM policy file given on the command-line, + conventionally "fs.data". If not set, returns an empty String. + */ public String cdm_file(VariableStack stack) throws ExecutionException { String file = ""; if (Director.policyFile != null) Modified: trunk/src/org/globus/swift/data/policy/AllocationHook.java =================================================================== --- trunk/src/org/globus/swift/data/policy/AllocationHook.java 2010-04-11 14:38:29 UTC (rev 3280) +++ trunk/src/org/globus/swift/data/policy/AllocationHook.java 2010-04-11 21:50:14 UTC (rev 3281) @@ -5,10 +5,22 @@ import org.globus.cog.abstraction.impl.common.StatusEvent; import org.globus.cog.abstraction.coaster.service.job.manager.Hook; +import org.globus.swift.data.Director; + +/** + * Re-apply CDM policies when we obtain a new allocation from Coasters. + * */ public class AllocationHook extends Hook { public void blockActive(StatusEvent e) { + if (!Director.isEnabled()) + return; + System.out.println("blockActive: " + e.getStatus().getMessage()); + String msg = e.getStatus().getMessage(); + String[] tokens = msg.split("="); + String allocation = tokens[1]; + Director.addAllocation(allocation); } } Modified: trunk/src/org/globus/swift/data/policy/Broadcast.java =================================================================== --- trunk/src/org/globus/swift/data/policy/Broadcast.java 2010-04-11 14:38:29 UTC (rev 3280) +++ trunk/src/org/globus/swift/data/policy/Broadcast.java 2010-04-11 21:50:14 UTC (rev 3281) @@ -1,6 +1,11 @@ package org.globus.swift.data.policy; +import java.util.ArrayList; +import java.util.Arrays; import java.util.List; +import java.util.Map; +import java.util.Map; +import java.util.Map.Entry; import org.globus.swift.data.Director; @@ -17,32 +22,66 @@ throw new RuntimeException("Incorrect settings for BROADCAST"); } } - - public void action(String srcfile, String srcdir) { - if (! Director.broadcasted(srcfile, srcdir)) - callScript(srcfile, srcdir, destination); - } - - void callScript(String srcfile, String srcdir, String destination) { - String home = System.getProperties().getProperty("swift.home"); + + /** + Call the external script to perform the broadcast for this batch. + */ + public static void perform(Map> batch) { + String[] line = commandLine(batch); + System.out.println("Broadcast.perform(): " + Arrays.toString(line)); + Process process = null; try { - String[] line = new String[4]; - line[0] = home+"/libexec/cdm_broadcast.sh"; - line[1] = srcfile; - line[2] = srcdir; - line[3] = destination; - Process process = Runtime.getRuntime().exec(line); + process = Runtime.getRuntime().exec(line); process.waitFor(); } catch (Exception e) { e.printStackTrace(); throw new RuntimeException("Could not launch external broadcast"); } - } - + int code = process.exitValue(); + if (code != 0) + throw new RuntimeException("External broadcast failed!"); + } + + /** + Generate the command line for the external broadcast script. + */ + static String[] commandLine(Map> batch) { + String home = System.getProperties().getProperty("swift.home"); + List line = new ArrayList(); + line.add(home+"/libexec/cdm_broadcast.sh"); + for (Map.Entry> entry : batch.entrySet()) { + line.add("-l"); + String location = entry.getKey(); + List files = entry.getValue(); + line.add(location); + for (String file : files) { + line.add(file); + line.add(getDestination(file)+"/"+file); + } + } + String[] result = new String[line.size()]; + line.toArray(result); + return result; + } + + /** + Return the remote destination directory for this policy. + */ public String getDestination() { return destination; } + + /** + Return the remote destination directory for this broadcasted file. + */ + public static String getDestination(String file) { + String result = null; + Policy policy = Director.lookup(file); + Broadcast broadcast = (Broadcast) policy; + result = broadcast.getDestination(); + return result; + } public String toString() { return "BROADCAST"; From noreply at svn.ci.uchicago.edu Mon Apr 12 09:55:42 2010 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Mon, 12 Apr 2010 09:55:42 -0500 (CDT) Subject: [Swift-commit] r3282 - in trunk: etc libexec src/org/globus/swift/data Message-ID: <20100412145542.5C1AA9CC86@vm-125-59.ci.uchicago.edu> Author: wozniak Date: 2010-04-12 09:55:42 -0500 (Mon, 12 Apr 2010) New Revision: 3282 Modified: trunk/etc/log4j.properties trunk/libexec/vdl-int.k trunk/src/org/globus/swift/data/Action.java trunk/src/org/globus/swift/data/Query.java Log: Clean up debugging output. Modified: trunk/etc/log4j.properties =================================================================== --- trunk/etc/log4j.properties 2010-04-11 21:50:14 UTC (rev 3281) +++ trunk/etc/log4j.properties 2010-04-12 14:55:42 UTC (rev 3282) @@ -28,5 +28,5 @@ log4j.logger.org.griphyn.vdl.engine.Karajan=INFO log4j.logger.org.globus.cog.abstraction.coaster.rlog=INFO -log4j.logger.org.globus.swift.data.Director=DEBUG +# log4j.logger.org.globus.swift.data.Director=DEBUG log4j.logger.org.griphyn.vdl.karajan.lib=INFO Modified: trunk/libexec/vdl-int.k =================================================================== --- trunk/libexec/vdl-int.k 2010-04-11 21:50:14 UTC (rev 3281) +++ trunk/libexec/vdl-int.k 2010-04-12 14:55:42 UTC (rev 3282) @@ -319,7 +319,6 @@ "desthost={desthost} destdir={destdir} provider={provider}") ) cdm:wait() - echo("doStageinFile: complete: {srcfile}") ) element(doStageout, [jobid, stageouts, dir, host] Modified: trunk/src/org/globus/swift/data/Action.java =================================================================== --- trunk/src/org/globus/swift/data/Action.java 2010-04-11 21:50:14 UTC (rev 3281) +++ trunk/src/org/globus/swift/data/Action.java 2010-04-12 14:55:42 UTC (rev 3282) @@ -2,6 +2,8 @@ import java.io.IOException; +import org.apache.log4j.Logger; + import org.globus.cog.karajan.arguments.Arg; import org.globus.cog.karajan.stack.VariableStack; import org.globus.cog.karajan.workflow.ExecutionException; @@ -13,7 +15,8 @@ * Karajan-accessible CDM functions that change something. * */ public class Action extends FunctionsCollection { - + private static final Logger logger = Logger.getLogger(Action.class); + public static final Arg PA_FILE = new Arg.Positional("srcfile"); public static final Arg PA_DIR = new Arg.Positional("srcdir"); @@ -30,7 +33,7 @@ String srcfile = (String) PA_FILE.getValue(stack); String srcdir = (String) PA_DIR.getValue(stack); - System.out.println("cdm_broadcast()"); + logger.debug("cdm_broadcast()"); Policy policy = Director.lookup(srcfile); @@ -51,7 +54,7 @@ Wait until CDM has ensured that all data has been propagated. */ public void cdm_wait(VariableStack stack) throws ExecutionException { - System.out.println("cdm_wait()"); + logger.debug("cdm_wait()"); Director.doBroadcast(); } } Modified: trunk/src/org/globus/swift/data/Query.java =================================================================== --- trunk/src/org/globus/swift/data/Query.java 2010-04-11 21:50:14 UTC (rev 3281) +++ trunk/src/org/globus/swift/data/Query.java 2010-04-12 14:55:42 UTC (rev 3282) @@ -4,6 +4,8 @@ import java.io.File; import java.io.FileReader; +import org.apache.log4j.Logger; + import org.globus.cog.karajan.arguments.Arg; import org.globus.cog.karajan.stack.VariableStack; import org.globus.cog.karajan.util.TypeUtil; @@ -16,7 +18,8 @@ Karajan-accessible read-queries to CDM functionality. */ public class Query extends FunctionsCollection { - + private static final Logger logger = Logger.getLogger(Query.class); + public static final Arg PA_QUERY = new Arg.Positional("query"); public static final Arg PA_NAME = new Arg.Positional("name"); @@ -32,7 +35,7 @@ public String cdm_query(VariableStack stack) throws ExecutionException { String file = (String) PA_QUERY.getValue(stack); Policy policy = Director.lookup(file); - System.out.println("Director.lookup(): " + file + " -> " + policy); + logger.debug("Director.lookup(): " + file + " -> " + policy); return policy.toString(); } From noreply at svn.ci.uchicago.edu Mon Apr 12 19:09:50 2010 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Mon, 12 Apr 2010 19:09:50 -0500 (CDT) Subject: [Swift-commit] r3283 - in branches/1.0/src/org/griphyn/vdl: karajan/lib mapping Message-ID: <20100413000950.D96C89CC86@vm-125-59.ci.uchicago.edu> Author: hategan Date: 2010-04-12 19:09:50 -0500 (Mon, 12 Apr 2010) New Revision: 3283 Modified: branches/1.0/src/org/griphyn/vdl/karajan/lib/CloseDataset.java branches/1.0/src/org/griphyn/vdl/karajan/lib/SetFieldValue.java branches/1.0/src/org/griphyn/vdl/karajan/lib/VDLFunction.java branches/1.0/src/org/griphyn/vdl/mapping/AbstractDataNode.java branches/1.0/src/org/griphyn/vdl/mapping/ArrayDataNode.java branches/1.0/src/org/griphyn/vdl/mapping/RootDataNode.java Log: fixed problems with returning fixed arrays of structures Modified: branches/1.0/src/org/griphyn/vdl/karajan/lib/CloseDataset.java =================================================================== --- branches/1.0/src/org/griphyn/vdl/karajan/lib/CloseDataset.java 2010-04-12 14:55:42 UTC (rev 3282) +++ branches/1.0/src/org/griphyn/vdl/karajan/lib/CloseDataset.java 2010-04-13 00:09:50 UTC (rev 3283) @@ -6,6 +6,7 @@ import org.apache.log4j.Logger; import org.globus.cog.karajan.arguments.Arg; import org.globus.cog.karajan.stack.VariableStack; +import org.globus.cog.karajan.util.TypeUtil; import org.globus.cog.karajan.workflow.ExecutionException; import org.griphyn.vdl.mapping.DSHandle; import org.griphyn.vdl.mapping.InvalidPathException; @@ -13,12 +14,13 @@ public class CloseDataset extends VDLFunction { public static final Logger logger = Logger.getLogger(CloseDataset.class); + + public static final Arg OA_CHILDREN_ONLY = new Arg.Optional("childrenOnly", Boolean.FALSE); static { - setArguments(CloseDataset.class, new Arg[] { PA_VAR, OA_PATH }); + setArguments(CloseDataset.class, new Arg[] { PA_VAR, OA_PATH, OA_CHILDREN_ONLY }); } - // TODO path is not used! public Object function(VariableStack stack) throws ExecutionException { Path path = parsePath(OA_PATH.getValue(stack), stack); DSHandle var = (DSHandle) PA_VAR.getValue(stack); @@ -27,7 +29,13 @@ logger.info("Closing " + var); } var = var.getField(path); - closeChildren(stack, var); + + if (TypeUtil.toBoolean(OA_CHILDREN_ONLY.getValue(stack))) { + closeChildren(stack, var); + } + else { + closeDeep(stack, var); + } } catch (InvalidPathException e) { throw new ExecutionException(e); Modified: branches/1.0/src/org/griphyn/vdl/karajan/lib/SetFieldValue.java =================================================================== --- branches/1.0/src/org/griphyn/vdl/karajan/lib/SetFieldValue.java 2010-04-12 14:55:42 UTC (rev 3282) +++ branches/1.0/src/org/griphyn/vdl/karajan/lib/SetFieldValue.java 2010-04-13 00:09:50 UTC (rev 3283) @@ -29,7 +29,7 @@ public Object function(VariableStack stack) throws ExecutionException { DSHandle var = (DSHandle) PA_VAR.getValue(stack); try { - Path path = parsePath(OA_PATH.getValue(stack), stack); + Path path = parsePath(OA_PATH.getValue(stack), stack); DSHandle leaf = var.getField(path); DSHandle value = (DSHandle) PA_VALUE.getValue(stack); if (logger.isInfoEnabled()) { @@ -47,6 +47,9 @@ } deepCopy(leaf, value, stack); } + if (var.getParent() != null && var.getParent().getType().isArray()) { + markAsAvailable(stack, leaf.getParent(), leaf.getPathFromRoot().getLast()); + } } return null; } Modified: branches/1.0/src/org/griphyn/vdl/karajan/lib/VDLFunction.java =================================================================== --- branches/1.0/src/org/griphyn/vdl/karajan/lib/VDLFunction.java 2010-04-12 14:55:42 UTC (rev 3282) +++ branches/1.0/src/org/griphyn/vdl/karajan/lib/VDLFunction.java 2010-04-13 00:09:50 UTC (rev 3283) @@ -438,8 +438,31 @@ markToRoot(stack, handle); } } + + protected void closeDeep(VariableStack stack, DSHandle handle) + throws ExecutionException, InvalidPathException { + synchronized(handle.getRoot()) { + closeDeep(stack, handle, getFutureWrapperMap(stack)); + } + } - private void markToRoot(VariableStack stack, DSHandle handle) throws ExecutionException { + private void closeDeep(VariableStack stack, DSHandle handle, + WrapperMap hash) throws InvalidPathException, ExecutionException { + handle.closeShallow(); + hash.close(handle); + try { + // Mark all leaves + Iterator it = handle.getFields(Path.CHILDREN).iterator(); + while (it.hasNext()) { + closeDeep(stack, (DSHandle) it.next(), hash); + } + } + catch (HandleOpenException e) { + throw new ExecutionException("Handle open in closeChildren",e); + } + } + + private void markToRoot(VariableStack stack, DSHandle handle) throws ExecutionException { // Also mark all arrays from root Path fullPath = handle.getPathFromRoot(); DSHandle root = handle.getRoot(); Modified: branches/1.0/src/org/griphyn/vdl/mapping/AbstractDataNode.java =================================================================== --- branches/1.0/src/org/griphyn/vdl/mapping/AbstractDataNode.java 2010-04-12 14:55:42 UTC (rev 3282) +++ branches/1.0/src/org/griphyn/vdl/mapping/AbstractDataNode.java 2010-04-13 00:09:50 UTC (rev 3283) @@ -504,6 +504,26 @@ public boolean isClosed() { return closed; } + + /** + * Recursively closes arrays through a tree of arrays and complex types. + */ + public void closeArraySizes() { + if (!this.closed && this.getType().isArray()) { + closeShallow(); + } + synchronized (handles) { + Iterator i = handles.entrySet().iterator(); + while (i.hasNext()) { + Map.Entry e = (Map.Entry) i.next(); + AbstractDataNode child = (AbstractDataNode) e.getValue(); + if (child.getType().isArray() || + child.getType().getFields().size() > 0) { + child.closeArraySizes(); + } + } + } + } public void closeDeep() { if (!this.closed) { @@ -519,25 +539,6 @@ } } - /** Recursively closes arrays through a tree of arrays and complex - types. */ - public void closeDeepStructure() { - if (!this.closed && this.getType().isArray()) { - closeShallow(); - } - synchronized (handles) { - Iterator i = handles.entrySet().iterator(); - while (i.hasNext()) { - Map.Entry e = (Map.Entry) i.next(); - AbstractDataNode child = (AbstractDataNode) e.getValue(); - if(child.getType().isArray() || - child.getType().getFields().size() > 0 ) { - child.closeDeepStructure(); - } - } - } - } - public synchronized Path getPathFromRoot() { if (pathFromRoot == null) { AbstractDataNode parent = (AbstractDataNode) this.getParent(); Modified: branches/1.0/src/org/griphyn/vdl/mapping/ArrayDataNode.java =================================================================== --- branches/1.0/src/org/griphyn/vdl/mapping/ArrayDataNode.java 2010-04-12 14:55:42 UTC (rev 3282) +++ branches/1.0/src/org/griphyn/vdl/mapping/ArrayDataNode.java 2010-04-13 00:09:50 UTC (rev 3283) @@ -42,6 +42,24 @@ } } } + + /** Recursively closes arrays through a tree of arrays and complex + types. */ + public void closeDeep() { + assert(this.getType().isArray()); + if (!this.isClosed()) { + closeShallow(); + } + Map handles = getHandles(); + synchronized (handles) { + Iterator i = handles.entrySet().iterator(); + while (i.hasNext()) { + Map.Entry e = (Map.Entry) i.next(); + AbstractDataNode child = (AbstractDataNode) e.getValue(); + child.closeDeep(); + } + } + } public boolean isArray() { Modified: branches/1.0/src/org/griphyn/vdl/mapping/RootDataNode.java =================================================================== --- branches/1.0/src/org/griphyn/vdl/mapping/RootDataNode.java 2010-04-12 14:55:42 UTC (rev 3282) +++ branches/1.0/src/org/griphyn/vdl/mapping/RootDataNode.java 2010-04-13 00:09:50 UTC (rev 3283) @@ -113,6 +113,8 @@ checkConsistency(root); } else if (mapper.isStatic()) { + // Static mappers are (array) mappers which know the size of + // an array statically. A good example is the fixed array mapper Iterator i = mapper.existing().iterator(); while (i.hasNext()) { Path p = (Path) i.next(); @@ -129,7 +131,7 @@ } } if (root.isArray()) { - root.closeDeepStructure(); + root.closeArraySizes(); } checkConsistency(root); } From noreply at svn.ci.uchicago.edu Mon Apr 12 23:00:17 2010 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Mon, 12 Apr 2010 23:00:17 -0500 (CDT) Subject: [Swift-commit] r3284 - trunk/src/org/griphyn/vdl/karajan Message-ID: <20100413040017.3E3D29CC97@vm-125-59.ci.uchicago.edu> Author: hategan Date: 2010-04-12 23:00:16 -0500 (Mon, 12 Apr 2010) New Revision: 3284 Modified: trunk/src/org/griphyn/vdl/karajan/VDL2ExecutionContext.java Log: Log source element on error Modified: trunk/src/org/griphyn/vdl/karajan/VDL2ExecutionContext.java =================================================================== --- trunk/src/org/griphyn/vdl/karajan/VDL2ExecutionContext.java 2010-04-13 00:09:50 UTC (rev 3283) +++ trunk/src/org/griphyn/vdl/karajan/VDL2ExecutionContext.java 2010-04-13 04:00:16 UTC (rev 3284) @@ -26,7 +26,7 @@ protected void printFailure(FailureNotificationEvent e) { if (logger.isDebugEnabled()) { - logger.debug(e.getMessage(), e.getException()); + logger.debug(e.getFlowElement() + ": " + e.getMessage(), e.getException()); } String msg = e.getMessage(); if (!"Execution completed with errors".equals(msg)) { From noreply at svn.ci.uchicago.edu Tue Apr 20 09:57:59 2010 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Tue, 20 Apr 2010 09:57:59 -0500 (CDT) Subject: [Swift-commit] r3287 - text/swift_pc3_fgcs Message-ID: <20100420145759.F02E69CCA8@vm-125-59.ci.uchicago.edu> Author: lgadelha Date: 2010-04-20 09:57:59 -0500 (Tue, 20 Apr 2010) New Revision: 3287 Modified: text/swift_pc3_fgcs/swift_pc3_fgcs.tex Log: Modified: text/swift_pc3_fgcs/swift_pc3_fgcs.tex =================================================================== --- text/swift_pc3_fgcs/swift_pc3_fgcs.tex 2010-04-20 05:09:24 UTC (rev 3286) +++ text/swift_pc3_fgcs/swift_pc3_fgcs.tex 2010-04-20 14:57:59 UTC (rev 3287) @@ -68,15 +68,15 @@ \title{Provenance Management in Swift} \author{Ben Clifford} -%\ead{benc at hawaga.org.uk} +\ead{benc at hawaga.org.uk} \author[coppe]{Luiz Gadelha Jr.} \ead{gadelha at cos.ufrj.br} \author[coppe]{Marta Mattoso} -%\ead{marta at cos.ufrj.br} +\ead{marta at cos.ufrj.br} \author[uc,anl]{Michael Wilde} -%\ead{wilde at mcs.anl.gov} +\ead{wilde at mcs.anl.gov} \author[uc,anl]{Ian Foster} -%\ead{foster at mcs.anl.gov} +\ead{foster at mcs.anl.gov} %\address[no]{No affiliation} \address[coppe]{PESC/COPPE, Federal University of Rio de Janeiro, Brazil} \address[uc]{Computation Institute, University of Chicago, USA} @@ -214,14 +214,11 @@ In our initial attempts to implement LoadWorkflow, we found the use of the parallel {\tt foreach} loop problematic because the database routines executed by the external application procedures are opaque to Swift. Due to dependencies between iterations of the loop, these routines were being incorrectly executed in parallel. It was necessary to serialize the loop execution to keep the database consistent. For the same reason, since most of the PC3 queries are for row-level database provenance, we had to implement a workaround for gathering this provenance by modifying the application database so that for every row inserted, an entry containing the execution identifier of the Swift process that performed this insertion is recorded on a separate annotation table. A detailed description of the LoadWorkflow implementation in SwiftScript, and the SQL queries to the provenance database can be found in \cite{ClGaMa09}. Core query 1, for instance, consists of determining, for a given application database row, which CSV files contributed to it. The strategy used to answer this query is to determine input CSV files that precede, in the transitivity table, the process that inserted the row. This query can be answered by first obtaining the identifier of the Swift process that inserted the row from the annotations included in the application database. Then, we query for file names of datasets that contain CSV inputs in the set of predecessors of the process that inserted the row. -%Core query 2 asks if the range check workflow component (IsMatchColumnRanges) was performed in a particular table of the application database, given that a user found values that were not expected in it. This is implemented by querying for input parameters for all IsMatchColumnRanges calls. These are XML values, and it is necessary to examine the resulting XML to determine if it was invoked for the specific table. There is unpleasant cross-format joining necessary here to get an actual yes/no result properly, although we could use a {\tt LIKE} clause to examine the value. -%{\em Core Query 3}. The query asks which operation executions were strictly necessary for an application database table (Image) to contain a particular (non-computed) value. This uses the additional annotations made, that only store which process originally inserted a row, not which processes have modified a row. So to some extent, rows are regarded a bit like artifacts (though not first order artifacts in the provenance database); and we can only answer questions about the provenance of rows, not the individual fields within those rows. That is sufficient for this query, though. First find the row that contains the interesting value and extract its identifier ({\tt IMAGEID}). Then find the process that created the row by querying the annotations. This gives the process identifier for the process that created the row. Now query the transitive closure table for all predecessors for that process. This will produce all processes and artifacts that preceded this row creation. - The OPM output for a LoadWorkflow run in Swift was generated by a script that maps Swift's provenance data model to OPM's XML schema. Since OPM and Swift's provenance database use similar data models, it is fairly straightforward to build a tool to import data from an OPM graph into the Swift provenance database. However we observed that the OPM outputs from the various participating teams, including Swift, carry many details of the LoadWorkflow implementation that are system specific, such as auxiliary tasks that are not necessarily related to the workflow. To answer the same queries, it would be necessary to perform some manual interpretation of the imported OPM graph in order to identify the relevant processes and artifacts. -A number of other forms were briefly experimented with during development. The two most developed and interesting models were XML and Prolog. XML provides a semi-structured tree form for data. A benefit of this approach is that new data can be added to the database without needing an explicit schema to be known to the database. In addition, when used with a query language such as XPath, certain transitive queries become straightforward with the use of the {\tt //} operator of XPath. Representing the data as Prolog tuples is a different representation than a traditional database, but provides a query interface that can express interesting queries flexibly. +Swift's provenance data model is not dependent on a particular database system. A number of other forms were briefly experimented with during development. The two most developed and interesting models were XML and Prolog. XML provides a semi-structured tree form for data. A benefit of this approach is that new data can be added to the database without needing an explicit schema to be known to the database. In addition, when used with a query language such as XPath, certain transitive queries become straightforward with the use of the {\tt //} operator of XPath. Representing the data as Prolog tuples is a different representation than a traditional database, but provides a query interface that can express interesting queries flexibly. PC3 provided an opportunity to use OPM in practice. This also enabled us to evaluate OPM and compare it to Swift's provenance data model. OPM originally did not specify a naming mechanism for globally identifying artifacts outside of an OPM graph. In Swift, dataset handles are given an URI, now OPM has an annotation for this purpose \cite{opm1.1}. @@ -236,7 +233,7 @@ int c[] = [a, b]; \end{lstlisting} -The Swift entry made a minor proposal \cite{pc} to change the XML schema to better reflect the perceived intentions of the OPM authors. It was apparent that the present representation of hierarchical processes in OPM is insufficiently rich for some groups and that it would be useful to represent hierarchy of individual processes and their containing processes more directly. In Swift this is given by two categories: at the highest level, SwiftScript language constructs, such as procedures and functions; below that, the mechanics of Swift's execution, such as moving files to and from computational resources, and interactions with job execution. Swift provenance work so far has concentrated in the high-level representation, treating all of the low-level behavior as opaque and exposing neither processes nor artifacts. An OPM modification proposal for this is forthcoming. In Swift, this information is often available through the Karajan \cite{karajan} thread identifier which clo sely maps to the Swift process execution hierarchy: a Swift process contains another Swift process if its Karajan thread identifier is a prefix of the second processes Karajan thread identifier. The Swift provenance database stores values of dataset handles when those values exist in-memory (for example, when a dataset handle represents and integer or a string). There was some desire in the PC3 workshop for a standard way to represent this. +The Swift entry made a minor proposal \cite{pc} to change the XML schema to better reflect the perceived intentions of the OPM authors. It was apparent that the present representation of hierarchical processes in OPM is insufficiently rich for some groups and that it would be useful to represent hierarchy of individual processes and their containing processes more directly. In Swift this is given by two categories: at the highest level, SwiftScript language constructs, such as procedures and functions; below that, the mechanics of Swift's execution, such as moving files to and from computational resources, and interactions with job execution. Swift provenance work so far has concentrated in the high-level representation, treating all of the low-level behavior as opaque and exposing neither processes nor artifacts. An OPM modification proposal for this is forthcoming. In Swift, this information is often available through the Karajan \cite{karajan} execution engine thread ide ntifier which closely maps to the Swift process execution hierarchy: a Swift process contains another Swift process if its Karajan thread identifier is a prefix of the second processes Karajan thread identifier. The Swift provenance database stores values of dataset handles when those values exist in-memory (for example, when a dataset handle represents and integer or a string). There was some desire in the PC3 workshop for a standard way to represent this. \section{Related Work} @@ -255,10 +252,124 @@ {\em Provenance query system}. It was clear from PC3 that although it is possible to express the provenance queries in SQL it is not always practical to do so, due to its poor transitivity support. One future objective is to make the provenance query system, which should include a specialized provenance query language, capable of being readily queried by scientists to let them do better science through validation, collaboration, and discovery. -%\footnotesize \bibliographystyle{plain} -\bibliography{ref} +\begin{thebibliography}{10} +\bibitem{pc} +{Provenance Challenge Wiki}. +\newblock http://twiki.ipaw.info, 2009. + +\bibitem{karma} +B.~Cao, B.~Plale, G.~Subramanian, E.~Robertson, and Y.~Simmhan. +\newblock {Provenance Information Model of Karma Version 3}. +\newblock In {\em Proc. IEEE Congress on Services}, pages 348--351, 2009. + +\bibitem{ClFo08} +B.~Clifford, I.~Foster, J.~Voeckler, M.~Wilde, and Y.~Zhao. +\newblock Tracking provenance in a virtual data grid. +\newblock {\em Concurrency and Computation: Practice and Experience}, + 20(5):565--575, 2008. + +\bibitem{ClGaMa09} +B.~Clifford, L.~Gadelha, M.~Mattoso, M.~Wilde, and I.~Foster. +\newblock {Tracking Provenance in Swift}. +\newblock Technical Report ANL/MCS-P1703-1209, Argonne National Laboratory, + 2009. + +\bibitem{CrCa09} +S.~da~Cruz, M.~Campos, and M.~Mattoso. +\newblock {Towards a Taxonomy of Provenance in Scientific Workflow Management + Systems}. +\newblock In {\em Proc. IEEE Congress on Services, Part I, (SERVICES I 2009)}, + pages 259--266, 2009. + +\bibitem{DeGa09} +E.~Deelman, D.~Gannon, M.~Shields, and I.~Taylor. +\newblock {Workflows in e-Science: An overview of workflow system features and + capabilities}. +\newblock {\em Future Generation Computer Systems}, 25(5):528--540, 2009. + +\bibitem{SQLTRANS} +G.~Dong, L.~Libkin, J.~Su, and L.~Wong. +\newblock {Maintaining Transitive Closure of Graphs in SQL}. +\newblock {\em Intl. Journal of Information Technology}, 5, 1999. + +\bibitem{chimera} +I.~Foster, J.~Vockler, M.~Wilde, and Y.~Zhao. +\newblock {Chimera: A Virtual Data System for Representing, Querying and + Automating Data Derivation}. +\newblock In {\em Proc. 14th International Conference on Scientific and + Statistical Database Management (SSDBM'02)}, pages 37--46, 2002. + +\bibitem{FrSi06} +J.~Freire, C.~Silva, S.~Callahan, E.~Santos, C.~Scheidegger, and H.~Vo. +\newblock {Managing Rapidly-Evolving Scientific Workflows}. +\newblock In {\em International Provenance and Annotation Workshop (IPAW + 2006)}, volume 4145 of {\em LNCS}, pages 10--18, 2006. + +\bibitem{OPMcollections} +P.~Groth, S.~Miles, P.~Missier, and L.~Moreau. +\newblock {A Proposal for Handling Collections in the Open Provenance Model}. +\newblock + http://mailman.ecs.soton.ac.uk/pipermail/provenance-challenge-ipaw-info/2009% +-June/000120.html, 2009. + +\bibitem{karajan} +G.~Laszewski, M.~Hategan, and D.~Kodeboyina. +\newblock {Java CoG Kit Workflow}. +\newblock In I.~Taylor, E.~Deelman, D.~Gannon, and M.~Shields, editors, {\em + Workflows for e-Science}, pages 340--356. Springer, 2007. + +\bibitem{opm1.1} +L.~Moreau, B.~Clifford, J.~Freire, Y.~Gil, P.~Groth, J.~Futrelle, + N.~Kwasnikowska, S.~Miles, P.~Missier, J.~Myers, Y.~Simmhan, E.~Stephan, and + J.~Van den Bussche. +\newblock {The Open Provenance Model - Core Specification (v1.1)}. +\newblock {\em Future Generation Computer Systems}, 2009 (Submitted). + +\bibitem{xdtm} +L.~Moreau, Y.~Zhao, I.~Foster, J.~Voeckler, and M.~Wilde. +\newblock {XDTM: XML Dataset Typing and Mapping for Specifying Datasets}. +\newblock European Grid Conference (EGC 2005), 2005. + +\bibitem{tupelo} +J.~Myers, J.~Futrelle, J.~Plutchak, P.~Bajcsy, J.~Kastner, L.~Marini, + R.~Kooper, R.~McGrath, T.~McLaren, A.~Rodr\'{\i}guez, and Y.~Liu. +\newblock {Embedding Data within Knowledge Spaces}. +\newblock {\em CoRR}, abs/0902.0744, 2009. + +\bibitem{falkon} +I.~Raicu, Y.~Zhao, C.~Dumitrescu, I.~Foster, and M.~Wilde. +\newblock {Falkon: A Fast and Lightweight Task Execution Framework}. +\newblock In {\em Proc. ACM/IEEE Conference on High Performance Networking and + Computing (Supercomputing 2007)}, 2007. + +\bibitem{SiPlGa05} +Y.~Simmhan, B.~Plale, and D.~Gannon. +\newblock {A Survey of Data Provenance in e-Science}. +\newblock {\em SIGMOD Record}, 34(3):31--36, 2005. + +\bibitem{WiFo09} +M.~Wilde, I.~Foster, K.~Iskra, P.~Beckman, A.~Espinosa, M.~Hategan, + B.~Clifford, and I.~Raicu. +\newblock {Parallel Scripting for Applications at the Petascale and Beyond}. +\newblock {\em IEEE Computer}, 42(11):50--60, November 2009. + +\bibitem{swift} +Y.~Zhao, M.~Hategan, B.~Clifford, I.~Foster, G.~Laszewski, I.~Raicu, + T.~Stef-Praun, and M.~Wilde. +\newblock {Swift: Fast, Reliable, Loosely Coupled Parallel Computation}. +\newblock In {\em Proc. 1st IEEE International Workshop on Scientific Workflows + (SWF 2007)}, pages 199--206, 2007. + +\bibitem{ZhWiFo06} +Y.~Zhao, M.~Wilde, and I.~Foster. +\newblock {Applying the Virtual Data Provenance Model}. +\newblock In {\em International Provenance and Annotation Workshop (IPAW + 2006)}, volume 4145 of {\em LNCS}, pages 148--161. Springer, 2006. + +\end{thebibliography} + \end{document} \endinput From noreply at svn.ci.uchicago.edu Wed Apr 21 10:11:51 2010 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Wed, 21 Apr 2010 10:11:51 -0500 (CDT) Subject: [Swift-commit] r3288 - trunk/src/org/globus/swift/data Message-ID: <20100421151151.A74E39CC7E@vm-125-59.ci.uchicago.edu> Author: wozniak Date: 2010-04-21 10:11:51 -0500 (Wed, 21 Apr 2010) New Revision: 3288 Modified: trunk/src/org/globus/swift/data/Director.java Log: More docs. Modified: trunk/src/org/globus/swift/data/Director.java =================================================================== --- trunk/src/org/globus/swift/data/Director.java 2010-04-20 14:57:59 UTC (rev 3287) +++ trunk/src/org/globus/swift/data/Director.java 2010-04-21 15:11:51 UTC (rev 3288) @@ -35,7 +35,7 @@ Has a CDM policy file been provided? */ static boolean enabled = false; - + /** Save the location of the given CDM policy file */ @@ -45,13 +45,13 @@ Maps from Patterns to Policies for fs.data rules */ static Map map = new LinkedHashMap(); - + /** Maps from String names to String values for fs.data properties */ static Map properties = new HashMap(); - /** + /** Remember the files we have broadcasted. Map from allocations to filenames. NOTE: must be accessed only using synchronized Director methods @@ -64,7 +64,7 @@ NOTE: must be accessed only using synchronized Director methods */ private static Set broadcastWork = new LinkedHashSet(); - + /** Remember all known allocations */ @@ -74,7 +74,7 @@ { return enabled; } - + /** Read in the user-supplied CDM policy file. */ @@ -90,34 +90,52 @@ } } + /** + A line is either a rule or a property. + */ static void addLine(String s) { String[] tokens = LineReader.tokenize(s); String type = tokens[0]; - if (type.equals("rule")) { + if (type.equals("rule")) { addRule(tokens); } - else if (type.equals("property")) { + else if (type.equals("property")) { addProperty(tokens); } } - static void addRule(String[] tokens) { + /** + A rule has a pattern, a policy token, and extra arguments. +
+ E.g.: rule .*.txt BROADCAST arg arg arg + */ + static void addRule(String[] tokens) { Pattern pattern = Pattern.compile(tokens[1]); Policy policy = Policy.valueOf(tokens[2]); - List tokenList = Arrays.asList(tokens); + List tokenList = Arrays.asList(tokens); policy.settings(tokenList.subList(3,tokenList.size())); map.put(pattern, policy); } - - static void addProperty(String[] tokens) { + + /** + A property has a name and a value. + Properties can be overwritten. +
+ E.g.: property X 3 + */ + static void addProperty(String[] tokens) { String name = tokens[1]; - String value = concat(tokens, 2); + String value = concat(tokens, 2); properties.put(name, value); } - + + /** + Utility to concatenate all strings from array {@link tokens} + starting at index {@link start}. + */ static String concat(String[] tokens, int start) { StringBuilder result = new StringBuilder(); - for (int i = start; i < tokens.length; i++) { + for (int i = start; i < tokens.length; i++) { result.append(tokens[i]); } return result.toString(); @@ -133,7 +151,7 @@ if (matcher.matches()) return map.get(pattern); } - + return Policy.DEFAULT; } @@ -142,7 +160,7 @@ */ public static String property(String name) { String result = properties.get(name); - if (result == null) + if (result == null) result = "UNSET"; return result; } @@ -158,8 +176,8 @@ /** Add a location to the list of allocations. - If the location is added twice, the second addition is considered to be an - empty allocation with no CDM state + If the location is added twice, the second addition + is considered to be an empty allocation with no CDM state. */ public static synchronized void addAllocation(String allocation) { logger.debug("addAllocation(): " + allocation); @@ -167,7 +185,7 @@ broadcasted.put(allocation, new HashSet()); doBroadcast(); } - + /** Create a batch of broadcast work to do and send it to be performed. */ @@ -185,7 +203,8 @@ /** Obtain a map from allocations to files. - For each allocation, its corresponding files should be broadcasted to it. + For each allocation, its corresponding files should + be broadcasted to it. Should only be called by {@link doBroadcast}. */ private static Map> getBroadcastBatch() { @@ -226,7 +245,7 @@ assert (! contents.contains(file)); logger.debug("markBroadcasts: add: " + file); contents.add(file); - } + } } } @@ -235,29 +254,29 @@ return broadcasted.contains(dir+"/"+file); } */ - - /** + + /** Check the policy effect of name with respect to policy_file - @param args {name, policy_file} + @param args {name, policy_file} */ public static void main(String[] args) { if (args.length != 2) { logger.debug("Incorrect args"); System.exit(1); } - + try { - - String name = args[0]; + + String name = args[0]; File policyFile = new File(args[1]); if (! policyFile.exists()) { - logger.debug("Policy file does not exist: " + + logger.debug("Policy file does not exist: " + args[1]); } load(policyFile); Policy policy = lookup(name); logger.debug(name + ": " + policy); - } catch (Exception e) { + } catch (Exception e) { e.printStackTrace(); System.exit(2); } From noreply at svn.ci.uchicago.edu Thu Apr 29 19:40:58 2010 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Thu, 29 Apr 2010 19:40:58 -0500 (CDT) Subject: [Swift-commit] r3289 - in branches/1.0/src/org/globus/swift/catalog: . util Message-ID: <20100430004058.8EB769CCE6@vm-125-59.ci.uchicago.edu> Author: hategan Date: 2010-04-29 19:40:58 -0500 (Thu, 29 Apr 2010) New Revision: 3289 Modified: branches/1.0/src/org/globus/swift/catalog/Catalog.java branches/1.0/src/org/globus/swift/catalog/CatalogEntry.java branches/1.0/src/org/globus/swift/catalog/util/Escape.java branches/1.0/src/org/globus/swift/catalog/util/ProfileParser.java branches/1.0/src/org/globus/swift/catalog/util/ProfileParserException.java branches/1.0/src/org/globus/swift/catalog/util/Separator.java Log: "fixed" Jens' name Modified: branches/1.0/src/org/globus/swift/catalog/Catalog.java =================================================================== --- branches/1.0/src/org/globus/swift/catalog/Catalog.java 2010-04-21 15:11:51 UTC (rev 3288) +++ branches/1.0/src/org/globus/swift/catalog/Catalog.java 2010-04-30 00:40:58 UTC (rev 3289) @@ -21,7 +21,7 @@ * This interface create a common ancestor for all cataloging * interfaces. * - * @author Jens-S. V?ckler + * @author Jens-S. Voeckler * @author Yong Zhao * @version $Revision: 1.2 $ */ Modified: branches/1.0/src/org/globus/swift/catalog/CatalogEntry.java =================================================================== --- branches/1.0/src/org/globus/swift/catalog/CatalogEntry.java 2010-04-21 15:11:51 UTC (rev 3288) +++ branches/1.0/src/org/globus/swift/catalog/CatalogEntry.java 2010-04-30 00:40:58 UTC (rev 3289) @@ -18,7 +18,7 @@ /** * This interface create a common ancestor for all catalog entries. * - * @author Jens-S. V?ckler + * @author Jens-S. Voeckler * @author Yong Zhao * @version $Revision: 1.1 $ */ Modified: branches/1.0/src/org/globus/swift/catalog/util/Escape.java =================================================================== --- branches/1.0/src/org/globus/swift/catalog/util/Escape.java 2010-04-21 15:11:51 UTC (rev 3288) +++ branches/1.0/src/org/globus/swift/catalog/util/Escape.java 2010-04-30 00:40:58 UTC (rev 3289) @@ -33,7 +33,7 @@ * * @author Gaurang Mehta * @author Karan Vahi - * @author Jens-S. V?ckler + * @author Jens-S. Voeckler * @version $Revision: 1.1 $ */ public class Escape Modified: branches/1.0/src/org/globus/swift/catalog/util/ProfileParser.java =================================================================== --- branches/1.0/src/org/globus/swift/catalog/util/ProfileParser.java 2010-04-21 15:11:51 UTC (rev 3288) +++ branches/1.0/src/org/globus/swift/catalog/util/ProfileParser.java 2010-04-30 00:40:58 UTC (rev 3289) @@ -26,7 +26,7 @@ * and the parsed triples and back again. * * @author Gaurang Mehta - * @author Jens-S. V???ckler + * @author Jens-S. Voeckler */ public class ProfileParser { Modified: branches/1.0/src/org/globus/swift/catalog/util/ProfileParserException.java =================================================================== --- branches/1.0/src/org/globus/swift/catalog/util/ProfileParserException.java 2010-04-21 15:11:51 UTC (rev 3288) +++ branches/1.0/src/org/globus/swift/catalog/util/ProfileParserException.java 2010-04-30 00:40:58 UTC (rev 3289) @@ -20,7 +20,7 @@ * @see ProfileParser * * @author Gaurang Mehta - * @author Jens-S. V?ckler + * @author Jens-S. Voeckler * @version $Revision: 1.1 $ */ public class ProfileParserException Modified: branches/1.0/src/org/globus/swift/catalog/util/Separator.java =================================================================== --- branches/1.0/src/org/globus/swift/catalog/util/Separator.java 2010-04-21 15:11:51 UTC (rev 3288) +++ branches/1.0/src/org/globus/swift/catalog/util/Separator.java 2010-04-30 00:40:58 UTC (rev 3289) @@ -22,7 +22,7 @@ * representation of a definition looks like ns::name:version, and * a textual representation of a uses like ns::name:min,max.

* - * @author Jens-S. V?ckler + * @author Jens-S. Voeckler * @author Yong Zhao * @version $Revision: 1.6 $ * From noreply at svn.ci.uchicago.edu Thu Apr 29 20:43:14 2010 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Thu, 29 Apr 2010 20:43:14 -0500 (CDT) Subject: [Swift-commit] r3290 - branches/1.0/src/org/griphyn/vdl/mapping Message-ID: <20100430014314.8BE6C9CCA0@vm-125-59.ci.uchicago.edu> Author: hategan Date: 2010-04-29 20:43:14 -0500 (Thu, 29 Apr 2010) New Revision: 3290 Modified: branches/1.0/src/org/griphyn/vdl/mapping/Path.java Log: added missing method Modified: branches/1.0/src/org/griphyn/vdl/mapping/Path.java =================================================================== --- branches/1.0/src/org/griphyn/vdl/mapping/Path.java 2010-04-30 00:40:58 UTC (rev 3289) +++ branches/1.0/src/org/griphyn/vdl/mapping/Path.java 2010-04-30 01:43:14 UTC (rev 3290) @@ -177,6 +177,10 @@ public String getFirst() { return ((Entry) elements.get(0)).name; } + + public String getLast() { + return ((Entry) elements.get(elements.size() - 1)).name; + } public boolean isEmpty() { return elements == null || elements.size() == 0; From noreply at svn.ci.uchicago.edu Thu Apr 29 20:54:27 2010 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Thu, 29 Apr 2010 20:54:27 -0500 (CDT) Subject: [Swift-commit] r3291 - branches/1.0/src/org/griphyn/vdl/karajan/lib Message-ID: <20100430015427.1F17E9CCA0@vm-125-59.ci.uchicago.edu> Author: hategan Date: 2010-04-29 20:54:26 -0500 (Thu, 29 Apr 2010) New Revision: 3291 Modified: branches/1.0/src/org/griphyn/vdl/karajan/lib/VDLFunction.java Log: made method protected since it is used by SetFieldValue Modified: branches/1.0/src/org/griphyn/vdl/karajan/lib/VDLFunction.java =================================================================== --- branches/1.0/src/org/griphyn/vdl/karajan/lib/VDLFunction.java 2010-04-30 01:43:14 UTC (rev 3290) +++ branches/1.0/src/org/griphyn/vdl/karajan/lib/VDLFunction.java 2010-04-30 01:54:26 UTC (rev 3291) @@ -513,7 +513,7 @@ return getFutureWrapperMap(stack).addFutureListListener(handle, value).futureIterator(stack); } - private void markAsAvailable(VariableStack stack, DSHandle handle, Object key) + protected void markAsAvailable(VariableStack stack, DSHandle handle, Object key) throws ExecutionException { getFutureWrapperMap(stack).markAsAvailable(handle, key); } From noreply at svn.ci.uchicago.edu Thu Apr 29 22:44:37 2010 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Thu, 29 Apr 2010 22:44:37 -0500 (CDT) Subject: [Swift-commit] r3292 - text Message-ID: <20100430034437.C36E99CCE6@vm-125-59.ci.uchicago.edu> Author: aespinosa Date: 2010-04-29 22:44:37 -0500 (Thu, 29 Apr 2010) New Revision: 3292 Added: text/internals/ Log: Initial tree for internals paper From noreply at svn.ci.uchicago.edu Thu Apr 29 22:44:52 2010 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Thu, 29 Apr 2010 22:44:52 -0500 (CDT) Subject: [Swift-commit] r3293 - text/internals Message-ID: <20100430034452.9FAF19CCE6@vm-125-59.ci.uchicago.edu> Author: aespinosa Date: 2010-04-29 22:44:52 -0500 (Thu, 29 Apr 2010) New Revision: 3293 Added: text/internals/trunk/ Log: Trunk branch From noreply at svn.ci.uchicago.edu Fri Apr 30 10:19:36 2010 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Fri, 30 Apr 2010 10:19:36 -0500 (CDT) Subject: [Swift-commit] r3295 - text/internals/trunk Message-ID: <20100430151936.8B3B29CC80@vm-125-59.ci.uchicago.edu> Author: aespinosa Date: 2010-04-30 10:19:36 -0500 (Fri, 30 Apr 2010) New Revision: 3295 Added: text/internals/trunk/Makefile Log: Makefile for building the manuscript Added: text/internals/trunk/Makefile =================================================================== --- text/internals/trunk/Makefile (rev 0) +++ text/internals/trunk/Makefile 2010-04-30 15:19:36 UTC (rev 3295) @@ -0,0 +1,39 @@ +# set latexfile to the name of the main file without the .tex +latexfile = internals +# put the names of figure files here. include the .eps +figures = +TEX = latex +BUILDIR = build + +# *.fig files may be in ./Figs +vpath %.fig Figs + +# reruns latex if needed. to get rid of this capability, delete the +# three lines after the rule. +# idea from http://ctan.unsw.edu.au/help/uk-tex-faq/Makefile +$(latexfile).dvi : $(figures) $(latexfile).tex + while ($(TEX) -output-directory=$(BUILDIR) $(latexfile); \ + grep -q "Rerun to get cross" $(BUILDIR)/$(latexfile).log ) do true ; \ + done + + +%.eps : %.fig + fig2dev -L eps $< > $@ + +$(latexfile).pdf : $(latexfile).ps + ps2pdf $(latexfile).ps $(latexfile).pdf + +pdf : $(latexfile).pdf + +$(latexfile).ps : $(latexfile).dvi + dvips $(latexfile) + +ps : $(latexfile).ps + +$(latexfile).tar.gz : $(figures) $(latexfile).tex + tar -czvf $(latexfile).tar.gz $(figures) $(latexfile).tex Figs/*.fig + +tarball: $(latexfile).tar.gz + +clean: + @rm -f $(BUILDIR)/*.dvi $(BUILDIR)/*.ps $(BUILDIR)/*.pdf From noreply at svn.ci.uchicago.edu Fri Apr 30 10:19:39 2010 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Fri, 30 Apr 2010 10:19:39 -0500 (CDT) Subject: [Swift-commit] r3296 - text/internals/trunk Message-ID: <20100430151939.37E239CC80@vm-125-59.ci.uchicago.edu> Author: aespinosa Date: 2010-04-30 10:19:39 -0500 (Fri, 30 Apr 2010) New Revision: 3296 Modified: text/internals/trunk/internals.tex Log: First draft of notes from first meeting Modified: text/internals/trunk/internals.tex =================================================================== --- text/internals/trunk/internals.tex 2010-04-30 15:19:36 UTC (rev 3295) +++ text/internals/trunk/internals.tex 2010-04-30 15:19:39 UTC (rev 3296) @@ -1,87 +1,112 @@ \documentclass{article} +\usepackage{url} + \title{Swift Internals} \author{ Mihael Hategan \\ scribe: Allan Espinosa } +\newcommand{\filename}[1]{\textit{#1}} +\newcommand{\snippet}[1]{\texttt{#1}} + \begin{document} \maketitle \section{Provider model} + Providers are asynchronous. Internally these are implemented as lightweight threads. It consumes less resources when a job is executing. Jobs only consume resources when submitting a job and receiving a notification when a job -finished. In between, the thread is simply waiting for the job to finish. By -using lightweight threads, less resources are consumed. +finished. In between the two events, the thread is simply waiting for the job +to finish. By using lightweight threads, no memory is consumed in this period. +This is the basic architecture on how providers work. -not native threads. consumes smaller resources when a job is executing. light -weight threading (green threads: don't consume memory when nothing is running). -This is how a provider works +If it is desired to monitor a task, a listener such as \snippet{StatusListener} +is added to the task. In the example code in +\url{http://wiki.cogkit.org/wiki/Java_CoG_Kit_Abstraction_Guide#How_to_execute_a_remote_job_execution_task}, +the \snippet{addStatusListener()} method is called in the \snippet{Task} object. -{ -Submit() -addListener()-> called when the job is done. add when you make a submit() job -} -vs -just runJob() +A remote execution task can be summarized in a few steps: (1) create a new +\snippet{Task} object with the corresponding execution provider using the +constructor from \snippet{TaskImpl}, (2) add a listener to monitor the task (3) +create a \snippet{JobSpecification} and attach this to the \snippet{Task} +object, (4) set the \snippet{Service} specification for the remote execution +site, and (5) create a \snippet{TaskHandler} to submit the task. These classes +can be found in the \filename{abstract-common} module of the CoG source tree. -CPU (on submit site): fill with as much of submission and notification. -Task -Handler -> that can actually execute these tasks -Handler.submit(Task) +\begin{verbatim} +Task task = new TaskImpl(...); +task.addStatusListener(public statusChanged()...); +JobSpecification spec = new JobSpecification(...); spec.set...; +Service serv = new Service(...); +task.setSpecification(spec); +task.setService(service); +TaskHandler handler = AbstractionFactory...; +handler.submit(task) +\end{verbatim} -in \verb1 modules/abstract-common1 -src/org/globus/cog/abstraction/impl/common/task +The reference classes are found in +\filename{src/org/globus/cog/abstraction/impl/common/task}. -public interface TaskHandler -public interface Task - - \section{Karajan} -Stackframes. Every element (execution unit) is a function - - java hashtables +Every element (execution unit) is a function and has its own stackframe. The +stackframe are currently implemented as java hashtables. -procedure invocations gets these +Procedure invocations gets these internal functions: \snippet{start()}, +\snippet{waitDone()}. All of these are executed asynchronously. -start() -waitDone(); +\begin{verbatim} +sequential( + job1 + job2 +) +\end{verbatim} +job2 will only be executed when job1 triggers the \snippet{Done} event. It can +be logically viewed as -everything is asynchronous. - -sequential:( - job1, - job2) - job1.start() - job1.done() { - job2.start() - } - job2.done() { - "i'm done" - } +\begin{verbatim} +job1.start() +job1.done() { + job2.start() +} +job2.done() { + "i'm done" +} +\end{verbatim} parallel: (decrements until 0 => done) That Thread1 Thread2 -a parenthesis creates a local scope (1 stack frame) +A parenthesis creates a local scope (1 stack frame) -Threads creates a new scopes +Threads creates a new scope -partial arguments -for(i in range(0,10), (do something) ) +An element features partial arguments: +\snippet{for(i in range(0,10), (do something) )} \section{Swift karajan} + +The \filename{bin/swift} program can execute pre-compiled karajan .k files. + +Basic components used by swift: + +\begin{verbatim} import("sys.k") wait() delay() sequential() parallel() +\end{verbatim} \section{Futures} +Used by swift variables to wait for availability of data sources. + +\begin{verbatim} a := future(wait(delay=2000), "A") [a,b] := seq(1,2) @@ -99,9 +124,11 @@ wait(delay=1000) wait(delay=2000) ) -> slightly parallel +\end{verbatim} \section{Channels} +\begin{verbatim} import("sys.k") c := futureChannel( @@ -109,16 +136,21 @@ wait(delay=1000), i ) ) +\end{verbatim} -not this: [ future(wait(1000), 0), future(2), future(3)... future(9) ] -but this: [], [0], [0, 1], [0, 2] +Channels are not implemented as an array of futures \snippet{[ +future(wait(1000), 0), future(2), future(3)... future(9) ]}. But as an array +whose content changes over time \snippet{[], [0], [0, 1], [0, 2]} for(j, c, echo(j)) basic concurency book -how swift works: -Alg 1 (channels) +\subsection{Core swift} + +Basic karajan structure of a compiled swiftscript + +\begin{verbatim} c := channel() parallel( sequential( @@ -129,12 +161,17 @@ echo(first(c)) ) ) +\end{verbatim} equivalent swift code: +\begin{verbatim} c = generate(); consume(c); +\end{verbatim} Used for swift arrays + +\begin{verbatim} import("sys.k") c := futureChannel( @@ -143,6 +180,7 @@ list(i, chr(i)) ) ) +\end{verbatim} \end{document} From noreply at svn.ci.uchicago.edu Fri Apr 30 10:19:33 2010 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Fri, 30 Apr 2010 10:19:33 -0500 (CDT) Subject: [Swift-commit] r3294 - text/internals/trunk Message-ID: <20100430151933.ACE259CC80@vm-125-59.ci.uchicago.edu> Author: aespinosa Date: 2010-04-30 10:19:33 -0500 (Fri, 30 Apr 2010) New Revision: 3294 Added: text/internals/trunk/internals.tex Log: Initial draft notes from last week Added: text/internals/trunk/internals.tex =================================================================== --- text/internals/trunk/internals.tex (rev 0) +++ text/internals/trunk/internals.tex 2010-04-30 15:19:33 UTC (rev 3294) @@ -0,0 +1,148 @@ +\documentclass{article} + +\title{Swift Internals} +\author{ + Mihael Hategan \\ + scribe: Allan Espinosa +} + +\begin{document} +\maketitle +\section{Provider model} +Providers are asynchronous. Internally these are implemented as lightweight +threads. It consumes less resources when a job is executing. Jobs only consume +resources when submitting a job and receiving a notification when a job +finished. In between, the thread is simply waiting for the job to finish. By +using lightweight threads, less resources are consumed. + +not native threads. consumes smaller resources when a job is executing. light +weight threading (green threads: don't consume memory when nothing is running). +This is how a provider works + +{ +Submit() +addListener()-> called when the job is done. add when you make a submit() job +} +vs +just runJob() + +CPU (on submit site): fill with as much of submission and notification. +Task +Handler -> that can actually execute these tasks +Handler.submit(Task) + +in \verb1 modules/abstract-common1 +src/org/globus/cog/abstraction/impl/common/task + +public interface TaskHandler +public interface Task + + +\section{Karajan} + +Stackframes. Every element (execution unit) is a function + - java hashtables + +procedure invocations gets these + +start() +waitDone(); + +everything is asynchronous. + +sequential:( + job1, + job2) + job1.start() + job1.done() { + job2.start() + } + job2.done() { + "i'm done" + } +parallel: (decrements until 0 => done) + +That + Thread1 + Thread2 + +a parenthesis creates a local scope (1 stack frame) + +Threads creates a new scopes + +partial arguments +for(i in range(0,10), (do something) ) + +\section{Swift karajan} +import("sys.k") +wait() +delay() +sequential() +parallel() + +\section{Futures} + +a := future(wait(delay=2000), "A") + +[a,b] := seq(1,2) + +brackets to combine a multple of l values + +echo("o") +a := future(wait(delay=2000), "A") +echo("B") executes at t=0 +wait(delay=1000) +echo("a") - > executes at t=1000 +echo(a) -> executes at t=2000 + +race( + wait(delay=1000) + wait(delay=2000) +) -> slightly parallel + +\section{Channels} + +import("sys.k") + +c := futureChannel( + for(i, ranger(0,9) + wait(delay=1000), i + ) +) + +not this: [ future(wait(1000), 0), future(2), future(3)... future(9) ] +but this: [], [0], [0, 1], [0, 2] + +for(j, c, echo(j)) + +basic concurency book + +how swift works: +Alg 1 (channels) +c := channel() +parallel( + sequential( + wait(delay=1000) + append(c,1) + ) + sequential( + echo(first(c)) + ) +) + +equivalent swift code: +c = generate(); +consume(c); + +Used for swift arrays +import("sys.k") + +c := futureChannel( + for(i, ranger(0,9) + wait(delay=1000) + list(i, chr(i)) + ) +) + + +\end{document} From noreply at svn.ci.uchicago.edu Fri Apr 30 11:36:30 2010 From: noreply at svn.ci.uchicago.edu (noreply at svn.ci.uchicago.edu) Date: Fri, 30 Apr 2010 11:36:30 -0500 (CDT) Subject: [Swift-commit] r3297 - in SwiftApps/SwiftR: . Swift Swift/R Swift/exec Swift/man Swift/tests Message-ID: <20100430163630.6D3219CC80@vm-125-59.ci.uchicago.edu> Author: wilde Date: 2010-04-30 11:36:30 -0500 (Fri, 30 Apr 2010) New Revision: 3297 Added: SwiftApps/SwiftR/Swift/ SwiftApps/SwiftR/Swift/DESCRIPTION SwiftApps/SwiftR/Swift/R/ SwiftApps/SwiftR/Swift/R/Swift.R SwiftApps/SwiftR/Swift/exec/ SwiftApps/SwiftR/Swift/exec/RunR.sh SwiftApps/SwiftR/Swift/exec/RunSwiftScript.sh SwiftApps/SwiftR/Swift/exec/swiftapply.swift SwiftApps/SwiftR/Swift/man/ SwiftApps/SwiftR/Swift/man/Swift-package.Rd SwiftApps/SwiftR/Swift/tests/ SwiftApps/SwiftR/Swift/tests/TestSwift.R Removed: SwiftApps/SwiftR/RunR.sh SwiftApps/SwiftR/RunSwiftScript.sh SwiftApps/SwiftR/Swift.R SwiftApps/SwiftR/TestSwift.R SwiftApps/SwiftR/swiftapply.swift Modified: SwiftApps/SwiftR/TODO Log: Created initial version of Swift package dir and relocated files to it. Deleted: SwiftApps/SwiftR/RunR.sh =================================================================== --- SwiftApps/SwiftR/RunR.sh 2010-04-30 15:19:39 UTC (rev 3296) +++ SwiftApps/SwiftR/RunR.sh 2010-04-30 16:36:30 UTC (rev 3297) @@ -1,13 +0,0 @@ -#! /usr/bin/env Rscript - -argv = commandArgs(TRUE) - -load(argv[1]); - -result=list() -for(c in 1:length(rcall$arglistbatch)) { - # FIXME: run this under try/catch and save error status in results object (need to make it a list: rval + error status) - result[[c]] = do.call( rcall$func, rcall$arglistbatch[[c]] ) -} - -save(result,file=argv[2]) Deleted: SwiftApps/SwiftR/RunSwiftScript.sh =================================================================== --- SwiftApps/SwiftR/RunSwiftScript.sh 2010-04-30 15:19:39 UTC (rev 3296) +++ SwiftApps/SwiftR/RunSwiftScript.sh 2010-04-30 16:36:30 UTC (rev 3297) @@ -1,32 +0,0 @@ -rundir=$1 -site=$2 - -cd $rundir - -cat >tc <sites.xml < - - - 10000 - .11 - - $(pwd) - - - 00:00:10 - 1800 - - 1 - 10000 - 5.99 - - $(pwd) - - -EOF - -swift -tc.file tc -sites.file sites.xml ../swiftapply.swift Added: SwiftApps/SwiftR/Swift/DESCRIPTION =================================================================== --- SwiftApps/SwiftR/Swift/DESCRIPTION (rev 0) +++ SwiftApps/SwiftR/Swift/DESCRIPTION 2010-04-30 16:36:30 UTC (rev 3297) @@ -0,0 +1,10 @@ +Package: Swift +Type: Package +Title: R interface to Swift parallel scripting languaage +Version: 0.1 +Date: 2010-02-25 +Author: Michael Wilde +Maintainer: Michael Wilde +Description: Routines to invoke R functions on remote resources through Swift. +License: Apache License +LazyLoad: yes Copied: SwiftApps/SwiftR/Swift/R/Swift.R (from rev 3254, SwiftApps/SwiftR/Swift.R) =================================================================== --- SwiftApps/SwiftR/Swift/R/Swift.R (rev 0) +++ SwiftApps/SwiftR/Swift/R/Swift.R 2010-04-30 16:36:30 UTC (rev 3297) @@ -0,0 +1,98 @@ +swiftapply <- function( func, arglists, site=NULL, callsperbatch=NULL ) +{ + # Move swiftprops into a Swift namespace + +# if(!exists("swiftprops")) { +# swiftprops <<-list() +# swiftprops$site <<- "local" +# swiftprops$callsperbatch <<- 1 +# } + + if (is.null(getOption("swift.site"))) + options(swift.site="local"); + if (is.null(getOption("swift.callsperbatch"))) + options(swift.callsperbatch=1) + + # Set Swift properties + +# if(is.null(site)) +# if( is.null(swiftprops) || is.null(swiftprops$site)) +# site <- "local" +# else +# site <- swiftprops$site +# if(is.null(callsperbatch)) +# if( is.null(swiftprops) || is.null(swiftprops$callsperbatch)) +# callsperbatch <- 1 +# else +# Callsperbatch <- swiftprops$callsperbatch + + if(is.null(site)) + site <- getOption("swift.site") + if(is.null(callsperbatch)) + callsperbatch <- getOption("swift.callsperbatch") + + cat("\nSwift R properties:\n") + cat(" site =",site,"\n"); + cat(" callsperbatch =",callsperbatch,"\n") + cat("\nCurrent dir: ",getwd(),"\n"); + + # Execute the calls in batches + + rundir <- system("mktemp -d SwiftR.run.XXX",intern=TRUE) + cat("Swift running in",rundir,"\n") + narglists <- length(arglists) # number of arglists to process + batch <- 1 # Next arglist batch number to fill + arglist <- 1 # Next arglist number to insert + while(arglist <= narglists) { + arglistsleft <- narglists - arglist + 1 + if(arglistsleft >= callsperbatch) { + batchsize <- callsperbatch + } + else { + batchsize <- arglistsleft + } + arglistbatch <- list() + for(i in 1 : batchsize) { + arglistbatch[[i]] <- arglists[[arglist]] + arglist <- arglist +1 + } + rcall <- list(func=func,arglistbatch=arglistbatch) + save(rcall,file=paste(rundir,"/cbatch.",as.character(batch),".Rdata",sep="")) + batch <- batch + 1; + } + nbatches <- batch - 1 + RunSwiftScript <- system.file(package="Swift","exec/RunSwiftScript.sh") + RunRScript <- system.file(package="Swift","exec/RunR.sh") + swiftapplyScript <- system.file(package="Swift","exec/swiftapply.swift") + system(paste(RunSwiftScript,rundir,site,swiftapplyScript,RunRScript,sep=" ")) + + # Fetch the batch results + + rno <- 1 + rlist <- list() + for(batch in 1:nbatches) { + result <- NULL + load(paste(rundir,"/rbatch.",as.character(batch),".Rdata",sep="")) + nresults <- length(result) + for(r in 1:nresults) { + rlist[[rno]] <- result[[r]] + rno <- rno + 1 + } + } + return(rlist) +} + +swiftLapply <- function( tlist, func, ... ) +{ + arglists <- list() + narglists <- length(tlist) + for(i in 1 : narglists) { + arglists[[i]] <- list(tlist[[i]], ...); + } + swiftapply(func, arglists) +} +##### +#* checking R code for possible problems ... NOTE +#swiftapply: no visible binding for '<<-' assignment to 'swiftprops' +#swiftapply: no visible binding for global variable 'swiftprops' +#swiftapply: no visible binding for global variable 'result' Copied: SwiftApps/SwiftR/Swift/exec/RunR.sh (from rev 3251, SwiftApps/SwiftR/RunR.sh) =================================================================== --- SwiftApps/SwiftR/Swift/exec/RunR.sh (rev 0) +++ SwiftApps/SwiftR/Swift/exec/RunR.sh 2010-04-30 16:36:30 UTC (rev 3297) @@ -0,0 +1,13 @@ +#! /usr/bin/env Rscript + +argv = commandArgs(TRUE) + +load(argv[1]); + +result=list() +for(c in 1:length(rcall$arglistbatch)) { + # FIXME: run this under try/catch and save error status in results object (need to make it a list: rval + error status) + result[[c]] = do.call( rcall$func, rcall$arglistbatch[[c]] ) +} + +save(result,file=argv[2]) Copied: SwiftApps/SwiftR/Swift/exec/RunSwiftScript.sh (from rev 3254, SwiftApps/SwiftR/RunSwiftScript.sh) =================================================================== --- SwiftApps/SwiftR/Swift/exec/RunSwiftScript.sh (rev 0) +++ SwiftApps/SwiftR/Swift/exec/RunSwiftScript.sh 2010-04-30 16:36:30 UTC (rev 3297) @@ -0,0 +1,35 @@ +rundir=$1 +site=$2 +script=$3 +runR=$4 + +cd $rundir + +cat >tc <sites.xml < + + + 10000 + .11 + + $(pwd) + + + 00:00:10 + 1800 + + 1 + 10000 + 5.99 + + $(pwd) + + +EOF + +swift -tc.file tc -sites.file sites.xml $script Copied: SwiftApps/SwiftR/Swift/exec/swiftapply.swift (from rev 3251, SwiftApps/SwiftR/swiftapply.swift) =================================================================== --- SwiftApps/SwiftR/Swift/exec/swiftapply.swift (rev 0) +++ SwiftApps/SwiftR/Swift/exec/swiftapply.swift 2010-04-30 16:36:30 UTC (rev 3297) @@ -0,0 +1,13 @@ +type RFile; + +app (RFile result) RunR (RFile rcall) +{ + RunR @rcall @result; +} + +RFile rcalls[] ; +RFile results[] ; + +foreach c, i in rcalls { + results[i] = RunR(c); +} Added: SwiftApps/SwiftR/Swift/man/Swift-package.Rd =================================================================== --- SwiftApps/SwiftR/Swift/man/Swift-package.Rd (rev 0) +++ SwiftApps/SwiftR/Swift/man/Swift-package.Rd 2010-04-30 16:36:30 UTC (rev 3297) @@ -0,0 +1,62 @@ +\name{Swift-package} +\alias{Swift-package} +\alias{Swift} +\alias{swiftapply} +\alias{swiftLapply} +\docType{package} +\title{ +R interface to Swift parallel scripting language +} +\description{ +Description: Routines to invoke R functions and Swift scripts on remote resources through Swift. +R functions can be remotely executed in parallel in a manner similar to Snow using a list of argument lists. +Eventually more general Swift functions can be embedded and invked remotely as well. +} +\details{ +\tabular{ll}{ +Package: \tab Swift\cr +Type: \tab Package\cr +Version: \tab 1.0\cr +Date: \tab 2010-02-25\cr +License: \tab Globus Toolkit Public License v3 (based on Apache License 2.0): http://www.globus.org/toolkit/legal/4.0/license-v3.html \cr +LazyLoad: \tab yes\cr +} +To use this package, create a list of argument lists, and then invoke: swiftapply(function,arglists). + +As a preliminary interface, you can set R options() to control Swift's operation: + +options(swift.callsperbatch=n) # n = number of R calls to perform in each Swift job. + +options(swift.site=sitename) # sitename = "local" to run on the current host and "pbs" to submit to a local PBS cluster. + +} +\author{ +Michael Wilde + +Maintainer: Michael Wilde +} +\references{ +http://www.ci/uchicago.edu/swift +} +\keyword{ parallel and distributed execution } +%%\seealso{ +%%~~ Optional links to other man pages, e.g. ~~ +%%~~ \code{\link[:-package]{}} ~~ +%%} +\examples{ + +require("boot") +sumcrits <- function(duckdata,dogdata) { sum( duckdata$plumage, dogdata$mvo ) } + +args=list(ducks,dogs) +arglist = rep(list(args),9) +res = swiftapply(sumcrits,arglist) + +res = swiftapply(sumcrits,arglist,callsperbatch=10) + +res = swiftapply(sumcrits,arglist,callsperbatch=2) + +res = swiftapply(sumcrits,arglist,callsperbatch=3) + +# res = swiftapply(sumcrits,arglist,callsperbatch=2,site="pbs") +} Copied: SwiftApps/SwiftR/Swift/tests/TestSwift.R (from rev 3253, SwiftApps/SwiftR/TestSwift.R) =================================================================== --- SwiftApps/SwiftR/Swift/tests/TestSwift.R (rev 0) +++ SwiftApps/SwiftR/Swift/tests/TestSwift.R 2010-04-30 16:36:30 UTC (rev 3297) @@ -0,0 +1,68 @@ +require(boot) +#source("Swift.R") +require(Swift) + +sumcrits <- function(duckdata,dogdata) { sum( duckdata$plumage, dogdata$mvo ) } + +args=list(ducks,dogs) +arglist = rep(list(args),9) + +### +require("boot") +sumcrits <- function(duckdata,dogdata) { sum( duckdata$plumage, dogdata$mvo ) } + +args=list(ducks,dogs) +arglist = rep(list(args),9) +res = swiftapply(sumcrits,arglist) + +res = swiftapply(sumcrits,arglist,callsperbatch=10) + +res = swiftapply(sumcrits,arglist,callsperbatch=2) + +res = swiftapply(sumcrits,arglist,callsperbatch=3) + +# res = swiftapply(sumcrits,arglist,callsperbatch=2,site="pbs") +### + + +if(TRUE) { # Basic tests + + res = do.call(sumcrits,args) + cat("Test of do.call(sumcrits)\n") + print(res) + + cat("\nTest of swiftapply(sumcrits,arglist)\n") + res = swiftapply(sumcrits,arglist) + print(res) +} + +if(FALSE) { # Test various batch sizes + + cat("\nTest of swiftapply(sumcrits,arglist,callsperbatch=10)\n") + res = swiftapply(sumcrits,arglist,callsperbatch=10) + print(res) + + cat("\nTest of swiftapply(sumcrits,arglist,callsperbatch=2)\n") + res = swiftapply(sumcrits,arglist,callsperbatch=2) + print(res) + + cat("\nTest of swiftapply(sumcrits,arglist,callsperbatch=3)\n") + res = swiftapply(sumcrits,arglist,callsperbatch=3) + print(res) + + cat("\nTest of swiftapply(sumcrits,arglist,callsperbatch=20)\n") + res = swiftapply(sumcrits,arglist,callsperbatch=20) + print(res) +} + +if(FALSE) { # Larger-scale tests + + cat("\nTest of swiftapply(sumcrits,arglist[1000],callsperbatch=1)\n") + + arglist = rep(list(args),100) +# res = swiftapply(sumcrits,arglist,callsperbatch=2,site="pbs") + res = swiftapply(sumcrits,arglist,callsperbatch=2) + + print(res[[1]]) + print(res[[1000]]) +} Deleted: SwiftApps/SwiftR/Swift.R =================================================================== --- SwiftApps/SwiftR/Swift.R 2010-04-30 15:19:39 UTC (rev 3296) +++ SwiftApps/SwiftR/Swift.R 2010-04-30 16:36:30 UTC (rev 3297) @@ -1,67 +0,0 @@ -swiftapply <- function( func, arglists, site=NULL, callsperbatch=NULL ) -{ - # Move swiftprops into a Swift namespace - - if(!exists("swiftprops")) { - swiftprops <<-list() - swiftprops$site <<- "local" - swiftprops$callsperbatch <<- 1 - } - - # Set Swift properties - - if(is.null(site)) - if( is.null(swiftprops) || is.null(swiftprops$site)) - site <- "local" - else - site <- swiftprops$site - if(is.null(callsperbatch)) - if( is.null(swiftprops) || is.null(swiftprops$callsperbatch)) - callsperbatch <- 1 - else - callsperbatch <- swiftprops$callsperbatch - cat("\nSwift R properties:\n") - cat(" site =",site,"\n"); - cat(" callsperbatch =",callsperbatch,"\n") - - # Execute the calls in batches - - rundir <- system("mktemp -d SwiftR.run.XXX",intern=TRUE) - cat("Swift running in",rundir,"\n") - narglists <- length(arglists) # number of arglists to process - batch <- 1 # Next arglist batch number to fill - arglist <- 1 # Next arglist number to insert - while(arglist <= narglists) { - arglistsleft <- narglists - arglist + 1 - if(arglistsleft >= callsperbatch) { - batchsize <- callsperbatch - } - else { - batchsize <- arglistsleft - } - arglistbatch <- list() - for(i in 1 : batchsize) { - arglistbatch[[i]] <- arglists[[arglist]] - arglist <- arglist +1 - } - rcall <- list(func=func,arglistbatch=arglistbatch) - save(rcall,file=paste(rundir,"/cbatch.",as.character(batch),".Rdata",sep="")) - batch <- batch + 1; - } - nbatches <- batch - 1 - system(paste("./RunSwiftScript.sh",rundir,site,sep=" ")) - - # Fetch the batch results - - rno <- 1 - rlist <- list() - for(batch in 1:nbatches) { - load(paste(rundir,"/rbatch.",as.character(batch),".Rdata",sep="")) - nresults <- length(result) - for(r in 1:nresults) { - rlist[[rno]] <- result[[r]] - rno <- rno + 1 - } - } - return(rlist) -} Modified: SwiftApps/SwiftR/TODO =================================================================== --- SwiftApps/SwiftR/TODO 2010-04-30 15:19:39 UTC (rev 3296) +++ SwiftApps/SwiftR/TODO 2010-04-30 16:36:30 UTC (rev 3297) @@ -1,4 +1,42 @@ +*** NOTES on where everything lives: + +Am testing on PADS +~/SwiftR is my "project" main working dir + +R is under ~/R and ~/R/pads (compiled for PADS; the ~/R/bin/R executable gets a library error on pads) +OpenMx source tree checked out under: ~/SwiftR/OpenMx + +R packages are installed under: ~/RPackages + +-- + +(Note: dont yet know if we do or do not need separate compiles between +communicado, bridles, pads. teraport, and other systems on the CI +net. Hopefully nit; if we do, weill need to create a tree to R +releases each with a separate subtree for user-installed packages) +Seems that we do, at least for PADS, we get this error: + +login1$ ~/R/bin/R +/home/wilde/R/lib64/R/bin/exec/R: + error while loading shared libraries: libreadline.so.4: + cannot open shared object file: No such file or directory +login1$ + +-- + + +Swift package *source* (tbd) is under: + +~/SwiftR/Swift (the "Swift" package) + +Swift package is installed under: + + + + +*** TO DO LIST: + x n args x batch x into svn Deleted: SwiftApps/SwiftR/TestSwift.R =================================================================== --- SwiftApps/SwiftR/TestSwift.R 2010-04-30 15:19:39 UTC (rev 3296) +++ SwiftApps/SwiftR/TestSwift.R 2010-04-30 16:36:30 UTC (rev 3297) @@ -1,48 +0,0 @@ -require(boot) -source("Swift.R") - -sumcrits <- function(duckdata,dogdata) { sum( duckdata$plumage, dogdata$mvo ) } - -args=list(ducks,dogs) -arglist = rep(list(args),9) - -if(TRUE) { # Basic tests - - res = do.call(sumcrits,args) - cat("Test of do.call(sumcrits)\n") - print(res) - - cat("\nTest of swiftapply(sumcrits,arglist)\n") - res = swiftapply(sumcrits,arglist) - print(res) -} - -if(FALSE) { # Test various batch sizes - - cat("\nTest of swiftapply(sumcrits,arglist,callsperbatch=10)\n") - res = swiftapply(sumcrits,arglist,callsperbatch=10) - print(res) - - cat("\nTest of swiftapply(sumcrits,arglist,callsperbatch=2)\n") - res = swiftapply(sumcrits,arglist,callsperbatch=2) - print(res) - - cat("\nTest of swiftapply(sumcrits,arglist,callsperbatch=3)\n") - res = swiftapply(sumcrits,arglist,callsperbatch=3) - print(res) - - cat("\nTest of swiftapply(sumcrits,arglist,callsperbatch=20)\n") - res = swiftapply(sumcrits,arglist,callsperbatch=20) - print(res) -} - -if(FALSE) { # Larger-scale tests - - cat("\nTest of swiftapply(sumcrits,arglist[1000],callsperbatch=1)\n") - - arglist = rep(list(args),1000) - res = swiftapply(sumcrits,arglist,callsperbatch=2,site="pbs") - - print(res[[1]]) - print(res[[1000]]) -} Deleted: SwiftApps/SwiftR/swiftapply.swift =================================================================== --- SwiftApps/SwiftR/swiftapply.swift 2010-04-30 15:19:39 UTC (rev 3296) +++ SwiftApps/SwiftR/swiftapply.swift 2010-04-30 16:36:30 UTC (rev 3297) @@ -1,13 +0,0 @@ -type RFile; - -app (RFile result) RunR (RFile rcall) -{ - RunR @rcall @result; -} - -RFile rcalls[] ; -RFile results[] ; - -foreach c, i in rcalls { - results[i] = RunR(c); -}