[Swift-commit] r3271 - text/swift_pc3_fgcs

noreply at svn.ci.uchicago.edu noreply at svn.ci.uchicago.edu
Mon Apr 5 13:52:06 CDT 2010


Author: lgadelha
Date: 2010-04-05 13:52:06 -0500 (Mon, 05 Apr 2010)
New Revision: 3271

Modified:
   text/swift_pc3_fgcs/swift_pc3_fgcs.tex
Log:


Modified: text/swift_pc3_fgcs/swift_pc3_fgcs.tex
===================================================================
--- text/swift_pc3_fgcs/swift_pc3_fgcs.tex	2010-04-05 17:55:07 UTC (rev 3270)
+++ text/swift_pc3_fgcs/swift_pc3_fgcs.tex	2010-04-05 18:52:06 UTC (rev 3271)
@@ -254,19 +254,13 @@
 Except for {\em wasControlledBy}, the dependency relationships defined in OPM can be derived from the {\tt dataset\_usage} database relation. {\em used} and {\em wasGeneratedBy} are explicitly stored in the relation. For instance, if the tuple $\langle P_{id}, D_{id}, {\text I}, R \rangle$ is in the {\tt dataset\_usage} relation then it is equivalent to say $D_{id} \xleftarrow{\text{used(R)}} P_{id}$ in OPM. If we had 'O' instead of 'I' as the value for attribute {\tt direction} it would be equivalent to 
 $P_{id} \xleftarrow{\text{wasGeneratedBy(R)}} D_{id}$ in OPM.
 
+One of the main concerns with using a relational model for representing provenance is the need for querying over the transitive relation expressed in the {\tt dataset\_usage} table. For example, after executing the SwiftScript code in listing \ref{transit}, it might be desirable to find all dataset handles that lead to {\tt c}: that is, {\tt a} and {\tt b}. However simple SQL queries over the {\tt dataset\_usage} relation can only go back one step, leading to the answer {\tt b} but not to the answer {\tt a}. To address this problem, we generate a transitive closure table by an incremental evaluation system \cite{SQLTRANS}. This approach makes it straightforward to query over transitive relations using natural SQL syntax, at the expense of larger database size and longer import time.
 
-
-One of the main concerns with using a relational model for representing provenance is the need for querying over the transitive relation expressed in the {\tt dataset\_usage} table. For example, after executing the fragment:
-
-
-\begin{lstlisting}[float,caption=A floating example,frame=lines]
+\begin{lstlisting}[float,caption=Transitivity of provenance relationships.,frame=lines,label=transit]
 b = p(a);
 c = q(b);
 \end{lstlisting}
 
-
-it might be desirable to find all dataset handles that lead to {\tt c}: that is, {\tt a} and {\tt b}. However simple SQL queries over the {\tt dataset\_usage} relation can only go back one step, leading to the answer {\tt b} but not to the answer {\tt a}. To address this problem, we generate a transitive closure table by an incremental evaluation system \cite{SQLTRANS}. This approach makes it straightforward to query over transitive relations using natural SQL syntax, at the expense of larger database size and longer import time.
-
 \section{Third Provenance Challenge Queries}
 
 The workflow selected for PC3 receives a set of CSV files containing astronomical data, stores the contents of these files in a relational database, and performs a series of validation steps on the database. This workflow makes extensive use of conditional and loop flow controls and database operations. A Java implementation of the component applications of the workflow was provided in the Provenance Challenge Wiki \cite{pc}, where our Swift implementation is also available. Swift has an application (local or remote) catalog where wrapper scripts that call these component applications were listed. Most of the inputs and outputs of the component applications are XML files,  so we defined a mapped variable type called {\tt xmlfile} for handling these files. Component applications are declared in the SwiftScript program, such as:




More information about the Swift-commit mailing list