[Swift-commit] r5953 - provenancedb

lgadelha at ci.uchicago.edu lgadelha at ci.uchicago.edu
Mon Oct 1 14:38:47 CDT 2012


Author: lgadelha
Date: 2012-10-01 14:38:47 -0500 (Mon, 01 Oct 2012)
New Revision: 5953

Added:
   provenancedb/provdb-uml.dia
Modified:
   provenancedb/README.asciidoc
Log:


Modified: provenancedb/README.asciidoc
===================================================================
--- provenancedb/README.asciidoc	2012-09-27 20:04:54 UTC (rev 5952)
+++ provenancedb/README.asciidoc	2012-10-01 19:38:47 UTC (rev 5953)
@@ -6,9 +6,9 @@
 
 . A set of scripts for extracting provenance information from Swift's log files. The extracted data is imported into a relational database, currently PotgreSQL, where it can queried.
 
-. A query interface for provenance with a built-in query language called SPQL (Swift Provenance Query Language). SPQL is similar to SQL except for not having +FROM+-clauses and join expressions on the +WHERE+-clause, which are automatically computed for the user. A number of functions and stored procedures that abstract common provenance query patterns are available both in SPQL and SQL.
+. A query interface for provenance with a built-in query language called SPQL (Swift Provenance Query Language). SPQL is similar to SQL except for not having +FROM+-clauses and join expressions on the +WHERE+-clause, which are automatically computed for the user. A number of functions and stored procedures that abstract common provenance query patterns are available in both SPQL and SQL.
 	
-It addresses the characteristics of many-task computing, where concurrent component tasks are submitted to parallel and distributed computational resources. Such resources are subject to failures, and are usually under high demand for executing tasks and transferring data. Science-level performance information, which describes the behavior of an experiment from the  point of view of the scientific domain, is critical for the management of such experiments (for instance, by determining how accurate the outcome of a scientific simulation was, and whether accuracy varies between execution environments). Recording the resource-level performance of such workloads can also assist scientists in managing the life cycle of their computational experiments. Features :
+The tools for managing provenance information in Swift have the following features:
 
 - Gathering of producer-consumer relationships between data sets and processes. 
 
@@ -22,8 +22,11 @@
 
 - Provides a usable and useful query interface for provenance information. 
 
-A UML diagram of this provenance model is presented in Figure . We simplify the UML notation to abbreviate the information that each annotated entity set (script run, function call, and variable) has one annotation entity set per data type. We define entities that correspond to the Open Provenance Model (OPM) notions of artifact, process, and artifact usage (either being consumed or produced by a process). These are augmented with entities used to represent many-task scientific computations, and to allow for entity annotations. Such annotations, which can be added post-execution, represent information about provenance entities  such as object version tags and scientific parameters. 
+A UML diagram of this provenance model is presented in figure <<provdb_schema>>. We simplify the UML notation to abbreviate the information that each annotated entity set (script run, function call, and variable) has one annotation entity set per data type. We define entities that correspond to the Open Provenance Model (OPM) notions of artifact, process, and artifact usage (either being consumed or produced by a process). These are augmented with entities used to represent many-task scientific computations, and to allow for entity annotations. Such annotations, which can be added post-execution, represent information about provenance entities  such as object version tags and scientific parameters. 
 
+[[provdb_schema]]
+image::provdb-uml.svg["Swift provenance database schema",width=1000]
+
 +script_run+: refers to the execution (successful or unsuccessful) of an entire many-task scientific computation, which in Swift is specified as the execution of a complete parallel script from start to finish. 
 
 +function_call+: records calls to Swift functions. These calls take as input data sets, such as values stored in primitive variables or files referenced by mapped variables; perform some computation specified in the respective function declaration; and produce data sets as output. In Swift, function calls can represent invocations of external applications, built-in functions, and operators; each function call is associated with the script run that invoked it.

Added: provenancedb/provdb-uml.dia
===================================================================
(Binary files differ)


Property changes on: provenancedb/provdb-uml.dia
___________________________________________________________________
Added: svn:mime-type
   + application/octet-stream




More information about the Swift-commit mailing list