[Swift-devel] Re: More questions on Provenance
Tanu Malik
tmalik at purdue.edu
Tue Jul 28 11:12:07 CDT 2009
Thanks Ben,
This is very helpful. I wish I could hunt you down.
Interesting to know about the recent OPM work.
We have defined network nodes in our model to explicitly demonstrate those.
I did not know about OPM.
Thanks
Ben Clifford wrote:
> Hi Tanu. I'm long gone. But here are a few brief comments. I added
> swift-devel.
>
> On Mon, 27 Jul 2009, Tanu Malik wrote:
>
>
>> 1. How do you model the provenance for across the network transfers?
>> In that case the input is some file, the process is the file transfer process
>> and the
>> output would be on another machine. The output will have to be created
>> manually
>> which either mentions the success of the transfer or failure.
>>
>
> The level at which provenance is recorded is more abstract than that at
> the level where file transfers exist. A procedure takes input files which
> are described by URLs relative to the submit-side run directory and
> produces output files described by the same.
>
> The internal mechanisms of moving those files around to runtime sites as
> needed and managing the cache of those happens internally to the procedure
> execution and is not exposed as explicit activity.
>
> Information is logged abut such transfers though so if desired it might be
> possible to make another level of description about what happened there
> (one of the interesting things with ongoing OPM work is how to describe
> the same activity at multiple levels like this).
>
>
>> 2. Also you mention something about the number of runs in your
>> presentation. "extra records � depth of graph x number of runs". What
>> does the number of runs correspond to and how is that modeled in the DB.
>>
>
> This is about constructing an explicit transitive closure of the
> procedure/dataset graph.
>
> If you have an explicit graph A->B, B->C then constructing the closure
> means you ened to add A->C as an edge. Thats what I mean by roughly
> proportional to depth of graph - the deeper the graph, the more edges you
> need to add.
>
> In the most recent implementation, each invocation of Swift is a subgraph
> disconnected from the subgraphs of all other invocations of Swift. So (if
> you make the often invalid but also often valid assumption that each
> invocation of Swift generates roughly the same size provenance output),
> size of the graph put together is roughly proportional to the number of
> runs.
>
> If further work was done to identify datasets from the graphs of different
> runs (using some identity relation such as same filename or something
> else), then generating a tranistive closure would possibly generate graphs
> that are proportional to more-than-the-number-of-runs.
>
>
>> I was also wondering if we can chat on the phone or I come up again to
>> discuss a possible collaboration on this project and present some of our
>> new results.
>>
>
> Nothing involving me except by very occasional email or if you hunt me
> down in person and ply me with alcohol.
>
> --
More information about the Swift-devel
mailing list