[Swift-devel] Re: oops provenance
Michael Wilde
wilde at mcs.anl.gov
Wed Apr 15 11:44:55 CDT 2009
On 4/15/09 10:58 AM, Ben Clifford wrote:
> If you want to look at the provenance db as it is now, read sections 2 and
> 3 of this page:
>
> http://www.ci.uchicago.edu/~benc/provenance.html#owndb
>
> I recommend if you try this at home to use sqlite3, not postgres.
>
>> I think a starting point for oops provenance is this: For every run, you want
>> to know:
>
> many of these are straightforward to add, and i will look at doing so
> after pc3 stuff
>
>> - an ID for the run
>
> this exists now
yes, "but". Its long and hard to manage. We have experience now with
both Falkon and Swift in giving runs simple short IDs, and that has
worked well. Its so much easier to talk about oops run 0042 than run
*imqvgr8. The long ID is also useful but should be more hidden and internal.
How we do this should tie in with where we go with swift run management
conventions.
>
>> - analyzed scores of the run output
>
> not sure what that is - is this application specific output?
yes
>
>> - what version of oops was used
>
> the extrainfo stuff I implemented previously for the oops app may be used
> here. I've heard no feedback about it actually being used, though.
right, that is the solution. we need to test it.
>
>> - what version of the oops.swift script was used
>
> For all the version stuff, you need to figure out what version semantics
> you want (eg md5sum of swift script, which gives fine grained version
> distinction but no order; user specified version numbering which is pretty
> much guaranteed to be wrong but you might think you want that, and also
> gives ordering; ... there are lots of schemes ...)
hmmm - all those sound good - can we have them all? ;)
seriously, though - a few thoughts on this:
- i lean to close integration with svn on versions, ie use svn to
version code, including swift scripts, and use svn revision IDs as well
as software release numbers to define versions of code. Ie, oops rev
0428 or oops release 1.2.4, depending on what you were running.
- i can now see the merits and use of the old vdl constructs
namespace::name:version, and would like to explore how to use and
integrate that into Swift.
- I think the mdsum etc stuff is useful, and also good for research into
"airtight" provenance, but less immediately needed by users. And when
added, seems like that kind of thing thats nice to have always running
in the background, to resolve thorny provenance questions, but should
seldom be visible to the end user.
>> Given this in a database, you could also compare structure scores for one
>> version of code or one algorithm vs another
>
> This is more application specific data?
I think its a join of app-specific and swift-maintained. Eg, in the
current oops.swift script, the user can specify via cmd line arg which
of 2 oops algorithms to use ("classic" or "rama"). So I could easily see
a parameter sweep that says: for each protein in plist, do the full
sweep for both algorithms, and give me tables, plots etc that lets me
compare them. So far, that is more "application" than provenance. But
now, do the same thing but compare rama 1.2.4 with rama 1.2.6. Depending
on how thats expressed, it could utilize provenance info. Especially if
the question was asked "retrospectively" on the provenance data, as
opposed to set up in advance as a comparative workflow. Ie, look at the
runtime-per-simulation of each of the last 3 rama versions.
> Are you looking here to have data output from a run end up in a database?
Yes, thats being considered, as an application thing, in addition to and
separate from the provenance data.
I will send the OOPS paper to swft and try to get it posted on the swift
web soon. Its got some nice stats in sec 5 on #runs, that would be great
to derive on a running basis from collected provenance data.
- Mike
More information about the Swift-devel
mailing list