Skip to topic | Skip to bottom

Provenance Challenge


Start of topic | Skip to actions

Notes on PC4 Scoping Workshop


We had roughly 25 attendees.

Discussion about challenge options

We began by discussing the Challenge Proposals and how broadly to organize the challenge. Two options, were discussed.

1) Chop up a single process and interoperate provenance information across systems.

2) Everyone generate their own provenance information from their chosen domain following a general problem pattern. Other teams must then read another teams provenance and answer a set of general provenance questions.

Comments during this discussion about these options

Types/Patterns of Processes to Consider

There was a general agreement that all the scenarios proposed had several types of process (or "Process Patterns") that were viewed as important for PC4 to address. The group listed these patterns as follows:

For each pattern, teams identified which pattern they would be interested in supporting/implementing. The following is a tentative list of the teams that say they would be interested.

Interested Teams

For each pattern, we came up with example provenance queries that illustrated the need for this sort of process.

Provenance Queries

Scenario Selection

There was a strong debate about whether we should adopt one scenario (e.g. the crystallography scenario) as with past challenges or allow for multiple different scenarios. The two key issues were:
  1. if one scenario was used, some teams would not have the bandwidth to participate while
  2. if we allowed for multiple scenarios, interoperation across many teams would most likely suffer. Teams would probably chose to only work with scenarios where interoperation was easy or teams already worked closely together.

The resolution of this problem was the following compromise:

Organization of the Challenge

Organization of Teams


  1. For teams to generate OPM for the part process they are responsible for in either XML or OWL. These are uploaded to the wiki: Convention to grab a bunch of OPM Graphs (tar ball or wget?)
  2. Teams load all OPM into their provenance systems and perform queries (for one run‚and then see what happens)
  3. Decide if/how we go about implementing distributed provenance query across systems


  1. Abstract Scenario
  2. Identify all the data flowing in the system with respect to the crystallography scenario (this can be mocked up) where possible we have example data: (August 30)
  3. For each pattern of the process produce a mock-up of the opm graph with respect to the data in step 2 and make sure they stitch together (Nov 30)
  4. Finalize queries with respect to scenario (Dec 15)
  5. Import and implement queries over the mockup (Feb 28)
  6. Generate and publish Provenance for each pattern (Feb 28)
  7. Import and Implement Queries over the generated provenance (Mar 30)
  8. Decide whether to do api compatibility
  9. Prepare slides for challenge [Jun 1 - Jun 8]
  10. PC4 Workshop June 10

It was suggested that we try to collocate at SIGMOD on June 12

-- PaulGroth - 07 Jul 2010
to top

End of topic
Skip to action links | Back to top

You are here: Challenge > NotesPC4ScopingWorkshop

to top

Copyright © 1999-2012 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.