Skip to topic | Skip to bottom

Provenance Challenge


Start of topic | Skip to actions

The Open Provenance Model (v1.01)


  1. Introduction (Below)
  2. Basics
  3. Overlapping and Hierarchichal Descriptions
  4. Provenance Graph Definition
  5. Timeless Formal Model
  6. Inferences
  7. Formal Model and Time Annotations
  8. Time Constraints and Inferences
  9. Support for Collections
  10. Example of Representation
  11. Conclusion
  12. Best Practice on the Use of Agensts
  13. References

Notes on the Wiki Version

This is a wiki version of the Open Provenance Model version 1.01. This is based on the authoritative pdf that can be found at . It is designed to help track comments and suggestions for the next revision of the OPM.

In terms of comments, each section its own comment area. Please leave your comments there. Make sure to add your signature as well so we know the provenance of the comments. If you do not want to modify the wiki itself, there's a comment box which you can use.

If you really need to leave comments within the text itself, please use another color and try to make your comment stand out from the rest of the text.


Luc Moreau (Editor) (U. of Southampton)
Beth Plale (Indiana U.)
Simon Miles (King’s College)
Carole Goble, Paolo Missier (Manchester U.)
Roger Barga, Yogesh Simmhan (Microsoft)
Joe Futrelle, Robert E. McGrath, Jim Myers (NCSA)
Patrick Paulson (PNNL)
Shawn Bowers, Bertram Ludaescher (U. Davis)
Natalia Kwasnikowska, Jan Van den Bussche (U. Hasselt)
Tommy Ellkvist, Juliana Freire (U. Utah)
Paul Groth (USC)


In this paper, we introduce the Open Provenance Model , a model for provenance that is designed to meet the following requirements: (1) To allow provenance information to be exchanged between systems, by means of a compatibility layer based on a shared provenance model. (2) To allow developers to build and share tools that operate on such a provenance model. (3) To define the model in a precise, technology-agnostic manner. (4) To support a digital representation of provenance for any "thing", whether produced by computer systems or not. (5) To define a core set of rules that identify the valid inferences that can be made on provenance graphs.

1 Introduction

Provenance is well understood in the context of art or digital libaries, where it respectively refers to the documented history of an art object, or the documentation of processes in a digital object's life cycle [5]. Interest for provenance in the "e-science community" [13] is also growing, since provenance is perceived as a crucial component of workflow systems [2] that can help scientists ensure reproducibility of their scientific analyses and processes.

Against this background, the International Provenance and Annotation Workshop (IPAW'06), held on May 3-5, 2006 in Chicago, involved some 50 participants interested in the issues of data provenance, process documentation, data derivation, and data annotation [8, 1]. During a session on provenance standardization, a consensus began to emerge, whereby the provenance research community needed to understand better the capabilities of the different systems, the representations they used for provenance, their similarities, their differences, and the rationale that motivated their designs.

Hence, the first Provenance Challenge was born, and from the outset, the challenge was set up to be informative rather than competitive. The first Provenance Challenge was set up in order to provide a forum for the community to understand the capabilities of different provenance systems and the expressiveness of their provenance representations. Participants simulated or ran a Functional Magnetic Resonance Imaging workflow, from which they implemented and executed a pre-identified set of ``provenance queries''. Sixteen teams responded to the challenge, and reported their experience in a journal special issue [10].

The first Provenance Challenge was followed by the second Provenance Challenge, aiming at establishing inter-operability of systems, by exchanging provenance information. Thirteen teams [12] responded to this second challenge. Discussions indicated that there was substantial agreement on a core representation of provenance. As a result, following a workshop in August 2007, in Salt Lake City, a data model was crafted and released as the Open Provenance Model (v1.00) [9].

The starting point of this work is the community agreement summarized by Miles [7]. We assume that provenance of objects (whether digital or not) is represented by an annotated causality graph, which is a directed acyclic graph, enriched with annotations capturing further information pertaining to execution. For the purpose of this paper, a provenance graph is defined to be a record of a past execution (or current execution), and not a description of something that could happen in the future.

The Open Provenance Model (OPM) is a model for provenance that is designed to meet the following requirements:

While specifying this model, we also have some _non_-requirements:

On June 19th 2008, twenty participants attended the first OPM workshop [3] to discuss the version of the specification. Minutes of the workshop and recommendations [4] were published, and led to the current version (v1.01) of the Open Provenance Model.


to top

End of topic
Skip to action links | Back to top

You are here: Challenge > OPM > OPM1-01Review

to top

Copyright © 1999-2012 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.