The Open Provenance Model is a model of provenance that is designed to meet the following requirements: (1) To allow provenance information to be exchanged between systems, by means of a compatibility layer based on a shared provenance model. (2) To allow developers to build and share tools that operate on such a provenance model. (3) To define provenance in a precise, technology-agnostic manner. (4) To support a digital representation of provenance for any 'thing', whether produced by computer systems or not. (5) To allow multiple levels of description to coexist. (6) To define a core set of rules that identify the valid inferences that can be made on provenance representation.
This document presents an XML Schema for the Open Provenance Model (v1.1) [OPM V1.1].
The Open Provenance Model is a model of provenance that is designed to meet the following requirements: (1) To allow provenance information to be exchanged between systems, by means of a compatibility layer based on a shared provenance model. (2) To allow developers to build and share tools that operate on such a provenance model. (3) To define provenance in a precise, technology-agnostic manner. (4) To support a digital representation of provenance for any 'thing', whether produced by computer systems or not. (5) To allow multiple levels of description to coexist. (6) To define a core set of rules that identify the valid inferences that can be made on provenance representation.
The purpose of this document is to define an XML Schema to capture the concepts of the open provenance model [OPM V1.1]. Valid inferences are not captured by this specification; instead, we refer the reader to OPMO ontology [OPMO].
A design goal of this XMLSchema is that the XML serialization should be convertible into RDF (as per the OPMO ontology [OPMO]), and vice-versa, the RDF representation should be convertible into XML. The OWL ontology and the XML schemas were co-evolved to ensure that convertibility.
We adopt the following XML prefix and XML namespaces:
OPM define a notion of graphs. There are three kinds of nodes: Artifacts, Agents, Processes. (Note that the schema does not define the type Node.)
Five kinds of edges are supported: Used, WasGeneratedBy (WGB), WasDerivedFrom (WDF), WasControlledBy (WCB) and WasTriggeredBy (WTB). (Note that the schema does not define the type Edge.)
Edges have specific source (effect) and specific destination (cause). Used has an Artifact as an effect, and a Process as a Cause; WasGeneratedBy (WGB) has an Artifact as an effect, and a Process as a cause; WasDerivedFrom (WDF) has Artifacts as cause and effect; WasControlledBy (WCB) has a Process as an effect, and an Agent as a Cause; WasTriggeredBy (WTB) has Processes as cause and effect. Some edges have a Role and Time information associated with them.
Nodes, edges, and annotations can belong to Accounts. A graph enumerates the nodes, edges, annotations and accounts it contains.
Annotable entities can be associated with Annotations. (Note that the schema does not define the type Annotable.)
The OPMX XML schema uses xsd:IDREF to identify nodes, edges, accounts in an OPM graph.
3.1. Example
Here is a simple OPM graph, inspired from the First Provenance Challenge workflow. Using the OPM graphical notation, we have the following OPM graph:
Two representations of this OPM graph have been produced. The first maps to RDF, according to the OPMO ontology, and is represented in the N3 notation. The second is a serialization in XML compatible with OPMX Schema.
Observed Time allow for interval of observation, where an event is said to occur no earlier than a given time t1 and no later than a given time t2. When the event is observed to occur at a specific time, it is not convenient to use an interval. Instead, one can use the alternate exactlyAt attribute. We note that exactlyAt is disjoint from noEarlierThan and noLaterThan.
[OPM V1.1] Luc Moreau, Ben Clifford, Juliana Freire, Joe Futrelle, Yolanda Gil, Paul Groth, Natalia Kwasnikowska, Simon Miles, Paolo Missier, Jim Myers, Beth Plale, Yogesh Simmhan, Eric Stephan, and Jan Van den Bussche. The open provenance model core specification (v1.1). Future Generation Computer Systems, July 2010. (doi: 10.1016/j.future.2010.07.005), (www: http://eprints.ecs.soton.ac.uk/21449/).