The Open Provenance Model has defined the notion of OPM graph based on a set of syntactic rules and the notion of Provenance Graph adding a set of topological constraints. Provenance graphs are aimed at representing causality graphs explaining how processes and artifacts came out to be. It is expected that a variety of reasoning algorithms will exploit this data model, in order to provide novel and powerful functionality to users. It is beyond the scope of this document to include an extensive coverage of relevant reasoning algorithms. However, provenance graphs, by means of edges, capture causal dependencies, which can be summarised by means of transitive closure that we describe in this section.
In Section 2, we have introduced the two causal dependencies triggeredBy and wasDerivedFrom acting as abbreviation for causal dependencies used and wasGeneratedBy. Figure 9 shows their exact meaning.
Figure 9: One Step Inference in the Provenance Model
Figures 10 and 11 formalize Figure 9 by introducing rules for each inference that can be performed in the Open Provenance Model. A rule consists of two expressions separated by a horizontal line. The expression above the line is a hypothesis, whereas the expression below the line is a conclusion that can be inferred from the hypothesis.
In Equation (1), a wasTriggeredBy edge is inferred from the existence of a used and wasGeneratedBy edges, as per described in Figure 9. We note that the inferred \triggeredBy edge relies on both accounts acc_{2} and acc_{3}, hence, it is given _acc_{2} ∪ acc_{3} as account.
Figure 10: One Step Inference Rules (1)
Equation (2) is the reverse of Equation (1): it allows us to establish that the edge < p_{2},p_{1},acc > ∈ WasTriggeredBy is hiding the existence of some artifact a_{2}, used by p_{2} and generated by p_{1}. The inferred edges used and wasGeneratedBy are asserted in the context of some account acc_{2} and acc_{3}, whose union is the original account acc. We note that Equation (2) allows us to establish the existence of some artifact a_{2} (and r_{1},r_{2},acc_{1},acc_{2}) but it does not tell us what their ids and values are. This is the consequence of using wasTriggeredBy , which is a lossy summary of the composition of used and wasGeneratedBy.
The kind of inferences that can be made about wasDerivedFrom is of a different nature. Indeed, without any internal knowledge of P_{1} in Figure 9, it is impossible to ascertain there is an actual data dependency between A_{1} and A_{2}.
Remark. Concretely, a rule such as the following would lead to incorrect inferences since it allows arbitrary outputs to a process to be inferred to be dependent on arbitrary inputs to the same process.
⟨a_{2},r_{2},p_{1},acc_{2}⟩∈WasGeneratedBy ∧ ⟨p_{1},r_{1},a_{1},acc_{1}⟩∈Used |
⟨a_{2},a_{1},acc_{1}⋃ acc_{2}⟩∈WasDerivedFrom |
While it is unreasonable to infer an exact dependency by means of wasDerivedFrom , it is useful to be able to infer that a dependency may exist. To this end, we introduce the edge mayHaveBeenDerivedFrom that marks such a potential dependency. Hence, if < a_{1},a_{2} > ∈ WasDerivedFrom, then < a_{1},a_{2} > ∈ MayHaveBeenDerivedFrom, but not vice-versa. Hence, Equation (3) states that a mayHaveBeenDerivedFrom edge can be derived from the existence of a succession of wasGeneratedBy and used edges. Equation (4) is to (2) what wasDerivedFrom is to wasTriggeredBy.
Figure 11: One Step Inference Rules (2)
In rules 1 and 3, the inferred edges have accounts acc_{2} ∪ acc_{3} and acc_{1} ∪ acc_{2}, respectively. Hence, the artifacts and processes connected by these edges will have an effective account membership modified accordingly. We note that rules 1 and 3 effectively creates relationships in the union of multiple account views.
Users want to find out the causes of an artifact, not due to one process, but potentially, due to an unknown number of them.
Hence, for the purpose of expressing queries or expressing inferences about provenance graphs, we introduce four new relationships, which are transitive versions of existing relationships, namely Used^{*}, WasGeneratedBy^{*}, WasDerivedFrom^{*} and TriggeredBy^{*}. Their definitions are displayed in Figure 12. We note that Figure 12 contains definitions (as opposed to inference rules of Figures 10 and 11, which specify which edges can be inferred from which edges). For convenience, we have also introduced a generic causal dependency wasDependentOn^{∗} (see equations (9) to (12)). Note that similar inference rules can be defined for MayHaveBeenDerivedFrom.
Figure 12: Transitive Closures
Equations 7 and 8 are one of the multiple possible ways of defining edges used^{*} and wasGeneratedBy^{*}. Other definitions could be expressed and proved equivalent (such as used^{*} can be derived from a single used and _wasDerivedFrom^{*}).
-- PatrickPaulson - 18 Aug 2008
I | Attachment | Action | Size | Date | Who | Comment |
---|---|---|---|---|---|---|
graph.jpg | manage | 113.5 K | 30 Jul 2008 - 18:47 | PaulGroth | ||
fig10.jpg | manage | 44.1 K | 31 Jul 2008 - 03:09 | PaulGroth | ||
fig11.jpg | manage | 45.2 K | 31 Jul 2008 - 03:10 | PaulGroth | ||
fig12.jpg | manage | 140.9 K | 31 Jul 2008 - 03:10 | PaulGroth |