Open Provenance Model Contents
- Introduction
- Basics
- Overlapping and Hierarchichal Descriptions
- Provenance Graph Definition
- Timeless Formal Model
- Inferences
- Formal Model and Time Annotations
- Time Constraints and Inferences
- Support for Collections
- Example of Representation
- Conclusion
- Best Practice on the Use of Agensts
- References
5 Timeless Formal Model
Figure 8 provides a set-theoretic definition [11, 6] of the open provenance model, based on the concepts introduced so far. The model of causality we propose is timeless since time precedence does not imply causality: if a process
P1 occurs before a process
P2, in general, we cannot infer that
P1 caused
P2 to happen. However, the converse implication holds assuming time is measured according to a single clock.
Even though the provenance model is timeless, we recognize the importance of time, since time is easily observable by computer systems or users. Hence, in Section 7, we examine how the causality graph can be annotated with time. We will also specify constraints that one would expect time annotations to satisfy (in terms of monotonicity with respect to time) in sound causality graphs.
We assume the existence of a few primitive sets: identifiers for processes, artifacts and agents, roles, and accounts. These sets of identifiers provide indentifies to the corresponding entities within the scope of a given provenance graph. A given serialization will standardize on these sets, and provide concrete representations for them.
It is important to stress that the purpose of these identifiers is to define the structure of graphs: they are not meant to define identities that are persistent and reliably resolvable over time.
In the model, processes, artifacts and agents are identified by their IDs, and are associated with a value and zero or more accounts --- noted
P(Account), the powerset notation. In the set-theoretic notation, identifiers map to the corresponding value and account membership. In other words, with a database perspective, elements of
ProcessId,
ArtifactId and
AgentId are keys to processes, artifacts and agents, respectively.
The five causality edges can be easily specified by sets
used,
wasGeneratedBy,
triggeredBy,
wasDerivedFrom, and
wasControlledBy making use of identifiers for artifacts, processes or agents, roles, and the associated accounts.
Finally, an
OPM graph needs to identify explicitly which accounts are overlapping or refinements. For this, we use a set
Overlaps enumerating lists of overlapping accounts, and a set
Refines enumerating lists of refined accounts.
Figure 8: Timeless Causality Graph Data Model
The model of Figure 8 specifies all the necessary building blocks for creating
OPM graphs. We now revisit the definition provided by Section 4, re-examining each item, and explaining it in terms of the formal model.
-
Accounts are elements of the set Account.
- All artifacts of a graph must have identifiers belonging to the
set ArtifactId. A function A of type Artifact is total on the set ArtifactId. For an artifact id a, account membership is
A(a).acc. In OPM, the artifact entity contains a placeholder,
A(a).value, for application specific values or references to the
actual piece of state.
- All processes of a graph must have identifiers belonging to the
set ProcessId. A function P of type Process is total on the set of ProcessId. For a process
id p, account memberhsip is P(p).acc. A process contains a
placeholder P(p).value for application specific valuers or
references to the actual process.
- All agents of a graph must have identifiers belonging to the set
AgentId. For the total function AG, and for an agent id ag,
account memberhsip is AG(ag).acc. Placeholder for the actual
agent value is AG(ag).value.
- Equality on edges is defined as follows:
For any used edges u1=⟨ p1,r1,a1, acc1⟩∈ Used
and u2=⟨ p2,r2,a2, acc2⟩∈ Used, u1=u2 if p1=p2,
a1=a2, r1=r2, acc1=acc2.
For any wasGeneratedBy edges g1=⟨ a1,r1,p1, acc1⟩∈ WasGeneratedBy
and g2=⟨ a2,r2, acc2⟩∈ Used, g1=g2 if p1=p2,
a1=a2, r1=r2, acc1=acc2.
For any wasControlledBy edges c1=⟨ p1,r1,ag1, acc1⟩∈ WasControlledBy
and ag2=⟨ p2,r2,ag2, acc2⟩∈ WasControlledBy, c1=c2 if p1=p2,
ag1=ag2, r1=r2, acc1=acc2.
For any wasDerivedFrom edges d1=⟨ a1,a′1, acc1⟩∈ WasDerivedFrom
and d2=⟨ a2,a′2, acc2⟩∈ DerivedFrom, d1=d2 if a1=a2,
a′1=a′2, acc1=acc2.
For any wasTriggeredBy edges t1=⟨ p1,p′1, acc1⟩∈ WasTriggeredBy
and t2=⟨ p2,p′2, acc2⟩∈ WasTriggeredBy, t1=t2 if p1=p2,
p′1=p′2, acc1=acc2.
- The model does not place any constraints on roles, beyond their
membership to the set Role.
- We introduce a convenience function accountOfgr operating on entities of a graph gr.
For a given OPM graph gr=⟨ A, P, AG, U, G, T, D, C, Ov, Re⟩, where
A∈
Artifact, P∈ Process, AG∈ Agent, and U⊆ Used, G⊆
WasGeneratedBy, T⊆ WasTriggeredBy, D⊆ WasDerivedFrom,C⊆
WasControlledBy, Ov⊆ Overlapping, Re⊆ Refinement
accountOfgr(p) | = | P(p).acc |
accountOfgr(a) | = | A(a).acc |
accountOfgr(ag) | = | AG(ag).acc |
accountOfgr(u) | = | acc where u=⟨ p,r,a,acc⟩∈ U |
accountOfgr(g) | = | acc where g=⟨ a,r,p,acc⟩∈ G |
accountOfgr(t) | = | acc where t=⟨ p1,p2,acc⟩∈ T |
accountOfgr(d) | = | acc where d=⟨ a1,a2,acc⟩∈ D |
accountOfgr(c) | = | acc where c=⟨ p,r,ag,acc⟩∈ C |
|
We then introduce effectiveAccountOf:
effectiveAccountOfgr(p) |
| = | accountOfgr(p) |
| | ⋃i,j,k accountOfgr (ui,j,k) where ui,j,k=⟨ p,ri,aj,acck⟩ ∈ U |
| | ⋃i,j,k accountOfgr (di,j,k) where di,j,k⟨ ai,rj,p,acck⟩ ∈ G |
| | ⋃i,j accountOfgr (ti,j) where ti,j=⟨ p, pi,accj⟩ ∈ T |
| | ⋃i,j accountOfgr (ti,j) where ti,j=⟨ pi, p,accj⟩ ∈ T |
| | ⋃i,j,k accountOfgr (ci,j,k) where ci,j,k=⟨ p,ri, agj,acck⟩ ∈ C |
|
(It is defined similarly for artifacts and agents.) - No topological restriction is placed on OPM graphs. For instance,
⟨ p,r1,a,∅⟩ ∈ U and ⟨
a,r2,p,∅⟩
∈ G are two acceptable edges of an OPM graph, which would create a
circularity.
If gr1=⟨ A1,P1,AG1, U1,G1,T1,D1,C1, Ov1, Re1⟩
and
gr2=⟨ A2,P2,AG2, U2, G2,T2,D2,C2, Ov2, Re2⟩,
then
gr1∪ gr2=⟨ A1⊔ A2,P1⊔ P2,AG1⊔ AG2, U1∪ U2,G1∪ G2,T1∪
T2,D1∪ D2,C1∪ C2, Ov1∪ Ov2, Re1∪ Re2⟩,
where the ⊔ operator is define as: A1⊔ A2(x)=⟨ v,a1∪ a2⟩ with A1(x)=⟨
v,a1⟩ and A2(x)=⟨ v,a2⟩.
If gr1=⟨ A1,P1,AG1, U1,G1,T1,D1,C1, Ov1, Re1⟩
and
gr2=⟨ A2,P2,AG2, U2, G2,T2,D2,C2, Ov2, Re2⟩,
then
gr1∩ gr2=⟨ A1⊓ A2,P1⊓ P2,AG1⊓ AG2, U1∩ U2,G1∩ G2,T1∩
T2,D1∩ D2,C1∩ C2, Ov1∩ Ov2, Re1∩ Re2⟩,
where the ⊓ operator is define as: A1⊓ A2(x)=⟨ v,a1∩ a2⟩ with A1(x)=⟨
v,a1⟩ and A2(x)=⟨ v,a2⟩.
If gr1,gr2∈ OPMGraph, then
and
- For an OPMGraph gr=⟨ A,P,AG, U,G,T,D, C, Ov, Re⟩, for an account α,
view(α,gr) is ⟨ Aα,Pα,AGα,
Uα,Gα,Tα,Dα,Cα,Ov, Re⟩,
where:
Aα⊆ A | with | Aα={ (a,acc)∈ A such that α∈ effectiveAccountOfgr(a)} |
Pα⊆ P | with | Pα={ (p,acc)∈ P such that α∈ effectiveAccountOfgr(p)} |
AGα⊆ AG | with | AGα={ (ag,acc)∈ AG such that α∈ effectiveAccountOfgr(ag)} |
Uα⊆ U | with | Uα={ ⟨ p,r,a,acc⟩ ∈ U such that α∈ acc} |
Gα⊆ G | with | Gα={ ⟨ a,r,p,acc⟩ ∈ G such that α∈ acc} |
Tα⊆ T | with | Tα={ ⟨ p1,p2,acc⟩ ∈ T such that α∈ acc} |
Dα⊆ D | with | Dα={ ⟨ a1,a2,acc⟩ ∈ D such that α∈ acc} |
Cα⊆ C | with | Cα={ ⟨ p,ag,acc⟩ ∈ C such that α∈ acc} |
|
- A legal account view gr=⟨ A,P,AG, U,G,T,D,C,Ov,
Re⟩ is such that there is no cycle in U, G, T, D and if
⟨ a1,r1,p1, acc1⟩∈ G and ⟨ a1,r2,p2,
acc1⟩∈ G, then ⟨ a1,r1,p1, acc1⟩=⟨
a1,r2,p2, acc1⟩, where acc1 is a singleton.
- Two accounts α1,α2 are declared to be overlapping in an OPMgraph
gr=⟨ A,P, AG, U,G,T,D,C, Ov, Re⟩, if ⟨ α1,α2⟩ ∈ Ov
or ⟨ α2,α1⟩ ∈ Ov.
- Two accounts α1,α2 are declared to be legally overlapping in an OPMgraph
if they are overlapping and if their respective account views
⟨ A1,P1,AG1, U1,G1,T1,D1,C1,Ov1,Re1⟩ and
⟨ A2,P2,AG2, U2,G2,T2,D2,C2,Ov2,Re2⟩ are such that
| | Domain(A1)⋂ Domain(A2)≠∅ |
| or | Domain(P1)⋂ Domain(P2)≠∅ |
| or | Domain(AG1)⋂ Domain(AG2)≠∅. |
|
Hence, the overlapping relationship is reflexive, symmetric but
not transitive. - An account α1 is declared to refine account α2 in an OPMgraph
gr=⟨ A,P, AG, U,G,T,D,C, Ov, Re⟩, if ⟨ α1,α2⟩ ∈ Re.
- An account α1 is declared to be legally refining account α2 in an OPMgraph
if they are overlapping and if their respective account views
gr1=⟨ A1,P1,AG1, U1,G1,T1,D1,C1,Ov1,Re1⟩ and
gr2=⟨ A2,P2,AG2, U2,G2, T2,D2,C2,Ov2,Re2⟩ are such that
| | source(gr2)⊆ source(gr1) |
| and | sink(gr2)⊆ sink(gr1) |
|
Concept is currently ill-defined. Definition remaining to be finalised. Can we define refinement just on syntactic properties of the graphs? Hence, the refinement relationship is reflexive, asymmetric and transitive.
Comments
to top