OPM1-01Review-TimelessFormalModel < Challenge

Challenge.OPM1-01Review-TimelessFormalModel

Open Provenance Model Contents

5 Timeless Formal Model

Figure 8 provides a set-theoretic definition [11, 6] of the open provenance model, based on the concepts introduced so far. The model of causality we propose is timeless since time precedence does not imply causality: if a process P₁ occurs before a process P₂, in general, we cannot infer that P₁ caused P₂ to happen. However, the converse implication holds assuming time is measured according to a single clock.

Even though the provenance model is timeless, we recognize the importance of time, since time is easily observable by computer systems or users. Hence, in Section 7, we examine how the causality graph can be annotated with time. We will also specify constraints that one would expect time annotations to satisfy (in terms of monotonicity with respect to time) in sound causality graphs.

We assume the existence of a few primitive sets: identifiers for processes, artifacts and agents, roles, and accounts. These sets of identifiers provide indentifies to the corresponding entities within the scope of a given provenance graph. A given serialization will standardize on these sets, and provide concrete representations for them.

It is important to stress that the purpose of these identifiers is to define the structure of graphs: they are not meant to define identities that are persistent and reliably resolvable over time.

In the model, processes, artifacts and agents are identified by their IDs, and are associated with a value and zero or more accounts --- noted P(Account), the powerset notation. In the set-theoretic notation, identifiers map to the corresponding value and account membership. In other words, with a database perspective, elements of ProcessId, ArtifactId and AgentId are keys to processes, artifacts and agents, respectively.

The five causality edges can be easily specified by sets used, wasGeneratedBy, triggeredBy, wasDerivedFrom, and wasControlledBy making use of identifiers for artifacts, processes or agents, roles, and the associated accounts.

Finally, an OPM graph needs to identify explicitly which accounts are overlapping or refinements. For this, we use a set Overlaps enumerating lists of overlapping accounts, and a set Refines enumerating lists of refined accounts.

Figure 8: Timeless Causality Graph Data Model

The model of Figure 8 specifies all the necessary building blocks for creating OPM graphs. We now revisit the definition provided by Section 4, re-examining each item, and explaining it in terms of the formal model.


Accounts are elements of the set Account. 
All artifacts of a graph must have identifiers belonging to the
set ArtifactId. A function A of type Artifact is total on the set ArtifactId. For an artifact id a, account membership is
A(a).acc. In OPM, the artifact entity contains a placeholder,
A(a).value, for application specific values or references to the
actual piece of state.
All processes of a graph must have identifiers belonging to the
set ProcessId. A function P of type Process is total on the set of ProcessId. For a process
id p, account memberhsip is P(p).acc. A process contains a
placeholder P(p).value for application specific valuers or
references to the actual process.
All agents of a graph must have identifiers belonging to the set
AgentId. For the total function AG, and for an agent id ag,
account memberhsip is AG(ag).acc. Placeholder for the actual
agent value is AG(ag).value.
Equality on edges is defined as follows:For any used edges u₁=⟨ p₁,r₁,a₁, acc₁⟩∈ Used
and u₂=⟨ p₂,r₂,a₂, acc₂⟩∈ Used, u₁=u₂ if p₁=p₂,
a₁=a₂, r₁=r₂, acc₁=acc₂. 
For any wasGeneratedBy edges g₁=⟨ a₁,r₁,p₁, acc₁⟩∈ WasGeneratedBy
and g₂=⟨ a₂,r₂, acc₂⟩∈ Used, g₁=g₂ if p₁=p₂,
a₁=a₂, r₁=r₂, acc₁=acc₂. 
For any wasControlledBy edges c₁=⟨ p₁,r₁,ag₁, acc₁⟩∈ WasControlledBy
and ag₂=⟨ p₂,r₂,ag₂, acc₂⟩∈ WasControlledBy, c₁=c₂ if p₁=p₂,
ag₁=ag₂, r₁=r₂, acc₁=acc₂. 
For any wasDerivedFrom edges d₁=⟨ a₁,a′₁, acc₁⟩∈ WasDerivedFrom
and d₂=⟨ a₂,a′₂, acc₂⟩∈ DerivedFrom, d₁=d₂ if a₁=a₂,
a′₁=a′₂, acc₁=acc₂. 
For any wasTriggeredBy edges t₁=⟨ p₁,p′₁, acc₁⟩∈ WasTriggeredBy
and t₂=⟨ p₂,p′₂, acc₂⟩∈ WasTriggeredBy, t₁=t₂ if p₁=p₂,
p′₁=p′₂, acc₁=acc₂. 
The model does not place any constraints on roles, beyond their
membership to the set Role.
We introduce a convenience function accountOf^gr operating on entities of a graph gr. 
For a given OPM graph gr=⟨ A, P, AG, U, G, T, D, C, Ov, Re⟩, where
A∈
Artifact, P∈ Process, AG∈ Agent, and U⊆ Used, G⊆
WasGeneratedBy, T⊆ WasTriggeredBy, D⊆ WasDerivedFrom,C⊆

 WasControlledBy, Ov⊆ Overlapping, Re⊆ Refinement

accountOf^gr(p) = P(p).acc 

accountOf^gr(a) = A(a).acc 

accountOf^gr(ag) = AG(ag).acc 

accountOf^gr(u) = acc  where  u=⟨ p,r,a,acc⟩∈ U

accountOf^gr(g) = acc  where  g=⟨ a,r,p,acc⟩∈ G

accountOf^gr(t) = acc  where  t=⟨ p₁,p₂,acc⟩∈ T

accountOf^gr(d) = acc  where  d=⟨ a₁,a₂,acc⟩∈ D

accountOf^gr(c) = acc  where  c=⟨ p,r,ag,acc⟩∈ C


We then introduce effectiveAccountOf:

effectiveAccountOf^gr(p) 
  = accountOf^gr(p) 
    ⋃_i,j,k accountOf^gr (u_i,j,k)  where  u_i,j,k=⟨ p,r_i,a_j,acc_k⟩ ∈ U
    ⋃_i,j,k accountOf^gr (d_i,j,k)  where  d_i,j,k⟨ a_i,r_j,p,acc_k⟩ ∈ G
    ⋃_i,j accountOf^gr (t_i,j)  where  t_i,j=⟨ p, p_i,acc_j⟩ ∈ T
    ⋃_i,j accountOf^gr (t_i,j)  where  t_i,j=⟨ p_i, p,acc_j⟩ ∈ T
    ⋃_i,j,k accountOf^gr (c_i,j,k)  where  c_i,j,k=⟨ p,r_i, ag_j,acc_k⟩ ∈ C


(It is defined similarly for artifacts and agents.)
No topological restriction is placed on OPM graphs. For instance,
⟨ p,r₁,a,∅⟩ ∈ U and ⟨
a,r₂,p,∅⟩ 
∈ G are two acceptable edges of an OPM graph, which would create a
circularity. If gr₁=⟨ A₁,P₁,AG₁, U₁,G₁,T₁,D₁,C₁, Ov₁, Re₁⟩
and
gr₂=⟨ A₂,P₂,AG₂, U₂, G₂,T₂,D₂,C₂, Ov₂, Re₂⟩,
then 
gr₁∪ gr₂=⟨ A₁⊔ A₂,P₁⊔ P₂,AG₁⊔ AG₂, U₁∪ U₂,G₁∪ G₂,T₁∪
T₂,D₁∪ D₂,C₁∪ C₂, Ov₁∪ Ov₂, Re₁∪ Re₂⟩,
where the ⊔ operator is define as: A₁⊔ A₂(x)=⟨ v,a₁∪ a₂⟩ with A₁(x)=⟨
v,a₁⟩ and A₂(x)=⟨ v,a₂⟩.
If gr₁=⟨ A₁,P₁,AG₁, U₁,G₁,T₁,D₁,C₁, Ov₁, Re₁⟩
and
gr₂=⟨ A₂,P₂,AG₂, U₂, G₂,T₂,D₂,C₂, Ov₂, Re₂⟩,
then 
gr₁∩ gr₂=⟨ A₁⊓ A₂,P₁⊓ P₂,AG₁⊓ AG₂, U₁∩ U₂,G₁∩ G₂,T₁∩
T₂,D₁∩ D₂,C₁∩ C₂, Ov₁∩ Ov₂, Re₁∩ Re₂⟩,
where the ⊓ operator is define as: A₁⊓ A₂(x)=⟨ v,a₁∩ a₂⟩ with A₁(x)=⟨
v,a₁⟩ and A₂(x)=⟨ v,a₂⟩.
If gr₁,gr₂∈ OPMGraph, then 
gr₁⋃ gr₂∈ OPMGraph
 and
gr₁⋂ gr₂∈ OPMGraph.
For an OPMGraph gr=⟨ A,P,AG, U,G,T,D, C, Ov, Re⟩, for an account α,
view(α,gr) is ⟨ A_α,P_α,AG_α,
U_α,G_α,T_α,D_α,C_α,Ov, Re⟩,
where:

A_α⊆ A with A_α={ (a,acc)∈ A such that α∈ effectiveAccountOf^gr(a)}

P_α⊆ P with P_α={ (p,acc)∈ P such that α∈ effectiveAccountOf^gr(p)}

AG_α⊆ AG with AG_α={ (ag,acc)∈ AG such that α∈ effectiveAccountOf^gr(ag)}

U_α⊆ U with U_α={ ⟨ p,r,a,acc⟩ ∈ U such that α∈ acc}

G_α⊆ G with G_α={ ⟨ a,r,p,acc⟩ ∈ G such that α∈ acc}

T_α⊆ T with T_α={ ⟨ p₁,p₂,acc⟩ ∈ T such that  α∈ acc}

D_α⊆ D with D_α={ ⟨ a₁,a₂,acc⟩ ∈ D such that  α∈ acc}

C_α⊆ C with C_α={ ⟨ p,ag,acc⟩ ∈ C such that  α∈ acc}

A legal account view gr=⟨ A,P,AG, U,G,T,D,C,Ov,
Re⟩ is such that there is no cycle in U, G, T, D and if
⟨ a₁,r₁,p₁, acc₁⟩∈ G and ⟨ a₁,r₂,p₂,
acc₁⟩∈ G, then ⟨ a₁,r₁,p₁, acc₁⟩=⟨
a₁,r₂,p₂, acc₁⟩, where acc₁ is a singleton.
Two accounts α₁,α₂ are declared to be overlapping in an OPMgraph
gr=⟨ A,P, AG, U,G,T,D,C, Ov, Re⟩, if ⟨ α₁,α₂⟩ ∈ Ov
or ⟨ α₂,α₁⟩ ∈ Ov. 
Two accounts α₁,α₂ are declared to be legally overlapping in an OPMgraph
if they are overlapping and if their respective account views
⟨ A₁,P₁,AG₁,  U₁,G₁,T₁,D₁,C₁,Ov₁,Re₁⟩ and
⟨ A₂,P₂,AG₂, U₂,G₂,T₂,D₂,C₂,Ov₂,Re₂⟩ are such that
    Domain(A₁)⋂ Domain(A₂)≠∅
  or Domain(P₁)⋂ Domain(P₂)≠∅
  or Domain(AG₁)⋂ Domain(AG₂)≠∅.


Hence, the overlapping relationship is reflexive, symmetric but 
not transitive.
An account α₁ is declared to refine account α₂ in an OPMgraph
gr=⟨ A,P, AG, U,G,T,D,C, Ov, Re⟩, if ⟨ α₁,α₂⟩ ∈ Re. 
An account α₁ is declared to be legally refining account α₂ in an OPMgraph
if they are overlapping and if their respective account views
gr₁=⟨ A₁,P₁,AG₁,  U₁,G₁,T₁,D₁,C₁,Ov₁,Re₁⟩ and
gr₂=⟨ A₂,P₂,AG₂, U₂,G₂, T₂,D₂,C₂,Ov₂,Re₂⟩ are such that
    source(gr₂)⊆ source(gr₁)
  and sink(gr₂)⊆ sink(gr₁)

Concept is currently ill-defined. Definition remaining to be finalised. Can we define refinement just on syntactic properties of the graphs? Hence, the refinement relationship is reflexive, asymmetric and transitive.

Comments

to top

End of topic
Skip to action links | Back to top

I	Attachment	Action	Size	Date	Who	Comment
	fig8.jpg	manage	141.3 K	31 Jul 2008 - 02:44	PaulGroth

You are here: Challenge > OPM1-01Review-TimelessFormalModel

to top

Provenance Challenge

5 Timeless Formal Model

Comments