Start of topic | Skip to actions

- Introduction
- Basics
- Overlapping and Hierarchichal Descriptions
- Provenance Graph Definition
- Timeless Formal Model
- Inferences
- Formal Model and Time Annotations
- Time Constraints and Inferences
- Support for Collections
- Example of Representation
- Conclusion
- Best Practice on the Use of Agensts
- References

Figure 8 provides a set-theoretic definition [11, 6] of the open provenance model, based on the concepts introduced so far. The model of causality we propose is timeless since time precedence does not imply causality: if a process *P _{1}* occurs before a process

Even though the provenance model is timeless, we recognize the importance of time, since time is easily observable by computer systems or users. Hence, in Section 7, we examine how the causality graph can be annotated with time. We will also specify constraints that one would expect time annotations to satisfy (in terms of monotonicity with respect to time) in sound causality graphs.

We assume the existence of a few primitive sets: identifiers for processes, artifacts and agents, roles, and accounts. These sets of identifiers provide indentifies to the corresponding entities within the scope of a given provenance graph. A given serialization will standardize on these sets, and provide concrete representations for them.

It is important to stress that the purpose of these identifiers is to define the structure of graphs: they are not meant to define identities that are persistent and reliably resolvable over time.

In the model, processes, artifacts and agents are identified by their IDs, and are associated with a value and zero or more accounts --- noted *P(Account)*, the powerset notation. In the set-theoretic notation, identifiers map to the corresponding value and account membership. In other words, with a database perspective, elements of *ProcessId*, *ArtifactId* and *AgentId* are keys to processes, artifacts and agents, respectively.

The five causality edges can be easily specified by sets *used*, *wasGeneratedBy*, *triggeredBy*, *wasDerivedFrom*, and *wasControlledBy* making use of identifiers for artifacts, processes or agents, roles, and the associated accounts.

Finally, an OPM graph needs to identify explicitly which accounts are overlapping or refinements. For this, we use a set *Overlaps* enumerating lists of overlapping accounts, and a set *Refines* enumerating lists of refined accounts.

Figure 8: Timeless Causality Graph Data Model

The model of Figure 8 specifies all the necessary building blocks for creating OPM graphs. We now revisit the definition provided by Section 4, re-examining each item, and explaining it in terms of the formal model.

- Accounts are elements of the set
Account.- All artifacts of a graph must have identifiers belonging to the set
ArtifactId. A functionAof typeArtifactis total on the setArtifactId. For an artifact ida, account membership isA(a).acc. In OPM, the artifact entity contains a placeholder,A(a).value, for application specific values or references to the actual piece of state.- All processes of a graph must have identifiers belonging to the set
ProcessId. A functionPof typeProcessis total on the set ofProcessId. For a process idp, account memberhsip isP(p).acc. A process contains a placeholderP(p).valuefor application specific valuers or references to the actual process.- All agents of a graph must have identifiers belonging to the set
AgentId. For the total functionAG, and for an agent idag, account memberhsip isAG(ag).acc. Placeholder for the actual agent value isAG(ag).value.- Equality on edges is defined as follows:
For any

usededgesu_{1}=⟨p_{1},r_{1},a_{1},acc_{1}⟩∈Usedandu_{2}=⟨p_{2},r_{2},a_{2},acc_{2}⟩∈Used,u_{1}=u_{2}ifp_{1}=p_{2},a_{1}=a_{2},r_{1}=r_{2},acc_{1}=acc_{2}.For any

wasGeneratedByedgesg_{1}=⟨a_{1},r_{1},p_{1},acc_{1}⟩∈WasGeneratedByandg_{2}=⟨a_{2},r_{2},acc_{2}⟩∈Used,g_{1}=g_{2}ifp_{1}=p_{2},a_{1}=a_{2},r_{1}=r_{2},acc_{1}=acc_{2}.For any

wasControlledByedgesc_{1}=⟨p_{1},r_{1},ag_{1},acc_{1}⟩∈WasControlledByandag_{2}=⟨p_{2},r_{2},ag_{2},acc_{2}⟩∈WasControlledBy,c_{1}=c_{2}ifp_{1}=p_{2},ag_{1}=ag_{2},r_{1}=r_{2},acc_{1}=acc_{2}.For any

wasDerivedFromedgesd_{1}=⟨a_{1},a′_{1},acc_{1}⟩∈WasDerivedFromandd_{2}=⟨a_{2},a′_{2},acc_{2}⟩∈DerivedFrom,d_{1}=d_{2}ifa_{1}=a_{2},a′_{1}=a′_{2},acc_{1}=acc_{2}.For any

wasTriggeredByedgest_{1}=⟨p_{1},p′_{1},acc_{1}⟩∈WasTriggeredByandt_{2}=⟨p_{2},p′_{2},acc_{2}⟩∈WasTriggeredBy,t_{1}=t_{2}ifp_{1}=p_{2},p′_{1}=p′_{2},acc_{1}=acc_{2}.- The model does not place any constraints on roles, beyond their membership to the set
Role.- We introduce a convenience function
accountOf^{gr}operating on entities of a graphgr. For a given OPM graphgr=⟨A,P,AG,U,G,T,D,C,Ov,Re⟩, whereA∈Artifact,P∈Process,AG∈Agent, andU⊆Used,G⊆WasGeneratedBy,T⊆WasTriggeredBy,D⊆WasDerivedFrom,C⊆

WasControlledBy,Ov⊆Overlapping,Re⊆RefinementWe then introduce

accountOf^{gr}(p)= P(p).accaccountOf^{gr}(a)= A(a).accaccountOf^{gr}(ag)= AG(ag).accaccountOf^{gr}(u)= accwhereu=⟨p,r,a,acc⟩∈UaccountOf^{gr}(g)= accwhereg=⟨a,r,p,acc⟩∈GaccountOf^{gr}(t)= accwheret=⟨p_{1},p_{2},acc⟩∈TaccountOf^{gr}(d)= accwhered=⟨a_{1},a_{2},acc⟩∈DaccountOf^{gr}(c)= accwherec=⟨p,r,ag,acc⟩∈CeffectiveAccountOf:(It is defined similarly for artifacts and agents.)

effectiveAccountOf^{gr}(p)= accountOf^{gr}(p)⋃ _{i,j,k}accountOf^{gr}(u_{i,j,k}) whereu_{i,j,k}=⟨p,r_{i},a_{j},acc_{k}⟩ ∈U⋃ _{i,j,k}accountOf^{gr}(d_{i,j,k}) whered_{i,j,k}⟨a_{i},r_{j},p,acc_{k}⟩ ∈G⋃ _{i,j}accountOf^{gr}(t_{i,j}) wheret_{i,j}=⟨p,p_{i},acc_{j}⟩ ∈T⋃ _{i,j}accountOf^{gr}(t_{i,j}) wheret_{i,j}=⟨p_{i},p,acc_{j}⟩ ∈T⋃ _{i,j,k}accountOf^{gr}(c_{i,j,k}) wherec_{i,j,k}=⟨p,r_{i},ag_{j},acc_{k}⟩ ∈C- No topological restriction is placed on OPM graphs. For instance, ⟨
p,r_{1},a,∅⟩ ∈Uand ⟨a,r_{2},p,∅⟩ ∈Gare two acceptable edges of an OPM graph, which would create a circularity.If

gr_{1}=⟨A_{1},P_{1},AG_{1},U_{1},G_{1},T_{1},D_{1},C_{1},Ov_{1},Re_{1}⟩ andgr_{2}=⟨A_{2},P_{2},AG_{2},U_{2},G_{2},T_{2},D_{2},C_{2},Ov_{2},Re_{2}⟩, thengr_{1}∪gr_{2}=⟨A_{1}⊔A_{2},P_{1}⊔P_{2},AG_{1}⊔AG_{2},U_{1}∪U_{2},G_{1}∪G_{2},T_{1}∪T_{2},D_{1}∪D_{2},C_{1}∪C_{2},Ov_{1}∪Ov_{2},Re_{1}∪Re_{2}⟩, where the ⊔ operator is define as:A_{1}⊔A_{2}(x)=⟨v,a_{1}∪a_{2}⟩ withA_{1}(x)=⟨v,a_{1}⟩ andA_{2}(x)=⟨v,a_{2}⟩.If

gr_{1}=⟨A_{1},P_{1},AG_{1},U_{1},G_{1},T_{1},D_{1},C_{1},Ov_{1},Re_{1}⟩ andgr_{2}=⟨A_{2},P_{2},AG_{2},U_{2},G_{2},T_{2},D_{2},C_{2},Ov_{2},Re_{2}⟩, thengr_{1}∩gr_{2}=⟨A_{1}⊓A_{2},P_{1}⊓P_{2},AG_{1}⊓AG_{2},U_{1}∩U_{2},G_{1}∩G_{2},T_{1}∩T_{2},D_{1}∩D_{2},C_{1}∩C_{2},Ov_{1}∩Ov_{2},Re_{1}∩Re_{2}⟩, where the ⊓ operator is define as:A_{1}⊓A_{2}(x)=⟨v,a_{1}∩a_{2}⟩ withA_{1}(x)=⟨v,a_{1}⟩ andA_{2}(x)=⟨v,a_{2}⟩.If

gr_{1},gr_{2}∈OPMGraph, then

gr_{1}⋃gr_{2}∈OPMGraphand

gr_{1}⋂gr_{2}∈OPMGraph.- For an OPMGraph
gr=⟨A,P,AG,U,G,T,D,C,Ov,Re⟩, for an account α,view(α,gr) is ⟨A_{α},P_{α},AG_{α},U_{α},G_{α},T_{α},D_{α},C_{α},Ov,Re⟩, where:

A_{α}⊆Awith A_{α}={ (a,acc)∈Asuch that α∈effectiveAccountOf^{gr}(a)}P_{α}⊆Pwith P_{α}={ (p,acc)∈Psuch that α∈effectiveAccountOf^{gr}(p)}AG_{α}⊆AGwith AG_{α}={ (ag,acc)∈AGsuch that α∈effectiveAccountOf^{gr}(ag)}U_{α}⊆Uwith U_{α}={ ⟨p,r,a,acc⟩ ∈Usuch that α∈acc}G_{α}⊆Gwith G_{α}={ ⟨a,r,p,acc⟩ ∈Gsuch that α∈acc}T_{α}⊆Twith T_{α}={ ⟨p_{1},p_{2},acc⟩ ∈Tsuch that α∈acc}D_{α}⊆Dwith D_{α}={ ⟨a_{1},a_{2},acc⟩ ∈Dsuch that α∈acc}C_{α}⊆Cwith C_{α}={ ⟨p,ag,acc⟩ ∈Csuch that α∈acc}- A legal account view
gr=⟨A,P,AG,U,G,T,D,C,Ov,Re⟩ is such that there is no cycle inU,G,T,Dand if ⟨a_{1},r_{1},p_{1},acc_{1}⟩∈Gand ⟨a_{1},r_{2},p_{2},acc_{1}⟩∈G, then ⟨a_{1},r_{1},p_{1},acc_{1}⟩=⟨a_{1},r_{2},p_{2},acc_{1}⟩, whereacc_{1}is a singleton.- Two accounts α
_{1},α_{2}are declared to be overlapping in an OPMgraphgr=⟨A,P,AG,U,G,T,D,C,Ov,Re⟩, if ⟨ α_{1},α_{2}⟩ ∈Ovor ⟨ α_{2},α_{1}⟩ ∈Ov.- Two accounts α
_{1},α_{2}are declared to be legally overlapping in an OPMgraph if they are overlapping and if their respective account views ⟨A_{1},P_{1},AG_{1},U_{1},G_{1},T_{1},D_{1},C_{1},Ov_{1},Re_{1}⟩ and ⟨A_{2},P_{2},AG_{2},U_{2},G_{2},T_{2},D_{2},C_{2},Ov_{2},Re_{2}⟩ are such thatHence, the overlapping relationship is reflexive, symmetric but

Domain(A_{1})⋂Domain(A_{2})≠∅orDomain(P_{1})⋂Domain(P_{2})≠∅orDomain(AG_{1})⋂Domain(AG_{2})≠∅.nottransitive.- An account α
_{1}is declared to refine account α_{2}in an OPMgraphgr=⟨A,P,AG,U,G,T,D,C,Ov,Re⟩, if ⟨ α_{1},α_{2}⟩ ∈Re.- An account α
_{1}is declared to be legally refining account α_{2}in an OPMgraph if they are overlapping and if their respective account viewsgr_{1}=⟨A_{1},P_{1},AG_{1},U_{1},G_{1},T_{1},D_{1},C_{1},Ov_{1},Re_{1}⟩ andgr_{2}=⟨A_{2},P_{2},AG_{2},U_{2},G_{2},T_{2},D_{2},C_{2},Ov_{2},Re_{2}⟩ are such that

source(gr_{2})⊆source(gr_{1})andsink(gr_{2})⊆sink(gr_{1})

** Concept is currently ill-defined. Definition remaining to be finalised. Can we define refinement just on syntactic properties of the graphs?** Hence, the refinement relationship is reflexive, asymmetric and transitive.

Copyright © 1999-2012 by the contributing authors.
All material on this collaboration platform is the property of the contributing authors.