Open Provenance Model Contents
- Introduction
- Basics
- Overlapping and Hierarchichal Descriptions
- Provenance Graph Definition
- Timeless Formal Model
- Inferences
- Formal Model and Time Annotations
- Time Constraints and Inferences
- Support for Collections
- Example of Representation
- Conclusion
- Best Practice on the Use of Agensts
- References
7 Formal Model and Time Annotations
The Open Provenance Model allows for causality graphs to be annotated with time annotations. In this model, time is
not intended to be used for deriving causality: if causal dependencies exist, they need to be made explicit with the appropriate edges. However, time may have been observed during the course of a process, and we would expect such time information to be compatible with causal dependencies: the time of an effect should be greater than the time of its cause (for a same clock). Hence, time is useful in validating causality claims.
In the Open Provenance Model, time may be associated to
instantaneous occurrences in a process. We currently recognize four instantaneous occurrences, which have a reasonable shared understanding in real life and computer systems. Two of them pertain to artifacts, whereas the other two relate to processes. For artifacts, we consider the occurrences of
creation and
use, whereas for processes, we consider their
starting and
ending.
The rationale for choosing instant time for the
OPM model is the same as for adopting artifacts as immutable pieces of state. At a specific time, an object we consider will be in a specific state, which we refer to as artifact, and for which we can express the causality path that led to the object being in such a state.
In some scenarios, occurrences of use or creation of objects and occurrences of starting or ending of processes may not be instantenous. To capture such scenarios, detailed processes and artifacts, and their respective causal dependencies, need to be made explicit, in order to be expressible in the
OPM model. For instance, the starting of a nuclear power plant is not usefully modelled as an instantatenous occurrence, when one tries to understand failures that occurred during this activity; hence, this whole starting occurrence must be modelled by one process (or possibly several), which in turn have instanenous beginnings and endings.
In the Open Provenance Model, time information is expected to be obtained by
observing a clock when an occurrence occurs. Given that time is observed, time accuracy is limited by the granularity of the clock and the granularity of the observer's activities. Hence, while the notion of time we consider is instantaneous, the model allows for an interval of accuracy to support granularity of clocks and observers. In the
OPM model, an instantaneous occurrence happening at time
t is annotated by two observation times
tm,tM, such that the occurrence is known to have occurred
no later than
tM and no earlier than
tm. Hence,
t ∈ [tm,tM].
Figure 13: Causality Graph Data Model and Time Annotations
Concretely, for an artifact, we will be able to state that it was used (or generated by) no earlier than time
t1 or no later than time
t2. For a process, we will be able to state that it was started (or terminated), no earlier than time
t1 or no later than time
t2 .
In Figure 13, we revisit our formal model, examining where time annotations are permitted. We first introduce a new primitive set
Time, for which a given serialization will specify a format (such as the standard coordinated universal time, UTC). We then introduce
Observed Time as a pair of time values (whose set is
OTime). All time annotations are optional, which we note by
OTime0 in the definitions.
Edges involve
OTime in their cartesian product. Edges from
WasGeneratedBy and
Used can be annotated by an
optional timestamp, marking the associated artifact was known to be generated or used, at a given time (expressed as an observation interval).
For
WasControlledBy, we allow two
optional timestamps marking when the process was known to be started or terminated, respectively.
For
WasDerivedFrom, we also allow one
optional timestamp. Given Figure 9 and associated inferences, for a given edge
< a1,a2,acc > ∈ wasDerivedFrom, there is an implicit process that generated
a1 and that consumed
a2. The time annotation indicates when the artifact was generated.
Likewise, for
!WasTriggeredBy, we also allow one
optional timestamp. Given Figure 9 and associated inferences, for a given edge
< p1,p2,acc > ∈ WasTriggeredBy , there is an implicit artifact that was used by
p1 and generated by
p2. The time annotations indicates the time when the artifact was used by
p1.
Comments
to top