Open Provenance Model Contents
- Introduction
- Basics
- Overlapping and Hierarchichal Descriptions
- Provenance Graph Definition
- Timeless Formal Model
- Inferences
- Formal Model and Time Annotations
- Time Constraints and Inferences
- Support for Collections
- Example of Representation
- Conclusion
- Best Practice on the Use of Agensts
- References
9 Support for Collections
Collections represent groups of objects. Computer programs in general, and workflows in particular, usually offer primitives to manipulate such collections. It is therefore important that
OPM offers the means to represent collections and their provenance. Specifically, it is crucial to be able to distinguish the provenance of collections from the provenance of the items contained in them.
Collections are represented by artifacts, and an
OPM graph can express that a collection was used or was generated by a process. (Likewise, a summary edge can also express that a collection was derived from another.)
At any point in a computation, a collection consists of a group of member artifacts, which can be enumerated by means of a
collection accessor, and individually used by processes. Symmetrically, a group of artifacts generated by processes can be grouped into a collection by means of a
collection constructor.
Collection types are defined by means of collection accessors and constructors; such operations are expressed by
OPM processes, and the algebraic properties of these operations define the properties of collections: e.g. ordered or unordered collections, bags or sets, indexable collections or not.
Over time, in order to promote inter-operability, OPM needs to define accessors and constructors for common collections.
Figure 16 illustrates an example of collection, whose provenance consists of two overlapping views (refinements). In the high level view, the collection
[b1, b2, b3, ...]
is described as resulting from mapping a function
f
over a collection
[a1, a2, a3, ....]
Figure 16: Provenance of a Collection
The individual members of collection
[b1, b2, b3, ...]
were generated by application of process
p
to the members of collection
[a1, a2, a3, ....]
The convention is that the role associated with each individual of a collection is the path that allows us to access that individual artifact in the collection. It could be a simple index (0, 1, ...) when the collection is an ordered list, or it can be an XPath expression when the collection is an XML document. The `collection' role is used to mark the used edge in the accessor and the generated edge in the constructor. Algebraic definitions of constructors and accessors must also define the roles that are permitted.
Comments
Inclusion of collections seems to be overreaching; it can be handled at the annotation/serialization level. For example, I don't see why collections--things that are composed of other things--are any more essential than ideas like 'is the average of' or 'is a super class of'.
--
PatrickPaulson - 18 Aug 2008
to top