Skip to topic | Skip to bottom

Provenance Challenge


Start of topic | Skip to actions
Open Provenance Model Contents
  1. Introduction
  2. Basics
  3. Overlapping and Hierarchichal Descriptions
  4. Provenance Graph Definition
  5. Timeless Formal Model
  6. Inferences
  7. Formal Model and Time Annotations
  8. Time Constraints and Inferences
  9. Support for Collections
  10. Example of Representation
  11. Conclusion
  12. Best Practice on the Use of Agensts
  13. References

9 Support for Collections

Collections represent groups of objects. Computer programs in general, and workflows in particular, usually offer primitives to manipulate such collections. It is therefore important that OPM offers the means to represent collections and their provenance. Specifically, it is crucial to be able to distinguish the provenance of collections from the provenance of the items contained in them.

Collections are represented by artifacts, and an OPM graph can express that a collection was used or was generated by a process. (Likewise, a summary edge can also express that a collection was derived from another.)

At any point in a computation, a collection consists of a group of member artifacts, which can be enumerated by means of a collection accessor, and individually used by processes. Symmetrically, a group of artifacts generated by processes can be grouped into a collection by means of a collection constructor.

Collection types are defined by means of collection accessors and constructors; such operations are expressed by OPM processes, and the algebraic properties of these operations define the properties of collections: e.g. ordered or unordered collections, bags or sets, indexable collections or not.

Over time, in order to promote inter-operability, OPM needs to define accessors and constructors for common collections.

Figure 16 illustrates an example of collection, whose provenance consists of two overlapping views (refinements). In the high level view, the collection [b1, b2, b3, ...] is described as resulting from mapping a function f over a collection [a1, a2, a3, ....]

Provenance of a Collection
Figure 16: Provenance of a Collection

The individual members of collection [b1, b2, b3, ...] were generated by application of process p to the members of collection [a1, a2, a3, ....] The convention is that the role associated with each individual of a collection is the path that allows us to access that individual artifact in the collection. It could be a simple index (0, 1, ...) when the collection is an ordered list, or it can be an XPath expression when the collection is an XML document. The `collection' role is used to mark the used edge in the accessor and the generated edge in the constructor. Algebraic definitions of constructors and accessors must also define the roles that are permitted.


Inclusion of collections seems to be overreaching; it can be handled at the annotation/serialization level. For example, I don't see why collections--things that are composed of other things--are any more essential than ideas like 'is the average of' or 'is a super class of'.

-- PatrickPaulson - 18 Aug 2008

to top

End of topic
Skip to action links | Back to top

I Attachment sort Action Size Date Who Comment
collection.jpg manage 233.0 K 30 Jul 2008 - 18:46 PaulGroth  

You are here: Challenge > OPM > OPM1-01Review > OPM1-01Review-Collections

to top

Copyright © 1999-2012 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.