First OPM Workshop Minutes
Introduction
This is a summary of the discussion about the Open Provenance Model from the first Open Provenance Model Workshop. The discussion broadly covered the following areas in the
OPM: Agents, Streams, Inference Rules, Collections, Artifacts, Edges, Alternates and Time. The result of this discussion was a suggestion to provide a minimal update to the
OPM (
OPM 1.01), which could be the basis to develop serializations.
- The issues that can be addressed in OPM 1.01 are marked by .
- Issues that were controversial are denoted by .
- Actual changes introduced in 1.01.
In addition to these notes, there is also a chat log (see
FirstOPMWorkshopChatLog).
Agents
At the beginning of the meeting, there was a general consensus that the use of Agents was verbose. The same concept could be represented using other elements of the
OPM. Instead, it was felt that the important part was the wasControlledBy edge. Hence, it was suggested that the term agent could be dropped. Instead of using agent, it was suggested that artifact (see Luc's slides,
http://twiki.ipaw.info/pub/Challenge/OpenProvenanceModelWorkshop/Luc-Moreau-agent-meaning.pptx) or process could be used instead. Perhaps, both could be used. It was suggested that perhaps an Agent was just a subclass of a process.
However, it was brought up that if by dropping agent the distinction between wasControlledBy and wasTriggeredBy becomes unclear and may need to be redefined.
Later in the meeting, Jim Myers argued for the need for agents especially in terms of modeling users and outside world entities (e.g. the NSF). It was felt that the agent provides some sort of context switch between these broad entities and digital processes.
Essentially, it was felt that there needs to be some mechanism for capturing the idea of a controlling hierarchy between entities whether using or not using agents.
Streams
- OPM doesn't currently handle streams. It was felt that there needed to be some mechanism to represent provenance produced by streaming systems. note added to 1.01 to that effect.
- It was not clear how streams could be introduced. However, it was thought that maybe the same mechanism that was used to deal with agents or long running processes could be used.
Unclear Inference Rules
- Figure 9 Rule (3) may introduce unintuitive inferences particularly in the case of translating from processes to wasDerivedFrom relationship.
- It was suggested that this rule be dropped as it currently stands to remove these unintuitive inference possibilities. In OPM 1.01, definition 8 about wasDerivedFrom is made stronger, original rule (3) is dropped (with an explanation of why it is not suitable), and instead, another rule (3) is proposed, with a weaker inference. Such changes are reflected in Figure 7.
- Two consensus suggestions were given:
- Introduce a mayHaveBeenDerivedFrom relationship that a similiar inference rule to number three produces. Done.
- Say that all the outputs of process wereDerviedFrom all the inputs to that process. This implies the need for collections.
Collections
- There was agreement in the room that there needed to be a mechanism in OPM to to deal with collections. In particular for linking elements of input lists to elements of output lists. There was debate about whether there should be extra syntax to achieve this. It was demonstrated that collections could be modeled in the current OPM syntax explicitly using the wasDerivedFrom edge and accounts. However, there were concerns about this approach because of worries about scalability, in terms of the amount of documentation needed to be produced for collections with many elements, and verbosity, in terms of inability to easily understand the relationship between collections. to acknowledge the importance of collections a whole section on collections was introduced in 1.01. Experience with the model needs to be obtained, and the scalability issued to be investigated.
- One suggestion is introducing intentional properties through annotations to the graph. However, some felt that this would go against the notion that OPM represents explicit account of provenance. it could be that rules may be used to explain how the refined view of figure 14 can be derived automatically from the higher level description.
- Another suggestions was to introduce a set of subclasses of process from which one can infer the wasDerviedFrom relationships using rules.
- The final agreed upon suggestion was to to develop a set of patterns for translating from collections to the current representation of OPM in particular serializations. For example, one might use annotations and rules in OWL to represent relationships between input and output collections. Patterns would then express how this syntax could be translated into the larger but more explicit OPM model. OPM 1.01, section 9, no w offers the broad framework for expressing collections. Specific accessor/constructor pairs and associated roles need to be defined.
Artifacts
- Many people were confused about the meaning of artifact ids. Thus, the revised OPM document should explicitly say that artifact ids are for the purpose of representing graphs and not for dereferencing the data.
- Given that artifact ids are there for the graph structure, it was felt that there needed to be a mechanism to retrieve actual data from an artifact either by reference or by value. In 1.01, a *Value* field is introduced as a placeholder for a reference to or value of the actual application data/processes/agents artifacts/processes/agents refer to.
- The suggested mechanism was annotations on both artifacts and processes to hook to "actual" data by reference or by value. The convention for representing these annotations was not made clear. In 1.01, it is now part of the model, serialisations will need to specify how this should be encoded. In 1.01, Figure 24 now provides an example of values for artifacts and processes.
- It was debated as to whether these extensibility points should be in OPM or left to serializations. The majority felt that these points needed to be in OPM. In 1.01, the concept of an placeholder for values/references is introduced; serialisations will express how this should be encoded.
Edges
- wasTriggedBy may imply that the cause is both necessary and sufficient for the effect. The name may need to be changed to remove this implication. At a minimum this should be clarified (e.g. stated in bold) in the document that the cause is necessary but may not be sufficient for the effect. note added to 1.01 to that effect, after Definition 7.
- The semantics of roles was unclear or misunderstood by a number of people. It was made clear that roles have no semantics that they are just tags. It was felt that this should be clarified and made stronger in the OPM document. remark on meaning of roles (section 2.2) expanded in 1.01.
- It was suggested by David Holland to introduce a new edge type to denote that artifacts are different versions of each other. In order to denote consistency across artifacts referring to the same data. It was suggested that instead of introducing a new edge a subclass of wasDerivedFrom be used to achieve this.
Long Running Processes
- There needs to be a way to represent long running processes with no definite end. This can be expressed using the time annotations in the current model as well as time annotations on links. Essentially, saying that a process caused a particular artifact at a different time.
- Thus, it was agreed that the OPM is a model about artifacts in the past and the processes can be currently running (e.g. processes documented in the OPM model document the past and the present but not the future) This should be clarified in the document. Note added to section 2.1 of OPM 1.01.
Alternates
- Currently, alternate accounts are constrained to a symmetric relation? This may result in systems inferring that some accounts are alternates of each other when this was not meant to be by the creator of the account. In particular, when alternate accounts are created by different users (i.e. asserters). This problem arises because the OPM does not support attribution. Furthermore, it is unclear from the current document whether the alternate relation is transitive and reflexive as well as being symmetric. In 1.01, we now have overlapping an refinement relationships. The former is reflexive, symmetric, non-transitive, whereas the latter is reflexive, asymmetric and transitive.
- It was agreed that the current definition of alternates is too weak and that there needed to be more notions of alternates. The current version as well as the a version of alternates that catered for hierarchal accounts. Thus, it was suggested that the two notions of alternates be expressed in OPM.
- The current notion of alternates called overlapping alternates. done in 1.01
- A new hierarchal/refinement alternate, which is a subclass of an overlapping alternate. However, it was unclear what constraints this alternate relationship required. Whether refinement alternates must share all of the same inputs and outputs or just some of the same inputs and outputs. Should they require the same incoming and outgoing edges? concept introduced but definition not final yet.
- It was proposed that for the next version of OPM alternates should only be defined in terms of commonality of nodes and not edges. Based on experience, a new notion of alternates that considers edges could be added later, if necessary. In 1.01, edges are not referred to in overlapping and refinement.
- Some felt that the term alternates was too strong and a better term should be used. alternate is replaced by overlapping.
Time
- Some felt that the time section roughly duplicates currently existing standards documents and is redundant. In particular, David Holland, will make comments identifying the duplicate sections and what standards might be useful to cite or use.
--
PaulGroth - 20 Jun 2008
to top