Model Discussion: Basics
In Section 1, you say "annotated with further information pertaining to execution". This seems a bit out of the blue. Execution of what? Why solely of execution, given the (possible to read as) broader scope of the introductory paragraph and the explicit inclusion of non-digital artefacts just beforehand?
I don't see the difference between a stream and an object (Section 2). Isn't a stream just something which changes state (next readable input and sequence of incoming data) over time? Why would we want a stream to be an artefact?
Jim (from
ModelDiscussionTime):
>
I would have no problem with us defining inference rules for
>
common annotations from other schema based on our core model
>
as a convenience, or generally allowing people to add non-causal
>
annotations as additional metadata - that could be very helpful
>
in mapping to workflow and digital library views of the world.
>
But if we're adding them to the model itself, I think we need
>
a good causality/provenance reason to do so.
It also makes sense to me, for both getting consensus and keeping a clean, widely applicable model, to have a light-weight view of provenance and then optional additions of common annotations. As has been discussed in
ModelDiscussionTime, the inclusion of time in the core model is not (yet) justified and seems a ripe candidate for a separate section on common but optional "Time Annotations". Another possible candidate is roles: it seems they need to be either justified by requirements in relation to causality/provenance or made optional. Specifically, you state "Given that a process may have used several artefacts, it is important to identify the roles under which these artefacts were used." Important to meet what requirement? In all cases?
Shouldn't "alternate" be "alternative"? Alternate suggests you cannot have both at once, which, as in Figure 5, we do?
A minor point, but Constraint 2 is unnecessary as it stands: a provenance graph cannot be cyclic by definition (in Section 1). To aid discussion, perhaps it would be helpful to distinguish a "provenance graph" (which may not fulfil all constraints) from a "sound provenance graph", or something like that. We could also make explicit that the ability of the model to include unsound graphs allows for flexibility and detection and repair of errors when constructing a model.
"isCausedBy" in Section 6 is a different tense to every other relationship ("used", "generatedBy"...)
In Constraint 3, should the < between times be <= to allow for simultaneous causal relationships? For example, "I can see over the wall because I am over 7 foot tall"? Or "The processor's memory is in the state of being 90% full because the input data's size is 800 MB."
Accounts are introduced a bit out of the blue in Section 7. What are they? Given their apparent intended function, why are they annotations to the graph rather than views over it (sub-graphs or subsets of annotations)? What requirement means that it is good that implementations do not have to associate accounts with processes, artifacts and agents (Section 7)?
Can annotations themselves capture some causal relationships, as in
PASS's model?
--
SimonMiles - 17 Aug 2007
>
Accounts are introduced a bit out of the blue in Section 7. What are they? Given their apparent intended function, why are they annotations to the graph rather than views over it (sub-graphs or subsets of annotations)? What requirement means that it is good that implementations do not have to associate accounts with processes, artifacts and agents (Section 7)?
Accounts are a means to represent partial agreement between provenance descriptions of the same process, among other things to aid in assembling provenance descriptions from independently-produced parts. They're not necessarily different views of the same graph; they might represent conflicting accounts of the same process. The requirement that processes, artifacts, and agents are stateless is what I think makes it necessary to keep account information off of them; if two accounts disagree e.g., about the attributes of a file, then they're really talking about two different artifacts, in this model.
--
JoeFutrelle - 21 Aug 2007
Joe:
>
Accounts are a means to represent partial agreement between provenance descriptions of the same process, among other things to aid in assembling provenance descriptions from independently-produced parts. They're not necessarily different views of the same graph; they might represent conflicting accounts of the same process.
OK, thanks. The latter statement seems to assume that a graph does not represent a conflicting account in itself. Is a single graph always non-conflicting? Or a single account? If that latter, is this lack of conflict automatically a property of it being an account or a constraint of it being a "sound account"? What does conflicting mean: subject to the constraints listed in the document, or something more domain-specific?
>
The requirement that processes, artifacts, and agents are stateless is what I think makes it necessary to keep account information off of them; if two accounts disagree e.g., about the attributes of a file, then they're really talking about two different artifacts, in this model.
OK. In the current document, you say "Artifacts are required to be an instance of an object's state because we aim to model the provenance of artifacts." I didn't find (but might have just missed) the rationale for the statement: presumably, that if an artifact/agent/process wasn't stateless it would be ambiguous as to what was caused or effected?
--
SimonMiles - 22 Aug 2007
One extra thought after the discussion today. In the same way that it may be good to have a common annotation of the "agent" to a processes it catalyses, it may be good to have a common annotation of an "object" to the artifacts that denote its states, e.g. the patient going into surgery with two organs is the same object as the patient coming out of surgery with one organ (otherwise you could interpret the surgery process as something replacing one patient with another).
--
SimonMiles - 22 Aug 2007
to top