Second Provenance Challenge Template
Participating Team
- Slides on the knowledge-oriented provenance environment, used during the Second Provenance Challenge workshop (June 26th 2007, Monterey Ca): KOPE-2ndProvenanceChallenge.ppt
OntoGrid Approach Overview
According to the myGrid provenance pyramid, we can structure provenance information as a pyramid with four main levels: Data, Organization, Process, and Knowledge. The first three levels have been widely addressed in the available literature and in the first edition of the Challenge. Thus, our focus is primarly on the Knowledge level. Knowledge provenance, settled on top of the provenance pyramid, is focused on the interpretation of the information registered by the other three.
This is the first participation of
OntoGrid in the Provenance Challenge. We aim at semantically interpreting the process documentation recorded by third party provenance systems by means of knowledge representation techniques from the field of Semantic Web that help users unacquainted with provenance systems to understand the results of a provenance query and therefore the execution of a distributed process.
Our provenance interpretation system is compliant with the provenance datamodel of the University of Southampton.
OntoGrid's distributed process documentation technology, accordingly to the S-OGSA architecture, the Business Process Monitor (BPM), records process documentation in a format compliant with such datamodel. We intend to use BPM as the infrastructure for documenting processes in distributed environments. BPM also provides a series of inspection methods which allow querying such process documentation. More information about BPM can be obtained in
http://www.insurancegrid.org/bpm/ (click on "About").
Our approach to analysing past processes through their provenance logs is based on
ProblemSolvingMethods (PSMs). A PSM is defined as a sequence of actions that accomplish a task in a specific domain. PSM have been traditionally used in Artificial Intelligence as a tool to model, establish, and control the sequence of actions required to solve domain-specific tasks by means of decomposing them into simpler subtasks down to the level of primitive actions.
Beyond execution of a reasoning process, PSM can also be used to analyze computational problem-solving behavior or reasoning in a service-oriented scenario. In this regard, PSM are high-level, domain-independent, knowledge templates that explain the underlying reasoning going on within a process execution. Our aim is to interpret provenance data (logs) at the knowledge level of the provenance pyramid, using the PSM analytic approach. Our provenance system detects occurrences of PSM amidst process documentation resulting from the execution of a particular task.
There are two different approaches towards using PSM for knowledge provenance. On the one hand, domain-independent PSM describing different reasoning processes, like those identified in
CommonKADS, e.g. validation, monitoring, or diagnostics, can be used to come up with a description, in terms of the particular domain of application, of the reasoning process which has taken place during execution. On the other hand, domain-level (instead of reasoning-level), process-oriented PSM can be used to specify the knowledge flow of distributed applications and validate their execution by means of interpreting their provenance logs against such specification.
In the case of the Challenge, we have followed the first approach in its simpler flavour: we use a single, domain-independent PSM category which describes a
Catalogation reasoning process to interpret, in terms of the population-based brain atlases domain, the execution of the Challenge workflow as recorded by preexisting process documentation infrastructure.
Provenance Data for Workflow Parts
Interpretation of provenance in a given domain requires a number of knowledge resources which, in the case of the Challenge, are the following:
- The OntoGrid PSM meta-model (psmontology.rdfs). It describes the basic entities of a PSM, and how they are related with each other. These key components can be found in the ProblemSolvingMethods section and are represented as follows:
- The catalogation PSM library (psmontologyModel.rdf) which provides a hierarchy of methods describing strategies to solve catalogation tasks. This PSM library is an instance of the OntoGrid PSM meta-model.
- The Brain Atlas ontology (BrainAtlasDomainOntology.rdfs, BrainAtlasDomainOntology.rdf). It describes the domain to be interpreted by the PSM library.
Explicit representation of domain and PSM meta-model as ontologies allows bridging domain and PSM entities, hence using domain-independent PSM to interpret domain-specific provenance. We have exploited the flexibility of the
content tag of the Southampton datamodel to include in the interacion p-assertions of the process documentation corresponding to the challenge workflow semantic annotations on the data exchanged by the services involved. These annotations allow us to relate domain entities, e.g.
brain atlas images, with the input and output roles of the PSM category, e.g.
initial observation, used to interpret the provenance log.
Roles serve two purposes, first they act as containers for domain concepts and, second, as pointers to the types of domain concepts that can play this role. For example,
brain image can be the input role of an
initial observation during the catalogation process. Roles can be either static or dynamic. Static roles contain concepts that are persistent across the process. Dynamic roles contain concepts that change during the reasoning process. Dynamic roles characterize the process because they are constantly manipulated by the PSM in which they occur.
The prime catalogue method of our PSM library decomposes the overall population-based brain atlas creation task into simpler subtasks, as shown in the next figure, where ellipses represent tasks and rectangles are methods that can solve those tasks. Tasks achievable by several different methods are linked to them with dashed lines.
At the top level, the brain atlas creation process is described as an overall catalogation task whose input role is
initial observation and its output role is
information representation. After applying the
prime catalogue method, a refinement of such task is produced, as shown in the next figure. This refinement shows the subtasks (ellipses) into which the method decomposes the current task and how they are connected by means of roles (squares). Subsequent refinements of each of these subtasks are provided by other PSM.
During this iterative refinement process, we match, for each level of the task-method decomposition, the data flow of the current PSM with the provenance DAG (i.e. the p-DAG, a directed acyclic graph whose nodes are data pieces exchanged by the services comprised in the distributed application and edges represent the precedence relation between them) by means of detecting paths (in fact, twigs) in the p-DAG between data sets which correspond to the data flow between the input and output roles of the PSM. The hierarchical structure of PSM i) guides the search process within the p-DAG and ii) allows producing interpretations of provenance at different levels of detail. As a result, the roles of the succeeding PSM are enriched with domain information extracted from the the p-DAG nodes against which they are successfully matched. The ouput of the process is an XML structure representing the PSM hierarchy and their enriched roles. This structure is later used to produce a graphic representation of the provenance interpretation.
- AlgorithmResult.xml: XML structure representing the PSM hierarchy and their enriched roles resulting from the provenance interpretation process using PSM.
The original process documentation, compliant with the Southampton datamodel, for the three parts of the original workflow can be found at
http://www.insurancegrid.org/bpm/. Select "Provenance Challenge13:48:13,496 ,1502" from the list of available Virtual Organizations , then select option "sniffer" on the left, and, in the "Actors" tab select the subprocess of the brain atlas workflow in which you are interested. Clicking on "Filter" will return the process documentation relevant to those actors. A file containing this provenance information can be retrieved by selecting "p-struct export".
The next are links to the semantically enhanced individual logs for each workflow stage:
- Stage1-2.xml: Part 1: align_warp and reslice (stages 1 and 2)
Model Integration Results
The provenance interpretation approach uses process documentation produced accordingly to the provenance datamodel of the University of Southampton. The provenance to be interpreted with our PMS libraries is obtained by means of provenance queries, using the pre-existing querying facilities.
Translation Details
The semantically-enhanced provenance log of the brain atlas workflow has been processed in order to produce an explicit representation of the p-DAG. This pre-processing eases the matching process between (in this case) the Catalogation PSM library and the Brain Atlas p-DAG, which is the core of the provenance interpretation based on PSM. The resulting structure simplifies creating XPath queries against the p-DAG XML, which relate PSM input and output role sets.
Benchmarks
Describe your proposed benchmark queries, how the comparable quantities are determined, and the results of applying the benchmark to your own system
Our approach to provenance interpretation is not focused on querying provenance but rather on interpreting, i.e. explaining, distributed process executions by the provenance information returned by pre-existing provenance query facilities. Thus, it is orthogonal to any system compliant with the University of Southampton datamodel. Thus, benhmarking can not appeal to its querying capabilities (as they are the same of the underlying provenance infrastructure) but rather to the quantity, i.e. does the system detect all the expected occurrences of the available PSM in the provenance logs?, and quality of the intepretations produced, i.e. do they contribute for a better understanding of process executions by non-provenance knowledgeable users?
Further Comments
Provide here further comments.
Conclusions
Provide here your conclusions on the challenge, and issues that you like to see discussed at a face to face meeting.
--
SimonMiles - 11 Dec 2006
--
JoseManuel - 28 May 2007
to top