Second Provenance Challenge Template
Participating Team
- Short team name: SDG
- Participant names: Karen Schuchardt, Tara Gibson, Eric Stephan
- Project URL: http://sdg.pnl.gov
- Reference to first challenge results (if participated): SDG
Differences from First Challenge
For the second challenge, we have modified our provenance repository so that it is an
Alfresco content management system modified to support the URIQA for getting, putting, and deleting rdf triples. Alfresco provides management of content, but in our modified version, metadata and provenance is managed through a
Sesame RDF store.
The extensible DASL search interface has been modified to take SPARQL queries.
Our data model is unchanged with the following exceptions:
- real namespaces are now used
- rdf type is defined
Provenance Data for Workflow Parts
Data Model Description We assign unique ids to resources we want to describe. Currently the ids are simply generated urls. RDF is then used to describe the resources, which for the provenance challenge include: workflow instances, actor instances, data port instances, and parameter instances.
The workflow execution graph is captured primarily through the links isInput and hasOuput. The naming is somewhat inconsistent in form because the names reflect links that
flow in a single direction. A naming convention such as hasInput and hasOutput is more clear but more difficult to generate the graph. Important link properties are shown in italics in the table.
The data model (also shown graphically in
http://twiki.grimoires.org/pub/Challenge/SDG2/provmodel.tif)
Name | Description | Applies |
dc:title | a non-unique identifying name for some resource | all |
dc:format | a type identifying the content/type of a resource | all |
rdf:type | ontological categorization | all |
dc:creator | identifier for person responsible for creating the workflow | workflow |
sdg:creaed | date/time of creation | workflow |
sdg:wasRunBy | identifier for person who ran the workflow | workflow |
sdg:owningInstitution | name of organization responsible for workfow | workfow |
sdg:hasStatus | value of workflow execution status | workflow |
sdg:instantiationOf | Currently referes to the name of class of object for which this resource is an instantiation. In fullsystem, would be link to the resource | workflow, actor |
sdg:startedExectuion | date/time at which execution started | workflow, actor |
sdg:finishedExecution | date/time at which execution complted | workflow, actor |
sdg:hasParameter | link to resource that fully describes a parameter instance | actor |
sdg:hasOuput | link to resource that fully describes a data (port) output | actor |
sdg:isInput | link to resource that receives this resoruce as input | parameter, data |
sdg:isPartOf | link to workflow instance that the resource is associated with (grouping mechanism) | actor, data, parameter |
sdg:hasSource | link to workflow sources (actors with no inputs) | workflow |
sdg:value | optional value of data item | data, parameter |
sdg:hasHashOfValue | optional hash of value of data | data, parameter |
Stage Data
Output of the three stages of our queries is provided in the
RDF/XML notation.
Primary Workflow Stage Data
Secondary Workflow Stage Data
After some evaluation of different provenance models we reformatted our own model, here is the new model and data for the first workflow.
New Primary Workflow Stage Data
Model Integration Results
We translated and performed the queries over data from Mindswap and VisTrails
Vistrails
Query1:
Mindswap
Query1:
Translation Details
The translation of
VisTrails was performed using primarily XSLT, it was mapped from XML to extract RDF which would adhere to our schema. We only extracted a subset of the data represented. There was additional information, representing the workflow representation that we did not import. After translating to rdf we also needed to infer object types and certain properties (such as title) to represent the workflow correctly within our result image. This information was inferred based on rdf types and other values extracted from the xml. Several relationship properties, such as has Input/Output were also inferred based on object type.
The translation of Mindswap was performed primarily using OWL. The mapping was described between one schema and another and the queries performed based on properties inferred by that mapping. We noticed that information seemed to be lacking about the initial files of the workflow (the anatomy headers and images), for this reason they could only be represented as uris in the query result.
In both cases when combining across data sources, we needed to assert which nodes in the graph corresponded to nodes in a different graph (for example, the reslice headers in our data were associated with the reslice headers in the mindswap data.)
Benchmarks
Describe your proposed benchmark queries, how the comparable quantities are determined, and the results of applying the benchmark to your own system
Further Comments
Provide here further comments.
Conclusions
Provide here your conclusions on the challenge, and issues that you like to see discussed at a face to face meeting.
--
KarenSchuchardt - 22 Jun 2007
to top