Second Provenance Challenge Template

Participating Team

Differences from First Challenge

For the second challenge, we have modified our provenance repository so that it is an Alfresco content management system modified to support the URIQA for getting, putting, and deleting rdf triples. Alfresco provides management of content, but in our modified version, metadata and provenance is managed through a Sesame RDF store. The extensible DASL search interface has been modified to take SPARQL queries.

Our data model is unchanged with the following exceptions:

Provenance Data for Workflow Parts

Data Model Description We assign unique ids to resources we want to describe. Currently the ids are simply generated urls. RDF is then used to describe the resources, which for the provenance challenge include: workflow instances, actor instances, data port instances, and parameter instances.

The workflow execution graph is captured primarily through the links isInput and hasOuput. The naming is somewhat inconsistent in form because the names reflect links that flow in a single direction. A naming convention such as hasInput and hasOutput is more clear but more difficult to generate the graph. Important link properties are shown in italics in the table.

The data model (also shown graphically in

Name Description Applies
dc:title a non-unique identifying name for some resource all
dc:format a type identifying the content/type of a resource all
rdf:type ontological categorization all
dc:creator identifier for person responsible for creating the workflow workflow
sdg:creaed date/time of creation workflow
sdg:wasRunBy identifier for person who ran the workflow workflow
sdg:owningInstitution name of organization responsible for workfow workfow
sdg:hasStatus value of workflow execution status workflow
sdg:instantiationOf Currently referes to the name of class of object for which this resource is an instantiation. In fullsystem, would be link to the resource workflow, actor
sdg:startedExectuion date/time at which execution started workflow, actor
sdg:finishedExecution date/time at which execution complted workflow, actor
sdg:hasParameter link to resource that fully describes a parameter instance actor
sdg:hasOuput link to resource that fully describes a data (port) output actor
sdg:isInput link to resource that receives this resoruce as input parameter, data
sdg:isPartOf link to workflow instance that the resource is associated with (grouping mechanism) actor, data, parameter
sdg:hasSource link to workflow sources (actors with no inputs) workflow
sdg:value optional value of data item data, parameter
sdg:hasHashOfValue optional hash of value of data data, parameter

Stage Data

Output of the three stages of our queries is provided in the RDF/XML notation.

Primary Workflow Stage Data

Secondary Workflow Stage Data

After some evaluation of different provenance models we reformatted our own model, here is the new model and data for the first workflow.

New Primary Workflow Stage Data

Model Integration Results

We translated and performed the queries over data from Mindswap and VisTrails





Translation Details

The translation of VisTrails was performed using primarily XSLT, it was mapped from XML to extract RDF which would adhere to our schema. We only extracted a subset of the data represented. There was additional information, representing the workflow representation that we did not import. After translating to rdf we also needed to infer object types and certain properties (such as title) to represent the workflow correctly within our result image. This information was inferred based on rdf types and other values extracted from the xml. Several relationship properties, such as has Input/Output were also inferred based on object type.

The translation of Mindswap was performed primarily using OWL. The mapping was described between one schema and another and the queries performed based on properties inferred by that mapping. We noticed that information seemed to be lacking about the initial files of the workflow (the anatomy headers and images), for this reason they could only be represented as uris in the query result.

In both cases when combining across data sources, we needed to assert which nodes in the graph corresponded to nodes in a different graph (for example, the reslice headers in our data were associated with the reslice headers in the mindswap data.)


