Skip to topic | Skip to bottom

Provenance Challenge

Challenge
Challenge.TetherlessPC3

Start of topic | Skip to actions

Provenance Challenge: Tetherless World Constellation (RPI)

Participating Team

Team and Project Details

Introduction

For our work on the Provenance Challenge, our team will be demonstrating a system known as ProtoProv?.

This system is designed to perform the following tasks: (i) Take in provenance metadata in either the OPM or Proof Markup Language (PML) format, and store it in an RDF-based format, known as ProtoProv? (a format designed for easy conversion back to OPM or PML). (ii) Facilitate the modeling and querying of the ProtoProv? RDF data, using Jena and SPARQL respectively.

The following equivalencies can be observed in ProtoProv?, OPM, and PML syntax:

ProtoProv? OPM PML
ProtoProv?:Variable Artifact pmlj:NodeSet
ProtoProv?:Function Process pmlp:InferenceRule
ProtoProv?:Controller Agent pmlp:InferenceEngine
ProtoProv?:Usd Used pmlj:hasAntecedentList
ProtoProv?:Wgb WasGeneratedBy? pmlj:isConsequentOf
ProtoProv?:Wcb WasControlledBy? pmlj:hasInferenceEngine

Where the following prefix mappings apply:

ProtoProv? <http://www.cs.rpi.edu/~michaj6/ProtoProv.owl>
pmlp <http://inference-web.org/2.0/pml-provenance.owl>
pmlj <http://inference-web.org/2.0/pml-justification.owl>

Workflow Representation

Syntax: Our workflow representation is obtained through running a modified representation of the Yogesh’s Java-based workflow demonstrator. Specifically, we examined this code, and included special annotations for recording ProtoProv? relations (as outlined above). Objects in the OPM graph are assigned the following ID notation:

Object ID Value
Artifact <Artifact Name>_<Instance Number>_<Scope> <Scope>_<Datatype>_<Datavalue>
Process <Process Name>_<Instance Number>_<Scope> <Scope>_<Process Name>
Agent <Agent Name>_<Instance Number> <Agent Name>

Where <Instance Number> is derived from a counter of all instances of (X Name), <Datatype> corresponds to a variable datatype (e.g., boolean), <Datavalue> corresponds to a variable's value (e.g., true) and <Scope> corresponds to the scope something existed in. The four possible scopes which are defined are as follows:

Scope ID Definition
main outside the workflow for loop
ForIter1? first iteration of the workflow for loop
ForIter2? second iteration
ForIter3? third iteration

Control Flow Representation: In cases where a control flow check would be necessary to reach a function (for instance, checking that IsCSVReadyFileExistsOutput? evaluates to true before proceeding to the function ReadCSVReadyFile?), we would establish a ProtoProv?:usd relation between the control flow variable and following function. We do this for two reasons: (i) to eliminate the need for declaring additional ProtoProv?:Function instances for the control checks, and (ii) to highlight the necessity of control flow checks to reach upcoming functions.

Unclear Variable Values: In a number of situations, it was unclear what <Datavalue> to assign certain artifacts. These situations (along with current assigned datavalues) are enumerated below:

Artifact Datavalue
DatabaseEntry? FileName The name of the variable itself
List<CSVFileEntry> FileName The name of the variable itself
CSVFileEntry? FileName FileName.FilePath_FileName.TargetTable
Sample Detection Entry DBEntry_P2Detection_<detectID>
Sample Image Entry DBEntry_P2Detection_<imageID>

Open Provenance Model Output

The OPM graph exported by our system can be found here. The representation was based off the OPM v1.01 Specification, and generated through the OPM API (build 1.0-20080826.123926-3) available at http://openprovenance.org/. At present, neither Agent nor WasControlledBy? instances are encoded in our OPM representation -- this is due to a limitation of the OPM API we are trying to work around.

Query Results

To answer the queries below, our system ran SPARQL queries on an RDF model of the ProtoProv? RDF, based off the Jena Semantic Web Framework. For each query, we provide here the SPARQL query used as well as a description of what it does.

Core Query 1
For this query, we created an additional ProtoProv?:Variable node in our workflow to represent a detection entry. In the workflow, each table in the DatabaseEntry object was populated through the function LoadCSVFileIntoTable. Therefore, we assumed that a ProtoProv?:wgb relationship could be made between the detection entry and this function.

Description:

  1. Get the ProtoProv:wgb instance (?wgb) which has the detection entry pc:DBEntryP2Detection_0_ForIter3 as its source
  2. Get the ProtoProv:Function instance (?fxn) corresponding to (?wgb) - this should be a LoadCSVFileIntoTable instance
  3. Find any ProtoProv:Variable instances (?var) which were used by the LoadCSVFileIntoTable instance
  4. Check the datatypes of these (?var), and filter out any which do not have the datatype CSVFileEntry

SPARQL Query:

      PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
      PREFIX ProtoProv: <http://www.cs.rpi.edu/~michaj6/ProtoProv.owl#>
      PREFIX pc: <http://www.cs.rpi.edu/~michaj6/PC3/PC3.owl#>
      SELECT ?value
      WHERE {
         ?wgb ProtoProv:wgbSource pc:DBEntryP2Detection_0_ForIter3 .
         ?wgb ProtoProv:wgbTarget ?fxn .
         ?usd ProtoProv:usdSource ?fxn .
         ?usd ProtoProv:usdTarget ?var .
         ?var ProtoProv:hasType ?type .
         FILTER(?type = "CSVFileEntry")
         ?var ProtoProv:hasValue ?value
      }

Output: [./Data/J062941/P2_J062941_B001_P2fits0_20081115_P2Detection.csv-P2Detection]

Core Query 2
To answer this query, we simply check whether the process IsMatchTableColumnRanges was carried out on the CSVFileEntry in the first iteration of the workflow for loop. In our ProtoProv? representation, this CSVFileEntry corresponds to ID pc:ReadCSVFileColumnNamesOutput_0_ForIter1.

Description:

  1. Get the ProtoProv:usd instance (?usd) which has the detection entry pc:ReadCSVFileColumnNamesOutput_0_ForIter1 as its target
  2. Get the ProtoProv:Function instance (?fxn) corresponding to (?usd) - this should be an IsMatchTableColumnRanges instance
  3. Check the type of these (?fxn), and filter out any which do not have the datatype IsMatchTableColumnRanges
  4. After SPARQL execution completes, if anything returned output YES.

SPARQL Query:

      PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
      PREFIX ProtoProv: <http://www.cs.rpi.edu/~michaj6/ProtoProv.owl#>
      PREFIX pc: <http://www.cs.rpi.edu/~michaj6/PC3/PC3.owl#>
      SELECT ?usd
      WHERE {
         ?usd ProtoProv:usdTarget pc:ReadCSVFileColumnNamesOutput_0_ForIter1 .
         ?usd ProtoProv:usdSource ?fxn .
         ?fxn ProtoProv:hasValue ?val .
         FILTER(?val="IsMatchTableColumnRanges")
      }

Output: [YES]

Core Query 3
As with Core Query 1, we created an additional ProtoProv?:Variable node in our workflow - this time to represent an image entry. In the workflow, each table in the DatabaseEntry object was populated through the function LoadCSVFileIntoTable. Therefore, we assumed that a ProtoProv?:wgb relationship could be made between the detection entry and this function.

Ultimately, we chose to handle this query through a combination of SPARQL querying and recursive function calls. Initially, the query assigns (see below) the value of the image entry pc:DBEntryP2ImageMeta_0_ForIter2. From here, each query execution returns any non control-flow variables (?var) (e.g., non-boolean) used by the function (?fxn) which generated . In turn, the SPARQL query is re-executed for each (?var). This recursion proceeds until the SPARQL query returns no results.

Description:

  1. Get the ProtoProv:wgb instance (?wgb) which has the ProtoProv:Variable == as its source
  2. Get the ProtoProv:Function instance (?fxn) corresponding to (?wgb)
  3. Store the types (?value) of these (?fxn) for later reference
  4. Find any ProtoProv:Variable instances (?var) which were used by the (?fxn) instances
  5. Check the datatypes of these (?var), and filter out any which have the datatype boolean
  6. After the SPARQL execution completes, do two things with each returned query result:
    • If (?value) equals (ForEach), discard the entry. Else, put (?fxn) in the solution set.
    • Run the procedure again, substituting each (?var) for

SPARQL Query:

      PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
      PREFIX ProtoProv: <http://www.cs.rpi.edu/~michaj6/ProtoProv.owl#>
      PREFIX pc: <http://www.cs.rpi.edu/~michaj6/PC3/PC3.owl#>
      SELECT ?fxn ?value ?var
      WHERE {
         ?wgb ProtoProv:wgbSource <r> .
         ?wgb ProtoProv:wgbTarget ?fxn .
         ?fxn ProtoProv:hasValue ?value .
         ?usd ProtoProv:usdSource ?fxn .
         ?usd ProtoProv:usdTarget ?var .
         ?var ProtoProv:hasType ?type .
         FILTER(?type != "boolean") .
      }

Output: [LoadCSVFileIntoTable_1_ForIter2, CreateEmptyLoadDB?_0_main, ReadCSVFileColumnNames?_1_ForIter2, ReadCSVReadyFile?_0_main]

Optional Query 8
For this query, we simply return a listing of processes which were recorded in the ProtoProv? RDF data (and hence completed in the workflow).

SPARQL Query:

      PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
      PREFIX ProtoProv: <http://www.cs.rpi.edu/~michaj6/ProtoProv.owl#>
      PREFIX pc: <http://www.cs.rpi.edu/~michaj6/PC3/PC3.owl#>
      SELECT ?fxn
      WHERE {
         ?fxn rdf:type ProtoProv:Function .
      }

Output: [DirectAssertion_0_main, ForEach?_2_ForIter3, ReadCSVFileColumnNames?_0_ForIter1, IsMatchTableRowCount?_1_ForIter2, ForEach?_0_ForIter1, IsMatchCSVFileColumnNames?_2_ForIter3, LoadCSVFileIntoTable?_0_ForIter1, ReadCSVFileColumnNames?_2_ForIter3, IsMatchCSVFileColumnNames?_0_ForIter1, IsExistsCSVFile?_0_ForIter1, CompactDatabase?_0_main, IsMatchTableColumnRanges?_2_ForIter3, IsMatchCSVFileTables?_0_main, IsMatchTableRowCount?_0_ForIter1, IsCSVReadyFileExists?_0_main, IsExistsCSVFile?_2_ForIter3, LoadCSVFileIntoTable?_2_ForIter3, IsExistsCSVFile?_1_ForIter2, LoadCSVFileIntoTable?_1_ForIter2, ReadCSVFileColumnNames?_1_ForIter2, UpdateComputedColumns?_1_ForIter2, IsMatchTableRowCount?_2_ForIter3, IsMatchTableColumnRanges?_0_ForIter1, UpdateComputedColumns?_0_ForIter1, UpdateComputedColumns?_2_ForIter3, IsMatchCSVFileColumnNames?_1_ForIter2, IsMatchTableColumnRanges?_1_ForIter2, ForEach?_1_ForIter2, CreateEmptyLoadDB?_0_main, ReadCSVReadyFile?_0_main]

Optional Query 10
To solve this query, we search for ProtoProv?:Variable instances (?var) generated through direct assertion by a user.

SPARQL Query:

      PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
      PREFIX ProtoProv: <http://www.cs.rpi.edu/~michaj6/ProtoProv.owl#>
      PREFIX pc: <http://www.cs.rpi.edu/~michaj6/PC3/PC3.owl#>
      SELECT ?var
      WHERE {
         ?var rdf:type ProtoProv:Variable .
         ?wgb ProtoProv:wgbSource ?var .
         ?wgb ProtoProv:wgbTarget ?fxn .
         ?fxn ProtoProv:hasValue ?value .
         FILTER(?value = "DirectAssertion")   
      }

Output: [JobId_0_main, CSVRootPath?_0_main]

Optional Query 11
Here, we search for functions (?fxn) which used variables (?var) which were in turn generated through direct assertion by a user.

SPARQL Query:

      PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
      PREFIX ProtoProv: <http://www.cs.rpi.edu/~michaj6/ProtoProv.owl#>
      PREFIX pc: <http://www.cs.rpi.edu/~michaj6/PC3/PC3.owl#>
      SELECT ?fxn
      WHERE {
         ?fxn rdf:type ProtoProv:Function .
         ?usd ProtoProv:usdSource ?fxn .
         ?usd ProtoProv:usdTarget ?var .
         ?wgb ProtoProv:wgbSource ?var .
         ?wgb ProtoProv:wgbTarget ?fxn2 .
         ?fxn2 ProtoProv:hasValue ?value .
         FILTER(?value = "DirectAssertion")
      }

Output: [CreateEmptyLoadDB_0_main, ReadCSVReadyFile?_0_main, IsCSVReadyFileExists?_0_main]

Query Results - Second Batch

Core Query 1

Description:

  1. Identify a function which generated an artifact PC3:provVarDbEntryP2Detection_(the database entry) (WGB).
  2. Identify any variables of type PC3OPM:CSVFileEntry that this function used (USD).
  3. Return the values attached to these variables (PC3OPM:hasValue).

SPARQL Query:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl#>
PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#>
SELECT ?VALUE
FROM  <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl#>
WHERE { 
   ?WGB PC3OPM:wgbSource PC3:provVarDbEntryP2Detection_0 .
   ?WGB PC3OPM:wgbTarget ?FXN . 
   ?USD PC3OPM:usdSource ?FXN .
   ?USD PC3OPM:usdTarget ?VAR .
   ?VAR rdf:type PC3OPM:CSVFileEntry .
   ?VAR PC3OPM:hasValue ?VALUE
}

Output:

----------------------------------------------------------------------------------------------------------------------------
| VALUE                                                                                                                    |
============================================================================================================================
| "/Data/J062941//P2_J062941_B001_P2fits0_20081115_P2Detection.csv-P2Detection"^^<http://www.w3.org/2001/XMLSchema#string> |
----------------------------------------------------------------------------------------------------------------------------

Core Query 2

Description:

  1. Identify any functions which used the artifact PC3:ReadCSVFileColumnNamesOutput_2 (this is a CSVFileEntry? corresponding to the table).
  2. Get the values of these processes, and only consider those with value (name) equal to "IsMatchTableColumnRanges".

SPARQL Query:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl#>
PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#>
SELECT ?FXN 
FROM  <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl#>
WHERE {          
   ?USD PC3OPM:usdTarget PC3:ReadCSVFileColumnNamesOutput_2 .
   ?USD PC3OPM:usdSource ?FXN . 
   ?FXN PC3OPM:hasValue ?VALUE 
   FILTER(?VALUE = "IsMatchTableColumnRanges") .
   ?WGB PC3OPM:wgbTarget ?FXN .
}

Output:

----------------------------------
| FXN                            |
==================================
| PC3:IsMatchTableColumnRanges_2 |
----------------------------------

Core Query 3

Note: this relies upon the Construct queries ConstructOpWTB?, ConstructOpWTBForEach?

Description:

  1. Identify a process which generated an artifact PC3:provVarDbEntryP2ImageMeta_0_(the image table entry) (WGB).
  2. List any data returning (as opposed to check returning or control flow) processes that directly or indirectly triggered the process above.

SPARQL Query:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl#>
PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#>
SELECT ?FXN1 ?FXN2
FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl>
FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl>
FROM  <http://onto.rpi.edu/sw4j/sparql?queryURL=http://www.cs.rpi.edu/~michaj6/provenance/queries/general/ConstructOpWTB.sparql>
FROM  <http://onto.rpi.edu/sw4j/sparql?queryURL=http://www.cs.rpi.edu/~michaj6/provenance/queries/general/ConstructOpWTBForEach.sparql>
WHERE {
   ?WGB PC3OPM:wgbSource PC3:provVarDbEntryP2ImageMeta_0 .
   ?WGB PC3OPM:wgbTarget ?FXN1 .
   ?FXN1 PC3OPM:opWasTriggeredBy ?FXN2 .
   ?FXN2 a PC3OPM:DataRetProc
}

Output:

-------------------------------------------------------------
| FXN1                       | FXN2                         |
=============================================================
| PC3:LoadCSVFileIntoTable_1 | PC3:ReadCSVFileColumnNames_1 |
| PC3:LoadCSVFileIntoTable_1 | PC3:CreateEmptyLoadDB_0      |
| PC3:LoadCSVFileIntoTable_1 | PC3:ReadCSVReadyFile_0       |
-------------------------------------------------------------

Optional Query 1

Description:

  1. Log the time when the call to IsMatchCSVFileTables? (PC3:IsMatchCSVFileTables_0) was completed.
  2. In turn, log the time when the second call to IsExistsCSVFile? was completed (and failed).

SPARQL Query:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/PC3Halt.owl#>
PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#>
SELECT ?FXN
FROM  <http://www.cs.rpi.edu/~michaj6/provenance/PC3Halt.owl#>
WHERE { 
   ?WGB PC3OPM:wgbTarget ?FXN .
   ?FXN PC3OPM:hasValue ?VALUE2
   FILTER (?VALUE2 = "IsMatchTableColumnRanges") .
   ?WGB PC3OPM:wgbSource ?VAR .
   ?VAR PC3OPM:hasValue ?VALUE1
   FILTER (?VALUE1 = "true") .
}

Output:

----------------------------------
| FXN                            |
==================================
| PC3:IsMatchTableColumnRanges_1 |
| PC3:IsMatchTableColumnRanges_0 |
----------------------------------

Optional Query 3

Description:

  1. Identify functions with value (name) equal to "IsMatchTableColumnRanges", and which generate an artifact with value equal to “true”.
  2. Since this is the last check done on each table, its completion indicates a table was both loaded and error free.

SPARQL Query:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl#>
PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#>
SELECT ?Time_IsMatchCSVFileTables ?Time_IsExistsCSVFile 
FROM  <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl>
WHERE { 
   ?WGB1 PC3OPM:wgbTarget PC3:IsMatchCSVFileTables_0 .
   ?WGB1 PC3OPM:hasTime ?TIME1 .
   ?TIME1 PC3OPM:stopTime ?Time_IsMatchCSVFileTables .        
   ?WGB2 PC3OPM:wgbTarget PC3:IsExistsCSVFile_1 .
   ?WGB2 PC3OPM:hasTime ?TIME2 .
   ?TIME2 PC3OPM:stopTime ?Time_IsExistsCSVFile .              
}

Output:

-----------------------------------------------------------------------------------------------------------------------
| Time_IsMatchCSVFileTables                                | Time_IsExistsCSVFile                                     |
=======================================================================================================================
| "1244069850697"^^<http://www.w3.org/2001/XMLSchema#long> | "1244069852894"^^<http://www.w3.org/2001/XMLSchema#long> |
-----------------------------------------------------------------------------------------------------------------------

Optional Query 5

Description:

  1. Fetch each instance of PC3OPM:EndState.
  2. Of these, filter out those that were triggered by the completion of the CompactDatabase? function (which indicates a successful workflow completion).
  3. The remaining workflows (or jobs) are those which halted.

SPARQL Query:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/PC3ALLHalt.owl#>
PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#>
SELECT ?ACCOUNT
FROM  <http://www.cs.rpi.edu/~michaj6/provenance/PC3ALLHalt.owl>
WHERE { 
   ?ENDSTATE rdf:type PC3OPM:EndState .
   ?WTB PC3OPM:wtbSource ?ENDSTATE .
   ?WTB PC3OPM:wtbTarget ?FXN .
   ?FXN PC3OPM:hasValue ?VALUE        
   FILTER (?VALUE != "CompactDatabase") .
   ?ENDSTATE PC3OPM:hasAccount ?ACCOUNT .
}

Output:

---------------
| ACCOUNT     |
===============
| PC3:J062943 |
---------------

Optional Query 6

Description:

  1. Identify any functions which generate artifacts used in the control flow checks (PC3OPM:ControlFlowArtifact) that evaluate to false.
  2. Since a workflow will halt on the first failed control flow check, only one such function should be found.

SPARQL Query:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/PC3Halt.owl#>
PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#>
SELECT ?FXN
FROM  <http://www.cs.rpi.edu/~michaj6/provenance/PC3Halt.owl>
WHERE { 
   ?WGB PC3OPM:wgbTarget ?FXN .
   ?WGB PC3OPM:wgbSource ?VAR .
   ?VAR PC3OPM:hasValue ?VALUE
   FILTER (?VALUE = "false") .
   ?VAR rdf:type PC3OPM:ControlFlowArtifact .
}

Output:

----------------------------------
| FXN                            |
==================================
| PC3:IsMatchTableColumnRanges_2 |
----------------------------------

Optional Query 8

Description:

  1. Return any processes in the data flow part of the workflow. Control flow checks can be disregarded for this.

SPARQL Query:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/PC3Halt.owl#>
PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#>
SELECT ?FXN
FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl>
FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3Halt.owl>
WHERE { 
   ?FXN rdf:type PC3OPM:DataFlowProc 
}

Output:

-----------------------------------
| FXN                             |
===================================
| PC3:IsMatchTableColumnRanges_0  |
| PC3:CreateEmptyLoadDB_0         |
| PC3:LoadCSVFileIntoTable_2      |
| PC3:IsExistsCSVFile_0           |
| PC3:IsMatchTableRowCount_1      |
| PC3:ReadCSVFileColumnNames_1    |
| PC3:ReadCSVFileColumnNames_0    |
| PC3:UpdateComputedColumns_2     |
| PC3:IsMatchTableRowCount_2      |
| PC3:IsMatchCSVFileTables_0      |
| PC3:IsMatchCSVFileColumnNames_0 |
| PC3:UpdateComputedColumns_0     |
| PC3:IsMatchCSVFileColumnNames_2 |
| PC3:IsExistsCSVFile_2           |
| PC3:IsMatchTableColumnRanges_1  |
| PC3:IsExistsCSVFile_1           |
| PC3:LoadCSVFileIntoTable_1      |
| PC3:LoadCSVFileIntoTable_0      |
| PC3:ReadCSVFileColumnNames_2    |
| PC3:IsMatchCSVFileColumnNames_1 |
| PC3:IsCSVReadyFileExists_0      |
| PC3:IsMatchTableRowCount_0      |
| PC3:ReadCSVReadyFile_0          |
| PC3:IsMatchTableColumnRanges_2  |
| PC3:UpdateComputedColumns_1     |
-----------------------------------

Optional Query 10

Description:

  1. Return any processes in the data flow part of the workflow. Control flow checks can be disregarded for this.

SPARQL Query:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl#>
PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#>
SELECT ?VAR
FROM  <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl#>
WHERE {
   ?FXN rdf:type PC3OPM:Process .
   ?FXN PC3OPM:hasValue ?VALUE
   FILTER (?VALUE = "DirectAssertion") .   
   ?WGB PC3OPM:wgbTarget ?FXN .
   ?WGB PC3OPM:wgbSource ?VAR .
}

Output:

---------------------
| VAR               |
=====================
| PC3:CSVRootPath_0 |
| PC3:JobId_0       |
---------------------

Optional Query 11

Description:

  1. Identify artifacts created by a user (indicated by the process “DirectAssertion”).
  2. In turn, identify processes which used these artifacts.

SPARQL Query:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl#>
PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#>
SELECT ?VAR ?FXN
FROM  <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl#>
WHERE { 
   ?FXN1 rdf:type PC3OPM:Process .
   ?FXN1 PC3OPM:hasValue ?VALUE
   FILTER (?VALUE = "DirectAssertion") .
   ?WGB PC3OPM:wgbSource ?VAR .
   ?WGB PC3OPM:wgbTarget ?FXN1 .
   ?USD PC3OPM:usdSource ?FXN .
   ?USD PC3OPM:usdTarget ?VAR
}

Output:

--------------------------------------------------
| VAR               | FXN                        |
==================================================
| PC3:CSVRootPath_0 | PC3:ReadCSVReadyFile_0     |
| PC3:CSVRootPath_0 | PC3:IsCSVReadyFileExists_0 |
| PC3:JobId_0       | PC3:CreateEmptyLoadDB_0    |
--------------------------------------------------

Optional Query 12

Description:

  1. Identify functions with value (name) equal to "IsMatchTableColumnRanges", and which generate an artifact with value equal to “true”. Since this is the last check done on each table, its completion indicates a table was both loaded and error free.
  2. For these functions, identify variables of type CSVFileEntry? that they used. The values of these variables correspond to the CSV File which was processed.

SPARQL Query:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/PC3Halt.owl#>
PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#>
SELECT ?VALUE
FROM  <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl>
FROM  <http://www.cs.rpi.edu/~michaj6/provenance/PC3Halt.owl>
WHERE { 
   ?WGB PC3OPM:wgbTarget ?FXN .
   ?FXN PC3OPM:hasValue ?VALUE2
   FILTER (?VALUE2 = "IsMatchTableColumnRanges") .
   ?WGB PC3OPM:wgbSource ?VAR1 .
   ?VAR1 PC3OPM:hasValue ?VALUE1
   FILTER (?VALUE1 = "true") .
   ?USD PC3OPM:usdSource ?FXN .
   ?USD PC3OPM:usdTarget ?VAR .
   ?VAR rdf:type PC3OPM:CSVFileEntry .
   ?VAR PC3OPM:hasValue ?VALUE .
}

Output:

--------------------------------------------------
| VAR               | FXN                        |
==================================================
| PC3:CSVRootPath_0 | PC3:ReadCSVReadyFile_0     |
| PC3:CSVRootPath_0 | PC3:IsCSVReadyFileExists_0 |
| PC3:JobId_0       | PC3:CreateEmptyLoadDB_0    |
--------------------------------------------------

Optional Query 13

Note: this relies upon the Construct queries ConstructOpWTB?, ConstructOpWTBForEach?

Description:

  1. Identify all processes in the data flow part of the workflow.
  2. List any data returning (as opposed to check returning or control flow) processes that directly or indirectly triggered the processes above.

SPARQL Query:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl#>
PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#>
SELECT ?fxn1 ?fxn2
FROM  <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl>
FROM  <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl>
FROM  <http://onto.rpi.edu/sw4j/sparql?queryURL=http://www.cs.rpi.edu/~michaj6/provenance/queries/general/ConstructOpWTB.sparql>
FROM  <http://onto.rpi.edu/sw4j/sparql?queryURL=http://www.cs.rpi.edu/~michaj6/provenance/queries/general/ConstructOpWTBForEach.sparql>
WHERE { 
   ?fxn1 PC3OPM:opWasTriggeredBy ?fxn2 .
   ?fxn2 a PC3OPM:DataRetProc .
   ?fxn1 a PC3OPM:DataFlowProc
}

Output:

------------------------------------------------------------------
| fxn1                            | fxn2                         |
==================================================================
| PC3:IsMatchTableColumnRanges_1  | PC3:ReadCSVFileColumnNames_1 |
| PC3:CompactDatabase_0           | PC3:ReadCSVFileColumnNames_1 |
| PC3:IsMatchCSVFileColumnNames_1 | PC3:ReadCSVFileColumnNames_1 |
| PC3:IsMatchTableRowCount_1      | PC3:ReadCSVFileColumnNames_1 |
| PC3:LoadCSVFileIntoTable_1      | PC3:ReadCSVFileColumnNames_1 |
| PC3:UpdateComputedColumns_1     | PC3:ReadCSVFileColumnNames_1 |
| PC3:IsExistsCSVFile_1           | PC3:CreateEmptyLoadDB_0      |
| PC3:ReadCSVFileColumnNames_1    | PC3:CreateEmptyLoadDB_0      |
| PC3:IsMatchCSVFileColumnNames_2 | PC3:CreateEmptyLoadDB_0      |
| PC3:IsMatchTableRowCount_0      | PC3:CreateEmptyLoadDB_0      |
| PC3:ReadCSVFileColumnNames_0    | PC3:CreateEmptyLoadDB_0      |
| PC3:LoadCSVFileIntoTable_0      | PC3:CreateEmptyLoadDB_0      |
| PC3:IsMatchTableColumnRanges_0  | PC3:CreateEmptyLoadDB_0      |
| PC3:IsExistsCSVFile_0           | PC3:CreateEmptyLoadDB_0      |
| PC3:UpdateComputedColumns_0     | PC3:CreateEmptyLoadDB_0      |
| PC3:CompactDatabase_0           | PC3:CreateEmptyLoadDB_0      |
| PC3:IsMatchCSVFileColumnNames_0 | PC3:CreateEmptyLoadDB_0      |
| PC3:ReadCSVFileColumnNames_2    | PC3:CreateEmptyLoadDB_0      |
| PC3:IsMatchTableRowCount_2      | PC3:CreateEmptyLoadDB_0      |
| PC3:UpdateComputedColumns_2     | PC3:CreateEmptyLoadDB_0      |
| PC3:IsMatchTableRowCount_1      | PC3:CreateEmptyLoadDB_0      |
| PC3:LoadCSVFileIntoTable_1      | PC3:CreateEmptyLoadDB_0      |
| PC3:UpdateComputedColumns_1     | PC3:CreateEmptyLoadDB_0      |
| PC3:IsExistsCSVFile_2           | PC3:CreateEmptyLoadDB_0      |
| PC3:IsMatchTableColumnRanges_1  | PC3:CreateEmptyLoadDB_0      |
| PC3:LoadCSVFileIntoTable_2      | PC3:CreateEmptyLoadDB_0      |
| PC3:IsMatchTableColumnRanges_2  | PC3:CreateEmptyLoadDB_0      |
| PC3:IsMatchCSVFileColumnNames_1 | PC3:CreateEmptyLoadDB_0      |
| PC3:IsExistsCSVFile_1           | PC3:ReadCSVReadyFile_0       |
| PC3:ReadCSVFileColumnNames_1    | PC3:ReadCSVReadyFile_0       |
| PC3:IsMatchCSVFileColumnNames_2 | PC3:ReadCSVReadyFile_0       |
| PC3:IsMatchTableRowCount_0      | PC3:ReadCSVReadyFile_0       |
| PC3:ReadCSVFileColumnNames_0    | PC3:ReadCSVReadyFile_0       |
| PC3:LoadCSVFileIntoTable_0      | PC3:ReadCSVReadyFile_0       |
| PC3:IsMatchTableColumnRanges_0  | PC3:ReadCSVReadyFile_0       |
| PC3:IsExistsCSVFile_0           | PC3:ReadCSVReadyFile_0       |
| PC3:UpdateComputedColumns_0     | PC3:ReadCSVReadyFile_0       |
| PC3:CompactDatabase_0           | PC3:ReadCSVReadyFile_0       |
| PC3:CreateEmptyLoadDB_0         | PC3:ReadCSVReadyFile_0       |
| PC3:IsMatchCSVFileColumnNames_0 | PC3:ReadCSVReadyFile_0       |
| PC3:ReadCSVFileColumnNames_2    | PC3:ReadCSVReadyFile_0       |
| PC3:IsMatchTableRowCount_2      | PC3:ReadCSVReadyFile_0       |
| PC3:LoadCSVFileIntoTable_1      | PC3:ReadCSVReadyFile_0       |
| PC3:IsMatchTableRowCount_1      | PC3:ReadCSVReadyFile_0       |
| PC3:UpdateComputedColumns_2     | PC3:ReadCSVReadyFile_0       |
| PC3:UpdateComputedColumns_1     | PC3:ReadCSVReadyFile_0       |
| PC3:IsMatchCSVFileTables_0      | PC3:ReadCSVReadyFile_0       |
| PC3:IsExistsCSVFile_2           | PC3:ReadCSVReadyFile_0       |
| PC3:IsMatchTableColumnRanges_1  | PC3:ReadCSVReadyFile_0       |
| PC3:LoadCSVFileIntoTable_2      | PC3:ReadCSVReadyFile_0       |
| PC3:IsMatchTableColumnRanges_2  | PC3:ReadCSVReadyFile_0       |
| PC3:IsMatchCSVFileColumnNames_1 | PC3:ReadCSVReadyFile_0       |
| PC3:IsMatchCSVFileColumnNames_2 | PC3:ReadCSVFileColumnNames_2 |
| PC3:CompactDatabase_0           | PC3:ReadCSVFileColumnNames_2 |
| PC3:LoadCSVFileIntoTable_2      | PC3:ReadCSVFileColumnNames_2 |
| PC3:IsMatchTableColumnRanges_2  | PC3:ReadCSVFileColumnNames_2 |
| PC3:IsMatchTableRowCount_2      | PC3:ReadCSVFileColumnNames_2 |
| PC3:UpdateComputedColumns_2     | PC3:ReadCSVFileColumnNames_2 |
| PC3:IsMatchTableRowCount_0      | PC3:ReadCSVFileColumnNames_0 |
| PC3:IsMatchTableColumnRanges_0  | PC3:ReadCSVFileColumnNames_0 |
| PC3:LoadCSVFileIntoTable_0      | PC3:ReadCSVFileColumnNames_0 |
| PC3:CompactDatabase_0           | PC3:ReadCSVFileColumnNames_0 |
| PC3:UpdateComputedColumns_0     | PC3:ReadCSVFileColumnNames_0 |
| PC3:IsMatchCSVFileColumnNames_0 | PC3:ReadCSVFileColumnNames_0 |
------------------------------------------------------------------

Insert Transitive WasTriggeredBy? Relation

Description: Find one of two patterns in the workflow data

  1. A function X was triggered by another function X2 (where the relation itself is an instance of the class PC3OPM:wasTriggeredBy)
  2. A function X used a variable Y, which was generated by another function X2
For each of these patterns, create a direct transitive relationship between X and X2 (called PC3OPM:opWasTriggeredBy)

SPARQL Query:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl#>
PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#>
CONSTRUCT { ?FXN PC3OPM:opWasTriggeredBy ?FXN2 }
FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl>
FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl>
WHERE {
{ ?WTB PC3OPM:wtbSource ?FXN . ?WTB PC3OPM:wtbTarget ?FXN2 }
UNION
{ 
   ?USD PC3OPM:usdSource ?FXN . ?USD PC3OPM:usdTarget ?VAR .
   ?WGB PC3OPM:wgbSource ?VAR . ?WGB PC3OPM:wgbTarget ?FXN2
} 
}

Insert Transitive WasTriggeredBy? - ForEach? Relation

Description:

  1. Find a function X which was triggered by the process ForEach? (PC3:ForEach_0)
  2. In turn, find a process X2 which was triggered by the ForEach? process.

SPARQL Query:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl#>
PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#>
CONSTRUCT { ?FXN1 PC3OPM:opWasTriggeredBy ?FXN2 }
FROM  <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl>
FROM  <http://onto.rpi.edu/sw4j/sparql?queryURL=http://www.cs.rpi.edu/~michaj6/provenance/queries/general/ConstructOpWTB.sparql>
WHERE { 
   ?FXN1 PC3OPM:opWasTriggeredBy PC3:ForEach_0 .
   PC3:ForEach_0 PC3OPM:opWasTriggeredBy ?FXN2 .
}
Insert Transitive WasDerivedFrom? Relation

Description: Find one of two patterns in the workflow data

  1. A variable X was derived from another variable X2 (where the relation itself is an instance of the class PC3OPM:wasDerivedFrom)
  2. A variable X was generated by a function Y, which used another variable X2
For each of these patterns, create a direct transitive relationship between X and X2 (called PC3OPM:opWasDerivedFrom)

SPARQL Query:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl#>
PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#>
CONSTRUCT 
{ ?VAR PC3OPM:opWasDerivedFrom ?VAR2 }
FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl>
FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl>
WHERE {
{ ?WDF PC3OPM:wdfSource ?VAR . ?WDF PC3OPM:wdfTarget ?VAR2 }
UNION
{ 
   ?USD PC3OPM:wgbSource ?VAR . ?USD PC3OPM:wgbTarget ?FXN .
   ?WGB PC3OPM:usdSource ?FXN . ?WGB PC3OPM:usdTarget ?VAR2
} 
}

Suggested Workflow Variants

None Yet

Suggested Queries

None Yet

Suggestions for Modification of the Open Provenance Model

None Yet

Conclusions


to top

End of topic
Skip to action links | Back to top

I Attachment sort Action Size Date Who Comment
opm.xml manage 39.1 K 21 Apr 2009 - 21:05 JamesMichaelis Exported OPM Graph for TetherlessPC3
opm.pdf manage 33.5 K 21 Apr 2009 - 21:05 JamesMichaelis Rendering of OPM Graph, using opm2dot
PML-A.owl manage 64.1 K 21 Apr 2009 - 21:06 JamesMichaelis PML Proof for entire workflow
PML-B.owl manage 58.4 K 21 Apr 2009 - 21:06 JamesMichaelis PML Proof for existence of database detection (Core Query 1)
PML-C.owl manage 39.0 K 21 Apr 2009 - 21:07 JamesMichaelis PML Proof for existence of image entry (Core Query 3)
PC3.owl manage 95.7 K 21 Apr 2009 - 21:07 JamesMichaelis ProtoProv? (pre-provenance) representation, in RDF
OPMV2.xml manage 41.4 K 10 Jun 2009 - 07:18 JamesMichaelis  

You are here: Challenge > TetherlessPC3

to top

Copyright © 1999-2012 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.