Provenance Challenge: Tetherless World Constellation (RPI)
Participating Team
Team and Project Details
- Short team name: RPI/TWC
- Participant names: James Michaelis, Li Ding, Zhenning Shangguan, Rui Huang
- Project URL: http://tw.rpi.edu/wiki/TetherlessPC3
- Project Overview:
- Relevant Publications:
Introduction
For our work on the Provenance Challenge, our team will be demonstrating a system known as
ProtoProv?.
This system is designed to perform the following tasks:
(i) Take in provenance metadata in either the
OPM or
Proof Markup Language (PML) format, and store it in an RDF-based format, known as
ProtoProv? (a format designed for easy conversion back to
OPM or PML).
(ii) Facilitate the modeling and querying of the
ProtoProv? RDF data, using Jena and SPARQL respectively.
The following equivalencies can be observed in
ProtoProv?,
OPM, and PML syntax:
ProtoProv? | OPM | PML |
ProtoProv?:Variable | Artifact | pmlj:NodeSet |
ProtoProv?:Function | Process | pmlp:InferenceRule |
ProtoProv?:Controller | Agent | pmlp:InferenceEngine |
ProtoProv?:Usd | Used | pmlj:hasAntecedentList |
ProtoProv?:Wgb | WasGeneratedBy? | pmlj:isConsequentOf |
ProtoProv?:Wcb | WasControlledBy? | pmlj:hasInferenceEngine |
Where the following prefix mappings apply:
ProtoProv? | <http://www.cs.rpi.edu/~michaj6/ProtoProv.owl> |
pmlp | <http://inference-web.org/2.0/pml-provenance.owl> |
pmlj | <http://inference-web.org/2.0/pml-justification.owl> |
Workflow Representation
Syntax:
Our workflow representation is obtained through running a modified representation of the Yogesh’s Java-based workflow demonstrator. Specifically, we examined this code, and included special annotations for recording
ProtoProv? relations (as outlined above). Objects in the
OPM graph are assigned the following ID notation:
Object | ID | Value |
Artifact | <Artifact Name>_<Instance Number>_<Scope> | <Scope>_<Datatype>_<Datavalue> |
Process | <Process Name>_<Instance Number>_<Scope> | <Scope>_<Process Name> |
Agent | <Agent Name>_<Instance Number> | <Agent Name> |
Where <
Instance Number> is derived from a counter of all instances of (X Name), <
Datatype> corresponds to a variable datatype (e.g., boolean), <
Datavalue> corresponds to a variable's value (e.g., true) and <
Scope> corresponds to the scope something existed in. The four possible scopes which are defined are as follows:
Scope ID | Definition |
main | outside the workflow for loop |
ForIter1? | first iteration of the workflow for loop |
ForIter2? | second iteration |
ForIter3? | third iteration |
Control Flow Representation:
In cases where a control flow check would be necessary to reach a function (for instance, checking that
IsCSVReadyFileExistsOutput? evaluates to true before proceeding to the function
ReadCSVReadyFile?), we would establish a
ProtoProv?:usd relation between the control flow variable and following function. We do this for two reasons: (i) to eliminate the need for declaring additional
ProtoProv?:Function instances for the control checks, and (ii) to highlight the necessity of control flow checks to reach upcoming functions.
Unclear Variable Values:
In a number of situations, it was unclear what <
Datavalue> to assign certain artifacts. These situations (along with current assigned datavalues) are enumerated below:
Artifact | Datavalue |
DatabaseEntry? FileName | The name of the variable itself |
List<CSVFileEntry> FileName | The name of the variable itself |
CSVFileEntry? FileName | FileName.FilePath_FileName.TargetTable |
Sample Detection Entry | DBEntry_P2Detection_<detectID> |
Sample Image Entry | DBEntry_P2Detection_<imageID> |
Open Provenance Model Output
The
OPM graph exported by our system can be found
here. The representation was based off the
OPM v1.01 Specification, and generated through the
OPM API (build
1.0-20080826.123926-3) available at
http://openprovenance.org/. At present, neither Agent nor
WasControlledBy? instances are encoded in our
OPM representation -- this is due to a limitation of the
OPM API we are trying to work around.
Query Results
To answer the queries below, our system ran SPARQL queries on an RDF model of the
ProtoProv? RDF, based off the Jena Semantic Web Framework. For each query, we provide here the SPARQL query used as well as a description of what it does.
Core Query 1
For this query, we created an additional
ProtoProv?:Variable node in our workflow to represent a detection entry. In the workflow, each table in the
DatabaseEntry object was populated through the function
LoadCSVFileIntoTable. Therefore, we assumed that a
ProtoProv?:wgb relationship could be made between the detection entry and this function.
Description:
- Get the
ProtoProv:wgb
instance (?wgb
) which has the detection entry pc:DBEntryP2Detection_0_ForIter3
as its source
- Get the
ProtoProv:Function
instance (?fxn
) corresponding to (?wgb
) - this should be a LoadCSVFileIntoTable
instance
- Find any
ProtoProv:Variable
instances (?var
) which were used by the LoadCSVFileIntoTable
instance
- Check the datatypes of these (
?var
), and filter out any which do not have the datatype CSVFileEntry
SPARQL Query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ProtoProv: <http://www.cs.rpi.edu/~michaj6/ProtoProv.owl#>
PREFIX pc: <http://www.cs.rpi.edu/~michaj6/PC3/PC3.owl#>
SELECT ?value
WHERE {
?wgb ProtoProv:wgbSource pc:DBEntryP2Detection_0_ForIter3 .
?wgb ProtoProv:wgbTarget ?fxn .
?usd ProtoProv:usdSource ?fxn .
?usd ProtoProv:usdTarget ?var .
?var ProtoProv:hasType ?type .
FILTER(?type = "CSVFileEntry")
?var ProtoProv:hasValue ?value
}
Output:
[./Data/J062941/P2_J062941_B001_P2fits0_20081115_P2Detection.csv-P2Detection]
Core Query 2
To answer this query, we simply check whether the process
IsMatchTableColumnRanges was carried out on the
CSVFileEntry in the first iteration of the workflow for loop. In our
ProtoProv? representation, this
CSVFileEntry corresponds to ID
pc:ReadCSVFileColumnNamesOutput_0_ForIter1
.
Description:
- Get the
ProtoProv:usd
instance (?usd
) which has the detection entry pc:ReadCSVFileColumnNamesOutput_0_ForIter1
as its target
- Get the
ProtoProv:Function
instance (?fxn
) corresponding to (?usd
) - this should be an IsMatchTableColumnRanges
instance
- Check the type of these (
?fxn
), and filter out any which do not have the datatype IsMatchTableColumnRanges
- After SPARQL execution completes, if anything returned output YES.
SPARQL Query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ProtoProv: <http://www.cs.rpi.edu/~michaj6/ProtoProv.owl#>
PREFIX pc: <http://www.cs.rpi.edu/~michaj6/PC3/PC3.owl#>
SELECT ?usd
WHERE {
?usd ProtoProv:usdTarget pc:ReadCSVFileColumnNamesOutput_0_ForIter1 .
?usd ProtoProv:usdSource ?fxn .
?fxn ProtoProv:hasValue ?val .
FILTER(?val="IsMatchTableColumnRanges")
}
Output:
[YES]
Core Query 3
As with Core Query 1, we created an additional
ProtoProv?:Variable node in our workflow - this time to represent an image entry. In the workflow, each table in the
DatabaseEntry object was populated through the function
LoadCSVFileIntoTable. Therefore, we assumed that a
ProtoProv?:wgb relationship could be made between the detection entry and this function.
Ultimately, we chose to handle this query through a combination of SPARQL querying and recursive function calls. Initially, the query assigns
(see below) the value of the image entry
pc:DBEntryP2ImageMeta_0_ForIter2
. From here, each query execution returns any non control-flow variables (
?var
) (e.g., non-boolean) used by the function (
?fxn
) which generated
. In turn, the SPARQL query is re-executed for each (
?var
). This recursion proceeds until the SPARQL query returns no results.
Description:
- Get the
ProtoProv:wgb
instance (?wgb
) which has the ProtoProv:Variable
== as its source
- Get the
ProtoProv:Function
instance (?fxn
) corresponding to (?wgb
)
- Store the types (
?value
) of these (?fxn
) for later reference
- Find any
ProtoProv:Variable
instances (?var
) which were used by the (?fxn
) instances
- Check the datatypes of these (
?var
), and filter out any which have the datatype boolean
- After the SPARQL execution completes, do two things with each returned query result:
- If (
?value
) equals (ForEach
), discard the entry. Else, put (?fxn
) in the solution set.
- Run the procedure again, substituting each (
?var
) for
SPARQL Query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ProtoProv: <http://www.cs.rpi.edu/~michaj6/ProtoProv.owl#>
PREFIX pc: <http://www.cs.rpi.edu/~michaj6/PC3/PC3.owl#>
SELECT ?fxn ?value ?var
WHERE {
?wgb ProtoProv:wgbSource <r> .
?wgb ProtoProv:wgbTarget ?fxn .
?fxn ProtoProv:hasValue ?value .
?usd ProtoProv:usdSource ?fxn .
?usd ProtoProv:usdTarget ?var .
?var ProtoProv:hasType ?type .
FILTER(?type != "boolean") .
}
Output:
[LoadCSVFileIntoTable_1_ForIter2, CreateEmptyLoadDB?_0_main, ReadCSVFileColumnNames?_1_ForIter2, ReadCSVReadyFile?_0_main]
Optional Query 8
For this query, we simply return a listing of processes which were recorded in the
ProtoProv? RDF data (and hence completed in the workflow).
SPARQL Query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ProtoProv: <http://www.cs.rpi.edu/~michaj6/ProtoProv.owl#>
PREFIX pc: <http://www.cs.rpi.edu/~michaj6/PC3/PC3.owl#>
SELECT ?fxn
WHERE {
?fxn rdf:type ProtoProv:Function .
}
Output:
[DirectAssertion_0_main, ForEach?_2_ForIter3, ReadCSVFileColumnNames?_0_ForIter1, IsMatchTableRowCount?_1_ForIter2, ForEach?_0_ForIter1, IsMatchCSVFileColumnNames?_2_ForIter3, LoadCSVFileIntoTable?_0_ForIter1, ReadCSVFileColumnNames?_2_ForIter3, IsMatchCSVFileColumnNames?_0_ForIter1, IsExistsCSVFile?_0_ForIter1, CompactDatabase?_0_main, IsMatchTableColumnRanges?_2_ForIter3, IsMatchCSVFileTables?_0_main, IsMatchTableRowCount?_0_ForIter1, IsCSVReadyFileExists?_0_main, IsExistsCSVFile?_2_ForIter3, LoadCSVFileIntoTable?_2_ForIter3, IsExistsCSVFile?_1_ForIter2, LoadCSVFileIntoTable?_1_ForIter2, ReadCSVFileColumnNames?_1_ForIter2, UpdateComputedColumns?_1_ForIter2, IsMatchTableRowCount?_2_ForIter3, IsMatchTableColumnRanges?_0_ForIter1, UpdateComputedColumns?_0_ForIter1, UpdateComputedColumns?_2_ForIter3, IsMatchCSVFileColumnNames?_1_ForIter2, IsMatchTableColumnRanges?_1_ForIter2, ForEach?_1_ForIter2, CreateEmptyLoadDB?_0_main, ReadCSVReadyFile?_0_main]
Optional Query 10
To solve this query, we search for
ProtoProv?:Variable instances (
?var
) generated through direct assertion by a user.
SPARQL Query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ProtoProv: <http://www.cs.rpi.edu/~michaj6/ProtoProv.owl#>
PREFIX pc: <http://www.cs.rpi.edu/~michaj6/PC3/PC3.owl#>
SELECT ?var
WHERE {
?var rdf:type ProtoProv:Variable .
?wgb ProtoProv:wgbSource ?var .
?wgb ProtoProv:wgbTarget ?fxn .
?fxn ProtoProv:hasValue ?value .
FILTER(?value = "DirectAssertion")
}
Output:
[JobId_0_main, CSVRootPath?_0_main]
Optional Query 11
Here, we search for functions (
?fxn
) which used variables (
?var
) which were in turn generated through direct assertion by a user.
SPARQL Query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ProtoProv: <http://www.cs.rpi.edu/~michaj6/ProtoProv.owl#>
PREFIX pc: <http://www.cs.rpi.edu/~michaj6/PC3/PC3.owl#>
SELECT ?fxn
WHERE {
?fxn rdf:type ProtoProv:Function .
?usd ProtoProv:usdSource ?fxn .
?usd ProtoProv:usdTarget ?var .
?wgb ProtoProv:wgbSource ?var .
?wgb ProtoProv:wgbTarget ?fxn2 .
?fxn2 ProtoProv:hasValue ?value .
FILTER(?value = "DirectAssertion")
}
Output:
[CreateEmptyLoadDB_0_main, ReadCSVReadyFile?_0_main, IsCSVReadyFileExists?_0_main]
Query Results - Second Batch
Core Query 1
Description:
- Identify a function which generated an artifact PC3:provVarDbEntryP2Detection_(the database entry) (WGB).
- Identify any variables of type PC3OPM:CSVFileEntry that this function used (USD).
- Return the values attached to these variables (PC3OPM:hasValue).
SPARQL Query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl#>
PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#>
SELECT ?VALUE
FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl#>
WHERE {
?WGB PC3OPM:wgbSource PC3:provVarDbEntryP2Detection_0 .
?WGB PC3OPM:wgbTarget ?FXN .
?USD PC3OPM:usdSource ?FXN .
?USD PC3OPM:usdTarget ?VAR .
?VAR rdf:type PC3OPM:CSVFileEntry .
?VAR PC3OPM:hasValue ?VALUE
}
Output:
----------------------------------------------------------------------------------------------------------------------------
| VALUE |
============================================================================================================================
| "/Data/J062941//P2_J062941_B001_P2fits0_20081115_P2Detection.csv-P2Detection"^^<http://www.w3.org/2001/XMLSchema#string> |
----------------------------------------------------------------------------------------------------------------------------
Core Query 2
Description:
- Identify any functions which used the artifact PC3:ReadCSVFileColumnNamesOutput_2 (this is a CSVFileEntry? corresponding to the table).
- Get the values of these processes, and only consider those with value (name) equal to "IsMatchTableColumnRanges".
SPARQL Query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl#>
PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#>
SELECT ?FXN
FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl#>
WHERE {
?USD PC3OPM:usdTarget PC3:ReadCSVFileColumnNamesOutput_2 .
?USD PC3OPM:usdSource ?FXN .
?FXN PC3OPM:hasValue ?VALUE
FILTER(?VALUE = "IsMatchTableColumnRanges") .
?WGB PC3OPM:wgbTarget ?FXN .
}
Output:
----------------------------------
| FXN |
==================================
| PC3:IsMatchTableColumnRanges_2 |
----------------------------------
Core Query 3
Note: this relies upon the Construct queries
ConstructOpWTB?,
ConstructOpWTBForEach?
Description:
- Identify a process which generated an artifact PC3:provVarDbEntryP2ImageMeta_0_(the image table entry) (WGB).
- List any data returning (as opposed to check returning or control flow) processes that directly or indirectly triggered the process above.
SPARQL Query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl#>
PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#>
SELECT ?FXN1 ?FXN2
FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl>
FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl>
FROM <http://onto.rpi.edu/sw4j/sparql?queryURL=http://www.cs.rpi.edu/~michaj6/provenance/queries/general/ConstructOpWTB.sparql>
FROM <http://onto.rpi.edu/sw4j/sparql?queryURL=http://www.cs.rpi.edu/~michaj6/provenance/queries/general/ConstructOpWTBForEach.sparql>
WHERE {
?WGB PC3OPM:wgbSource PC3:provVarDbEntryP2ImageMeta_0 .
?WGB PC3OPM:wgbTarget ?FXN1 .
?FXN1 PC3OPM:opWasTriggeredBy ?FXN2 .
?FXN2 a PC3OPM:DataRetProc
}
Output:
-------------------------------------------------------------
| FXN1 | FXN2 |
=============================================================
| PC3:LoadCSVFileIntoTable_1 | PC3:ReadCSVFileColumnNames_1 |
| PC3:LoadCSVFileIntoTable_1 | PC3:CreateEmptyLoadDB_0 |
| PC3:LoadCSVFileIntoTable_1 | PC3:ReadCSVReadyFile_0 |
-------------------------------------------------------------
Optional Query 1
Description:
- Log the time when the call to IsMatchCSVFileTables? (PC3:IsMatchCSVFileTables_0) was completed.
- In turn, log the time when the second call to IsExistsCSVFile? was completed (and failed).
SPARQL Query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/PC3Halt.owl#>
PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#>
SELECT ?FXN
FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3Halt.owl#>
WHERE {
?WGB PC3OPM:wgbTarget ?FXN .
?FXN PC3OPM:hasValue ?VALUE2
FILTER (?VALUE2 = "IsMatchTableColumnRanges") .
?WGB PC3OPM:wgbSource ?VAR .
?VAR PC3OPM:hasValue ?VALUE1
FILTER (?VALUE1 = "true") .
}
Output:
----------------------------------
| FXN |
==================================
| PC3:IsMatchTableColumnRanges_1 |
| PC3:IsMatchTableColumnRanges_0 |
----------------------------------
Optional Query 3
Description:
- Identify functions with value (name) equal to "IsMatchTableColumnRanges", and which generate an artifact with value equal to “true”.
- Since this is the last check done on each table, its completion indicates a table was both loaded and error free.
SPARQL Query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl#>
PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#>
SELECT ?Time_IsMatchCSVFileTables ?Time_IsExistsCSVFile
FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl>
WHERE {
?WGB1 PC3OPM:wgbTarget PC3:IsMatchCSVFileTables_0 .
?WGB1 PC3OPM:hasTime ?TIME1 .
?TIME1 PC3OPM:stopTime ?Time_IsMatchCSVFileTables .
?WGB2 PC3OPM:wgbTarget PC3:IsExistsCSVFile_1 .
?WGB2 PC3OPM:hasTime ?TIME2 .
?TIME2 PC3OPM:stopTime ?Time_IsExistsCSVFile .
}
Output:
-----------------------------------------------------------------------------------------------------------------------
| Time_IsMatchCSVFileTables | Time_IsExistsCSVFile |
=======================================================================================================================
| "1244069850697"^^<http://www.w3.org/2001/XMLSchema#long> | "1244069852894"^^<http://www.w3.org/2001/XMLSchema#long> |
-----------------------------------------------------------------------------------------------------------------------
Optional Query 5
Description:
- Fetch each instance of PC3OPM:EndState.
- Of these, filter out those that were triggered by the completion of the CompactDatabase? function (which indicates a successful workflow completion).
- The remaining workflows (or jobs) are those which halted.
SPARQL Query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/PC3ALLHalt.owl#>
PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#>
SELECT ?ACCOUNT
FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3ALLHalt.owl>
WHERE {
?ENDSTATE rdf:type PC3OPM:EndState .
?WTB PC3OPM:wtbSource ?ENDSTATE .
?WTB PC3OPM:wtbTarget ?FXN .
?FXN PC3OPM:hasValue ?VALUE
FILTER (?VALUE != "CompactDatabase") .
?ENDSTATE PC3OPM:hasAccount ?ACCOUNT .
}
Output:
---------------
| ACCOUNT |
===============
| PC3:J062943 |
---------------
Optional Query 6
Description:
- Identify any functions which generate artifacts used in the control flow checks (PC3OPM:ControlFlowArtifact) that evaluate to false.
- Since a workflow will halt on the first failed control flow check, only one such function should be found.
SPARQL Query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/PC3Halt.owl#>
PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#>
SELECT ?FXN
FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3Halt.owl>
WHERE {
?WGB PC3OPM:wgbTarget ?FXN .
?WGB PC3OPM:wgbSource ?VAR .
?VAR PC3OPM:hasValue ?VALUE
FILTER (?VALUE = "false") .
?VAR rdf:type PC3OPM:ControlFlowArtifact .
}
Output:
----------------------------------
| FXN |
==================================
| PC3:IsMatchTableColumnRanges_2 |
----------------------------------
Optional Query 8
Description:
- Return any processes in the data flow part of the workflow. Control flow checks can be disregarded for this.
SPARQL Query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/PC3Halt.owl#>
PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#>
SELECT ?FXN
FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl>
FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3Halt.owl>
WHERE {
?FXN rdf:type PC3OPM:DataFlowProc
}
Output:
-----------------------------------
| FXN |
===================================
| PC3:IsMatchTableColumnRanges_0 |
| PC3:CreateEmptyLoadDB_0 |
| PC3:LoadCSVFileIntoTable_2 |
| PC3:IsExistsCSVFile_0 |
| PC3:IsMatchTableRowCount_1 |
| PC3:ReadCSVFileColumnNames_1 |
| PC3:ReadCSVFileColumnNames_0 |
| PC3:UpdateComputedColumns_2 |
| PC3:IsMatchTableRowCount_2 |
| PC3:IsMatchCSVFileTables_0 |
| PC3:IsMatchCSVFileColumnNames_0 |
| PC3:UpdateComputedColumns_0 |
| PC3:IsMatchCSVFileColumnNames_2 |
| PC3:IsExistsCSVFile_2 |
| PC3:IsMatchTableColumnRanges_1 |
| PC3:IsExistsCSVFile_1 |
| PC3:LoadCSVFileIntoTable_1 |
| PC3:LoadCSVFileIntoTable_0 |
| PC3:ReadCSVFileColumnNames_2 |
| PC3:IsMatchCSVFileColumnNames_1 |
| PC3:IsCSVReadyFileExists_0 |
| PC3:IsMatchTableRowCount_0 |
| PC3:ReadCSVReadyFile_0 |
| PC3:IsMatchTableColumnRanges_2 |
| PC3:UpdateComputedColumns_1 |
-----------------------------------
Optional Query 10
Description:
- Return any processes in the data flow part of the workflow. Control flow checks can be disregarded for this.
SPARQL Query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl#>
PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#>
SELECT ?VAR
FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl#>
WHERE {
?FXN rdf:type PC3OPM:Process .
?FXN PC3OPM:hasValue ?VALUE
FILTER (?VALUE = "DirectAssertion") .
?WGB PC3OPM:wgbTarget ?FXN .
?WGB PC3OPM:wgbSource ?VAR .
}
Output:
---------------------
| VAR |
=====================
| PC3:CSVRootPath_0 |
| PC3:JobId_0 |
---------------------
Optional Query 11
Description:
- Identify artifacts created by a user (indicated by the process “DirectAssertion”).
- In turn, identify processes which used these artifacts.
SPARQL Query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl#>
PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#>
SELECT ?VAR ?FXN
FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl#>
WHERE {
?FXN1 rdf:type PC3OPM:Process .
?FXN1 PC3OPM:hasValue ?VALUE
FILTER (?VALUE = "DirectAssertion") .
?WGB PC3OPM:wgbSource ?VAR .
?WGB PC3OPM:wgbTarget ?FXN1 .
?USD PC3OPM:usdSource ?FXN .
?USD PC3OPM:usdTarget ?VAR
}
Output:
--------------------------------------------------
| VAR | FXN |
==================================================
| PC3:CSVRootPath_0 | PC3:ReadCSVReadyFile_0 |
| PC3:CSVRootPath_0 | PC3:IsCSVReadyFileExists_0 |
| PC3:JobId_0 | PC3:CreateEmptyLoadDB_0 |
--------------------------------------------------
Optional Query 12
Description:
- Identify functions with value (name) equal to "IsMatchTableColumnRanges", and which generate an artifact with value equal to “true”.
Since this is the last check done on each table, its completion indicates a table was both loaded and error free.
- For these functions, identify variables of type CSVFileEntry? that they used. The values of these variables correspond to the CSV File which was processed.
SPARQL Query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/PC3Halt.owl#>
PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#>
SELECT ?VALUE
FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl>
FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3Halt.owl>
WHERE {
?WGB PC3OPM:wgbTarget ?FXN .
?FXN PC3OPM:hasValue ?VALUE2
FILTER (?VALUE2 = "IsMatchTableColumnRanges") .
?WGB PC3OPM:wgbSource ?VAR1 .
?VAR1 PC3OPM:hasValue ?VALUE1
FILTER (?VALUE1 = "true") .
?USD PC3OPM:usdSource ?FXN .
?USD PC3OPM:usdTarget ?VAR .
?VAR rdf:type PC3OPM:CSVFileEntry .
?VAR PC3OPM:hasValue ?VALUE .
}
Output:
--------------------------------------------------
| VAR | FXN |
==================================================
| PC3:CSVRootPath_0 | PC3:ReadCSVReadyFile_0 |
| PC3:CSVRootPath_0 | PC3:IsCSVReadyFileExists_0 |
| PC3:JobId_0 | PC3:CreateEmptyLoadDB_0 |
--------------------------------------------------
Optional Query 13
Note: this relies upon the Construct queries
ConstructOpWTB?,
ConstructOpWTBForEach?
Description:
- Identify all processes in the data flow part of the workflow.
- List any data returning (as opposed to check returning or control flow) processes that directly or indirectly triggered the processes above.
SPARQL Query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl#>
PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#>
SELECT ?fxn1 ?fxn2
FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl>
FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl>
FROM <http://onto.rpi.edu/sw4j/sparql?queryURL=http://www.cs.rpi.edu/~michaj6/provenance/queries/general/ConstructOpWTB.sparql>
FROM <http://onto.rpi.edu/sw4j/sparql?queryURL=http://www.cs.rpi.edu/~michaj6/provenance/queries/general/ConstructOpWTBForEach.sparql>
WHERE {
?fxn1 PC3OPM:opWasTriggeredBy ?fxn2 .
?fxn2 a PC3OPM:DataRetProc .
?fxn1 a PC3OPM:DataFlowProc
}
Output:
------------------------------------------------------------------
| fxn1 | fxn2 |
==================================================================
| PC3:IsMatchTableColumnRanges_1 | PC3:ReadCSVFileColumnNames_1 |
| PC3:CompactDatabase_0 | PC3:ReadCSVFileColumnNames_1 |
| PC3:IsMatchCSVFileColumnNames_1 | PC3:ReadCSVFileColumnNames_1 |
| PC3:IsMatchTableRowCount_1 | PC3:ReadCSVFileColumnNames_1 |
| PC3:LoadCSVFileIntoTable_1 | PC3:ReadCSVFileColumnNames_1 |
| PC3:UpdateComputedColumns_1 | PC3:ReadCSVFileColumnNames_1 |
| PC3:IsExistsCSVFile_1 | PC3:CreateEmptyLoadDB_0 |
| PC3:ReadCSVFileColumnNames_1 | PC3:CreateEmptyLoadDB_0 |
| PC3:IsMatchCSVFileColumnNames_2 | PC3:CreateEmptyLoadDB_0 |
| PC3:IsMatchTableRowCount_0 | PC3:CreateEmptyLoadDB_0 |
| PC3:ReadCSVFileColumnNames_0 | PC3:CreateEmptyLoadDB_0 |
| PC3:LoadCSVFileIntoTable_0 | PC3:CreateEmptyLoadDB_0 |
| PC3:IsMatchTableColumnRanges_0 | PC3:CreateEmptyLoadDB_0 |
| PC3:IsExistsCSVFile_0 | PC3:CreateEmptyLoadDB_0 |
| PC3:UpdateComputedColumns_0 | PC3:CreateEmptyLoadDB_0 |
| PC3:CompactDatabase_0 | PC3:CreateEmptyLoadDB_0 |
| PC3:IsMatchCSVFileColumnNames_0 | PC3:CreateEmptyLoadDB_0 |
| PC3:ReadCSVFileColumnNames_2 | PC3:CreateEmptyLoadDB_0 |
| PC3:IsMatchTableRowCount_2 | PC3:CreateEmptyLoadDB_0 |
| PC3:UpdateComputedColumns_2 | PC3:CreateEmptyLoadDB_0 |
| PC3:IsMatchTableRowCount_1 | PC3:CreateEmptyLoadDB_0 |
| PC3:LoadCSVFileIntoTable_1 | PC3:CreateEmptyLoadDB_0 |
| PC3:UpdateComputedColumns_1 | PC3:CreateEmptyLoadDB_0 |
| PC3:IsExistsCSVFile_2 | PC3:CreateEmptyLoadDB_0 |
| PC3:IsMatchTableColumnRanges_1 | PC3:CreateEmptyLoadDB_0 |
| PC3:LoadCSVFileIntoTable_2 | PC3:CreateEmptyLoadDB_0 |
| PC3:IsMatchTableColumnRanges_2 | PC3:CreateEmptyLoadDB_0 |
| PC3:IsMatchCSVFileColumnNames_1 | PC3:CreateEmptyLoadDB_0 |
| PC3:IsExistsCSVFile_1 | PC3:ReadCSVReadyFile_0 |
| PC3:ReadCSVFileColumnNames_1 | PC3:ReadCSVReadyFile_0 |
| PC3:IsMatchCSVFileColumnNames_2 | PC3:ReadCSVReadyFile_0 |
| PC3:IsMatchTableRowCount_0 | PC3:ReadCSVReadyFile_0 |
| PC3:ReadCSVFileColumnNames_0 | PC3:ReadCSVReadyFile_0 |
| PC3:LoadCSVFileIntoTable_0 | PC3:ReadCSVReadyFile_0 |
| PC3:IsMatchTableColumnRanges_0 | PC3:ReadCSVReadyFile_0 |
| PC3:IsExistsCSVFile_0 | PC3:ReadCSVReadyFile_0 |
| PC3:UpdateComputedColumns_0 | PC3:ReadCSVReadyFile_0 |
| PC3:CompactDatabase_0 | PC3:ReadCSVReadyFile_0 |
| PC3:CreateEmptyLoadDB_0 | PC3:ReadCSVReadyFile_0 |
| PC3:IsMatchCSVFileColumnNames_0 | PC3:ReadCSVReadyFile_0 |
| PC3:ReadCSVFileColumnNames_2 | PC3:ReadCSVReadyFile_0 |
| PC3:IsMatchTableRowCount_2 | PC3:ReadCSVReadyFile_0 |
| PC3:LoadCSVFileIntoTable_1 | PC3:ReadCSVReadyFile_0 |
| PC3:IsMatchTableRowCount_1 | PC3:ReadCSVReadyFile_0 |
| PC3:UpdateComputedColumns_2 | PC3:ReadCSVReadyFile_0 |
| PC3:UpdateComputedColumns_1 | PC3:ReadCSVReadyFile_0 |
| PC3:IsMatchCSVFileTables_0 | PC3:ReadCSVReadyFile_0 |
| PC3:IsExistsCSVFile_2 | PC3:ReadCSVReadyFile_0 |
| PC3:IsMatchTableColumnRanges_1 | PC3:ReadCSVReadyFile_0 |
| PC3:LoadCSVFileIntoTable_2 | PC3:ReadCSVReadyFile_0 |
| PC3:IsMatchTableColumnRanges_2 | PC3:ReadCSVReadyFile_0 |
| PC3:IsMatchCSVFileColumnNames_1 | PC3:ReadCSVReadyFile_0 |
| PC3:IsMatchCSVFileColumnNames_2 | PC3:ReadCSVFileColumnNames_2 |
| PC3:CompactDatabase_0 | PC3:ReadCSVFileColumnNames_2 |
| PC3:LoadCSVFileIntoTable_2 | PC3:ReadCSVFileColumnNames_2 |
| PC3:IsMatchTableColumnRanges_2 | PC3:ReadCSVFileColumnNames_2 |
| PC3:IsMatchTableRowCount_2 | PC3:ReadCSVFileColumnNames_2 |
| PC3:UpdateComputedColumns_2 | PC3:ReadCSVFileColumnNames_2 |
| PC3:IsMatchTableRowCount_0 | PC3:ReadCSVFileColumnNames_0 |
| PC3:IsMatchTableColumnRanges_0 | PC3:ReadCSVFileColumnNames_0 |
| PC3:LoadCSVFileIntoTable_0 | PC3:ReadCSVFileColumnNames_0 |
| PC3:CompactDatabase_0 | PC3:ReadCSVFileColumnNames_0 |
| PC3:UpdateComputedColumns_0 | PC3:ReadCSVFileColumnNames_0 |
| PC3:IsMatchCSVFileColumnNames_0 | PC3:ReadCSVFileColumnNames_0 |
------------------------------------------------------------------
Insert Transitive WasTriggeredBy? Relation
Description:
Find one of two patterns in the workflow data
- A function X was triggered by another function X2 (where the relation itself is an instance of the class PC3OPM:wasTriggeredBy)
- A function X used a variable Y, which was generated by another function X2
For each of these patterns, create a direct transitive relationship between X and X2 (called PC3OPM:opWasTriggeredBy)
SPARQL Query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl#>
PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#>
CONSTRUCT { ?FXN PC3OPM:opWasTriggeredBy ?FXN2 }
FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl>
FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl>
WHERE {
{ ?WTB PC3OPM:wtbSource ?FXN . ?WTB PC3OPM:wtbTarget ?FXN2 }
UNION
{
?USD PC3OPM:usdSource ?FXN . ?USD PC3OPM:usdTarget ?VAR .
?WGB PC3OPM:wgbSource ?VAR . ?WGB PC3OPM:wgbTarget ?FXN2
}
}
Insert Transitive WasTriggeredBy? - ForEach? Relation
Description:
- Find a function X which was triggered by the process ForEach? (PC3:ForEach_0)
- In turn, find a process X2 which was triggered by the ForEach? process.
SPARQL Query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl#>
PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#>
CONSTRUCT { ?FXN1 PC3OPM:opWasTriggeredBy ?FXN2 }
FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl>
FROM <http://onto.rpi.edu/sw4j/sparql?queryURL=http://www.cs.rpi.edu/~michaj6/provenance/queries/general/ConstructOpWTB.sparql>
WHERE {
?FXN1 PC3OPM:opWasTriggeredBy PC3:ForEach_0 .
PC3:ForEach_0 PC3OPM:opWasTriggeredBy ?FXN2 .
}
Insert Transitive WasDerivedFrom? Relation
Description:
Find one of two patterns in the workflow data
- A variable X was derived from another variable X2 (where the relation itself is an instance of the class PC3OPM:wasDerivedFrom)
- A variable X was generated by a function Y, which used another variable X2
For each of these patterns, create a direct transitive relationship between X and X2 (called PC3OPM:opWasDerivedFrom)
SPARQL Query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl#>
PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#>
CONSTRUCT
{ ?VAR PC3OPM:opWasDerivedFrom ?VAR2 }
FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl>
FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl>
WHERE {
{ ?WDF PC3OPM:wdfSource ?VAR . ?WDF PC3OPM:wdfTarget ?VAR2 }
UNION
{
?USD PC3OPM:wgbSource ?VAR . ?USD PC3OPM:wgbTarget ?FXN .
?WGB PC3OPM:usdSource ?FXN . ?WGB PC3OPM:usdTarget ?VAR2
}
}
Suggested Workflow Variants
None Yet
Suggested Queries
None Yet
Suggestions for Modification of the Open Provenance Model
None Yet
Conclusions
to top