NCSA Provenance Challenge, CyberIntegrator?
- NCSA Provenance Challenge, CyberIntegrator
- Participating Team
- Workflow implementation and provenance trace
- How provenance was captured
- Provenance queries
- #1. Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc.
- #2. Find the process that led to Atlas X Graphic, excluding everything prior to the averaging of images with softmean.
- #3. Find the Stage 3, 4 and 5 details of the process that led to Atlas X Graphic.
- #4. Find all invocations of procedure align_warp using a twelfth order nonlinear 1365 parameter model (see model menu describing possible values of parameter "-m 12" of align_warp) that ran on a Monday.
- #5. Find all Atlas Graphic images outputted from workflows where at least one of the input Anatomy Headers had an entry global maximum=4095. The contents of a header file can be extracted as text using the scanheader AIR utility.
- #6. Find all output averaged images of softmean (average) procedures, where the warped images taken as input were align_warped using a twelfth order nonlinear 1365 parameter model, i.e. "where softmean was preceded in the workflow, directly or indirectly, by an align_warp procedure with argument -m 12."
- #7. A user has run the workflow twice, in the second instance replacing each procedures (convert) in the final stage with two procedures: pgmtoppm, then pnmtojpeg. Find the differences between the two workflow runs. The exact level of detail in the difference that is detected by a system is up to each participant.
- #8: A user has annotated some anatomy images with a key-value pair center=UChicago. Find the outputs of align_warp where the inputs are annotated with center=UChicago.
- #9. A user has annotated some atlas graphics with key-value pair where the key is studyModality. Find all the graphical atlas sets that have metadata annotation studyModality with values speech, visual or audio, and return all other annotations to these files.
Participating Team
- Short team name: National Center for Supercomputing Applications
- Participant names: Joe Futrelle, Jim Myers, Peter Bajcsy, Luigi Marini, Sang-Chul Lee
- Project URL: http://cleaner.ncsa.uiuc.edu/, http://isda.ncsa.uiuc.edu/ecid/intro.html
- Project Overview: Environmental observatory with meta-workflow, social networking, and content management
- Provenance-specific Overview: harvesting triples from workflow and portal components
- Relevant Publications:
- Bajcsy P., R. Kooper, D. Marini, D. Clutter and M. Markus, "Visualization and data mining tools applied to algal biomass prediction in Illinois streams," The 7th Intern. Conference on Hydroinformatics, September 4-8, 2006, Nice, France.
- Marini L., R. Kooper, B. Minsker, J. Myers and P. Bajcsy, CyberIntegrator?: A Meta-Workflow System Designed for Solving Complex Scientific Problems using Heterogeneous Tools, the NSF EO Modeling Workshop , poster, May 16-18, 2006, Tucson, AZ.
- Bajcsy P, R. Kooper, L. Marini, B. Minsker and J. Myers, "CyberIntegrator: A Meta-Workflow System Designed for Solving Complex Scientific Problems using Heterogeneous Tools," the Geoinformatics conference, May 10-12, 2006, the USGS National Center in Reston,Virginia.
- Kooper R, L. Marini, B. Minsker, J. Myers and P. Bajcsy, "A Meta-Workflow System Designed for Solving Complex Scientific Problems using Heterogeneous Tools," the 2006 Winter Federation of Earth Science Information Partners ("Federation") Conference, poster, January 4-6, 2006 in Washington, DC.
- Bajcsy P., R. Kooper, L. Marini, B. Minsker and J. Myers, "A Meta-Workflow Cyber-infrastructure System Designed for Environmental Observatories," Technical Report: NCSA Cyber-environments Division, ISDA01-2005, December 30, 2005.
Workflow implementation and provenance trace
A detailed narrative and provenance trace are attached. In the case of the example workflow, the task was run interactively rather than in batch mode; in other words there is no trace of the workflow structure other than the execution trace.
How provenance was captured
CyberIntegrator? is instrumented to push triples via JDBC to an intermediate Oracle store where they are harvested
into multiple
Kowari servers. This is a completely different way of getting the triples into Kowari than
we employed for the
NcsaD2k case.
Provenance queries
#1. Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc.
To do this we need transitive closure on the property of one step
having as input the output of another step, which we'll call
"precedence". Kowari can only compute transitive closure
per-predicate, so this needs to be collapsed into a single predicate
as follows:
insert
select $this <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $next
from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where
$this <http://ecid.ncsa.uiuc.edu/md/mwf#hasOutput> $out and
$next <http://ecid.ncsa.uiuc.edu/md/mwf#hasInput> $out
into <rmi://badger.ncsa.uiuc.edu/server1#cipc>;
This query starts with the step that outputted atlas x graphic and
finds all preceding modules. Note that because of how
CyberIntegrator?
represents inputs and outputs, we can only match on pathnames, not on
true file identity. Looking at the #hasFilename predicate, we see that
there's a file named
"D:\\sclee\\Provenance\\output\\prov-chal6\\atlas-X.gif". We can walk
the provenance graph back from this file to get all the steps that
preceded it in the workflow:
select $step
from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where
($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasOutput> $out and
$out <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> 'D:\\sclee\\Provenance\\output\\prov-chal6\\atlas-X.gif')
or
($end <http://ecid.ncsa.uiuc.edu/md/mwf#hasOutput> $out and
$out <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> 'D:\\sclee\\Provenance\\output\\prov-chal6\\atlas-X.gif' and
($step <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $end or
trans ($step <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $end)));
To describe the process, we can return all triples on those steps:
select $step $p $o
from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where
(($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasOutput> $out and
$out <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> 'D:\\sclee\\Provenance\\output\\prov-chal6\\atlas-X.gif')
or
($end <http://ecid.ncsa.uiuc.edu/md/mwf#hasOutput> $out and
$out <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> 'D:\\sclee\\Provenance\\output\\prov-chal6\\atlas-X.gif' and
($step <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $end or
trans ($step <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $end))))
and $step $p $o;
This is informative, but it's more informative when property key/value pairs are also returned. Unlike D2K,
CyberIntegrator? groups properties into higher-level structures called parameters, so we have to walk a little further in the graph from the steps that we find.
Here we exploit the fact that parameters are not shared between steps to simplify our query.
select $s $p $o
from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where
(($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasOutput> $out and
$out <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> 'D:\\sclee\\Provenance\\output\\prov-chal6\\atlas-X.gif')
or
($end <http://ecid.ncsa.uiuc.edu/md/mwf#hasOutput> $out and
$out <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> 'D:\\sclee\\Provenance\\output\\prov-chal6\\atlas-X.gif' and
($step <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $end or
trans ($step <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $end))))
and
(($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param and
$s <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param and
$s $p $o) or
($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $s and
$s $p $o) or
($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param and
$param <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $s and
$s $p $o));
#2. Find the process that led to Atlas X Graphic, excluding everything prior to the averaging of images with softmean.
In the execution trace, steps are annotated with a property indicating what function was run. To find the step that executed softmean, we need to find one having the property "function=Prov3(softMean)". We can do this with the following query:
select $step
from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where
$step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param and
$param <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop and
$prop <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and
$prop <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov3(softMean)';
Using the transitive closure of the #precedes predicate, we can find all following steps:
select $step
from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where
$softmean <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param and
$param <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop and
$prop <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and
$prop <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov3(softMean)' and
$avantSoftmean <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $softmean
and
($avantSoftmean <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $step or
trans ($avantSoftmean <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $step));
Now we can constrain it as in query #1, to capture which of those modules contributed to Atlas X Graphic:
select $step
from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where
$softmean <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param and
$param <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop and
$prop <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and
$prop <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov3(softMean)' and
$avantSoftmean <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $softmean
and
($avantSoftmean <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $step or
trans ($avantSoftmean <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $step))
and (
($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasOutput> $out and
$out <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> 'D:\\sclee\\Provenance\\output\\prov-chal6\\atlas-X.gif')
or
($end <http://ecid.ncsa.uiuc.edu/md/mwf#hasOutput> $out and
$out <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> 'D:\\sclee\\Provenance\\output\\prov-chal6\\atlas-X.gif' and
($step <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $end or
trans ($step <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $end)))
);
Now we can get all triples on these steps and their properties, as in #1:
select $s $p $o
from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where
$softmean <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param and
$param <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop and
$prop <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and
$prop <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov3(softMean)' and
$avantSoftmean <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $softmean
and
($avantSoftmean <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $step or
trans ($avantSoftmean <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $step))
and (
($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasOutput> $out and
$out <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> 'D:\\sclee\\Provenance\\output\\prov-chal6\\atlas-X.gif')
or
($end <http://ecid.ncsa.uiuc.edu/md/mwf#hasOutput> $out and
$out <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> 'D:\\sclee\\Provenance\\output\\prov-chal6\\atlas-X.gif' and
($step <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $end or
trans ($step <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $end)))
)
and
(($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param2 and
$s <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param2 and
$s $p $o) or
($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $s and
$s $p $o) or
($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param3 and
$param3 <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $s and
$s $p $o));
#3. Find the Stage 3, 4 and 5 details of the process that led to Atlas X Graphic.
CyberIntegrator? doesn't have a concept of workflow "stages," so our knowledge of the strategy the author used is external to
CyberIntegrator? and we need to add that information as annotations. We can characterize the stages as follows: stage 3 is the softmean stage, stage 4 is the slicer stage, and stage 5 is the convert stage.
The following query adds a predicate to all the stage 3, 4, and 5 steps describing which stage they're in. The query keys on the function property, because in the example workflow the function property is sufficient to identify the steps. This is of course not true in the general case.
insert
select $step <http://ecid.ncsa.uiuc.edu/md/mwf#inStage> $stage
from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where
($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param and
$param <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop and
$prop <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and
$prop <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov3(softMean)' and
$stage <http://tucana.org/tucana#is> '3') or
($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param and
$param <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop and
$prop <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and
$prop <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov4(3DSlice)' and
$stage <http://tucana.org/tucana#is> '4') or
($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param and
$param <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop and
$prop <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and
$prop <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov5(convert)' and
$stage <http://tucana.org/tucana#is> '5')
into <rmi://badger.ncsa.uiuc.edu/server1#cipc>;
this query retrieves the ids of all the modules in steps 3, 4, and 5, and the statements and properties associated with them:
select $s $p $o
from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where
($step <http://ecid.ncsa.uiuc.edu/md/mwf#inStage> '3' or
$step <http://ecid.ncsa.uiuc.edu/md/mwf#inStage> '4' or
$step <http://ecid.ncsa.uiuc.edu/md/mwf#inStage> '5')
and
(($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param and
$s <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param and
$s $p $o) or
($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $s and
$s $p $o) or
($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param and
$param <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $s and
$s $p $o));
#4. Find all invocations of procedure align_warp using a twelfth order nonlinear 1365 parameter model (see model menu describing possible values of parameter "-m 12" of align_warp) that ran on a Monday.
We can't quite answer this query from the execution trace data because in that data, the command line arguments are not separated from one another but appear together. But our technique would be no different if they had been split up. So for the purposes of the challenge, we will search for "-m 12 -q" instead of "-m 12". Strangely, in the execution trace the property holding the command line arguments is called "name" and it's the parameter, not the property, that identifies it as a command-line option.
iTQL doesn't support date arithmetic, so here we'll just match the align_warps with the given options and return timestamps along with them:
select $step $start $end
from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where
$step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param1 and
$param1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop1 and
$prop1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and
$prop1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov1(warp)' and
$step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param2 and
$param2 <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameterName> 'optionString' and
$param2 <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop2 and
$prop2 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'name' and
$prop2 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> '-m 12 -q' and
$execution <http://ecid.ncsa.uiuc.edu/md/mwf#executionOf> $step and
$execution <http://ecid.ncsa.uiuc.edu/md/mwf#startOn> $start and
$execution <http://ecid.ncsa.uiuc.edu/md/mwf#endOn> $end;
#5. Find all Atlas Graphic images outputted from workflows where at least one of the input Anatomy Headers had an entry global maximum=4095. The contents of a header file can be extracted as text using the scanheader AIR utility.
The workflow doesn't run scanheader (that's not part of the example workflow, so we didn't add it to our workflow). However it does identify header files as inputs so we can extract the values if we have the file data handy, and add nodes to the execution trace containing header keys and values.
CyberIntegrator? identifies objects that are used as inputs and outputs, and associates those objects with pathnames. This query will get us the pathnames that are associated with inputs to warp_align. In this implementation of the workflow, only the header files are given as inputs:
select $input $path
from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where
$step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param1 and
$param1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop1 and
$prop1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and
$prop1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov1(warp)' and
$step <http://ecid.ncsa.uiuc.edu/md/mwf#hasInput> $input and
$input <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> $path;
Given the output of this query, we can scan the header files and produce RDF describing them with the following Perl script. Note that the script has to know a priori how to map local paths to the ones used in the workflow description. A better solution would be if each dataset had a globally unique id independent of where it is physically stored.
#!/usr/bin/perl
$AIR_BIN="../AIR/bin";
$LOCAL_DATA_DIR="../data";
$WORKFLOW_DATA_DIR="D:\\\\\\\\sclee\\\\\\\\Provenance\\\\\\\\output\\\\\\\\prov-chal6\\\\\\\\";
$ix=1;
while(<>) {
chomp;
($input,$workflowFile) = split /\t/;
($localFile = $workflowFile) =~ s/^"${WORKFLOW_DATA_DIR}(.*)"/${LOCAL_DATA_DIR}\/\1/;
open S,"${AIR_BIN}/scanheader $localFile |";
while(<S>) {
chomp;
next if /^$/;
($name,$value) = split /=/;
$header="http://ecid.ncsa.uiuc.edu/md/mwf#header_${ix}";
print "<${input}> <http://ecid.ncsa.uiuc.edu/md/mwf#hasHeader> <${header}>\n";
print "<${header}> <http://ecid.ncsa.uiuc.edu/md/mwf#hasHeaderName> '${name}'\n";
print "<${header}> <http://ecid.ncsa.uiuc.edu/md/mwf#hasHeaderValue> '${value}'\n";
$ix++;
}
}
The script generates the following output which can be inserted directly into Kowari (note that this is formatted in iTQL and not a standard RDF serialization):
Now we can do the query ("Find all Atlas Graphic images outputted from workflows where at least one of the input Anatomy Headers had an entry global maximum=4095"). We know that align_warp takes anatomy images as inputs, so we can look at those inputs to see if they have matching header values, and find the associated steps:
select $step
from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where
$header <http://ecid.ncsa.uiuc.edu/md/mwf#hasHeaderName> 'global maximum' and
$header <http://ecid.ncsa.uiuc.edu/md/mwf#hasHeaderValue> '4095' and
$in <http://ecid.ncsa.uiuc.edu/md/mwf#hasHeader> $header and
$step <http://ecid.ncsa.uiuc.edu/md/mwf#hasInput> $in and
$step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param1 and
$param1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop1 and
$prop1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and
$prop1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov1(warp)';
Now we need to find all the atlas graphic images resulting from any of these modules. We walk from the modules with the files-of-interest as inputs until we hit a "convert" step, which we know has an atlas graphic as an output:
select $pathname
from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where
$header <http://ecid.ncsa.uiuc.edu/md/mwf#hasHeaderName> 'global maximum' and
$header <http://ecid.ncsa.uiuc.edu/md/mwf#hasHeaderValue> '4095' and
$in <http://ecid.ncsa.uiuc.edu/md/mwf#hasHeader> $header and
$step <http://ecid.ncsa.uiuc.edu/md/mwf#hasInput> $in and
$step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param1 and
$param1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop1 and
$prop1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and
$prop1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov1(warp)' and
trans ($step <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $end) and
$end <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param2 and
$param2 <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop2 and
$prop2 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and
$prop2 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov5(convert)' and
$end <http://ecid.ncsa.uiuc.edu/md/mwf#hasOutput> $out and
$out <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> $pathname;
(which returns all three atlas graphic images).
#6. Find all output averaged images of softmean (average) procedures, where the warped images taken as input were align_warped using a twelfth order nonlinear 1365 parameter model, i.e. "where softmean was preceded in the workflow, directly or indirectly, by an align_warp procedure with argument -m 12."
We can answer this query by combining the conditions in the query with traversing the transitive closure of the precedence predicate (see #1).
select $softmean $alignWarp $pathname
from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where
$prop1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and
$prop1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov3(softMean)' and
$param1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop1 and
$softmean <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param1 and
$softmean <http://ecid.ncsa.uiuc.edu/md/mwf#hasOutput> $out and
$out <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> $pathname and
trans ($alignWarp <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $softmean) and
$alignWarp <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param2 and
$param2 <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop2 and
$prop2 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and
$prop2 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov1(warp)' and
$alignWarp <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param3 and
$param3 <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameterName> 'optionString' and
$param3 <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop3 and
$prop3 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'name' and
$prop3 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> '-m 12 -q';
#7. A user has run the workflow twice, in the second instance replacing each procedures (convert) in the final stage with two procedures: pgmtoppm, then pnmtojpeg. Find the differences between the two workflow runs. The exact level of detail in the difference that is detected by a system is up to each participant.
Not sure what this means. Graph diffs can be computed between the compute nodes and input/output edges, or statistics profiling distribution of execution times across the runs, parameters could be compared, etc.
#8: A user has annotated some anatomy images with a key-value pair center=UChicago. Find the outputs of align_warp where the inputs are annotated with center=UChicago.
We can annotate any input or output with a pathname we recognize as one of the anatomy images. To find the inputs/outputs associated with "anatomy1.hdr" or "anatomy3.hdr" (remember, only the headers are represented as inputs), we can do this query:
select $io
from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where
$io <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> 'D:\\sclee\\Provenance\\output\\prov-chal6\\anatomy1.hdr' or
$io <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> 'D:\\sclee\\Provenance\\output\\prov-chal6\\anatomy3.hdr';
Given the output of this query, we can insert annotations. For example:
insert
<http://ecid.ncsa.uiuc.edu/md/mwf#data_e74eb542-264d-4a23-8406-0a1aa8a4ef95> <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotation> $ann1
$ann1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationName> 'center'
$ann1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationValue> 'UChicago'
<http://ecid.ncsa.uiuc.edu/md/mwf#data_2820fa46-d2c7-416e-b01c-df80db3ab63a> <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotation> $ann2
$ann2 <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationName> 'center'
$ann2 <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationValue> 'UChicago'
into <rmi://badger.ncsa.uiuc.edu/server1#cipc>;
Now we can perform the query:
select $out $pathname
from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where
$step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param1 and
$param1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop1 and
$prop1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and
$prop1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov1(warp)' and
$step <http://ecid.ncsa.uiuc.edu/md/mwf#hasOutput> $out and
$out <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> $pathname and
$step <http://ecid.ncsa.uiuc.edu/md/mwf#hasInput> $in and
$in <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotation> $ann and
$ann <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationName> 'center' and
$ann <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationValue> 'UChicago';
#9. A user has annotated some atlas graphics with key-value pair where the key is studyModality. Find all the graphical atlas sets that have metadata annotation studyModality with values speech, visual or audio, and return all other annotations to these files.
From the workflow, we can infer that if a file is the output of "convert", it's an atlas graphic. That amounts to this query:
select $out $pathname
from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where
$step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param1 and
$param1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop1 and
$prop1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and
$prop1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov5(convert)' and
$step <http://ecid.ncsa.uiuc.edu/md/mwf#hasOutput> $out and
$out <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> $pathname;
Now we need to add the annotations, using the same strategy as query #8:
For atlas x:
insert
<http://ecid.ncsa.uiuc.edu/md/mwf#data_7c9e83bf-5bf0-4ec8-8cce-90e0cc9b0aeb> <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotation> $ann1
$ann1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationName> 'studyModality'
$ann1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationValue> 'speech'
<http://ecid.ncsa.uiuc.edu/md/mwf#data_7c9e83bf-5bf0-4ec8-8cce-90e0cc9b0aeb> <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotation> $ann2
$ann2 <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationName> 'foo'
$ann2 <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationValue> 'bar'
<http://ecid.ncsa.uiuc.edu/md/mwf#data_7c9e83bf-5bf0-4ec8-8cce-90e0cc9b0aeb> <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotation> $ann3
$ann3 <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationName> 'foo'
$ann3 <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationValue> 'quux'
into <rmi://badger.ncsa.uiuc.edu/server1#cipc>;
For atlas y:
insert
<http://ecid.ncsa.uiuc.edu/md/mwf#data_282191c5-319a-4eb1-8a6f-d35ff34cc02a> <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotation> $ann1
$ann1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationName> 'studyModality'
$ann1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationValue> 'tactile'
<http://ecid.ncsa.uiuc.edu/md/mwf#data_282191c5-319a-4eb1-8a6f-d35ff34cc02a> <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotation> $ann2
$ann2 <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationName> 'foo'
$ann2 <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationValue> 'fnord'
into <rmi://badger.ncsa.uiuc.edu/server1#cipc>;
For atlas z:
insert
<http://ecid.ncsa.uiuc.edu/md/mwf#data_c3cc7405-a148-4c17-93a0-4b2ec1c36c7d> <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotation> $ann1
$ann1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationName> 'studyModality'
$ann1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationValue> 'visual'
into <rmi://badger.ncsa.uiuc.edu/server1#cipc>;
Now we can perform the query. The subquery produces a nested table which groups the annotation key/value pairs by which output they're associated with.
select $out
subquery(select $name $value
from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where
$out <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotation> $otherAnn and
$otherAnn <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationName> $name and
$otherAnn <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationValue> $value)
from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where
$step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param1 and
$param1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop1 and
$prop1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and
$prop1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov5(convert)' and
$step <http://ecid.ncsa.uiuc.edu/md/mwf#hasOutput> $out and
$out <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotation> $ann and
$ann <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationName> 'studyModality' and
($ann <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationValue> 'speech' or
$ann <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationValue> 'audio' or
$ann <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationValue> 'visual');
--
JoeFutrelle - 12 Sep 2006
to top