Skip to topic | Skip to bottom

Provenance Challenge


Start of topic | Skip to actions

First Provenance Challenge


The provenance challenge aims to establish an understanding of the capabilities of available provenance-related systems and, in particular, the following details.

To help achieve the aims, we define a simple example workflow that forms the basis of the challenge. It is inspired from a real experiment, in the area of Functional Magnetic Resonance Imaging (fMRI). Here, we use the term workflow to denote a series of procedures being performed in a system, each taking some data as input and producing other data as output. We do not assume that these procedures must use some particular form of technology (EXE files, Web Services etc.) or that the workflow is explicitly defined in a workflow technology (BPEL, compiled executable, Scufl, batch file etc.), but individual participants will adopt their technology of choice.

Our focus in this challenge is on provenance and not on running the experiment. Hence, to facilitate take-up, while based on a real experiment, the procedures can be implemented as "dummies", i.e. we provide the input, output and intermediate data and participants can use fake procedures that take the right input and produce the right output. Alternatively, participants can actually execute the real workflow after installing the necessary libaries. In addition to this, we define a set of core queries that all partipicipants should show how they address, so we can compare systems.

Each participant in the challenge will have their own page on this TWiki, following the ChallengeTemplate, where they can inform the rest of their efforts in meeting the challenge. During the provenance challenge, we expect the participants to upload the following to their page, to then allow comparison.

Optionally, the participants may like to contribute the following.

Participants should not be too concerned about whether extensions to the workflow are scientific realistic: they are explicitly contrived to demonstrate aspects of their system.

Example Workflow

We propose an example workflow for creating population-based "brain atlases" from the fMRI Data Center's archive of high resolution anatomical data. The workflow is shown below (click for a pdf version of the image).


It is comprised of procedures, shown as orange ovals, and data items flowing between them, shown as rectangles. It can be seen as five stages, where each stage is depicted as a horizontal row of the same procedure in the figure. Note that the term stage is introduced only to help description of the workflow, and we do not dictate how it is apparent in a concrete implementation. The procedures employ the AIR (automated image registration) suite to create an averaged brain from a collection of high resolution anatomical data, and the FSL suite to create 2D images across each sliced dimension of the brain. In addition to the data items shown in the figure, there are other inputs to procedures (constant string options), defined below.

The inputs to a workflow are a set of new brain images (Anatomy Image 1 to 4) and a single reference brain image (Reference Image). All input images are 3D scans of a brain of varying resolutions, so that different features are evident. For each image, there is the actual image and the metadata information for that image (Anatomy Header 1 to 4). The image data was published with article Frontal-Hippocampal Double Dissociation Between Normal Aging and Alzheimer's Disease by Head, D, Synder, AZ, Girton, LE, Morris, JC, Buckner, RL in the fMRI Data Center Accession Number: 2-2004-1168X.

The stages of the workflow are as follows.

  1. For each new brain image, align_warp compares the reference image to determine how the new image should be warped, i.e. the position and shape of the image adjusted, to match the reference brain. The output of each procedure in the stage is a _warp parameter set_ defining the spatially transformation to be performed (Warp Params 1 to 4).
  2. For each warp parameter set, the actual transformation of the image is done by reslice, which creates a new version of the original new brain image with the configuration defined in the warp parameter set. The output is a resliced image.
  3. All the resliced images are averaged into one single image using softmean.
  4. For each dimension (x, y and z), the averaged image is sliced to give a 2D atlas along a plane in that dimension, taken through the centre of the 3D image. The output is an atlas data set, using slicer. This tool can be downloaded as part of the FSL suite, available at
  5. For each atlas data set, it is converted into a graphical atlas image using (the ImageMagick utility) convert.

The full steps, procedures data and parameters are enumerated in the table below. The procedure names are linked to the manual pages for those utilities, and the input and output names to the actual data exchanged between procedures.

Step Procedure Data Role Item 1 Item 2 Item 3 Item 4
1 align_warp Inputs Anatomy Image 1 Anatomy Header 1 Reference Image Reference Header
Outputs Warp Parameters 1      
Parameters -m 12 -q
2 align_warp Inputs Anatomy Image 2 Anatomy Header 2 Reference Image Reference Header
Outputs Warp Parameters 2      
Parameters -m 12 -q
3 align_warp Inputs Anatomy Image 3 Anatomy Header 3 Reference Image Reference Header
Outputs Warp Parameters 3      
Parameters -m 12 -q
4 align_warp Inputs Anatomy Image 4 Anatomy Header 4 Reference Image Reference Header
Outputs Warp Parameters 4      
Parameters -m 12 -q
5 reslice Inputs Warp Parameters 1      
Outputs Resliced Image 1 Resliced Header 1    
6 reslice Inputs Warp Parameters 2      
Outputs Resliced Image 2 Resliced Header 2    
7 reslice Inputs Warp Parameters 3      
Outputs Resliced Image 3 Resliced Header 3    
8 reslice Inputs Warp Parameters 4      
Outputs Resliced Image 4 Resliced Header 4    
9 softmean Inputs Resliced Image 1 Resliced Header 1 Resliced Image 2 Resliced Header 2
Inputs Resliced Image 3 Resliced Header 3 Resliced Image 4 Resliced Header 4
Outputs Atlas Image Atlas Header    
Parameters y null
10 slicer (download) Inputs Atlas Image Atlas Header    
Outputs Atlas X Slice      
Parameters -x .5
11 slicer (download) Inputs Atlas Image Atlas Header    
Outputs Atlas Y Slice      
Parameters -y .5
12 slicer (download) Inputs Atlas Image Atlas Header    
Outputs Atlas Z Slice      
Parameters -z .5
13 convert Inputs Atlas X Slice      
Outputs Atlas X Graphic      
14 convert Inputs Atlas Y Slice      
Outputs Atlas Y Graphic      
15 convert Inputs Atlas Z Slice      
Outputs Atlas Z Graphic      

Core Provenance Queries

An initial set of provenance-related queries is given below.

  1. Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc.
  2. Find the process that led to Atlas X Graphic, excluding everything prior to the averaging of images with softmean.
  3. Find the Stage 3, 4 and 5 details of the process that led to Atlas X Graphic.
  4. Find all invocations of procedure align_warp using a twelfth order nonlinear 1365 parameter model (see model menu describing possible values of parameter "-m 12" of align_warp) that ran on a Monday.
  5. Find all Atlas Graphic images outputted from workflows where at least one of the input Anatomy Headers had an entry global maximum=4095. The contents of a header file can be extracted as text using the scanheader AIR utility.
  6. Find all output averaged images of softmean (average) procedures, where the warped images taken as input were align_warped using a twelfth order nonlinear 1365 parameter model, i.e. "where softmean was preceded in the workflow, directly or indirectly, by an align_warp procedure with argument -m 12."
  7. A user has run the workflow twice, in the second instance replacing each procedures (convert) in the final stage with two procedures: pgmtoppm, then pnmtojpeg. Find the differences between the two workflow runs. The exact level of detail in the difference that is detected by a system is up to each participant.
  8. A user has annotated some anatomy images with a key-value pair center=UChicago. Find the outputs of align_warp where the inputs are annotated with center=UChicago.
  9. A user has annotated some atlas graphics with key-value pair where the key is studyModality. Find all the graphical atlas sets that have metadata annotation studyModality with values speech, visual or audio, and return all other annotations to these files.

Participant Instructions

We here give the specific steps that we expect each participating team to perform in completing the challenge.

Sample Workflow Implementations

As it may be useful to some, we provide sample implementations of the workflow here. This should not preclude the use of any other technology. The implementations assume that the executables referenced above are all installed; they are provided by the two packages AIR (automated image registration) suite and ImageMagick.

Minor caution - this is a DOS text file, and if run on Unix the extra carriage returns at the ends of lines make their way into the filenames and cause everything to break. Strip the CRs with tr before running...


-- SimonMiles - 21 Aug 2006
to top

End of topic
Skip to action links | Back to top

I Attachment sort Action Size Date Who Comment
BrainAtlas.png manage 5.1 K 16 May 2006 - 15:02 SimonMiles Brain Atlas workflow (original vdt display)
BrainAtlas.pdf manage 118.8 K 30 May 2006 - 16:40 SimonMiles Brain Atlas workflow (hi-res) manage 0.8 K 31 May 2006 - 15:07 SimonMiles Shell script version of workflow
BrainAtlas.gif manage 40.1 K 06 Jun 2006 - 17:39 LucMoreau  

You are here: Challenge > FirstProvenanceChallenge

to top

Copyright © 1999-2012 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.