Third Provenance Challenge
The top level page for the third provenance challenge.
Current Status
- The Third Provenance Challenge Workshop was a success. It resulted in several proposals for changes in the OPM specification, additional profiles for OPM, a governance model, a CFP for a journal paper based on PC3 results, and thoughts on a future fourth provenance challenge.
Workshop Details
Participating Teams
Pages for each participating team can be found at the
ParticipatingTeams3 page. If you are participating, please create a link to your teams page there. You can use the Test Team page as a template for what should be included in a team page.
Sponsors
Thanks to our sponsors, the
Virtual Laboratory for e-Science and
Microsoft
Schedule
1. Review of code and provenance query proposals (to Feb 27)
March 2 - PC3 Starts
2. Make the workflow work with individual team's systems [Mar 2 - Mar 30]
3. Generate provenance for the challenge workflow & run queries on it [Mar 30 - Apr 13]
4. Export
OPM Graphs and import from others [Apr 13 - May 4]
5. Run queries on imported
OPM graph [May 4 - Jun 1]
6. Prepare slides for challenge [Jun 1 - Jun 8]
PC3 Workshop June 10 - 11 held in Amsterdam
Challenge Goals
1. identify weaknesses and strengths of the the
OPM specification
2. encourage the development of concrete bindings for
OPM in a variety of languages
3. determine how well
OPM can represent provenance for a variety of technologies (scientific workflow, databases, etc.)
4. demonstrate that a complex data products provenance can be constructed from provenance documentation produced by multiple combinations of heterogenous applications
5. bring together the community to further discuss the interoperability of provenance systems.
Provenance Questions
Please list possible provenance queries for the Challenge
here. If the query requires any additions to the workflow please detail them as well.
Provenance Challenge Workflow
The PC3 workflow and its software implementation in .Net, Java, and shell scripts can be found at the
ThirdPCWorkflow page. Below is the background of the workflow.
Background
The
Pan-STARRS project is building and operating the next generation sky survey with the ability to continuously scan the visible sky once a week and build a time series of data. This helps detect moving objects that may potentially impact with earth besides building a massive catalog of the solar system and 99% of visible stars in the northern hemisphere. The collaboration is lead by the University of Hawai'i that operates the telescope and image pipeline while Johns Hopkins University is building the object data management (ODM) framework that is exposed to astronomers. The load workflow used in PC3 appears at the handoff between the image pipeline and the ODM, and uses the Trident workbench to ingest incoming CSV files into SQL Server databases.
Acknowledgement
Jim Heasley
(University of Hawai'i)
Alex Szalay
(Johns Hopkins University)
to top