The PROV-JSONLD Serialization

Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about the data or thing's quality, reliability or trustworthiness. PROV-DM is the conceptual data model that forms a basis for the W3C provenance (PROV) family of specifications. This document specifies PROV-JSONLD, a serialization of PROV in JSON, which exploits JSON-LD to define a semantic mapping so it can also be processed as Linked Data. Overall, PROV-JSONLD is designed to be suitable for interchanging provenance in Web and Linked Data applications, to offer a natural encoding of provenance for its targeted audience, and to allow for fast processing.

The following files are available:

The JSON Schema for PROV-JSONLD schema.json
The JSONLD Context for PROV-JSONLD context.jsonld
OWL Ontology Extension to support PROV-JSONLD provext.ttl

Introduction

Since their release in 2013, the PROV Recommendations [[?PROV-OVERVIEW]] by the World Wide Web Consortium (W3C) have been adopted by flagship deployments such as the Global Change Information System, the Gazette in the UK, and other Linked Data sets. PROV, which is used as the data model to describe the provenance of data, is made available in several different representations: PROV-N [[PROV-N]], PROV-XML [[PROV-XML]], or in an RDF serialization using the PROV Ontology [[PROV-O]]. The latter is most suitable for Linked Data [[LINKED-DATA]], given that it can readily be consumed by existing Semantic Web tools and comes with the semantic grounding provided by PROV-O [[PROV-O]].

Subsequently, the PROV-JSON [[?PROV-JSON]] serialization has gained traction, despite simply being a member submission, and not having gone through the various stages of a standardization activity. We conjecture that the primary reason for this is that many web applications are built to be light-weight, working mainly with simple data formats such as JSON [[RFC8259]].

The very existence of all these serializations is a testament to the approach to standardization taken by the Provenance Working Group, by which a conceptual data model for PROV was defined, the PROV data model [[PROV-DM]], alongside its mapping to different technologies, to suit users and developers. However, the family of PROV specifications lacks a serialization capable of simultaneously addressing all of the following requirements.

[Lightweight] A serialization MUST support lightweight Web applications.
[Natural] A serialization MUST look natural to its targeted community of users.
[Semantic] A serialization MUST allow for semantic markup and integration with linked data applications.
[Efficient] A serialization MUST be efficiently processable.

In our view, none of the existing PROV serializations supports all these requirements simultaneously. While PROV-JSON is the only serialization to support lightweight web applications, it does not have any semantic markup, its internal structure does not exhibit the natural structure of the PROV data structures, and its grouping of expressions per categories (e.g. all entities, all activities, ...) is not conducive to incremental processing. The RDF serialization compatible with PROV-O has been architected to be natural to the Semantic Web community: all influence relations have been given the same directionality, consistently aligned with their time ordering, but the decomposition of data structures (essentially n-ary relations) into individual triples, which can occur anywhere in the serialization, is not conducive to efficient parsing. It is reasonable to say that the world has moved on from XML, while the PROV-N notation was aimed at humans rather than efficient processing.

JSON-LD [[JSON-LD]] allows a semantic structure to be overlaid over a JSON structure [[RFC8259]], thereby enabling the conversion of JSON serializations into linked data. This was exploited in an early version of this work [[?IPAW-POSTER]], which applied the JSON-LD approach to a JSON serialization of PROV. The solution did not lead to a natural encoding of the PROV data structure because a property occurring in different types of JSON objects had to be named differently so that it could be mapped to the appropriate RDF property; we see here that what is natural in JSON is not necessarily natural in RDF, and vice-versa. The ability to define contextual mappings was introduced in JSON-LD 1.1 [[JSON-LD11]] and is a key enabler of this specification, allowing for the same natural PROV property names to be used in different contexts while still maintaining their correct mappings to the appropriate RDF properties.

Thus, this specification proposes PROV-JSONLD, a PROV serialization compatible with [[PROV-DM]] that addresses all of our 4 key requirements. It is, first and foremost, a JSON structure supporting lightweight Web applications. It is structured so that each PROV expression is encoded as a self-contained JSON object and, therefore, is natural to JavaScript programmers. Exploiting JSON-LD 1.1, we defined contextual semantic mappings, allowing PROV-JSONLD to be seen as linked data. And finally, PROV-JSONLD allows for efficient processing since each JSON object can be readily mapped to a data structure without requiring unbounded lookaheads or search within the data structure.

In the rest of this document, we illustrate PROV-JSONLD, we characterize its structure using a JSON Schema [[JSON-SCHEMA]], we define its semantic mappings using JSON-LD 1.1, and we outline the interoperability testing we put in place to check its compatibility with the PROV data model.

Namespace

The following namespaces prefixes are used throughout this document.

Table 1: Prefix and Namespaces used in this specification

prefix	namespace IRI	definition
prov	http://www.w3.org/ns/prov#	The PROV namespace [[PROV-DM]]
provext	https://openprovenance.org/ns/provext#	Extension namespace for PROV used in this specification
xsd	http://www.w3.org/2000/10/XMLSchema#	XML Schema Namespace [[XMLSCHEMA11-2]]]
rdf	http://www.w3.org/1999/02/22-rdf-syntax-ns#	The RDF namespace [[RDF-CONCEPTS]]
(others)	(various)	All other namespace prefixes are used in examples only. In particular, IRIs starting with "http://example.com" represent some application-dependent IRI [[RFC3987]]

Schema

This section provides an overview of the JSON schema [[JSON-SCHEMA]] for PROV-JSONLD; its full details are in Appendix A.

For each object property identified in the JSON scheme, we provide the corresponding normative attribute definition in [[PROV-DM]].

Preliminary Definitions

Some primitive types occur in PROV serializations, namely DateTime and QualifiedName. We define their schemas as follows.

The production rules for qualified names are more complex than the simple regular expression outlined here. A post-processor will need to check that qualified names comply with the definition in [[PROV-N]].

Typed values (typed_value) are JSON objects with properties @value and @type. String values are JSON objects with properties @value and @language.

We also define general types for property values, which can be arrays of values ArrayOfValues or arrays of labels ArrayOfLabelValues.

With these preliminary definitions in place, we can now present the specification of PROV-JSONLD's core data structures.

prov:Entity

In the Schema for prov:Entity, an entity MUST contain an identifier (property @id) and a property @type with value Entity. It MAY contain further type information (property type, see PROV-DM prov:type), a location (property location, see PROV-DM prov:location), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM entity attributes). (The presence of a colon ":" in the patternProperties element forces all other properties to have the structure of a prefix, a colon, and a local name.)

Schema for prov:Entity

prov:Activity

In the Schema for prov:Activity, an activity MUST contain an identifier (property @id) and a property @type with value Activity. It MAY contain a start time (property startTime, see PROV-DM startTime), an end time (property endTime, see PROV-DM endTime), further type information (property type, see PROV-DM prov:type), a location (property location, see PROV-DM prov:location), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM activity attributes).

Schema for prov:Activity

prov:Agent

In the Schema for prov:Agent, an agent MUST contain an identifier (property @id) and a property @type with value Agent. It MAY contain further type information (property type, see PROV-DM prov:type), a location (property location, see PROV-DM prov:location), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM agent attributes).

Schema for prov:Agent

prov:Derivation

In the Schema for prov:Derivation, a derivation MUST contain a property @type with value Derivation. It SHOULD contain a generated entity (property generatedEntity, see PROV-DM generatedEntity) and used entity (property usedEntity, see PROV-DM usedEntity). It MAY contain an identifier (property @id), an activity (property activity, see PROV-DM activity), a generation (property generation, see PROV-DM generation), a usage (property usage, see PROV-DM usage), further type information (property type, see PROV-DM prov:type), a label (property label), or other properties with an explicit prefix (see PROV-DM derivation attributes).

Schema for prov:Derivation

prov:Attribution

In the Schema for prov:Attribution, attribution MUST contain a property @type with value Attribution. It SHOULD contain the entity that is the subject of the attribution (property entity, see PROV-DM entity) and the associated agent (property agent, see PROV-DM agent). It MAY contain an identifier (property @id), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties witn an explicit prefix (see PROV-DM attribution attributes).

Schema for prov:Attribution

prov:Association

In the Schema for prov:Association, an association MUST contain a property @type with value Association. It SHOULD contain an activity (property activity, see PROV-DM activity) and its associated agent (property agent, see PROV-DM agent). It MAY contain an identifier (property @id), a plan (property plan, see PROV-DM plan), a location (property location, see PROV-DM prov:location), a role (property role, see PROV-DM prov:role), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM association attributes).

Schema for prov:Association

prov:Delegation

In the Schema for prov:Delegation, a delegation MUST contain a property @type with value Delegation. It SHOULD contain a delegate agent (property delegate, see PROV-DM delegate) and a responsible agent (property responsible, see PROV-DM responsible). It MAY contain an identifier (property @id), an activity (property activity, see PROV-DM activity), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM delegation attributes).

Schema for prov:Delegation

prov:Usage

In the Schema for prov:Usage, a usage MUST contain a property @type with value Usage. It SHOULD contain an activity (property activity, see PROV-DM activity) and an entity (property entity, see PROV-DM entity). It MAY contain an identifier (property @id), a time (property time, see PROV-DM time), a location (property location, see PROV-DM prov:location), a role (property role, see PROV-DM prov:role), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM usage attributes).

Schema for prov:Usage

prov:Generation

In the Schema for prov:Generation, a generation MUST contain a property @type with value Generation. It SHOULD contain an entity (property entity, see PROV-DM entity) and an activity (property activity, see PROV-DM activity). It MAY contain an identifier (property @id), a time (property time, see PROV-DM time), a location (property location, see PROV-DM prov:location), a role (property role, see PROV-DM prov:role), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM generation attributes).

Schema for prov:Generation

prov:Invalidation

In the Schema for prov:Invalidation, an invalidation MUST contain a property @type with value Invalidation. It SHOULD contain an entity (property entity, see PROV-DM entity) and an activity (property activity, see PROV-DM activity). It MAY contain an identifier (property @id), a time (property time, see PROV-DM time), a location (property location, see PROV-DM prov:location), a role (property role, see PROV-DM prov:role), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM invalidation attributes).

Schema for prov:Invalidation

prov:Start

In the Schema for prov:Start, a start MUST contain a property @type with value Start. It SHOULD contain an activity that was started (property activity, see PROV-DM activity); it MAY contain a starter activity (property starter, see PROV-DM starter) and a triggering entity (property trigger, see PROV-DM trigger). It MAY also contain an identifier (property @id), a time (property time, see PROV-DM time), a location (property location, see PROV-DM prov:location), a role (property role, see PROV-DM prov:role), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM start attributes).

Schema for prov:Start

prov:End

In the Schema for prov:End, an end MUST contain a property @type with value End. It SHOULD contain an activity that was ended (property activity, see PROV-DM activity); it MAY contain an ender activity (property ender, see PROV-DM ender) and a triggering entity (property trigger, see PROV-DM trigger). It MAY also contain an identifier (property @id), a time (property time, see PROV-DM time), a location (property location, see PROV-DM prov:location), a role (property role, see PROV-DM prov:role), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM end attributes).

Schema for prov:End

prov:Communication

In the Schema for prov:Communication, a communication MUST contain a property @type with value Communication. It SHOULD contain an informed activity (property informed, see PROV-DM informed) and an informant activity (property informant, see PROV-DM informant). It MAY contain an identifier (property @id), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM communication attributes).

Schema for prov:Communication

prov:Influence

In the Schema for prov:Influence, an influence MUST contain a property @type with value Influence. It SHOULD contain an influencee (property influencee, see PROV-DM influencee) and an influencer (property influencer, see PROV-DM influencer). It MAY contain an identifier (property @id), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM influence attributes).

Schema for prov:Influence

prov:Specialization

In the Schema for prov:Specialization, a specialization MUST contain a property @type with value Specialization. It SHOULD contain a specific entity (property specificEntity, see PROV-DM specificEntity) and a general entity (property generalEntity, see PROV-DM generalEntity). It MAY contain an identifier (property @id), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix. (There is no equivalent for those properties in PROV-DM, see section Interoperability.)

Schema for prov:Specialization

prov:Alternate

In the Schema for prov:Alternate, an alternate MUST contain a property @type with value Alternate. It SHOULD contain a first alternate (property alternate1, see PROV-DM alternate1) and a second alternate (property alternate2, see PROV-DM alternate2). It MAY contain an identifier (property @id), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix. (There is no equivalent for those properties in PROV-DM, see section Interoperability.)

Schema for prov:Alternate

prov:Membership

In the Schema for prov:Membership, a membership MUST contain a property @type with value Membership. It SHOULD contain a collection (property collection, see PROV-DM collection) and a single entity or an array of them (property entity, see PROV-DM entity). It MAY contain an identifier (property @id), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix. (There is no equivalent for those properties in PROV-DM, see section Interoperability.)

Schema for prov:Membership

prov:Bundle and prov:Document

In the Schemas for prov:Bundle and prov:Document, a bundle and a document MUST contain a property @type with value Bundle and Document respectively, a context @context, and set of PROV expressions @graph. The names of the properties @context and @graph are specified by JSON-LD [[JSON-LD11]]. In addition, a bundle must contain an identifier (property @id).

Schemas for prov:Bundle and prov:Document

Bundles contain statements (definition prov:Statement), whereas documents contain statements or bundles (definition prov:StatementOrBundle).

Finally, contexts are defined as follows. They take the shape of an array, containing either mappings of prefixes to URIs or URIs to further JSON-LD contexts.

JSON-LD Context

In this section, we provide a description of the JSON-LD context to map the PROV-JSON structures to linked data. Full details of the context can be found in Appendix B.

Introduction: Qualification Pattern

The Ontology PROV-O [[PROV-O]] defines the Qualification Pattern, which restates a binary property between two resources (referred to as an unqualified influence relation) by using an intermediate class that represents the influence between two resources. This new instance, in turn, can be annotated with additional descriptions of the influence that one resource had upon another. The following figure, borrowed from [[PROV-O]], summarises the PROV relations, and how they are encoded in RDF using the Qualification Pattern. Note that the figure does not include the Qualification Pattern for Influence; in addition, PROV-O does not define the Qualification Pattern for specialization, alternate and membership.

Figure 2 (taken from [[PROV-O]]): Illustration of the properties and classes to use (in blue) to qualify the binary influence relations (dotted black). The diagram depict entities as ovals, activities as rectangles, and agents as pentagons. The Qualified Resource is represented as a left-pointy shape: in PROV-JSONLD, a Qualified Resource is represented as a JSON Object.

Default and generic Context Elements

The following JSON properties have a default meaning, unless they are redefined in a specific context of a PROV-JSONLD document: entity, activity and agent respectively map to PROV-O object properties prov:entity, prov:activity and prov:agent.

The following JSON properties have the same meaning in all contexts of a PROV-JSONLD document: role, type, label and location respectively map to the RDF properties prov:hadRole, rdf:type, rdfs:label, and prov:atLocation.

Entity

In the mapping Context for Entity, the JSON property value maps to PROV-O prov:value.

Context for Entity

Activity

In the mapping Context for Activity, the JSON properties startTime and endTime map to the RDF data properties prov:startedAtTime and prov:endedAtType, respectively, and have a range of type xsd:dateTime.

Context for Activity

Agent

The mapping Context for Agent does not define further properties.

Context for Agent

Derivation

The mapping Context for Derivation supports the Qualification Pattern of Figure 2, g. Each of the JSON properties generatedEntity, usedEntity, activity, generation, and usage maps to an object property: namely, prov:qualifiedDerivation, prov:entity, prov:hadActivity, prov:hadGeneration, and prov:hadUsage, respectively.

Context for Derivation

Attribution

The mapping Context for Attribution supports the Qualification Pattern of Figure 2, i. The JSON property entity maps to the object property prov:qualifiedAttribution.

Context for Attribution

Association

The mapping Context for Association supports the Qualification Pattern of Figure 2, j. The JSON properties activity and plan map to the object properties prov:qualifiedAssociation and prov:hadPlan, respectively.

Context for Association

Delegation

The mapping Context for Delegation supports the Qualification Pattern of Figure 2, h. The JSON properties responsible, delegate and activity map to the object properties prov:agent, prov:qualifiedDelegation and prov:hadActivity, respectively.

Context for Delegation

Usage

The mapping Context for Usage supports the Qualification Pattern of Figure 2, a. The JSON properties activity and time map to the object property prov:qualifiedUsage and the data property prov:atTime, respectively. The range of the latter is xsd:dateTime.

Context for Usage

Generation

The mapping Context for Generation supports the Qualification Pattern of Figure 2, b. The JSON properties entity and time map to the object property prov:qualifiedGeneration and the data property prov:atTime, respectively. The range of the latter is xsd:dateTime.

Context for Generation

Invalidation

The mapping Context for Invalidation supports the Qualification Pattern of Figure 2, c. The JSON properties entity and time map to the object property prov:qualifiedInvalidation and the data property prov:atTime, respectively. The range of the latter is xsd:dateTime.

Context for Invalidation

Start

The mapping Context for Start supports the Qualification Pattern of Figure 2, e. The JSON properties activity, trigger, starter, and time map to the object properties prov:qualifiedStart, prov:entity, prov:hadActivity, and the data property prov:atTime, respectively. The range of the latter is xsd:dateTime.

Context for Start

End

The mapping Context for End supports the Qualification Pattern of Figure 2, f. The JSON properties activity, trigger, ender, and time map to the object properties prov:qualifiedEnd, prov:entity, prov:hadActivity, and the data property prov:atTime, respectively. The range of the latter is xsd:dateTime.

Context for End

Communication

The mapping Context for Communication supports the Qualification Pattern of Figure 2, d. The JSON properties informed and informant map to the object properties prov:qualifiedCommunication and prov:activity, respectively.

Context for Communication

Influence

In the mapping Context for Influence, the JSON properties influencee and influencer map to the object properties prov:qualifiedInfluence and prov:influencer, respectively.

Context for Influence

Specialization

While [[PROV-O]] does not define a Qualification Pattern for Specialization, for uniformity and usability reasons, we adopt a similar mapping as other PROV relations, via a Qualification Pattern (see Definition Context for Specialization). However, the mapping is to new classes and properties in the PROV extension namespace (denoted by the prefix provext). See Section 6 for interoperability considerations.

The JSON properties specificEntity and generalEntity map to the object properties provext:qualifiedSpecialization and provext:generalEntity, respectively.

Context for Specialization

Alternate

While [[PROV-O]] does not define a Qualification Pattern for Alternate, for uniformity and usability reasons, we adopt a similar mapping as other PROV relations, via a Qualification Pattern (see Definition Context for Alternate). However, the mapping is to new classes and properties in the PROV extension namespace (denoted by the prefix provext). See Section 6 for interoperability considerations.

The JSON properties alternate1 and alternate2 map to the object properties provext:qualifiedAlternate and provext:alternate, respectively.

Context for Alternate

Membership

While [[PROV-O]] does not define a Qualification Pattern for Membership, for uniformity and usability reasons, we adopt a similar mapping as other PROV relations, via a Qualification Pattern (see Definition Context for Membership). However, the mapping is to new classes and properties in the PROV extension namespace (denoted by the prefix provext). See Section 6 for interoperability considerations.

The JSON properties collection and entity map to the object properties provext:qualifiedMembership and provext:collection, respectively.

Context for Membership

Introduction

Namespace

Example

Schema

Preliminary Definitions

prov:Entity

prov:Activity

prov:Agent

prov:Derivation

prov:Attribution

prov:Association

prov:Delegation

prov:Usage

prov:Generation

prov:Invalidation

prov:Start

prov:End

prov:Communication

prov:Influence

prov:Specialization

prov:Alternate

prov:Membership

prov:Bundle and prov:Document

JSON-LD Context

Introduction: Qualification Pattern

Default and generic Context Elements

Entity

Activity

Agent

Derivation

Attribution

Association

Delegation

Usage

Generation

Invalidation

Start

End

Communication

Influence

Specialization

Alternate

Membership

Interoperability Considerations

JSON Schema for PROV-JSONLD

JSON-LD Context

PROVEXT Ontology

Acknowledgements