Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about the data or thing's quality, reliability or trustworthiness. PROV-DM is the conceptual data model that forms a basis for the W3C provenance (PROV) family of specifications. This document specifies PROV-JSONLD, a serialization of PROV in JSON, which exploits JSON-LD to define a semantic mapping, so that it can also be processed as Linked Data. Overall, PROV-JSONLD is designed to be suitable for interchanging provenance in Web and Linked Data applications, to offer a natural encoding of provenance for its targeted audience, and to allow for fast processing.

This is required for specifications that contain normative material.

Introduction

Since their release in 2013, the PROV Recommendations [[?PROV-OVERVIEW]] by the World Wide Web Consortium (W3C) have started being adopted by flagship deployments such as the Global Change Information System, the Gazette in the UK, and other Linked Data sets. PROV, which is used as the data model to describe the provenance of data, is made available in several different representations: PROV-N [[!PROV-N]], PROV-XML [[PROV-XML]], or in an RDF serialization using the PROV Ontology [[!PROV-O]]. The latter, arguably, is most suitable for Linked Data [[LINKED-DATA]], given that it can readily be consumed by existing Semantic Web tools and comes with the semantic grounding provided by PROV-O [[!PROV-O]].

Surprisingly, the PROV-JSON [[?PROV-JSON]] serialization has gained traction, despite simply being a member submission, and not having gone through the various stages of a standardization activity. The primary reason for this, we conjecture, is that many web applications are built to be light-weight, working mainly with simple data formats such as JSON [[RFC8259]].

The very existence of all these serializations is a testament to the approach to standardization taken by the Provenance Working Group, by which a conceptual data model for PROV was defined, the PROV data model [[!PROV-DM]], alongside its mapping to different technologies, to suit users and developers. However, the family of PROV specifications lacks a serialization that is capable of addressing simultaneously all of the following requirements.

  1. [Lightweight] A serialization MUST support lightweight Web applications.
  2. [Natural] A serialization MUST look natural to its targeted community of users.
  3. [Semantic] A serialization MUST allow for semantic markup and integration with linked data applications.
  4. [Efficient] A serialization MUST be efficiently processable.

Surprisingly, none of the existing PROV serializations supports all these requirements simultaneously. While PROV-JSON is the only serialization to support lightweight web applications, it does not have any semantic markup, its internal structure does not exhibit the natural structure of the PROV data structures, and its grouping of expressions per categories (e.g. all entities, all activities, ...) is not conducive to incremental processing. The RDF serialization compatible with PROV-O has been architected to be natural to the Semantic Web community: all influence relations have been given the same directionality with respect to their time ordering, but the decomposition of data structures (essentially n-ary relations) into individual triples, which can occur anywhere in the serialization, is not conducive to efficient parsing. It is reasonable to say that the world has moved on from XML, while the PROV-N notation was aimed at humans rather than efficient processing.

JSON-LD [[!JSON-LD]] allows a semantic structure to be overlaid over a JSON structure [[RFC8259]], thereby enabling the conversion of JSON serializations into linked data. This was exploited in an early version of this work [[?IPAW-POSTER]], which applied the JSON-LD approach to a JSON serialization of PROV. The solution did not lead to a natural encoding of the PROV data structure, because a property occurring in different types of JSON objects had to be named differently, so that it could be mapped to the appropriate RDF property; we see here that what is natural in JSON is not necessarily natural in RDF, and vice-versa. The ability to define contextual mappings was introduced in JSON-LD 1.1 [[!JSON-LD11]] and is a key enabler of this work, allowing for the same natural PROV property names to be used in different contexts while still maintaining their correct mappings to the appropriate RDF properties.

Thus, this specification proposes PROV-JSONLD, a serialization of PROV that is compatible with [[!PROV-DM]] and that addresses all of our 4 key requirements. It is first and foremost a JSON structure so it supports lightweight Web applications. It is structured in such a way that each PROV expression is encoded as a self-contained JSON object, and therefore, is natural to JavaScript programmers. Exploiting JSON-LD 1.1, we defined contextual semantic mappings, allowing PROV-JSONLD to be seen as linked data. And finally, PROV-JSONLD allows for efficient processing, since each JSON object can be readily mapped to a data structure, without requiring unbounded lookaheads, or search within the data structure.

In the rest of this specification, we provide an illustration of PROV-JSONLD, we then define its structure by means of a JSON Schema [[!JSON-SCHEMA]], we define its semantic mappings using JSON-LD 1.1, and we outline the interoperability testing we put in place to check its compatibility with the PROV data model.

Namespace

The following namespaces prefixes are used throughout this document.

Table 1: Prefix and Namespaces used in this specification
prefixnamespace IRI definition
provhttp://www.w3.org/ns/prov#The PROV namespace [[PROV-DM]]
provexthttps://openprovenance.org/ns/provext#Extension namespace for PROV used in this specification
xsdhttp://www.w3.org/2000/10/XMLSchema#XML Schema Namespace [[XMLSCHEMA11-2]]]
rdfhttp://www.w3.org/1999/02/22-rdf-syntax-ns#The RDF namespace [[RDF-CONCEPTS]]
(others)(various)All other namespace prefixes are used in examples only.
In particular, IRIs starting with "http://example.com" represent some application-dependent IRI [[RFC3987]]

Example

We assume the reader to be familiar with PROV, JSON, and JSON-LD.

To illustrate the PROV-JSONLD serialization, we consider a subset of the example of [[PROV-PRIMER]], depicted below. It can be paraphrased as follows: agent Derek was responsible for composing an article based on an existing dataset.

title http://example/compose compose http://example/dataSet1 dataSet1 http://example/compose->http://example/dataSet1 use http://example/derek derek http://example/compose->http://example/derek assoc http://example/article1 article1 http://example/article1->http://example/compose gen http://example/article1->http://example/dataSet1 der -attrs0 title: Crime rises in cities@EN -attrs0->http://example/article1 -attrs1 type: prov:Person mbox: <mailto:derek@example.org> givenName: Derek -attrs1->http://example/derek
Figure 1: Provenance expressing that Derek was responsible for composing an article based on a data set.

The PROV-JSONLD representation of this example can be seen in Example 1. At the top-level, a PROV-JSONLD document is a JSON object with two properties @context and @graph, as per JSON-LD. A context contains mappings of prefixes to namespaces, and also an explicit reference to https://openprovenance.org/prov-jsonld/context.json — the JSON-LD 1.1 context defining the semantic mapping for PROV-JSONLD. (This context is fully described in section 5.) The @graph property has an array of PROV expressions as value. Each PROV expression is itself a JSON object with at least a @type property (for instance, prov:Entity, prov:Agent or prov:Derivation). Each of these PROV expressions provides a description for a resource, some of which are identified by the @id property (for instance, ex:article1 or ex:derek). Some of the resources are anonymous and therefore do not have a property @id, for instance, the prov:Derivation between the dataset and the article.

PROV expressions can be enriched with a variety of properties. Some of which are "reserved" such as activity and agent in a prov:Association. Others may be defined in a different namespace such as foaf:givenName, for which we expect the prefix foaf to be declared in the @context property. Finally, further PROV attributes are allowed, for instance prov:type with an array of further types, to better describe the resource.

The property @type is mandatory and is associated with a single value, expected to be one of the predefined PROV expressions. From an efficiency viewpoint, this property is critical in determining which internal data structure a PROV expression should map to, and therefore, facilitates efficient processing. On the contrary, prov:type is optional and can contain as many types as required; their order is not significant.

      

Schema

In this section, we provide an overview of the JSON schema [[!JSON-SCHEMA]] for PROV-JSONLD; its full details can be found in Appendix A.

Preliminary Definitions

Some primitive types occur in PROV serializations, namely DateTime and QualifiedName. We define their schemas as follows.


	  
	  
	
The production rules for qualified names are more complex than the simple regular expression outlined here. A post-processor will need to check that qualified names comply with the definition in [[PROV-N]].

Typed values (typed_value) are JSON objects with properties @value and @type. String values are JSON objects with properties @value and @language.


	  
	  
	

We also define general types for property values, which can be arrays of values ArrayOfValues or arrays of labels ArrayOfLabelValues.


	

With these preliminary definitions in place, we can now present the specification of the core data structures of PROV-JSONLD.

prov:Entity

An entity MUST contain an identifier (property @id) and a property @type with value prov:Entity. It MAY contain further type information (property prov:type), a location (property prov:location), a label (property prov:label), or other properties with an explicit prefix. (The presence of a colon ":" in the patternProperties element forces all other properties to have the structure of a prefix, a colon, and a local name.)

Schema for prov:Entity

	

prov:Activity

An activity MUST contain an identifier (property @id) and a property @type with value prov:Activity. It MAY contain a start time (property startTime), an end time (property endTime), further type information (property prov:type), a location (property prov:location), a label (property prov:label), or other properties with an explicit prefix.

Schema for prov:Activity

	

prov:Agent

An agent MUST contain an identifier (property @id) and a property @type with value prov:Agent. It MAY contain further type information (property prov:type), a location (property prov:location), a label (property prov:label), or other properties with an explicit prefix.

Schema for prov:Agent

	

prov:Derivation

A derivation MUST contain a property @type with value prov:Derivation. It SHOULD contain a generated entity (property generatedEntity) and used entity (property usedEntity). It MAY contain an identifier (property @id), an activity (property activity), a generation (property generation), a usage (property usage), further type information (property prov:type), a label (property prov:label), or other properties with an explicit prefix.

Schema for prov:Derivation

	

prov:Attribution

An attribution MUST contain a property @type with value prov:Attribution. It SHOULD contain the entity that is the subject of the attribution (property entity) and the associated agent (property agent). It MAY contain an identifier (property @id), further type information (property prov:type), a label (property prov:label), or other properties witn an explicit prefix.

Schema for prov:Attribution

	

prov:Association

An association MUST contain a property @type with value prov:Association. It SHOULD contain an activity (property activity) and its associated agent (property agent). It MAY contain an identifier (property @id), a plan (property plan), a location (property location), a role (property role), further type information (property prov:type), a label (property prov:label), or other properties with an explicit prefix.

Schema for prov:Association

	

prov:Delegation

A delegation MUST contain a property @type with value prov:Delegation. It SHOULD contain a delegate agent (property delegate) and a responsible agent (property responsible). It MAY contain an identifier (property @id), an activity (property activity), further type information (property prov:type), a label (property prov:label), or other properties with an explicit prefix.

Schema for prov:Delegation

	

prov:Usage

A usage MUST contain a property @type with value prov:Usage. It SHOULD contain an activity (property activity) and an entity (property entity). It MAY contain an identifier (property @id), a time (property time), a location (property location), a role (property role), further type information (property prov:type), a label (property prov:label), or other properties with an explicit prefix.

Schema for prov:Usage

	

prov:Generation

A generation MUST contain a property @type with value prov:Generation. It SHOULD contain an entity (property entity) and an activity (property activity). It MAY contain an identifier (property @id), a time (property time), a location (property location), a role (property role), further type information (property prov:type), a label (property prov:label), or other properties with an explicit prefix.

Schema for prov:Generation

	

prov:Invalidation

An invalidation MUST contain a property @type with value prov:Invalidation. It SHOULD contain an entity (property entity) and an activity (property activity). It MAY contain an identifier (property @id), a time (property time), a location (property location), a role (property role), further type information (property prov:type), a label (property prov:label), or other properties with an explicit prefix.

Schema for prov:Invalidation

	

prov:Start

A start MUST contain a property @type with value prov:Start. It SHOULD contain an activity that was started (property activity); it MAY contain a starter activity (property starter) and a triggering entity (property trigger). It MAY also contain an identifier (property @id), a time (property time), a location (property location), a role (property role), further type information (property prov:type), a label (property prov:label), or other properties with an explicit prefix.

Schema for prov:Start

	

prov:End

An end MUST contain a property @type with value prov:End. It SHOULD contain an activity that was ended (property activity); it MAY contain an ender activity (property ender) and a triggering entity (property trigger). It MAY also contain an identifier (property @id), a time (property time), a location (property location), a role (property role), further type information (property prov:type), a label (property prov:label), or other properties with an explicit prefix.

Schema for prov:End

	

prov:Communication

A communication MUST contain a property @type with value prov:Communication. It SHOULD contain an informed activity (property informed) and an informant activity (property informant). It MAY contain an identifier (property @id), further type information (property prov:type), a label (property prov:label), or other properties with an explicit prefix.

Schema for prov:Communication

	

prov:Influence

An influence MUST contain a property @type with value prov:Influence. It SHOULD contain an influencee (property influencee) and an influencer (property influencer). It MAY contain an identifier (property @id), further type information (property prov:type), a label (property prov:label), or other properties with an explicit prefix.

Schema for prov:Influence

	

prov:Specialization

A specialization MUST contain a property @type with value prov:Specialization. It SHOULD contain a specific entity (property specificEntity) and a general entity (property generalEntity). It MAY contain an identifier (property @id), further type information (property prov:type), a label (property prov:label), or other properties with an explicit prefix.

Schema for prov:Specialization

	

prov:Alternate

An alternate MUST contain a property @type with value prov:Alternate. It SHOULD contain a first alternate (property alternate1) and a second alternate (property alternate2). It MAY contain an identifier (property @id), further type information (property prov:type), a label (property prov:label), or other properties with an explicit prefix.

Schema for prov:Alternate

	

prov:Membership

A membership MUST contain a property @type with value prov:Membership. It SHOULD contain a collection (property collection) and a single entity or an array of them (property entity). It MAY contain an identifier (property @id), further type information (property prov:type), a label (property prov:label), or other properties with an explicit prefix.

Schema for prov:Membership

	  
	  
	

prov:Bundle and prov:Document

A bundle and a document MUST contain a property @type with value prov:Bundle and prov:Document respectively, a context @context, and set of PROV expressions @graph. The names of the properties @context and @graph are specified by JSON-LD [[!JSON-LD11]]. In addition, a bundle must contain an identifier (property @id).

Schemas for prov:Bundle and prov:Document

	  

	

Bundles contain statements (definition prov:Statement), whereas documents contain statements or bundles (definition prov:StatementOrBundle).


	  

	

Finally, contexts are defined as follows. They take the shape of an array, containing either mappings of prefixes to URIs or URIs to further JSON-LD contexts.


	

JSON-LD Context

In this section, we provide a description of the JSON-LD context to map the PROV-JSON structures to linked data. Full details of the context can be found in Appendix B.

Introduction: Qualification Pattern

The Ontology PROV-O [[PROV-O]] defines the Qualification Pattern, which restates a binary property between two resources (referred to as an unqualified influence relation) by using an intermediate class that represents the influence between two resources. This new instance, in turn, can be annotated with additional descriptions of the influence that one resource had upon another. The following figure, borrowed from [[PROV-O]], summarises the PROV relations, and how they are encoded in RDF using the Qualification Pattern. Note that the figure does not include the Qualification Pattern for Influence; in addition, PROV-O does not define the Qualification Pattern for specialization, alternate and membership.

2012-07-17 18:23ZCanvas 1Layer 1e)f)c)a)b)prov:activityprov:atTimeprov:qualifiedInvalidationxsd:dateTimeInvalidationprov:wasInvalidatedByEntityActivityprov:entityprov:atTimeprov:qualifiedUsagexsd:dateTimeprov:activityprov:atTimeprov:qualifiedGenerationxsd:dateTimeUsageprov:usedActivityEntityGenerationprov:wasGeneratedByEntityActivityd)g)h)prov:entityprov:qualifiedDerivationprov:wasDerivedFromEntityEntityprov:qualifiedCommunicationActivityActivityprov:wasInformedByprov:activityCommunicationprov:agentprov:qualifiedDelegationprov:actedOnBehalfOfDelegationi)j)prov:agentprov:qualifiedAssociationprov:wasAssociatedWithprov:agentprov:qualifiedAttributionEntityAttributionprov:wasAttributedToActivityprov:hadRoleRolePlanprov:hadPlanAssociationprov:qualifiedStartActivityprov:wasStartedByprov:entityStartEntityprov:atTimexsd:dateTimeprov:qualifiedEndActivityprov:wasEndedByprov:entityEndEntityprov:atTimexsd:dateTimeprov:hadUsageprov:hadGenerationUsageGenerationActivityDerivationprov:hadActivityAgentAgentAgentAgent
Figure 2: Illustration of the properties and classes to use (in blue) to qualify the binary influence relations (dotted black). The diagram depict entities as ovals, activities as rectangles, and agents as pentagons. The Qualified Resource is represented as a [what's that shape!!!].

Default and generic Context Elements

The following JSON properties have a default meaning, unless they are redefined in a specific context of a PROV-JSONLD document: entity, activity and agent respectively map to PROV-O object properties prov:entity, prov:activity and prov:agent.

The following JSON properties have the same meaning in all contexts of a PROV-JSONLD document: prov:role, prov:type, prov:label and prov:location respectively map to the RDF properties prov:hadRole, rdf:type, rdfs:label, and prov:atLocation.


	

prov:Entity

There is no contextual definition that is specific to entities.

prov:Activity

The JSON properties startTime and endTime map to the RDF data properties prov:startedAtTime and prov:endedAtType, respectively, and have a range of type xsd:dateTime.

Context for prov:Activity

	

prov:Agent

There is no contextual definition that is specific to agents .

prov:Derivation

The mapping below supports the Qualification Pattern of Figure 2, g. Each of the JSON properties generatedEntity, usedEntity, activity, generation, and usage maps to an object property: namely, prov:qualifiedDerivation, prov:entity, prov:hadActivity, prov:hadGeneration, and prov:hadUsage, respectively.

Context for prov:Derivation

	

prov:Attribution

The mapping below supports the Qualification Pattern of Figure 2, i. The JSON property entity maps to the object property prov:qualifiedAttribution.

Context for prov:Attribution

	

prov:Association

The mapping below supports the Qualification Pattern of Figure 2, j. The JSON properties activity and plan map to the object properties prov:qualifiedAssociation and prov:hadPlan, respectively.

Context for prov:Association

	

prov:Delegation

The mapping below supports the Qualification Pattern of Figure 2, h. The JSON properties responsible, delegate and activity map to the object properties prov:agent, prov:qualifiedDelegation and prov:hadActivity, respectively.

Context for prov:Delegation

	

prov:Usage

The mapping below supports the Qualification Pattern of Figure 2, a. The JSON properties activity and time map to the object property prov:qualifiedUsage and the data property prov:atTime, respectively. The range of the latter is xsd:dateTime.

Context for prov:Usage

	

prov:Generation

The mapping below supports the Qualification Pattern of Figure 2, b. The JSON properties entity and time map to the object property prov:qualifiedGeneration and the data property prov:atTime, respectively. The range of the latter is xsd:dateTime.

Context for prov:Generation

	

prov:Invalidation

The mapping below supports the Qualification Pattern of Figure 2, c. The JSON properties entity and time map to the object property prov:qualifiedInvalidation and the data property prov:atTime, respectively. The range of the latter is xsd:dateTime.

Context for prov:Invalidation

	

prov:Start

The mapping below supports the Qualification Pattern of Figure 2, e. The JSON properties activity, trigger, starter, and time map to the object properties prov:qualifiedStart, prov:entity, prov:hadActivity, and the data property prov:atTime, respectively. The range of the latter is xsd:dateTime.

Context for prov:Start

	

prov:End

The mapping below supports the Qualification Pattern of Figure 2, f. The JSON properties activity, trigger, ender, and time map to the object properties prov:qualifiedEnd, prov:entity, prov:hadActivity, and the data property prov:atTime, respectively. The range of the latter is xsd:dateTime.

Context for prov:End

	

prov:Communication

The mapping below supports the Qualification Pattern of Figure 2, d. The JSON properties informed and informant map to the object properties prov:qualifiedCommunication and prov:activity, respectively.

Context for prov:Communication

	

prov:Influence

The JSON properties influencee and influencer map to the object properties prov:qualifiedInfluence and prov:influencer, respectively.

Context for prov:Influence

	

prov:Specialization

While [[PROV-O]] does not define a Qualification Pattern for Specialization, for uniformity and usability reasons, we adopt a similar mapping as other PROV relations, via a Qualification Pattern. However, the mapping is to new classes and properties in the PROV extension namespace (denoted by the prefix provext). See Section 6 for interoperability considerations.

The JSON properties specificEntity and generalEntity map to the object properties provext:qualifiedSpecialization and prov:entity, respectively.

Context for prov:Specialization

	

prov:Alternate

While [[PROV-O]] does not define a Qualification Pattern for Alternate, for uniformity and usability reasons, we adopt a similar mapping as other PROV relations, via a Qualification Pattern. However, the mapping is to new classes and properties in the PROV extension namespace (denoted by the prefix provext). See Section 6 for interoperability considerations.

The JSON properties alternate1 and alternate2 map to the object properties provext:qualifiedAlternate and prov:entity, respectively.

Context for prov:Alternate

	

prov:Membership

While [[PROV-O]] does not define a Qualification Pattern for Membership, for uniformity and usability reasons, we adopt a similar mapping as other PROV relations, via a Qualification Pattern. However, the mapping is to new classes and properties in the PROV extension namespace (denoted by the prefix provext). See Section 6 for interoperability considerations.

The JSON properties collection and entity map to the object properties provext:qualifiedMembership and prov:entity, respectively.

Context for prov:Membership

	

Interoperability Considerations

IC1:
There are differences between PROV-DM and PROV-O in terms of the level of requirements set on some expressions. For instance, PROV-DM mandates the presence of an entity in a generation, whereas it defines an activity as optional. Compliance requirements are not the same in PROV-O as one could define a qualified generation with an activity but without an entity. Experience shows that there may be good reasons why a generation may not refer to an entity; for instance, because the recorded provenance is not "complete" yet, and further provenance expressions still need to be asserted, received or merged; in the meantime, we still want to be able to process such provenance, despite being "incomplete". Thus, in PROV-JSONLD, the presence of an entity and an activity in a generation expression is RECOMMENDED (we use the term SHOULD), while other properties are optional (we use the term MAY), and its @type is REQUIRED (we use the term MUST).
IC2:
In PROV-DM, all relations are n-ary except for specialization, alternate and membership, which are binary, meaning that no identifier or extra properties are allowed for these. In PROV-O, this design decision translates to the lack of qualified relations for specialization, alternate and membership. In PROV-JSONLD, in order to keep the regular structure of JSON objects and the natural encoding of relations, but also to ensure the simplicity and efficiency of parsers, these three relations are encoded using the same pattern as for other relations. Therefore, their mapping to RDF via the JSON-LD context relies on a PROV extension namespace (denoted by the prefix provext) in which classes for Specialization, Alternate and Membership are defined. The PROV-JSONLD serialization also allows for identifier and properties to be encoded for these relations.
IC3:
The notion of a PROV document is not present in PROV-DM or PROV-O, but is introduced in PROV-N as a housekeeping construct, and is defined in PROV-XML as the root of a PROV-XML document. A document in PROV-JSONLD is also a JSON object, allowing for a JSON-LD @context property to be specified.
IC4:
The PROV-JSONLD specification does not introduce constructs for some PROV subtypes and subrelations, such as prov:Person, prov:Organization, prov:SoftwareAgent, prov:Collection, or prov:Quotation, prov:PrimarySource, prov:Revision. Instead, the example of Section 3 illustrates how they can be accommodated within the existing structures. We copy below an agent expression of type prov:Person and a derivation of type prov:Revision. These subtypes and subrelations are specified inside the prov:type property. PROV-XML offers a similar way of encoding such subtypes and subrelations, alongside specialized structures. We opted for this single approach to ensure simplicity and efficiency of parsers.

	
IC5:
The interoperability of the PROV-JSONLD serialization can be tested in different ways:
  1. In a roundtrip testing, consisting of the serialization of an internal representation in some programming language to PROV-JSONLD, followed by deserialization from PROV-JSONLD back to the same programming language, the source and target representations are expected to be equal.
  2. Likewise, in a roundtrip testing, consisting of the serialization of an internal representation in some programming language to PROV-JSONLD, followed by a conversion of PROV-JSONLD to another RDF representation such as Turtle, followed by a reading of the Turtle representation back to the same programming language, the source and target representations are also expected to be equal.
  3. Both interoperability tests have been implemented in the Java-based ProvToolbox, with:
    1. The first roundtrip testing is implemented in https://github.com/lucmoreau/ProvToolbox/blob/master/modules-core/prov-jsonld/src/test/java/org/openprovenance/prov/core/RoundTripFromJavaJSONLD11Test.java
    2. The second roundtrip testing is implemented in https://github.com/lucmoreau/ProvToolbox/blob/master/modules-legacy/roundtrip/src/test/java/org/openprovenance/prov/core/roundtrip/RoundTripFromJavaJSONLD11LegacyTest.java

JSON Schema for PROV-JSONLD

      

JSON-LD Context