Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about the data or thing's quality, reliability or trustworthiness. PROV-DM is the conceptual data model that forms a basis for the W3C provenance (PROV) family of specifications. This document specifies PROV-JSONLD, a serialization of PROV in JSON, which exploits JSON-LD to define a semantic mapping so it can also be processed as Linked Data. Overall, PROV-JSONLD is designed to be suitable for interchanging provenance in Web and Linked Data applications, to offer a natural encoding of provenance for its targeted audience, and to allow for fast processing.
The following files are available:
A member submission by King's College London
Since their release in 2013, the PROV Recommendations [[?PROV-OVERVIEW]] by the World Wide Web Consortium (W3C) have been adopted by flagship deployments such as the Global Change Information System, the Gazette in the UK, and other Linked Data sets. PROV, which is used as the data model to describe the provenance of data, is made available in several different representations: PROV-N [[PROV-N]], PROV-XML [[PROV-XML]], or in an RDF serialization using the PROV Ontology [[PROV-O]]. The latter is most suitable for Linked Data [[LINKED-DATA]], given that it can readily be consumed by existing Semantic Web tools and comes with the semantic grounding provided by PROV-O [[PROV-O]].
Subsequently, the PROV-JSON [[?PROV-JSON]] serialization has gained traction, despite simply being a member submission, and not having gone through the various stages of a standardization activity. We conjecture that the primary reason for this is that many web applications are built to be light-weight, working mainly with simple data formats such as JSON [[RFC8259]].
The very existence of all these serializations is a testament to the approach to standardization taken by the Provenance Working Group, by which a conceptual data model for PROV was defined, the PROV data model [[PROV-DM]], alongside its mapping to different technologies, to suit users and developers. However, the family of PROV specifications lacks a serialization capable of simultaneously addressing all of the following requirements.
In our view, none of the existing PROV serializations supports all these requirements simultaneously. While PROV-JSON is the only serialization to support lightweight web applications, it does not have any semantic markup, its internal structure does not exhibit the natural structure of the PROV data structures, and its grouping of expressions per categories (e.g. all entities, all activities, ...) is not conducive to incremental processing. The RDF serialization compatible with PROV-O has been architected to be natural to the Semantic Web community: all influence relations have been given the same directionality, consistently aligned with their time ordering, but the decomposition of data structures (essentially n-ary relations) into individual triples, which can occur anywhere in the serialization, is not conducive to efficient parsing. It is reasonable to say that the world has moved on from XML, while the PROV-N notation was aimed at humans rather than efficient processing.
JSON-LD [[JSON-LD]] allows a semantic structure to be overlaid over a JSON structure [[RFC8259]], thereby enabling the conversion of JSON serializations into linked data. This was exploited in an early version of this work [[?IPAW-POSTER]], which applied the JSON-LD approach to a JSON serialization of PROV. The solution did not lead to a natural encoding of the PROV data structure because a property occurring in different types of JSON objects had to be named differently so that it could be mapped to the appropriate RDF property; we see here that what is natural in JSON is not necessarily natural in RDF, and vice-versa. The ability to define contextual mappings was introduced in JSON-LD 1.1 [[JSON-LD11]] and is a key enabler of this specification, allowing for the same natural PROV property names to be used in different contexts while still maintaining their correct mappings to the appropriate RDF properties.
Thus, this specification proposes PROV-JSONLD, a PROV serialization compatible with [[PROV-DM]] that addresses all of our 4 key requirements. It is, first and foremost, a JSON structure supporting lightweight Web applications. It is structured so that each PROV expression is encoded as a self-contained JSON object and, therefore, is natural to JavaScript programmers. Exploiting JSON-LD 1.1, we defined contextual semantic mappings, allowing PROV-JSONLD to be seen as linked data. And finally, PROV-JSONLD allows for efficient processing since each JSON object can be readily mapped to a data structure without requiring unbounded lookaheads or search within the data structure.
In the rest of this document, we illustrate PROV-JSONLD, we characterize its structure using a JSON Schema [[JSON-SCHEMA]], we define its semantic mappings using JSON-LD 1.1, and we outline the interoperability testing we put in place to check its compatibility with the PROV data model.
The following namespaces prefixes are used throughout this document.
prefix | namespace IRI | definition |
prov | http://www.w3.org/ns/prov# | The PROV namespace [[PROV-DM]] |
provext | https://openprovenance.org/ns/provext# | Extension namespace for PROV used in this specification |
xsd | http://www.w3.org/2000/10/XMLSchema# | XML Schema Namespace [[XMLSCHEMA11-2]]] |
rdf | http://www.w3.org/1999/02/22-rdf-syntax-ns# | The RDF namespace [[RDF-CONCEPTS]] |
(others) | (various) | All other namespace prefixes are used in examples only. In particular, IRIs starting with "http://example.com" represent some application-dependent IRI [[RFC3987]] |
We assume the reader to be familiar with PROV, JSON, and JSON-LD.
To illustrate the PROV-JSONLD serialization, we consider a subset of the example of [[PROV-PRIMER]], depicted below. It can be paraphrased as follows: agent Derek was responsible for composing an article based on an existing dataset.
The PROV-JSONLD representation of this example can be seen in Example 1. At the top level, a PROV-JSONLD document is a JSON object with two properties @context and @graph, as per JSON-LD. A context contains mappings of prefixes to namespaces, and also an explicit reference to https://openprovenance.org/prov-jsonld/context.json — the JSON-LD 1.1 context defining the semantic mapping for PROV-JSONLD. (This context is fully described in section 5.) The @graph property has an array of PROV expressions as value. Each PROV expression is itself a JSON object with at least a @type property (for instance, Entity, Agent or Derivation). Each of these PROV expressions provides a description of a resource. Some of these resources have an identity provided by the @id property (for instance, ex:article1 or ex:derek). Other resources are anonymous and, therefore, do not have a property @id, for instance, the Derivation between the dataset and the article.
PROV expressions can be enriched with various properties. Some properties are predefined by PROV-JSONLD such as activity and agent in a Association. Further PROV attributes are allowed, for instance type with an array of further types, to better describe the resource. Others may be defined in a different namespace such as foaf:givenName, for which we expect the prefix foaf to be declared in the @context property.
The property @type is mandatory and is associated with a single value, expected to be one of the predefined PROV expressions. From an efficiency viewpoint, this property is critical in determining which internal data structure a PROV expression should map to, and therefore, facilitates efficient processing. On the contrary, type is optional and can contain as many types as required; their order is not significant.
This section provides an overview of the JSON schema [[JSON-SCHEMA]] for PROV-JSONLD; its full details are in Appendix A.
For each object property identified in the JSON scheme, we provide the corresponding normative attribute definition in [[PROV-DM]].
Some primitive types occur in PROV serializations, namely DateTime and QualifiedName. We define their schemas as follows.
Typed values (typed_value) are JSON objects with properties @value and @type. String values are JSON objects with properties @value and @language.
We also define general types for property values, which can be arrays of values ArrayOfValues or arrays of labels ArrayOfLabelValues.
With these preliminary definitions in place, we can now present the specification of PROV-JSONLD's core data structures.
In the Schema for prov:Entity, an entity MUST contain an identifier (property @id) and a property @type with value Entity. It MAY contain further type information (property type, see PROV-DM prov:type), a location (property location, see PROV-DM prov:location), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM entity attributes). (The presence of a colon ":" in the patternProperties element forces all other properties to have the structure of a prefix, a colon, and a local name.)
Schema for prov:EntityIn the Schema for prov:Activity, an activity MUST contain an identifier (property @id) and a property @type with value Activity. It MAY contain a start time (property startTime, see PROV-DM startTime), an end time (property endTime, see PROV-DM endTime), further type information (property type, see PROV-DM prov:type), a location (property location, see PROV-DM prov:location), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM activity attributes).
Schema for prov:ActivityIn the Schema for prov:Agent, an agent MUST contain an identifier (property @id) and a property @type with value Agent. It MAY contain further type information (property type, see PROV-DM prov:type), a location (property location, see PROV-DM prov:location), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM agent attributes).
Schema for prov:AgentIn the Schema for prov:Derivation, a derivation MUST contain a property @type with value Derivation. It SHOULD contain a generated entity (property generatedEntity, see PROV-DM generatedEntity) and used entity (property usedEntity, see PROV-DM usedEntity). It MAY contain an identifier (property @id), an activity (property activity, see PROV-DM activity), a generation (property generation, see PROV-DM generation), a usage (property usage, see PROV-DM usage), further type information (property type, see PROV-DM prov:type), a label (property label), or other properties with an explicit prefix (see PROV-DM derivation attributes).
Schema for prov:DerivationIn the Schema for prov:Attribution, attribution MUST contain a property @type with value Attribution. It SHOULD contain the entity that is the subject of the attribution (property entity, see PROV-DM entity) and the associated agent (property agent, see PROV-DM agent). It MAY contain an identifier (property @id), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties witn an explicit prefix (see PROV-DM attribution attributes).
Schema for prov:AttributionIn the Schema for prov:Association, an association MUST contain a property @type with value Association. It SHOULD contain an activity (property activity, see PROV-DM activity) and its associated agent (property agent, see PROV-DM agent). It MAY contain an identifier (property @id), a plan (property plan, see PROV-DM plan), a location (property location, see PROV-DM prov:location), a role (property role, see PROV-DM prov:role), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM association attributes).
Schema for prov:AssociationIn the Schema for prov:Delegation, a delegation MUST contain a property @type with value Delegation. It SHOULD contain a delegate agent (property delegate, see PROV-DM delegate) and a responsible agent (property responsible, see PROV-DM responsible). It MAY contain an identifier (property @id), an activity (property activity, see PROV-DM activity), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM delegation attributes).
Schema for prov:DelegationIn the Schema for prov:Usage, a usage MUST contain a property @type with value Usage. It SHOULD contain an activity (property activity, see PROV-DM activity) and an entity (property entity, see PROV-DM entity). It MAY contain an identifier (property @id), a time (property time, see PROV-DM time), a location (property location, see PROV-DM prov:location), a role (property role, see PROV-DM prov:role), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM usage attributes).
Schema for prov:UsageIn the Schema for prov:Generation, a generation MUST contain a property @type with value Generation. It SHOULD contain an entity (property entity, see PROV-DM entity) and an activity (property activity, see PROV-DM activity). It MAY contain an identifier (property @id), a time (property time, see PROV-DM time), a location (property location, see PROV-DM prov:location), a role (property role, see PROV-DM prov:role), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM generation attributes).
Schema for prov:GenerationIn the Schema for prov:Invalidation, an invalidation MUST contain a property @type with value Invalidation. It SHOULD contain an entity (property entity, see PROV-DM entity) and an activity (property activity, see PROV-DM activity). It MAY contain an identifier (property @id), a time (property time, see PROV-DM time), a location (property location, see PROV-DM prov:location), a role (property role, see PROV-DM prov:role), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM invalidation attributes).
Schema for prov:InvalidationIn the Schema for prov:Start, a start MUST contain a property @type with value Start. It SHOULD contain an activity that was started (property activity, see PROV-DM activity); it MAY contain a starter activity (property starter, see PROV-DM starter) and a triggering entity (property trigger, see PROV-DM trigger). It MAY also contain an identifier (property @id), a time (property time, see PROV-DM time), a location (property location, see PROV-DM prov:location), a role (property role, see PROV-DM prov:role), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM start attributes).
Schema for prov:StartIn the Schema for prov:End, an end MUST contain a property @type with value End. It SHOULD contain an activity that was ended (property activity, see PROV-DM activity); it MAY contain an ender activity (property ender, see PROV-DM ender) and a triggering entity (property trigger, see PROV-DM trigger). It MAY also contain an identifier (property @id), a time (property time, see PROV-DM time), a location (property location, see PROV-DM prov:location), a role (property role, see PROV-DM prov:role), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM end attributes).
Schema for prov:EndIn the Schema for prov:Communication, a communication MUST contain a property @type with value Communication. It SHOULD contain an informed activity (property informed, see PROV-DM informed) and an informant activity (property informant, see PROV-DM informant). It MAY contain an identifier (property @id), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM communication attributes).
Schema for prov:CommunicationIn the Schema for prov:Influence, an influence MUST contain a property @type with value Influence. It SHOULD contain an influencee (property influencee, see PROV-DM influencee) and an influencer (property influencer, see PROV-DM influencer). It MAY contain an identifier (property @id), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM influence attributes).
Schema for prov:InfluenceIn the Schema for prov:Specialization, a specialization MUST contain a property @type with value Specialization. It SHOULD contain a specific entity (property specificEntity, see PROV-DM specificEntity) and a general entity (property generalEntity, see PROV-DM generalEntity). It MAY contain an identifier (property @id), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix. (There is no equivalent for those properties in PROV-DM, see section Interoperability.)
Schema for prov:SpecializationIn the Schema for prov:Alternate, an alternate MUST contain a property @type with value Alternate. It SHOULD contain a first alternate (property alternate1, see PROV-DM alternate1) and a second alternate (property alternate2, see PROV-DM alternate2). It MAY contain an identifier (property @id), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix. (There is no equivalent for those properties in PROV-DM, see section Interoperability.)
Schema for prov:AlternateIn the Schema for prov:Membership, a membership MUST contain a property @type with value Membership. It SHOULD contain a collection (property collection, see PROV-DM collection) and a single entity or an array of them (property entity, see PROV-DM entity). It MAY contain an identifier (property @id), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix. (There is no equivalent for those properties in PROV-DM, see section Interoperability.)
Schema for prov:MembershipIn the Schemas for prov:Bundle and prov:Document, a bundle and a document MUST contain a property @type with value Bundle and Document respectively, a context @context, and set of PROV expressions @graph. The names of the properties @context and @graph are specified by JSON-LD [[JSON-LD11]]. In addition, a bundle must contain an identifier (property @id).
Schemas for prov:Bundle and prov:DocumentBundles contain statements (definition prov:Statement), whereas documents contain statements or bundles (definition prov:StatementOrBundle).
Finally, contexts are defined as follows. They take the shape of an array, containing either mappings of prefixes to URIs or URIs to further JSON-LD contexts.
In this section, we provide a description of the JSON-LD context to map the PROV-JSON structures to linked data. Full details of the context can be found in Appendix B.
The Ontology PROV-O [[PROV-O]] defines the Qualification Pattern, which restates a binary property between two resources (referred to as an unqualified influence relation) by using an intermediate class that represents the influence between two resources. This new instance, in turn, can be annotated with additional descriptions of the influence that one resource had upon another. The following figure, borrowed from [[PROV-O]], summarises the PROV relations, and how they are encoded in RDF using the Qualification Pattern.
Note that the figure does not include the Qualification Pattern for Influence; in addition, PROV-O does not define the Qualification Pattern for specialization, alternate and membership. The following JSON properties have a default meaning, unless they are redefined in a specific context of a PROV-JSONLD document:
entity,
activity and
agent
respectively map to PROV-O object properties
prov:entity,
prov:activity and
prov:agent. The following JSON properties have the same meaning in all contexts of a PROV-JSONLD document:
role,
type,
label and
location
respectively map to the RDF properties
prov:hadRole,
rdf:type,
rdfs:label, and
prov:atLocation.
In the mapping Context for Entity, the JSON property value maps to PROV-O prov:value. In the mapping Context for Activity, the JSON properties startTime and endTime map to the RDF data properties prov:startedAtTime and prov:endedAtType, respectively, and have a range of type xsd:dateTime. The mapping Context for Agent does not define further properties. The mapping Context for Derivation supports the Qualification Pattern of Figure 2, g. Each of the JSON properties
generatedEntity,
usedEntity,
activity,
generation, and
usage
maps to an object property: namely,
prov:qualifiedDerivation,
prov:entity,
prov:hadActivity,
prov:hadGeneration, and
prov:hadUsage, respectively.
The mapping Context for Attribution supports the Qualification Pattern of Figure 2, i. The JSON property
entity
maps to the object property
prov:qualifiedAttribution.
The mapping Context for Association supports the Qualification Pattern of Figure 2, j. The JSON properties
activity and
plan
map to the object properties
prov:qualifiedAssociation and
prov:hadPlan, respectively.
The mapping Context for Delegation supports the Qualification Pattern of Figure 2, h. The JSON properties
responsible,
delegate and
activity
map to the object properties
prov:agent,
prov:qualifiedDelegation and
prov:hadActivity, respectively.
The mapping Context for Usage supports the Qualification Pattern of Figure 2, a.
The JSON properties
activity and
time
map to the object property
prov:qualifiedUsage and the data property
prov:atTime, respectively. The range of the latter is
xsd:dateTime.
The mapping Context for Generation supports the Qualification Pattern of Figure 2, b.
The JSON properties
entity and
time
map to the object property
prov:qualifiedGeneration and the data property
prov:atTime, respectively. The range of the latter is xsd:dateTime.
The mapping Context for Invalidation supports the Qualification Pattern of Figure 2, c.
The JSON properties
entity and
time
map to the object property
prov:qualifiedInvalidation and the data property
prov:atTime, respectively. The range of the latter is xsd:dateTime.
The mapping Context for Start supports the Qualification Pattern of Figure 2, e.
The JSON properties
activity,
trigger,
starter, and
time
map to the object properties
prov:qualifiedStart,
prov:entity,
prov:hadActivity, and the data property
prov:atTime, respectively. The range of the latter is
xsd:dateTime.
The mapping Context for End supports the Qualification Pattern of Figure 2, f.
The JSON properties
activity,
trigger,
ender, and
time
map to the object properties
prov:qualifiedEnd,
prov:entity,
prov:hadActivity, and the data property
prov:atTime, respectively. The range of the latter is
xsd:dateTime.
The mapping Context for Communication supports the Qualification Pattern of Figure 2, d.
The JSON properties
informed and
informant
map to the object properties
prov:qualifiedCommunication and
prov:activity, respectively.
In the mapping Context for Influence, the JSON properties
influencee and
influencer
map to the object properties
prov:qualifiedInfluence and
prov:influencer, respectively.
While [[PROV-O]] does not define a Qualification Pattern for Specialization, for uniformity and usability reasons, we adopt a similar mapping as other PROV relations, via a Qualification Pattern (see Definition Context for Specialization). However, the mapping is to new classes and properties in the PROV extension namespace (denoted by the prefix provext). See Section 6 for interoperability considerations.
The JSON properties
specificEntity and
generalEntity
map to the object properties
provext:qualifiedSpecialization and
provext:generalEntity, respectively.
While [[PROV-O]] does not define a Qualification Pattern for Alternate, for uniformity and usability reasons, we adopt a similar mapping as other PROV relations, via a Qualification Pattern (see Definition Context for Alternate). However, the mapping is to new classes and properties in the PROV extension namespace (denoted by the prefix provext). See Section 6 for interoperability considerations.
The JSON properties
alternate1 and
alternate2
map to the object properties
provext:qualifiedAlternate and
provext:alternate, respectively.
While [[PROV-O]] does not define a Qualification Pattern for Membership, for uniformity and usability reasons, we adopt a similar mapping as other PROV relations, via a Qualification Pattern (see Definition Context for Membership). However, the mapping is to new classes and properties in the PROV extension namespace (denoted by the prefix provext). See Section 6 for interoperability considerations.
The JSON properties
collection and
entity
map to the object properties
provext:qualifiedMembership and
provext:collection, respectively.
Introduction: Qualification Pattern
Default and generic Context Elements
Entity
Activity
Agent
Derivation
Attribution
Association
Delegation
Usage
Generation
Invalidation
Start
End
Communication
Influence
Specialization
Alternate
Membership
We provide here a minimal definition of the classes and properties introduced in provext in the context of PROV-JSONLD. They allow the qualification pattern to be applied to Membership, Specialization and Alternate. For each, we define one class and two object properties.
Thank you to Pierre-Antoine Champin for his input on JSONLD and to Denis Ah-Kang for his assistance with the Respec tool.