Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about the data or thing's quality, reliability or trustworthiness. PROV-DM is the conceptual data model that forms a basis for the W3C provenance (PROV) family of specifications. This document specifies PROV-JSONLD, a serialization of PROV in JSON, which exploits JSON-LD to define a semantic mapping so it can also be processed as Linked Data. Overall, PROV-JSONLD is designed to be suitable for interchanging provenance in Web and Linked Data applications, to offer a natural encoding of provenance for its targeted audience, and to allow for fast processing.

The following files are available:

A member submission by King's College London

Introduction

Since their release in 2013, the PROV Recommendations [[?PROV-OVERVIEW]] by the World Wide Web Consortium (W3C) have been adopted by flagship deployments such as the Global Change Information System, the Gazette in the UK, and other Linked Data sets. PROV, which is used as the data model to describe the provenance of data, is made available in several different representations: PROV-N [[PROV-N]], PROV-XML [[PROV-XML]], or in an RDF serialization using the PROV Ontology [[PROV-O]]. The latter is most suitable for Linked Data [[LINKED-DATA]], given that it can readily be consumed by existing Semantic Web tools and comes with the semantic grounding provided by PROV-O [[PROV-O]].

Subsequently, the PROV-JSON [[?PROV-JSON]] serialization has gained traction, despite simply being a member submission, and not having gone through the various stages of a standardization activity. We conjecture that the primary reason for this is that many web applications are built to be light-weight, working mainly with simple data formats such as JSON [[RFC8259]].

The very existence of all these serializations is a testament to the approach to standardization taken by the Provenance Working Group, by which a conceptual data model for PROV was defined, the PROV data model [[PROV-DM]], alongside its mapping to different technologies, to suit users and developers. However, the family of PROV specifications lacks a serialization capable of simultaneously addressing all of the following requirements.

  1. [Lightweight] A serialization MUST support lightweight Web applications.
  2. [Natural] A serialization MUST look natural to its targeted community of users.
  3. [Semantic] A serialization MUST allow for semantic markup and integration with linked data applications.
  4. [Efficient] A serialization MUST be efficiently processable.

In our view, none of the existing PROV serializations supports all these requirements simultaneously. While PROV-JSON is the only serialization to support lightweight web applications, it does not have any semantic markup, its internal structure does not exhibit the natural structure of the PROV data structures, and its grouping of expressions per categories (e.g. all entities, all activities, ...) is not conducive to incremental processing. The RDF serialization compatible with PROV-O has been architected to be natural to the Semantic Web community: all influence relations have been given the same directionality, consistently aligned with their time ordering, but the decomposition of data structures (essentially n-ary relations) into individual triples, which can occur anywhere in the serialization, is not conducive to efficient parsing. It is reasonable to say that the world has moved on from XML, while the PROV-N notation was aimed at humans rather than efficient processing.

JSON-LD [[JSON-LD]] allows a semantic structure to be overlaid over a JSON structure [[RFC8259]], thereby enabling the conversion of JSON serializations into linked data. This was exploited in an early version of this work [[?IPAW-POSTER]], which applied the JSON-LD approach to a JSON serialization of PROV. The solution did not lead to a natural encoding of the PROV data structure because a property occurring in different types of JSON objects had to be named differently so that it could be mapped to the appropriate RDF property; we see here that what is natural in JSON is not necessarily natural in RDF, and vice-versa. The ability to define contextual mappings was introduced in JSON-LD 1.1 [[JSON-LD11]] and is a key enabler of this specification, allowing for the same natural PROV property names to be used in different contexts while still maintaining their correct mappings to the appropriate RDF properties.

Thus, this specification proposes PROV-JSONLD, a PROV serialization compatible with [[PROV-DM]] that addresses all of our 4 key requirements. It is, first and foremost, a JSON structure supporting lightweight Web applications. It is structured so that each PROV expression is encoded as a self-contained JSON object and, therefore, is natural to JavaScript programmers. Exploiting JSON-LD 1.1, we defined contextual semantic mappings, allowing PROV-JSONLD to be seen as linked data. And finally, PROV-JSONLD allows for efficient processing since each JSON object can be readily mapped to a data structure without requiring unbounded lookaheads or search within the data structure.

In the rest of this document, we illustrate PROV-JSONLD, we characterize its structure using a JSON Schema [[JSON-SCHEMA]], we define its semantic mappings using JSON-LD 1.1, and we outline the interoperability testing we put in place to check its compatibility with the PROV data model.

Namespace

The following namespaces prefixes are used throughout this document.

Table 1: Prefix and Namespaces used in this specification
prefixnamespace IRI definition
provhttp://www.w3.org/ns/prov#The PROV namespace [[PROV-DM]]
provexthttps://openprovenance.org/ns/provext#Extension namespace for PROV used in this specification
xsdhttp://www.w3.org/2000/10/XMLSchema#XML Schema Namespace [[XMLSCHEMA11-2]]]
rdfhttp://www.w3.org/1999/02/22-rdf-syntax-ns#The RDF namespace [[RDF-CONCEPTS]]
(others)(various)All other namespace prefixes are used in examples only.
In particular, IRIs starting with "http://example.com" represent some application-dependent IRI [[RFC3987]]

Example

We assume the reader to be familiar with PROV, JSON, and JSON-LD.

To illustrate the PROV-JSONLD serialization, we consider a subset of the example of [[PROV-PRIMER]], depicted below. It can be paraphrased as follows: agent Derek was responsible for composing an article based on an existing dataset.

title http://example/compose compose http://example/dataSet1 dataSet1 http://example/compose->http://example/dataSet1 use http://example/derek derek http://example/compose->http://example/derek assoc http://example/article1 article1 http://example/article1->http://example/compose gen http://example/article1->http://example/dataSet1 der -attrs0 title: Crime rises in cities@EN -attrs0->http://example/article1 -attrs1 type: prov:Person mbox: <mailto:derek@example.org> givenName: Derek -attrs1->http://example/derek
Figure 1: Provenance expressing that Derek was responsible for composing an article based on a data set.

The PROV-JSONLD representation of this example can be seen in Example 1. At the top level, a PROV-JSONLD document is a JSON object with two properties @context and @graph, as per JSON-LD. A context contains mappings of prefixes to namespaces, and also an explicit reference to https://openprovenance.org/prov-jsonld/context.json — the JSON-LD 1.1 context defining the semantic mapping for PROV-JSONLD. (This context is fully described in section 5.) The @graph property has an array of PROV expressions as value. Each PROV expression is itself a JSON object with at least a @type property (for instance, Entity, Agent or Derivation). Each of these PROV expressions provides a description of a resource. Some of these resources have an identity provided by the @id property (for instance, ex:article1 or ex:derek). Other resources are anonymous and, therefore, do not have a property @id, for instance, the Derivation between the dataset and the article.

PROV expressions can be enriched with various properties. Some properties are predefined by PROV-JSONLD such as activity and agent in a Association. Further PROV attributes are allowed, for instance type with an array of further types, to better describe the resource. Others may be defined in a different namespace such as foaf:givenName, for which we expect the prefix foaf to be declared in the @context property.

The property @type is mandatory and is associated with a single value, expected to be one of the predefined PROV expressions. From an efficiency viewpoint, this property is critical in determining which internal data structure a PROV expression should map to, and therefore, facilitates efficient processing. On the contrary, type is optional and can contain as many types as required; their order is not significant.

      

Schema

This section provides an overview of the JSON schema [[JSON-SCHEMA]] for PROV-JSONLD; its full details are in Appendix A.

For each object property identified in the JSON scheme, we provide the corresponding normative attribute definition in [[PROV-DM]].

Preliminary Definitions

Some primitive types occur in PROV serializations, namely DateTime and QualifiedName. We define their schemas as follows.


	  
	  
	
The production rules for qualified names are more complex than the simple regular expression outlined here. A post-processor will need to check that qualified names comply with the definition in [[PROV-N]].

Typed values (typed_value) are JSON objects with properties @value and @type. String values are JSON objects with properties @value and @language.


	  
	  
	

We also define general types for property values, which can be arrays of values ArrayOfValues or arrays of labels ArrayOfLabelValues.


	

With these preliminary definitions in place, we can now present the specification of PROV-JSONLD's core data structures.

prov:Entity

In the Schema for prov:Entity, an entity MUST contain an identifier (property @id) and a property @type with value Entity. It MAY contain further type information (property type, see PROV-DM prov:type), a location (property location, see PROV-DM prov:location), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM entity attributes). (The presence of a colon ":" in the patternProperties element forces all other properties to have the structure of a prefix, a colon, and a local name.)

Schema for prov:Entity

	

prov:Activity

In the Schema for prov:Activity, an activity MUST contain an identifier (property @id) and a property @type with value Activity. It MAY contain a start time (property startTime, see PROV-DM startTime), an end time (property endTime, see PROV-DM endTime), further type information (property type, see PROV-DM prov:type), a location (property location, see PROV-DM prov:location), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM activity attributes).

Schema for prov:Activity

	

prov:Agent

In the Schema for prov:Agent, an agent MUST contain an identifier (property @id) and a property @type with value Agent. It MAY contain further type information (property type, see PROV-DM prov:type), a location (property location, see PROV-DM prov:location), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM agent attributes).

Schema for prov:Agent

	

prov:Derivation

In the Schema for prov:Derivation, a derivation MUST contain a property @type with value Derivation. It SHOULD contain a generated entity (property generatedEntity, see PROV-DM generatedEntity) and used entity (property usedEntity, see PROV-DM usedEntity). It MAY contain an identifier (property @id), an activity (property activity, see PROV-DM activity), a generation (property generation, see PROV-DM generation), a usage (property usage, see PROV-DM usage), further type information (property type, see PROV-DM prov:type), a label (property label), or other properties with an explicit prefix (see PROV-DM derivation attributes).

Schema for prov:Derivation

	

prov:Attribution

In the Schema for prov:Attribution, attribution MUST contain a property @type with value Attribution. It SHOULD contain the entity that is the subject of the attribution (property entity, see PROV-DM entity) and the associated agent (property agent, see PROV-DM agent). It MAY contain an identifier (property @id), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties witn an explicit prefix (see PROV-DM attribution attributes).

Schema for prov:Attribution

	

prov:Association

In the Schema for prov:Association, an association MUST contain a property @type with value Association. It SHOULD contain an activity (property activity, see PROV-DM activity) and its associated agent (property agent, see PROV-DM agent). It MAY contain an identifier (property @id), a plan (property plan, see PROV-DM plan), a location (property location, see PROV-DM prov:location), a role (property role, see PROV-DM prov:role), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM association attributes).

Schema for prov:Association

	

prov:Delegation

In the Schema for prov:Delegation, a delegation MUST contain a property @type with value Delegation. It SHOULD contain a delegate agent (property delegate, see PROV-DM delegate) and a responsible agent (property responsible, see PROV-DM responsible). It MAY contain an identifier (property @id), an activity (property activity, see PROV-DM activity), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM delegation attributes).

Schema for prov:Delegation

	

prov:Usage

In the Schema for prov:Usage, a usage MUST contain a property @type with value Usage. It SHOULD contain an activity (property activity, see PROV-DM activity) and an entity (property entity, see PROV-DM entity). It MAY contain an identifier (property @id), a time (property time, see PROV-DM time), a location (property location, see PROV-DM prov:location), a role (property role, see PROV-DM prov:role), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM usage attributes).

Schema for prov:Usage

	

prov:Generation

In the Schema for prov:Generation, a generation MUST contain a property @type with value Generation. It SHOULD contain an entity (property entity, see PROV-DM entity) and an activity (property activity, see PROV-DM activity). It MAY contain an identifier (property @id), a time (property time, see PROV-DM time), a location (property location, see PROV-DM prov:location), a role (property role, see PROV-DM prov:role), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM generation attributes).

Schema for prov:Generation

	

prov:Invalidation

In the Schema for prov:Invalidation, an invalidation MUST contain a property @type with value Invalidation. It SHOULD contain an entity (property entity, see PROV-DM entity) and an activity (property activity, see PROV-DM activity). It MAY contain an identifier (property @id), a time (property time, see PROV-DM time), a location (property location, see PROV-DM prov:location), a role (property role, see PROV-DM prov:role), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM invalidation attributes).

Schema for prov:Invalidation

	

prov:Start

In the Schema for prov:Start, a start MUST contain a property @type with value Start. It SHOULD contain an activity that was started (property activity, see PROV-DM activity); it MAY contain a starter activity (property starter, see PROV-DM starter) and a triggering entity (property trigger, see PROV-DM trigger). It MAY also contain an identifier (property @id), a time (property time, see PROV-DM time), a location (property location, see PROV-DM prov:location), a role (property role, see PROV-DM prov:role), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM start attributes).

Schema for prov:Start

	

prov:End

In the Schema for prov:End, an end MUST contain a property @type with value End. It SHOULD contain an activity that was ended (property activity, see PROV-DM activity); it MAY contain an ender activity (property ender, see PROV-DM ender) and a triggering entity (property trigger, see PROV-DM trigger). It MAY also contain an identifier (property @id), a time (property time, see PROV-DM time), a location (property location, see PROV-DM prov:location), a role (property role, see PROV-DM prov:role), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM end attributes).

Schema for prov:End

	

prov:Communication

In the Schema for prov:Communication, a communication MUST contain a property @type with value Communication. It SHOULD contain an informed activity (property informed, see PROV-DM informed) and an informant activity (property informant, see PROV-DM informant). It MAY contain an identifier (property @id), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM communication attributes).

Schema for prov:Communication

	

prov:Influence

In the Schema for prov:Influence, an influence MUST contain a property @type with value Influence. It SHOULD contain an influencee (property influencee, see PROV-DM influencee) and an influencer (property influencer, see PROV-DM influencer). It MAY contain an identifier (property @id), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix (see PROV-DM influence attributes).

Schema for prov:Influence

	

prov:Specialization

In the Schema for prov:Specialization, a specialization MUST contain a property @type with value Specialization. It SHOULD contain a specific entity (property specificEntity, see PROV-DM specificEntity) and a general entity (property generalEntity, see PROV-DM generalEntity). It MAY contain an identifier (property @id), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix. (There is no equivalent for those properties in PROV-DM, see section Interoperability.)

Schema for prov:Specialization

	

prov:Alternate

In the Schema for prov:Alternate, an alternate MUST contain a property @type with value Alternate. It SHOULD contain a first alternate (property alternate1, see PROV-DM alternate1) and a second alternate (property alternate2, see PROV-DM alternate2). It MAY contain an identifier (property @id), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix. (There is no equivalent for those properties in PROV-DM, see section Interoperability.)

Schema for prov:Alternate

	

prov:Membership

In the Schema for prov:Membership, a membership MUST contain a property @type with value Membership. It SHOULD contain a collection (property collection, see PROV-DM collection) and a single entity or an array of them (property entity, see PROV-DM entity). It MAY contain an identifier (property @id), further type information (property type, see PROV-DM prov:type), a label (property label, see PROV-DM prov:label), or other properties with an explicit prefix. (There is no equivalent for those properties in PROV-DM, see section Interoperability.)

Schema for prov:Membership

	  
	  
	

prov:Bundle and prov:Document

In the Schemas for prov:Bundle and prov:Document, a bundle and a document MUST contain a property @type with value Bundle and Document respectively, a context @context, and set of PROV expressions @graph. The names of the properties @context and @graph are specified by JSON-LD [[JSON-LD11]]. In addition, a bundle must contain an identifier (property @id).

Schemas for prov:Bundle and prov:Document

	  

	

Bundles contain statements (definition prov:Statement), whereas documents contain statements or bundles (definition prov:StatementOrBundle).


	  

	

Finally, contexts are defined as follows. They take the shape of an array, containing either mappings of prefixes to URIs or URIs to further JSON-LD contexts.


	

JSON-LD Context

In this section, we provide a description of the JSON-LD context to map the PROV-JSON structures to linked data. Full details of the context can be found in Appendix B.

Introduction: Qualification Pattern

The Ontology PROV-O [[PROV-O]] defines the Qualification Pattern, which restates a binary property between two resources (referred to as an unqualified influence relation) by using an intermediate class that represents the influence between two resources. This new instance, in turn, can be annotated with additional descriptions of the influence that one resource had upon another. The following figure, borrowed from [[PROV-O]], summarises the PROV relations, and how they are encoded in RDF using the Qualification Pattern. Note that the figure does not include the Qualification Pattern for Influence; in addition, PROV-O does not define the Qualification Pattern for specialization, alternate and membership.

Canvas 1Layer 1e)f)c)a)b)prov:activityprov:atTimeprov:qualifiedInvalidationxsd:dateTimeInvalidationprov:wasInvalidatedByEntityActivityprov:entityprov:atTimeprov:qualifiedUsagexsd:dateTimeprov:activityprov:atTimeprov:qualifiedGenerationxsd:dateTimeUsageprov:usedActivityEntityGenerationprov:wasGeneratedByEntityActivityd)g)h)prov:entityprov:qualifiedDerivationprov:wasDerivedFromEntityEntityprov:qualifiedCommunicationActivityActivityprov:wasInformedByprov:activityCommunicationprov:agentprov:qualifiedDelegationprov:actedOnBehalfOfDelegationi)j)prov:agentprov:qualifiedAssociationprov:wasAssociatedWithprov:agentprov:qualifiedAttributionEntityAttributionprov:wasAttributedToActivityprov:hadRoleRolePlanprov:hadPlanAssociationprov:qualifiedStartActivityprov:wasStartedByprov:entityStartEntityprov:atTimexsd:dateTimeprov:qualifiedEndActivityprov:wasEndedByprov:entityEndEntityprov:atTimexsd:dateTimeprov:hadUsageprov:hadGenerationUsageGenerationActivityDerivationprov:hadActivityAgentAgentAgentAgent
Figure 2 (taken from [[PROV-O]]): Illustration of the properties and classes to use (in blue) to qualify the binary influence relations (dotted black). The diagram depict entities as ovals, activities as rectangles, and agents as pentagons. The Qualified Resource is represented as a left-pointy shape: in PROV-JSONLD, a Qualified Resource is represented as a JSON Object.

Default and generic Context Elements

The following JSON properties have a default meaning, unless they are redefined in a specific context of a PROV-JSONLD document: entity, activity and agent respectively map to PROV-O object properties prov:entity, prov:activity and prov:agent.

The following JSON properties have the same meaning in all contexts of a PROV-JSONLD document: role, type, label and location respectively map to the RDF properties prov:hadRole, rdf:type, rdfs:label, and prov:atLocation.


	

Entity

In the mapping Context for Entity, the JSON property value maps to PROV-O prov:value.

Context for Entity

	

Activity

In the mapping Context for Activity, the JSON properties startTime and endTime map to the RDF data properties prov:startedAtTime and prov:endedAtType, respectively, and have a range of type xsd:dateTime.

Context for Activity

	

Agent

The mapping Context for Agent does not define further properties.

Context for Agent

	

Derivation

The mapping Context for Derivation supports the Qualification Pattern of Figure 2, g. Each of the JSON properties generatedEntity, usedEntity, activity, generation, and usage maps to an object property: namely, prov:qualifiedDerivation, prov:entity, prov:hadActivity, prov:hadGeneration, and prov:hadUsage, respectively.

Context for Derivation

	

Attribution

The mapping Context for Attribution supports the Qualification Pattern of Figure 2, i. The JSON property entity maps to the object property prov:qualifiedAttribution.

Context for Attribution

	

Association

The mapping Context for Association supports the Qualification Pattern of Figure 2, j. The JSON properties activity and plan map to the object properties prov:qualifiedAssociation and prov:hadPlan, respectively.

Context for Association

	

Delegation

The mapping Context for Delegation supports the Qualification Pattern of Figure 2, h. The JSON properties responsible, delegate and activity map to the object properties prov:agent, prov:qualifiedDelegation and prov:hadActivity, respectively.

Context for Delegation

	

Usage

The mapping Context for Usage supports the Qualification Pattern of Figure 2, a. The JSON properties activity and time map to the object property prov:qualifiedUsage and the data property prov:atTime, respectively. The range of the latter is xsd:dateTime.

Context for Usage

	

Generation

The mapping Context for Generation supports the Qualification Pattern of Figure 2, b. The JSON properties entity and time map to the object property prov:qualifiedGeneration and the data property prov:atTime, respectively. The range of the latter is xsd:dateTime.

Context for Generation

	

Invalidation

The mapping Context for Invalidation supports the Qualification Pattern of Figure 2, c. The JSON properties entity and time map to the object property prov:qualifiedInvalidation and the data property prov:atTime, respectively. The range of the latter is xsd:dateTime.

Context for Invalidation

	

Start

The mapping Context for Start supports the Qualification Pattern of Figure 2, e. The JSON properties activity, trigger, starter, and time map to the object properties prov:qualifiedStart, prov:entity, prov:hadActivity, and the data property prov:atTime, respectively. The range of the latter is xsd:dateTime.

Context for Start

	

End

The mapping Context for End supports the Qualification Pattern of Figure 2, f. The JSON properties activity, trigger, ender, and time map to the object properties prov:qualifiedEnd, prov:entity, prov:hadActivity, and the data property prov:atTime, respectively. The range of the latter is xsd:dateTime.

Context for End

	

Communication

The mapping Context for Communication supports the Qualification Pattern of Figure 2, d. The JSON properties informed and informant map to the object properties prov:qualifiedCommunication and prov:activity, respectively.

Context for Communication

	

Influence

In the mapping Context for Influence, the JSON properties influencee and influencer map to the object properties prov:qualifiedInfluence and prov:influencer, respectively.

Context for Influence

	

Specialization

While [[PROV-O]] does not define a Qualification Pattern for Specialization, for uniformity and usability reasons, we adopt a similar mapping as other PROV relations, via a Qualification Pattern (see Definition Context for Specialization). However, the mapping is to new classes and properties in the PROV extension namespace (denoted by the prefix provext). See Section 6 for interoperability considerations.

The JSON properties specificEntity and generalEntity map to the object properties provext:qualifiedSpecialization and provext:generalEntity, respectively.

Context for Specialization

	

Alternate

While [[PROV-O]] does not define a Qualification Pattern for Alternate, for uniformity and usability reasons, we adopt a similar mapping as other PROV relations, via a Qualification Pattern (see Definition Context for Alternate). However, the mapping is to new classes and properties in the PROV extension namespace (denoted by the prefix provext). See Section 6 for interoperability considerations.

The JSON properties alternate1 and alternate2 map to the object properties provext:qualifiedAlternate and provext:alternate, respectively.

Context for Alternate

	

Membership

While [[PROV-O]] does not define a Qualification Pattern for Membership, for uniformity and usability reasons, we adopt a similar mapping as other PROV relations, via a Qualification Pattern (see Definition Context for Membership). However, the mapping is to new classes and properties in the PROV extension namespace (denoted by the prefix provext). See Section 6 for interoperability considerations.

The JSON properties collection and entity map to the object properties provext:qualifiedMembership and provext:collection, respectively.

Context for Membership

	

Interoperability Considerations

IC1:
There are differences between PROV-DM and PROV-O in terms of the level of requirements set on some expressions. For instance, PROV-DM mandates the presence of an entity in a generation, whereas it defines an activity as optional. Compliance requirements are not the same in PROV-O as one could define a qualified generation with an activity but without an entity. Experience shows that there may be good reasons why a generation may not refer to an entity; for instance, because the recorded provenance is not "complete" yet, and further provenance expressions still need to be asserted, received or merged; in the meantime, we still want to be able to process such provenance, despite being "incomplete". Thus, in PROV-JSONLD, the presence of an entity and an activity in a generation expression is RECOMMENDED (we use the term SHOULD), while other properties are optional (we use the term MAY), and its @type is REQUIRED (we use the term MUST).
IC2:
In PROV-DM, all relations are n-ary except for specialization, alternate and membership, which are binary, meaning that no identifier or extra properties are allowed for these. In PROV-O, this design decision translates to the lack of qualified relations for specialization, alternate and membership. In PROV-JSONLD, in order to keep the regular structure of JSON objects and the natural encoding of relations, but also to ensure the simplicity and efficiency of parsers, these three relations are encoded using the same pattern as for other relations. Therefore, their mapping to RDF via the JSON-LD context relies on a PROV extension namespace (denoted by the prefix provext) in which classes for Specialization, Alternate and Membership are defined. The PROV-JSONLD serialization also allows for identifier and properties to be encoded for these relations.
IC3:
The notion of a PROV document is not present in PROV-DM or PROV-O, but is introduced in PROV-N as a housekeeping construct, and is defined in PROV-XML as the root of a PROV-XML document. A document in PROV-JSONLD is also a JSON object, allowing for a JSON-LD @context property to be specified.
IC4:
The PROV-JSONLD specification does not introduce constructs for some PROV subtypes and subrelations, such as prov:Person, prov:Organization, prov:SoftwareAgent, prov:Collection, or prov:Quotation, prov:PrimarySource, prov:Revision. Instead, the example of Section 3 illustrates how they can be accommodated within the existing structures. We copy below an agent expression of type prov:Person and a derivation of type prov:Revision. These subtypes and subrelations are specified inside the prov:type property. PROV-XML offers a similar way of encoding such subtypes and subrelations, alongside specialized structures. We opted for this single approach to ensure simplicity and efficiency of parsers.

	
IC5:
The interoperability of the PROV-JSONLD serialization can be tested in different ways:
  1. In a roundtrip testing, consisting of the serialization of an internal representation in some programming language to PROV-JSONLD, followed by deserialization from PROV-JSONLD back to the same programming language, the source and target representations are expected to be equal.
  2. Likewise, in a roundtrip testing, consisting of the serialization of an internal representation in some programming language to PROV-JSONLD, followed by a conversion of PROV-JSONLD to another RDF representation such as Turtle, followed by a reading of the Turtle representation back to the same programming language, the source and target representations are also expected to be equal.
  3. Both interoperability tests have been implemented in the Java-based ProvToolbox, with:
    1. The first roundtrip testing is implemented in https://github.com/lucmoreau/ProvToolbox/blob/ProvToolbox-1.0.0/modules-core/prov-jsonld/src/test/java/org/openprovenance/prov/core/RoundTripFromJavaJSONLD11Test.java
    2. The second roundtrip testing is implemented in https://github.com/lucmoreau/ProvToolbox/blob/ProvToolbox-1.0.0/modules-legacy/roundtrip/src/test/java/org/openprovenance/prov/core/roundtrip/RoundTripFromJavaJSONLD11LegacyTest.java

JSON Schema for PROV-JSONLD

      

JSON-LD Context

      

PROVEXT Ontology

We provide here a minimal definition of the classes and properties introduced in provext in the context of PROV-JSONLD. They allow the qualification pattern to be applied to Membership, Specialization and Alternate. For each, we define one class and two object properties.

      

Acknowledgements

Thank you to Pierre-Antoine Champin for his input on JSONLD and to Denis Ah-Kang for his assistance with the Respec tool.