Introduction

A JSON-LD document is a representation of a directed graph. A single directed graph can have many different serializations, each expressing exactly the same information. Developers wanting to work with interconnected JSON objects, can use the connect algorithm described here to "stich together" node definitions and creating an index of important values mapped to each node. This enables navigation between node definitions via interlinking keys (representing arcs in graph nomenclature).

Technically, connect works by replacing each node reference in the input data with a real, programmatic object reference, to one, merged node definition. This results in the creation of a simple in-memory graph.

Out of the box, the connect process creates an index mapping node @id values to the node definition. It is also possible to configure the process to create additional indexes, as well as adding a local reverse key map to each node definition.

How to Read this Document

This document is a detailed specification for Linked Data in JSON. The document is primarily intended for the following audiences:

Software developers that want to implement processors and APIs for JSON-LD.
Web developers who want to navigate JSON-LD documents as interconnected nodes.

To understand the basics in this specification you must first be familiar with JSON, which is detailed in [[!RFC4627]]. You must also understand the JSON-LD Syntax [[!json-ld]], which is the base syntax used by all of the algorithms in this document, and the JSON-LD API [[!JSON-LD-API]]. To understand the API and how it is intended to operate in a programming environment, it is useful to have working knowledge of the JavaScript programming language [[ECMA-262]] and WebIDL [[!WEBIDL]]. To understand how JSON-LD maps to RDF, it is helpful to be familiar with the basic RDF concepts [[!RDF-CONCEPTS]].

General Terminology

The intent of the Working Group and the Editors of this specification is to eventually align terminology used in this document with the terminology used in the RDF Concepts document to the extent to which it makes sense to do so. In general, if there is an analogue to terminology used in this document in the RDF Concepts document, the preference is to use the terminology in the RDF Concepts document.

The following is an explanation of the general terminology used in this document:

JSON object: An object structure is represented as a pair of curly brackets surrounding zero or more name-value pairs. A name is a string. A single colon comes after each name, separating the name from the value. A single comma separates a value from a following name. The names within an object SHOULD be unique.
array: An array is represented as square brackets surrounding zero or more values that are separated by commas.
string: A string is a sequence of zero or more Unicode (UTF-8) characters, wrapped in double quotes, using backslash escapes (if necessary). A character is represented as a single character string.
number: A number is similar to that used in most programming languages, except that the octal and hexadecimal formats are not used and that leading zeros are not allowed.
true and false: Values that are used to express one of two possible boolean states.
null: The use of the null value within JSON-LD is used to ignore or reset values.
keyword: A JSON key that is specific to JSON-LD, specified in the JSON-LD 1.1 Syntax specification [[!json-ld]] in the section titled Syntax Tokens and Keywords.
context: A a set of rules for interpreting a JSON-LD document as specified in The Context of the [[json-ld]] specification.
IRI: An Internationalized Resource Identifier as described in [[!RFC3987]].
Linked Data: A technique for creating a network of inter-connected data across different documents and Web sites.
JSON-LD graph: An unordered labeled directed graph, where nodes are IRIs or Blank Nodes, or other values. A JSON-LD graph is a generalized representation of a RDF graph as defined in [[!RDF-CONCEPTS]].
named graph: A JSON-LD graph that is identified by an IRI.
graph name: The IRI identifying a named graph.
default graph: When executing an algorithm, the graph where data should be placed if a named graph is not specified.
node: A piece of information that is represented in a JSON-LD graph.
node definition: A JSON object used to represent a node and one or more properties of that node. A JSON object is a node definition if it does not contain the keys @value, @list or @set and it has one or more keys other than @id.
node reference: A JSON object used to reference a node having only the @id key.
blank node: A node in a JSON-LD graph that does not contain a de-referenceable identifier because it is either ephemeral in nature or does not contain information that needs to be linked to from outside of the JSON-LD graph. A blank node is assigned an identifier starting with the prefix _:.
property: The IRI label of an edge in a JSON-LD graph.
subject: A node in a JSON-LD graph with at least one outgoing edge, related to an object node through a property.
object: A node in a JSON-LD graph with at least one incoming edge.
quad: A piece of information that contains four items; a subject, a property, an object, and a graph name.
literal: An object expressed as a value such as a string, number or in expanded form.

Algorithm

The algorithm described in this section is intended to operate on language-native data structures. That is, the serialization to a text-based JSON document isn't required as input or output, and language-native data structures MUST be used where applicable.

Syntax Tokens and Keywords

@rev: Used in Connect to set the default key for the reverse key map.

All JSON-LD tokens and keywords are case-sensitive.

Connect

Connecting is the process of turning an input JSON-LD document into an interconnected and indexed data structure.

Connect Algorithm Terms

current result object: A merged node definition.
map of result objects: A map of resulting node definitions that is the result of this algorithm.
type map: A map of types to sets (represented by arrays) of result objects.
reverse key map: A map of keys with arrays of subjects that link to the current node using that key.
connected output: The resulting object containing the map of result objects.

Connect Algorithm

Create an idMap and a typeMap, being the map of result objects and type map of the connected output respectively.

For each object, do the following series of steps:

If the object has a key whose resolved meaning is @id, find an existing object indexed by that id from the idMap. If one is found, use that as the current result object.
Otherwise (on no key or no existing), create a new object and use that as the result object.
For each key, value pair in the current object, do the following:
1. If the resolved key meaning is @id, add the result object under that key to the idMap.
2. If the resolved key meaning is @type, and the current object has no key whose resolved meaning is @value, get or add an array for the type value in the typeMap. Append the result object to that array.
3. If the value is an object, create a new value by using the result of running these steps recursively with the value as input.
4. Otherwise, if the value is an array, create a new array. Then apply these steps recursively to each object in the array, appending each result to the new array.
5. Otherwise, use the value as the new value.
6. Add the key and the new value to the result object.
Return the result object.

If the @rev flag is used, pass the current key (called reverse link) and current object (called linking object) to each recursive call in the process above. At the end of the process, get or create a revMap from the current object. Get or create a list for the reverse link key from the revMap, and append the linking object to that. Then set that revMap on the current object.

The resulting idMap and typeMap are added to the final connected output.

This algorithm is a work in progress. It is currently undefined whether the creation of typeMaps and revMaps should be optional. It is also still undefined which keys will label these in the connected output.

The Application Programming Interface

JsonLdProcessor

The JSON-LD processor interface is the high-level programming structure that developers use to access the JSON-LD transformation methods. The definition below is an experimental extension of the interface defined in the [[JSON-LD-API]].

The JSON-LD API signatures are the same across all programming languages. Due to the fact that asynchronous programming is uncommon in certain languages, developers MAY implement processor with a synchronous interface instead. In that case, the callback parameter MUST NOT be included and the result MUST be returned as return value instead.

void connect()

Connects the given input according to the steps in the Connect Algorithm. The input is used to build the connected output and is returned if there are no errors. Exceptions MUST be thrown if there are errors.

object or object[] or IRI input: The JSON-LD object or array of JSON-LD objects to perform the connecting upon or an IRI referencing the JSON-LD document to connect.
object or IRI? context: An optional external context to use additionally to the context embedded in input when expanding the input.
JsonLdCallback callback: A callback that is called when processing is complete on the given input.
optional JsonLdOptions? options: A set of options that MAY affect the connect algorithm such as, e.g., the input document's base IRI.

Callbacks

JsonLdCallback

The JsonLdCallback is used to return a processed JSON-LD representation as the result of processing an API method.

See JsonLdCallback definition in [[!JSON-LD-API]].

Data Structures

This section describes datatype definitions used within the JSON-LD API.

JsonLdOptions

The JsonLdOptions type is used to convert a set of options to an interface method.

See JsonLdOptions definition in [[!JSON-LD-API]].