JSON-LD CG

Minutes for 2022-10-12

Present

Anatoly Scherbakov, Gregg Kellogg, Pierre-Antoine Champin, Ted Thibodeau Jr., Leonard Rosenthol, Niklas Lindström, David I. Lehn

Chair(s)

Gregg Kellogg

Scribe(s)

Anatoly Scherbakov

Agenda

https://www.w3.org/events/meetings/164de972-9959-44a0-a925-3905f7685c0f/20221012T120000

Topics

Announcements and Introductions
Approve Minutes
NDJSON-LD
YAML-LD
Next call

Topic: Announcements and Introductions ✪

Anatoly Scherbakov is scribing.

Gregg Kellogg: Introductions! Leonard — would you please introduce yourself?

Leonard Rosenthol: Senior principal architect @ Adobe, responsibilities: PDF architecture, Content Authenticity Initiative. C2PA is an open standards body we founded. Representing Adobe at W3C, ISO, IETF, etc.

Topic: Approve Minutes ✪

Gregg Kellogg: https://json-ld.org/minutes/2022-09-28/

Gregg Kellogg: Approving minutes from the last time... without objections, we'll accept those and move on.

Topic: NDJSON-LD ✪

Gregg Kellogg: https://github.com/json-ld/ndjson-ld

Gregg Kellogg: https://github.com/json-ld/ndjson-ld/issues/1

Gregg Kellogg: We have a repo for NDJSON-LD, and Nicholas has volunteered to spend time on that. That prompted leonardr to jump in and discuss. It is based upon NDJSON and similar to JSON Lines and a few more similar formats, very similar to each other. It is possible for one spec to handle multiple different formats though only one of them has a well defined media type.

Leonard Rosenthol: I don't have a preference between them. Trying to understand what the use case is? What are we trying to accomplish specifically? Obviously JSON-LD and YAML-LD make sense for this group. Why is it making sense to tackle the problem of JSON Records, and how are we going to solve it?

Gregg Kellogg: YAML is a string format that supports multiple embedded documents in a file. In order to do an LD version of YAML, we need to decide how to deal with them. If there was a JSON-LD line streaming format (or a general streaming format for RDF in general even) it would be useful to do. We'd just process each document in a YAML stream using JSON-LD methods.

Gregg Kellogg: Potentially, any RDF format might have a way to describe a stream of records. One use case might be an open connection for actual streaming of real time data. There's a JSON-LD streaming spec. If you want to operate upon completely separate records it would be useful.

Pierre-Antoine Champin: Another use case comes from the SOLID system. They have performance issues serializing large "containers" (connections). Potentially, having a streaming JSON-LD format could help that.

Gregg Kellogg: https://github.com/json-ld/ndjson-ld/issues/3

Gregg Kellogg: Would be useful to put these into issue 3 to start collecting them. Ultimately, the spec (itself or a companion document) should list a few use cases.

Gregg Kellogg: Nicholas, could you describe for instance your motivation for NDJSON-LD?

Niklas Lindström: We've used a line-based format for internal purposes. Thinking of publishing data in such a format. We had raw dumps we published and just declared every line was a separate JSON-LD document.

Niklas Lindström: We are missing a clear definition though, and it is unspecified how to associate a JSON-LD context with such a document.

Niklas Lindström: For us, HTTP response header specified the context; the document itself didn't.

Subtopic: Blank Node Scope ✪

Gregg Kellogg: https://github.com/json-ld/ndjson-ld/issues/4

Gregg Kellogg: If every record is a JSON-LD document, it might have its own context. Or it can be provided externally — as an HTTP header, or an API parameter

... Now, regarding the blank nodes scope. We've discussed it for YAML-LD before.

Gregg Kellogg: Tag definitions and blank node names might be independent between documents. For instance, if we're collecting random documents we might have unexpected consequences if the labels or tags overlap. In the streaming applications though, we might want to share labels (for computing differences, etc) but we won't be able to.

Gregg Kellogg: Next steps on NDJSON-LD — Nicholas?

Niklas Lindström: I do not lack anything other than time. I should write the simplest thing imaginable first, probably... Blank nodes: I do not have a strong opinion on this, we're not using blank node identifiers. JSON-LD document can be an RDF dataset and that is an interesting complication. So every line can represent multiple datasets/graphs.

Niklas Lindström: This does not influence blank nodes question though.

Niklas Lindström: In TriG documents, afaik blank node ids are shared throughout the whole document.

Gregg Kellogg: Different use cases might drive conflicting requirements.

Leonard Rosenthol: There is a difference whether we address a homogenuous case (all documents share a context or grammar) vs the case where documents are heteregenuous (potentially unrelated to each other) - but they happen to share one data stream. Do we want to solve both cases? What are we gaining or losing by doing so?

Gregg Kellogg: NDJSON-LD should be an extension of JSON-LD API because YAML-LD calls upon it. Algorithms operate to transform → JSON-LD internal representation. Consequently, we rely upon that. Regarding the purpose of all this: if we pre-suppose an API and one of this API entry points relates to RDF transformation then doing something multi-dataset becomes really challenging.

Gregg Kellogg: Especially when we think we're collecting unrelated and maybe conflicting records. It would simplify the problem if all the records relate to a single dataset. Unless we have compelling use cases which suggest otherwise.

Gregg Kellogg: We might introduce a "meta" record to specify meta parameters. Like Turtle with their `"@prefix"` parameters in the header.

Niklas Lindström: +1 For "meta-records"; could e.g. set the context initially (corollary: first row of a TSV as columns)

Niklas Lindström: Going to continue the work on NDJSON-LD.

Topic: YAML-LD ✪

Subtopic: test suite ✪

Gregg Kellogg: Yaml-ld#87

Gregg Kellogg: This is an issue to work on the test suite. Quite a bit to be done to it yet. Would be useful to reference normative statements in the spec to the tests which test those statements (a practice by Ivan) - but that adds more work.

Pierre-Antoine Champin: +1 Linking the spec text to the tests is really cool

Subtopic: YAML Streams and JSON Sequences yaml-ld#63 ✪

Gregg Kellogg: That's what pushed NDJSON-LD

Gregg Kellogg: Roberto proposes to map a YAML-LD to a sequence of JSON-LD files

Gregg Kellogg: Proposing to update the spec with a hypothetical mapping to NDJSON-LD so as we can start to flush out the missing components of the spec right now. I will spend some time on that.

Leonard Rosenthol: Does this only apply to streams, or also for a YAML-LD file that contains multiple documents?

Gregg Kellogg: In YAML, stream is a sequence of documents separated by "---". This has a well defined meaning within YAML. In YAML-LD spec, part of the process is to convert YAML-LD into Internal Representation, which includes splitting stream into individual documents.

Gregg Kellogg: What if a stream contains a single document? Does it yield that document, or a stream with that document? For NDJSON-LD probably that's the latter, and for YAML-LD this might depend upon HTTP media type or an API method perhaps (different methods for streams vs documents). This is a subject of consideration.

Leonard Rosenthol: Makes sense. I am thinking of this in respect to having physical files more than something else.

Gregg Kellogg: In file representation or, say, in a multipart/MIME email, or in a stream where you process records as they come through, — this can be hard in API sense. API endpoints create promises and you might expect the promise to fulfill only once the entire stream is processed. Might be not adequate for a real time stream. But we might just focus on the "closed" use case and leave the "open stream" use case for later.

Gregg Kellogg: We need to list use cases for both and look at the other W3C work on realtime processing and open data streams to see if we can find any relevance.

Subtopic: Extended Internal Representation yaml-ld#84 ✪

Gregg Kellogg: Current spec describes the extended internal representation. The motivation is: if, when parsing YAML-LD in extended mode, you have node tags then they can be passed through the JSON-LD algorithms without interpretation. Pierre-Antoine had another idea: add information into JSON objects and forcing algorithm to ignore that information. Pierre-Antoine referred to a related work by Niklas.

Gregg Kellogg: Niklas's work on the LDTR project: https://github.com/niklasl/ldtr

Pierre-Antoine Champin: I came over that only recently and it just rang a bell. For YAML-LD, another option would be to stick to existing internal representation — but we'll lose ability to round-trip from and to YAML-specific notations (tags to represent data types for instance).

Pierre-Antoine Champin: Extending the internal representation aims to convey YAML-specific extended syntactic constructs which don't exist in JSON.

Pierre-Antoine Champin: At TPAC, we mused with the idea that an extended representation can be extended to support Turtle and other RDF serializations.

Niklas Lindström: That's right. That was the point of my idea which I did twice. Originally, I wrote a very simple Turtle and TriG parser using a parser generator library for JS. I thought of it as of a teaching tool for the developers and metadata librarians.

Niklas Lindström: To some extent, it has worked like that. Then, I got carried away and did the same for RDF/XML as well. It's a bit dangerous as an idea because it is not what RDF is about. RDF is about semantics and triples.

Niklas Lindström: The attraction of JSON-LD is to get away from abstract model and to get into something concrete.

Niklas Lindström: Abstract syntax tree for RDF is a viable idea.

Gregg Kellogg: Is intermediate representation itself transcribable? It is printable for debugging purposes probably. but does it relate somehow into the extended JSON-LD representation, how does that match?

Niklas Lindström: JSON is a string representation but it is materialized in (especially dynamic) programming languages very similarly

Niklas Lindström: I didn't consider a formal internal representation at all when doing that

Niklas Lindström: The implementation in question: https://github.com/niklasl/ldtr

Gregg Kellogg: In SPARQL, in N3 my parser abstract syntax tree was forming S-expressions which are serializable in LISP-like fashion. This might be extended to other formats and be a way of expressing these internal formats. You want similar things everywhere: URIs, prefixes, IRIs, literals.

Gregg Kellogg: Having S-expression representation of RDF and then of SPARQL might be useful.

Gregg Kellogg: At RDF level, a triple is a fundamental building block. In SPARQL you also have operators. This doesn't deal with recursive statements though but still might be worth exploring.

Niklas Lindström: Was thinking of doing something similar for SPARQL.

Topic: Next call ✪

Leonard Rosenthol: +1 To skip

Gregg Kellogg: The next date would be Oct 27. I won't be able to be available at that time. Maybe someone is going to lead that meeting, or otherwise we can skip that meeting.

Pierre-Antoine Champin: No objection to skipping

Niklas Lindström: No objection

Gregg Kellogg: Next meeting on November 9. https://www.w3.org/events/meetings/164de972-9959-44a0-a925-3905f7685c0f/20221109T120000

Gregg Kellogg: W3C calendar should automatically adjust to daylight saving time shifts.

Gregg Kellogg: May be one hour later outside of the US.

Pierre-Antoine Champin: In Europe this call will be one hour later.

Gregg Kellogg: Continuing to work on the test suites.

The W3C JSON-LD Community Group

Go Back