JSON-LD Community Group Telecon

Minutes for 2012-11-27

Agenda
http://lists.w3.org/Archives/Public/public-linked-json/2012Nov/0019.html
Topics
  1. ISSUE-182: Dataset vs. Graph
  2. ISSUE-113: Define exactly how (IRI) compaction is supposed to work
  3. ISSUE-172: Should each member in a list contribute to term rank?
  4. ISSUE-200: JSON-LD API Review by Robin Berjon
Resolutions
  1. When compacting lists, the most specific term that matches all of the elements in the list, taking into account the default language, must be selected.
  2. The callback signature for the .toRDF() method should accept Quad[]. That is, the callback is called once after all processing has been completed.
Chair
Manu Sporny
Scribe
François Daoust
Present
François Daoust, Markus Lanthaler, Manu Sporny, Gregg Kellogg, Niklas Lindström, David I. Lehn
Audio Log
audio.ogg
François Daoust is scribing.
Markus Lanthaler: We should discuss https://github.com/json-ld/json-ld.org/issues/182 first, there was some discussion about it on last week's RDF WG telecon.

Topic: ISSUE-182: Dataset vs. Graph

Markus Lanthaler: I gave a quick update on JSON-LD during last week's RDF telecon
… and we came across issue 105 about dataset syntaxes vs. graph syntaxes
… the issue is that if we dereference a URI and get a graph, it wouldn't be the same as getting a dataset even if the data is the same.
… One solution is that we put in JSON-LD spec that we treat the data in the default graph in the JSON-LD Dataset as the graph in a usual graph-based serialization.
Manu Sporny: That seems a reasonable way to address the issue.
… What you're saying is that the RDF WG wouldn't have a problem with that?
Markus Lanthaler: The RDF WG does not say anything about these semantics.
… Richard made the comment that this could be generalized. The idea would be that we come up with a proposal and push it to the RDF WG. They don't have a lot of interest in the issue otherwise and might just close it.
Gregg Kellogg: I wonder if default graph is really the right choice.
… Let's say that you have a datasource in Turtle that describes a book.
… It would be natural to put the metadata about the book in the default graph in JSON-LD.
… and you could put the description of the book in the named graph whose name is the location of the document.
… If the @graph keyword is used, then perhaps it makes things more explicit.
Manu Sporny: It seems that it would be more of a Best Practice thing.
… Does not seem to require any MUST or SHOULD.
… It seems that what Markus is proposing is easier.
Gregg Kellogg: I think it's less a JSON-LD issue than a dataset issue.
… If you use a dataset in a graph, you could use all the data, and it's not wrong. You'd have more data.
Markus Lanthaler: also discussed during the telecom, but that would mean that you could not generically do content-negotiation with JSON-LD because it would be up to the application to decide where it puts the information.
… By default, if you put all info in a named graph that has the same URI, you end up with sort of two default graphs, which sounds weird.
Gregg Kellogg: that would be two named graphs.
… Question is what does the receiver do with the data it receives?
Markus Lanthaler: problem is that there would be no way to interpret the data in a generic way.
Gregg Kellogg: we just need to describe what the behavior should be on the client side.
… I could see an argument for just flattening, basically just stripping off the named graphs.
Manu Sporny: concerned about data loss, meaning references to the named graph.
… We don't know where the triple originate from.
Gregg Kellogg: the only other solution would be to reify it. No, thank you.
Manu Sporny: most natural thing would be to use the default graph. If the server is mixing and matching datasets and graphs, the lowest denominator should be used, which means the default graph.
[discussion about Payswarm implementation]
Gregg Kellogg: it seems to me that there is a trend towards supporting named graphs.
… I can certainly see that happening. I think it would be natural to do things. Signing information is useful use case.
… The source of the important info is likely to be in a named graph unless we add more semantics to the default graph.
Manu Sporny: in the use case where there is signature, the "default" graph is effectively going to be named.
Gregg Kellogg: yes, and the name could be the URI of the document.
Manu Sporny: In PaySwarm, we actually don't use named graphs yet because RDFa doesn't support them yet. We talk about the signature on the graph as another set of triples, which is a bit awkward, but it works.
Gregg Kellogg: We could support some of it in the RDF conversion algorithms. One of Robin's comment is about calling only one callback. We could do some magic there if we have that.
… I think we really want to push JSON-LD to the main frame of RDF, not to the fringe.
Niklas Lindström: +1, this is the crucial part
Gregg Kellogg: It's not just JSON-LD. For JSON-LD, document is generally limited, but in quads, it can be gigabytes, and you cannot wait up until you have ingested the whole thing before asserting things.
[scribe missed some of Gregg's comments]
Niklas Lindström: It would be good if we could formulate some concrete suggestion to the RDF WG.
… For one, if I understood correctly, the concept of datasets within RDF 1.1 does not allow to nest datasets.
Gregg Kellogg: correct.
Gregg Kellogg: Basically, the argument is that if expecting a graph, a consumer should extract the graph with the name equivalent to the location.
Gregg Kellogg: … We can change the to/from RDF algorithm to take a JSON-LD document with only a default graph and output it using a name based on the location.
Niklas Lindström: it's to me a clear indication, the grouping of triples is clearly outside the notion of graphs. It's just a way to group sets. There should no semantic between the the set of triples and the groups that contain these triples.
… The union of triples should be treated the same way as if they were together.
… If we make a difference, it's http-range-14 times 10.
Manu Sporny: So I'm having a hard time finding the difference between your two views. Could you formulate something?
Gregg Kellogg: I pasted my proposal on IRC: "Basically, the argument is that if expecting a graph, a consumer should extract the graph with the name equivalent to the location."
Manu Sporny: How does that translate to JSON-LD? Content-negotiating between Turtle and JSON-LD, what would the resulting JSON-LD graph contain?
Gregg Kellogg: with my proposal, if you have a named graph, you use that, otherwise you use the default graph.
Manu Sporny: How does that affect the JSON-LD document?
Markus Lanthaler: My proposal would be to say that you can use JSON-LD as a graph source. The consumer would just use the default graph in that case
Gregg Kellogg: It doesn't. If we're returning quads in JSON-LD. With no name, the intent is clear. If the name is the same-document relative URI, then that's the same thing.
Markus Lanthaler: The problem is (as I've found out last week) that graphs can be treated as logical expressions, but not datasets
Gregg Kellogg: It does not have implications on the JSON-LD syntax.
Manu Sporny: I guess I'm unclear about the differences between what you're proposing and what Markus is proposing.
… It seems that your proposals are parallel. Neither of them requires us to change JSON-LD at all.
Markus Lanthaler: If I understood Gregg correctly, there would be no default graph when turning to RDF
Gregg Kellogg: when turning to RDF, that's correct. It would return quads that are named according to the document location. This would address the use case where the default graph is used to provide provenance information.
Markus Lanthaler: You prevent another use case. You cannot put anything in the default graph.
Gregg Kellogg: No, you can! In JSON-LD, you can have an empty name graph. @graph with an empty object as a value. It doesn't put any triples in the graph.
Markus Lanthaler: you would put the data in the named graph if there is no such named graph in the first place?
Gregg Kellogg: yes.
Markus Lanthaler: I don't really like that. It means your data moves if you later decide to change the graph and add such a named graph.
Niklas Lindström: I think the problem here is that the notion of graph is the domain of the keeper of information. In Gregg's example, if you have an URI for the document, and you return a dataset with assertions with a named graph that uses that URI. From a consumer perspective, you would want to put provenance information in your default graph. There is a clash of two worlds. Conflict between default graph and source of each graph.
Gregg Kellogg: we could just say that provenance information should not be written in the default graph.
… That would allow us to use the default graph as now.
… We have examples that might be worth re-writing, in particular when we talk about signing information.
… Chicken-and-egg situation as a named graph needs to be included in the default graph in JSON-LD
Manu Sporny: I suggest to push the issue off to the issue tracker. Niklas, Gregg, Markus, please put some proposals there.
Niklas Lindström: named graph with provenance data. I have minted special URIs for Atom entries. Sort of similar to distinct named graph with provenance information as Gregg suggests.
… There may be something substantially useful there.
Manu Sporny: OK, let's see concrete proposals and get back to it next week.
Niklas Lindström: Sandro: "you can treat this is as graph source, if you want, and when you do, you get the default graph"
Niklas Lindström: Sandro said something that looks like Markus proposal.
Gregg Kellogg: yes, but we need to think through the provenance issues.

Topic: ISSUE-113: Define exactly how (IRI) compaction is supposed to work

Manu Sporny: two proposals on the table with concerns from Markus that we may be missing the point.
… This is the whole term-ranking discussion. Markus proposes updates to the algorithm. Gregg and Dave thought it would just be different, not necessarily better.
Manu Sporny: PROPOSAL 1: Clarify parts of the IRI compaction algorithm that need to change, but do not change the algorithm in any large way as it works and has been implemented by two different people.
Manu Sporny: PROPOSAL 2: Adopt Markus' proposed algorithm above for the IRI compaction algorithm.
Manu Sporny: It seems first proposal has the most amount of support.
… I guess Markus point is that clarification is not enough.
Markus Lanthaler: It's not clear to me what this proposal means. It's too abstract for me.
Manu Sporny: The main thing that proposal is trying to convey is that the algorithm is the one that is in the spec. So it's about clarifying the parts that are not clear.
Gregg Kellogg: This also intersects with possible changes we need to make to deal with property generators.
… and language maps. It's possible that the term ranking algorithm may need to be revisited in light of these. If it does, it could be good to improve it if we can.
Manu Sporny: If we work on it heavily, it could modify a number of test cases.
Gregg Kellogg: It's easy to find test cases that will be more appropriately dealt with by a given algorithm, but that's not the point of test cases which should test the actual algorithm that is in the spec.
… If you're abusing term ranking with lists.
… I guess we should make things much simpler in such cases.
Markus Lanthaler: but we never discussed that. It says something in between.
Manu Sporny: The best way to solve it might be to re-write the algorithm. If it addresses the compaction issues, I don't really care what it looks like. It needs to be simple and do the job. Someone just needs to do it.
Markus Lanthaler: I don't care if it's my algorithm but I do care what the output of the algorithm is. That's why I would like to decide what the desired output is.
Gregg Kellogg: I think it's clear for everything but lists.
… It's really when you get to what is the best term to use for a list that things get tricky.
… I can certainly see that I might want to select a term to express that list. When you have a list with different languages, it's a bit nonsensical.
Niklas Lindström: The only applicable term with mixed content should be the one that has no type and language. You can't split the list. That's the simplest solution to me.
... If it's a mixed list, you must treat that data with lots of inline knowledge in your code.
Gregg Kellogg: That would alter the algorithm as it is written now to reject a term [scribe missed exact change, it's kind of hard to scribe algorithms expressed orally ;)]
Niklas Lindström: The only case where I used mixed lists was to report errors. I have to pick up the specific details of that, so no coercion.
Manu Sporny: Going back, I think we have agreement on how this should work. Someone needs to sit down re-writing the algorithm. Whoever does it first and implements it wins :)
… I'm fine with Markus re-writing the algorithm if he takes other people comments into account.
Gregg Kellogg: This should be the final version.
Manu Sporny: Right, it should include everything.
PROPOSAL: When compacting lists, the most specific term that matches all of the elements in the list, taking into account the default language, must be selected.
Gregg Kellogg: +1
Manu Sporny: +1
François Daoust: +1
Niklas Lindström: +1
Markus Lanthaler: +1
RESOLUTION: When compacting lists, the most specific term that matches all of the elements in the list, taking into account the default language, must be selected.
Manu Sporny: do we need to do anything else to address this issue here?
… OK, moving on, then.

Topic: ISSUE-172: Should each member in a list contribute to term rank?

Manu Sporny: Basically, that's what we just discussed. The answer is "yes" but not quite straightforward. Each member in the list is checked and the most specific term that matches all the elements in the list is taken.

Topic: ISSUE-200: JSON-LD API Review by Robin Berjon

Manu Sporny: Review by Robin Berjon.
… Ivan felt that it would be good to have an API review by someone that has a lot of experience with WebIDL and Javascript APIs.
… I see that Markus has already responded.
Gregg Kellogg: I certainly think we should talk about the use of IRI vs. URL.
Manu Sporny: Robin suggests we use URL instead of IRI, even though IRI is more correct.
Gregg Kellogg: HTML5 modifies what URL means, at least last time I checked, and we put some provision in RDFa I think about that.
Manu Sporny: The plan is to update the URL spec to absorb the IRI spec, but not positive about that.
François Daoust: One thing that wasn't said - we said we're using URL to mean IRI. [scribe assist by Manu Sporny]
David I. Lehn: can i vote for URI? :)
Gregg Kellogg: maybe URI, at it's most commonly understood than IRI. We could use URI and say that we conform to IRI spec.
Niklas Lindström: The problem is that, technically, URI and IRI are not the same thing. I think we should stick to IRI until someone is really pushing for the change.
Manu Sporny: Agree, let's move on.
[Manu going over Robin's comments]
Manu Sporny: changing JSON Object to reference JSON spec?
Markus Lanthaler: yes, much clearer in Syntax spec.
[discussion on NoInterfaceObject on JsonLdProcessor]
Manu Sporny: I'm going to push back on that.
… that's how JSON works. JSON.parse, JSON.stringify.
… that's probably what we want to follow.
Markus Lanthaler: You could have a private constructor.
Manu Sporny: we might want to ask the whatwg channel. I'm not convinced that constructors are the right way to go. That's what I did previously and received a lot of pushback.
Manu Sporny: ref. asynchronous/synchronous. We could say that this is an asynchronous API but that implementations in other languages may use a synchronous version.
… I don't think that adding a synchronous API buys us a lot of things.
Niklas Lindström: do we need to rephrase the note that it's only when you don't want to implement the API but want to follow the gist of it.
Manu Sporny: Yes, we should clarify the wording. I also think we should not specify a synchronous API and we should also not claim that the API is the only way to implement the algorithms.
Markus Lanthaler: I think the spec is quite clear on this: http://json-ld.org/spec/latest/json-ld-api/#jsonldprocessor
Manu Sporny: ref. error constants, that's true, something we never have had time to review so far.
Manu Sporny: ref. losing information, I'm pretty that's what we're doing.
Gregg Kellogg: we lose information for terms that are not defined.
Markus Lanthaler: we still have a constant that is not used anywhere. That may have triggered the comment.
… "lossy compaction", let's remove that.
Manu Sporny: re. modification in place, it's true. We should be probably be modifying a copy of the provided input.
Markus Lanthaler: yes.
Manu Sporny: re. "string" and "number" in WebIDL. OK, we'll have a look at WebIDL for numbers.
Manu Sporny: re. toRDF designed wrong, true for the final call. We wanted to provide feedback about how many triples has been generated. I'm afraid that if we call back with an array of quads, that would make a lot of data. That said, we'll need to keep that data in memory, so that memory is needed anyway. Does anyone have a feeling about one callback total vs. one callback per quad?
Markus Lanthaler: It's much easier to pass all the quads at once.
Gregg Kellogg: Agree.
Niklas Lindström: any way to say that it's an enumerable of any kind in WebIDL?
Manu Sporny: I don't think so.
PROPOSAL: The callback signature for the .toRDF() method should accept Quad[]. That is, the callback is called once after all processing has been completed.
Gregg Kellogg: +1
Manu Sporny: +1
François Daoust: +1
Niklas Lindström: +1
Markus Lanthaler: +1
David I. Lehn: +0
RESOLUTION: The callback signature for the .toRDF() method should accept Quad[]. That is, the callback is called once after all processing has been completed.
Markus Lanthaler: one quick question about error handler that Dave was to work on?
Manu Sporny: no news up until the end of the year, I think. Maybe we should simplify that. Markus, is that you would suggest?
Markus Lanthaler: yes.
Manu Sporny: feel free to do that and let's see how it looks like after that. If fixing the data really ends up being necessary, we can always improve that later on, but I would expect people to lint the data before they pass it on to the processor.
[Call adjourned]