JSON-LD Community Group Telecon

Minutes for 2012-09-18

Agenda
http://lists.w3.org/Archives/Public/public-linked-json/2012Sep/0006.html
Topics
  1. ISSUE-113: IRI compaction algorithm
  2. ISSUE-140: Consider objectify/link API method
  3. Timeframe?
Chair
Manu Sporny
Scribe
Manu Sporny
Present
Manu Sporny, Markus Lanthaler, Niklas Lindström, David I. Lehn
Audio Log
audio.ogg
Manu Sporny is scribing.
Manu Sporny: Gregg, Francois not here today. Dave Lehn will be here shortly. Let's discuss the approach for solving these issues.
Manu Sporny: Any additions/changes to the Agenda?
No changes

Topic: ISSUE-113: IRI compaction algorithm

Markus Lanthaler: The problem was that we never defined how we're going to do IRI compaction, but that has been since corrected, though not ideally the way we wanted it to be.
Markus Lanthaler: Gregg updated the spec - currently, there is an algorithm that is not understandable without implementing it. It isn't explained how the numbers were generated. If you don't implement it, you have a difficult time understanding what the algorithm is doing.
Markus Lanthaler: It's just a very difficult to understand algorithm. It makes it quite difficult to explain to people what compaction does. It's kind of a black box at the moment.
Manu Sporny: So, what's the plan here? Make the language simpler?
Markus Lanthaler: We should consider IRI compaction algorithm and term ranking algorithm when simplifying.
Markus Lanthaler: Pseudo-code in the issue is easier to understand.
Markus Lanthaler: Gregg disagrees, and Dave needs more time to look at it.
Manu Sporny: Dave Longley's concern is that all the algorithms, because we're focused on corner cases, are getting difficult to understand. Perhaps what we should do is simplify greatly, and ignore corner cases. One way we could do this is say that if there is ever a term conflict, that we should just throw an error and have the error callback handle the selection of the proper term. The problem with that approach is that developers may choose the wrong way to select the term (or at the very least, it's non-interoperable - or they have to publish their algorithm). To get around that, we could publish the "proper" term matching algorithm along with the JSON-LD API and that can be the default for the .compact() option for the error handler. The problem with that is that we end up having the same amount of complexity in there that we do today.
Manu Sporny: The other option is that we can explain the algorithm better, but that doesn't remove the complexity of the algorithm. [scribe assist by Niklas Lindström]
Manu Sporny: we could explain the algorithm like this: the algorithm picks the most specific term; but there are complications for this in the edge cases. [scribe assist by Niklas Lindström]
Manu Sporny: so should we simplify it, or can we settle for explaining it better? [scribe assist by Niklas Lindström]
Markus Lanthaler: What do you mean by conflict?
Manu Sporny: Two terms that have the same IRI, but one of them has a datatype - which one is picked?
Niklas Lindström: I haven't had time recently to grasp the current algorithm. I hope that we could simplify it to some extent.
Niklas Lindström: There are many edge cases, are there test cases?
Manu Sporny: yes, lots of test cases.
Niklas Lindström: Perhaps having different terms for date vs. datetime. Author name (dc:creator with a string) vs. with a URI reference. Those would be good to keep.
Niklas Lindström: Not having spent too much time on this recently, I hope that we could make some sort of binary check - either it's a perfect match, or if there is a term for it, use that. So, we don't have multiple steps for checking (to see if there is something matching)
Manu Sporny: the current algorithm is a multistep process; it ranks the terms. We do have test cases for them. [scribe assist by Niklas Lindström]
Manu Sporny: There are multiple ways of implementing it. The selection algorithm is very complex because it deals with all the corner cases. [scribe assist by Niklas Lindström]
Manu Sporny: Dave Longely proposes to deal with less corner cases, and raise an error if there's a corner case conflict. That has advantages and disadvantages. [scribe assist by Niklas Lindström]
Manu Sporny: The big issue is figuring out, when there is a corner case, which term gets picked.
Niklas Lindström: If I have a property 'age' and a value that is an integer, that would be straight forward to pick - that property and three terms - if one of them was coerced to an integer, that one would be picked. If a term was coerced to a list, it wouldn't be picked.
Manu Sporny: The issue is that the algorithm to do that is complex.
Niklas Lindström: I haven't actually implemented that algorithm yet - I'm about to.
Niklas Lindström: I'd map the property IRI to an object that itself has a type dictionary, a container dictionary, or a default property IRI mapping.
Niklas Lindström: I can see there is a certain complexity involved if you are looking for something that is both coerced to a datatype and it has a certain container (ie: has multiple values)... I don't understand why you need to rank items.
Markus Lanthaler: Gregg wrote it, so he'd know best.
Markus Lanthaler: I didn't implement it as it is in the spec, I couldn't figure out how to implement it from the spec. The idea is that you have a number of terms or complex IRIs (prefix/suffixes), or even the full IRI, and you assign a number to them (to the IRI/value pair) which expresses how well it matches.
Markus Lanthaler: So, for example, if you have just one term with one IRI a 1, but you have something that has a datatype and it matches, that gets a value of 2 and wins, etc.
0 and term is ... you don't know how the numbers were created. It's difficult to understand what's going on by looking at the numbers.
Manu Sporny: I think we should try and remove all the numbers in the term ranking algorithm as a way of simplifying the way it is explained. Perhaps we need to implement it as a map-reduce step that always results in 0 or 1 term picked as a result. So, you give the algorithm a list of potential terms that can be matched, and a value that is being considered for match against all the terms. The algorithm then whittles the list of IRIs down to 1 (if a term matched) or 0 (if none of the terms match). This way, there is no weirdness like rank = rank - 2.
Niklas Lindström: If you have this - [] dc:created "2012-01-01T00:00:00"^^xsd:dateTime
Niklas Lindström: and this term: "created": "dc:created"
Niklas Lindström: Let me see if I understand this correctly...
Niklas Lindström: and this term: "dc:created": {"@type": "xsd:dateTime"}
Niklas Lindström: What it we order the list so that you just go down and ignore each item in the list until a selection is made?
Niklas Lindström: "createdTimeSet": {"@type": "xsd:dateTime", "@container": "@set"}
Niklas Lindström: So, we could simplify by throwing out choices that we don't want to make.... like given the choice between terms and curies, throw out all the curies from the decision before you make the decision?
Manu Sporny: The issue is that people might be surprised by this, because the more accurate term wouldn't be selected.
Niklas Lindström: Then they should only use terms, or only use CURIEs.
Niklas Lindström: If you don't want the terms to be picked, you should be able to manage your own context in that scenario, anyway.
Niklas Lindström: If we try to support that use case, I'm not really sure if we're supporting that usage of @context anyway - it's a complex usage of terms and CURIEs.
Manu Sporny: Perhaps we can do this map-reduce in 3 iterations, instead? First removes @set/@list, second matches against datatype/language, third picks by lexicographical value. That may be easier for folks to understand?
Markus Lanthaler: Maybe we pick @set/@list first, then @datatype/@language, then last step checks lexicographical/prefix value?
Markus Lanthaler: Maybe it's enough to specify how the internal inverse-context is sorted? Then we just go down the list of internal inverse-context values and pick an item or skip it?
Niklas Lindström: Maybe we should investigate that - we cover most of the needs - it's more direct/natural.
Manu Sporny: Okay, so loose consensus - we have a function that takes in a list of terms and a value to match... the function whittles down the list to one item by the end. The way it whittles could be performed in 3 iterations, where each iteration removes imperfect matches leaving 1 or 0 matches at the end. The other way it could be whittled down is to sort the list of potential term matches in some way, and then searches for an "exact" match.
Error: (IRC nickname not recognized)[10:56] <mlnt> termA: @list, typeA | termB: @list, typeB --> list: val1/typeA, val2/typeB, val3/typeC
Markus Lanthaler: I would say this should choose typeA (lexigr. least)
Markus Lanthaler: for list: val1/typeA, val2/typeB, val3/type
Manu Sporny: So, the approach could be less cognitively complex and more algorithmically complex?
Niklas Lindström: Yeah, but only because we need to be more accurate than we are now.
Manu Sporny: Dave Longley is concerned that when we chose the word 'compact' that it was the wrong decision. The reason is that people think it's supposed to end up with the least number of bytes for the document. In reality, it's supposed to give back an easy-to-use data structure for developers to use. So, when compacting, we should ensure that we don't compact something that shouldn't really be compacted. Like a list with mixed values being compacted to a list of @datatypes that are xsd:integers that would be the wrong thing to do.
Niklas Lindström: Yes, for lists, it either matches exactly (every item in the list), or there is no match.
Niklas Lindström: It should always be crystal clear when something applies...
Manu Sporny: The issue with cornercases is it makes it too complex. The choice is - don't deal with the corner cases, or deal with them. Dealing with the corner cases leads to very complex algorithms. Not dealing with the corner cases has two possible outcomes; 1) Interoperability problems that contain data in the corner cases - people might think JSON-LD sucks because it gives back bad data when you .compact(), 2) Forcing people to mark their data up in a specific way, which removes corner-cases from JSON-LD data because that data doesn't work well with the API. The first is bad, the second is good. No idea which one will happen if we choose to ignore corner cases.
Niklas Lindström: Irregular data where you have mixed types with the same terms are not compact-able, unless you have different terms for different types used. It's obvious from looking at the context that the context is written for irregular data.
Manu Sporny: Okay - maybe Markus and I need to write the pseudocode for what we've discussed today, then we look at it as a group, then decide what we want to go with and include it in the spec.

Topic: ISSUE-140: Consider objectify/link API method

Manu Sporny: This issue is about whether or not we should add a link(), .graphify(), method to the API
Manu Sporny: I'm concerned that we don't have an algorithm to do this yet... time issue for 1.0
Niklas Lindström: I'm concerned about timing - need to write something in the wiki about this - perhaps I should collaborate with Gregg and write this in a sibling specification.
Manu Sporny: I agree, I don't think we have the time to put this in 1.0, but we should start working on it immediately.
Niklas Lindström: I took your jsonld.js implementation and took out the framing part - needed a smaller code size - and I don't think we need to do anything in the spec. It should be possible to add things later on in a simple way. I don't think we have to add anything in the API document for that.
Niklas Lindström: The .link() / .graphify() mechanism could be extended in the same way the browsers are expanded - you just extend as needed via an 'add-on' API.
Niklas Lindström: We have had a bunch of different names for this - I've been using .connect() recently. I think we all agree that .objectify() wasn't working... .graphify() might be a little too odd.
Manu Sporny: I don't think we need to pick the name now... we can wait until the spec goes to LC, even.
Niklas Lindström: We might want to add some sort of "indexing" mechanism - something that allows you to index JSON-LD documents.
Manu Sporny: Something like a .view() call that is dynamically updated.
Manu Sporny: There is a lot of potential for .graphify() / .connect() and .index() / .view() - but the ideas are floating out there right now... not finalized.
Niklas Lindström: There are a bunch of these sorts of libraries for RDF - they all use the Class mechanism to define short names bound to IRIs/coercions, which is exactly what the JSON-LD context does in a language-agnostic way.
Niklas Lindström: To use a @context as a "lens" to access a live RDF graph to act as if it is something live in memory (it could come from a database backend over the Web/WebSockets)
Niklas Lindström: It makes it much easier to throw RDF into an arbitrary templating systems.
Manu Sporny: I think we're saying that all of these things are important, but we can't do it by JSON-LD 1.0.
Markus Lanthaler: I'm concerned that if we don't have .frame() / .objectify() that people can't process these documents in an arbitrary way.
Manu Sporny: Well they can, it just won't be 'standardized' - jsonld.js still has .frame(), so does the Ruby implementation.
Niklas Lindström: Can we include a separate .graphify() 1.0, that in 1.1 could evolve?
Manu Sporny: I'm concerned that we don't have any idea how these APIs are going to evolve.
Niklas Lindström: We could always implement the core - then we could add more indexes in the future? Maybe have a callback to do your own indexes.
Manu Sporny: I think somebody needs to volunteer to write the .graphify() / .index() spec - that will ensure that we know what we're getting into if we have a stripped down version of the call in the JSON-LD 1.0 API spec.

Topic: Timeframe?

Markus Lanthaler: Is there a timeframe for publication?
Manu Sporny: Technically, we have to publish every 3-6 months. RDF WG charter ends in January 2013 - so, ideally, we'd be at REC in that time frame.
David I. Lehn: That is going to be very difficult to do.
Manu Sporny: I'll talk to the chairs about it.