JSON-LD Community Group Telecon

Minutes for 2012-09-11

  1. ISSUE-159: Add specifying @language to expanded form
Manu Sporny
François Daoust
François Daoust, Manu Sporny, Markus Lanthaler, Niklas Lindström, Stéphane Corlosquet, Lin Clark
Audio Log
François Daoust: [Manu going through the agenda. A couple of issues may not be resolved today as there are too many proposals on the table]
François Daoust is scribing.

Topic: ISSUE-159: Add specifying @language to expanded form

Manu Sporny: Issue has to do with round-tripping language-map stuff.
... We added support for Drupal community and Wikidata community.
... No context in expanded form, otherwise we'd have to interpret this in very weird ways.
... Question I asked the Wikidata community was "Why not work in compact form?"
... Having languages as keys gives direct access to data
... The problem is now to define how the expanded form is generated from the compact form so that we can get back to the compact form afterwards.
Markus Lanthaler: If you have @language in expanded form, there might be collisions with @language that are already there or with properties that are of other types and do not accept @language.
... See comment in github issue
... One option to solve this would be to keep a @context in expanded form, but not what we'd like to have.
Niklas Lindström: Precedence is good in any case. Even in compact form.
Manu Sporny: Yes. If we have precedence, does it address your concern Markus?
Stéphane Corlosquet: are you guys saying that in any case, any typed value could not have a language?
Markus Lanthaler: for a plain literal, it wouldn't because you cannot add @language to a plain literal.
Niklas Lindström: we understand we're diverging from RDF here [scribe assist by Stéphane Corlosquet]
... It's strange to have language information in expanded form. The only way to describe this is RDF is to have a named graph.
... (scribe missed details)
Manu Sporny: "term": { "@language": {"en": ..., "de": ...}}
Manu Sporny: "http://foo.bar/vocab#term": { "@language": {"en": ..., "de": ...}}
Manu Sporny: wondering if we could do something like the snippet I just pasted
Markus Lanthaler: The problem is that we're trying to express data that is not there. It's metadata.
Niklas Lindström: The expanded form is an abstract triple representation and what we do with language maps (and id-maps for that matter) is just reify indexing.
... Only if we stay within JSON-LD and expand/compact would you get round-tripping.
Manu Sporny: The concern in the Drupal community is that you could get something different out.
Niklas Lindström: The only thing expanded are terms. That's the only expansion we've talked about. Perhaps that's a good concept.
Manu Sporny: I don't know if ends up becoming a different type of form for JSON-LD.
Stéphane Corlosquet: Niklas, you were talking about round-tripping in RDF.
... It wouldn't be a concern in Drupal because it's never used internally.
... Our goal is not necessarily to output RDF in the end.
... What we'd like to do is use the compact form, expand it and process it.
... We just want to have the language in the expanded form.
... Getting the same data from compaction is not exactly our use case.
... You guys may want to recompact it again and get the same data, but not exactly what we need in practice.
Niklas Lindström: I can understand your use case. I touched upon it during a RDFa to JSON-LD workshop.
... If we want to support it, we should do it via the notion of term expansion, not full expansion.
Manu Sporny: Just a quick explanation about the Drupal use case. Every Drupal site has a slightly different context.
... Tags can have different information associated with them across Drupal sites.
Stéphane Corlosquet: can be anything, 'tags' is just an example
Manu Sporny: Those tags are kind of b-nodes.
... When two Drupal sites share data, one of them is going to export data as JSON-LD, using its context, probably expanding it.
... The targeted Drupal site will process the received data, using the expanded form as input and compacting using the target context.
... The idea that we need to reconstruct the language map is a pretty strong requirement.
... I also think that both Niklas and Markus have very strong points.
Manu Sporny: "http://foo.bar/vocab#term": { "@language": {"en": ..., "de": ...}}
... The only solution that I can see working that doesn't have the issue Markus raised in the beginning is the idea I share on IRC
... I don't see any issue with this, but I may miss something.
Markus Lanthaler: alternative: { "@context": { "langmap": "example.com/vocab/term#" }, "langmap:de": ..., "langmap:en": ... }
Markus Lanthaler: perhaps additionaly define "langmap:de": { "@language": "de" } in context or add context inline
Markus Lanthaler: I don't see an issue with that but proposing another alternative on IRC for Stéphane.
Lin Clark: hey all, I'm on call now as well
Markus Lanthaler: langmap:de - example.com/vocab/term#de
Markus Lanthaler: langmap:it - example.com/vocab/term#it
Markus Lanthaler: example.com/vocab/term#it
Markus Lanthaler: basically, you'd have different terms for different properties.
Stéphane Corlosquet: How would you re-compact this in the end?
Markus Lanthaler: { "@context": { "langmap": "example.com/vocab/term#" }
Markus Lanthaler: langmap:LANGUAGE
Markus Lanthaler: with the context just pasted on IRC, you would just re-generate the initial data
Lin Clark: That sounds a lot like the proposal Manu had made initially
Manu Sporny: There's a downside (missed by scribe) to that that explains why we had left the idea in the end.
... The only reason why we want it in expanded form is to be able to recompact it in lossless form.
... This idea of being able to tell whether something came from a language map is to reconstruct the same structure in the end.
... There may be times that you express values in expanded form where you didn't want them to be necessarily put back in language maps.
Niklas Lindström: The question is whether data coming from language-based data can be reconstructing. Any deviation from that should not use language maps to compact because that would always give weird results.
... If you start mixing from various sources, you may have titles in English but description in Italian, then properties would fall in different buckets if you use language maps.
Manu Sporny: I'm proposing this: "http://foo.bar/vocab#term": { "@language": {"en": ..., "de": ...}} because... 1-to-1 mapping
[Markus and Manu discussing examples of expansion/compaction]
Niklas Lindström: I wonder if the expanded form you're proposing here would solve the problem of combining two sources.
... It seems to require things from the compaction algorithm.
Manu Sporny: Let's say you have two documents that use the same IRI term and you expand.
... Without a flag and with the rank algorithm that we have, there wouldn't be any problem.
... The term with the language map would be separated from the term without the language map.
... That's for when we don't flatten.
... If we do flatten, (scribe missed that), that would address the issue.
Niklas Lindström: I'd rather we put information in the different buckets in expanded form so that compaction be done deterministically
Manu Sporny: and it's a fairly expensive operation when the data gets bigger. I agree with you Niklas. If we could simplify, we should.
... It turns out that, each time we need to look into details, we end up with things that are fairly complex. The ranking algorithm is a good example of this. It becomes impossible to know what will happen without understanding the algorithm itself.
... All that to say that I agree in principle, but I'm worried about the algorithm will become more complex than expressing a 1-to-1 mapping with language maps.
Niklas Lindström: The problem is that we're trying to express something that we cannot even express in our data model.
Lin Clark: Are there differences between RDF data model and JSON-LD data model?
... I saw discussions from Gregg
Manu Sporny: This is kind of corner state. We don't make use of the differences for the time being, although there is a tiny difference, indeed.
... We just have to be very careful if we say JSON-LD uses RDF data model since that's not entirely true.
Niklas Lindström: {'@language': 'en', '@id': 'http://example.com/tags/foo', 'label': ' Foo'}
Niklas Lindström: {'@language': 'de', '@id': 'http://example.com/tags/baz', 'label': ' Baz'}
Niklas Lindström: Example on IRC. Different resources because different IDs.
... The node themselves have not, in RDF terms, any language expressed.
Niklas Lindström: {'@language': 'en', '@id': 'http://example.com/tags/foo', 'label': ' Foo'}
Niklas Lindström: {'@language': 'de', '@id': 'http://example.com/tags/foo', 'label': ' Foo'}
... You can infer that "Foo" seems to be in English.
... but that's all.
... Now consider the second example, where IDs are the same.
Niklas Lindström: {'dc:language': 'en', '@id': 'http://example.com/tags/foo', 'label': ' Foo'}
... We have a problem here because it's not clear whether we want to reify the language. Do we want to say that the node is somehow intrinsically associated with English, then you should use 'dc:language'.
Manu Sporny: "term": {"en": "Foo", "de": "Bar"}
... That is quite different from that there is an English label about this.
Manu Sporny: On the opposite, we need to account for very simple examples such as the one I just pasted.
Niklas Lindström: {'dc:language': 'en', '@id': 'http://example.com/tags/foo', 'label': ' Foo'}
Niklas Lindström: {'@language': 'en', '@id': 'http://example.com/tags/foo', 'label': ' Foo'}
Niklas Lindström: {'@language': 'de', '@id': 'http://example.com/tags/foo', 'label': ' Foo'}
Niklas Lindström: actually, that's simple and straightforward.
Stéphane Corlosquet: I just wanted to jump on Niklas comments.
... When you use 'dc:language', you say that the resource is in English
... (scribe missed description because of noise)
Niklas Lindström: you have two different resources, one being a translation of the other.
Lin Clark: No, they don't want to have separate graphs.
... Different properties in different languages.
... You would have an author field on the node. That field count point to Stéphane for the French version and to myself in the English version.
... I understand that in the RDF model, it would be understood as two different graphs.
... If we start to introduce complex syntax, people will get lost, and it's just not worth it for the 2-3 people that understand this.
Manu Sporny: We understand the need to have simple ways of accessing the data.
Niklas Lindström: I object to this. This has nothing to do with simplicity of accessing the data, but with simplicity of modeling the data.
Manu Sporny: I don't think it applies to the Drupal use case. I don't think they should have to change data modeling for this.
Niklas Lindström: I am not a fundamentalist here, we have to find a pragmatical solution to the issue.
Lin Clark: The translation to us is not a different resource.
Niklas Lindström: but there are two different translations.
Manu Sporny: aside - "http://foo.bar/vocab#term": { "@language": {"en": ..., "de": ...}} also allows the Drupal folks to work w/ expanded form, if they need to.
Niklas Lindström: ... {"@id": "/resource", translation": {"en": {"author": {"@id": "/lin"}}, "de": {"author": {"@id": "/stephane"}}}
Niklas Lindström: you don't have to describe the translation in any more detail than in the code I just pasted.
Markus Lanthaler: alternative {"@id": "/resource", "en": {"author": {"@id": "/lin"}}, "de": {"author": {"@id": "/stephane"}}}
Markus Lanthaler: where "en" is a property like example.com/vocab/languages/en
... You can have a property that combines translations
Markus Lanthaler: along the same lines as Niklas
Markus Lanthaler: alternative {"@id": "/resource", "en": {"author": {"@id": "/lin"}}, "de": {"author": {"@id": "/stephane"}}}
Lin Clark: I actually suggested that to our multilingual initiative, but they put so much work in it and it's already almost done that I don't think that we can or we should change our data model at this point.
Niklas Lindström: From an implementation perspective, it's more or less the same.
Lin Clark: They're doing a lot of stuff in the multilingual initiative that I'm not involved with, so I can't speak particularly to all the details.
... I don't think we can convince everyone that it's worth it because of JSON-LD.
Markus Lanthaler: how would it help to turn the structure around? (assuming we coud) [scribe assist by Stéphane Corlosquet]
Markus Lanthaler: but you wouldn't mind if "en" would not expand to full IRIs in expanded form.
Markus Lanthaler: {"@id": "/resource", {"author": { "en" {"@id": "/lin"}}, "de": {"@id": "/stephane"}}}
Markus Lanthaler: {"@id": "/resource", {"author": { "http://example.com/en" {"@id": "/lin"}}, "http://example.com/de": {"@id": "/stephane"}}}
Markus Lanthaler: Something like this would work for you, right?
... No big deal if it becomes something like this in expanded form, right?
Lin Clark: Then can it compact back to the other form?
Markus Lanthaler: yes, you wouldn't event need language map for that.
Manu Sporny: The one concern is that we're going to have terms for each language.
Markus Lanthaler: is that really an issue?
Manu Sporny: If you're expressing languages as predicates, the data is jammed.
Markus Lanthaler: right, but that's you have. It's a predicate, not a language.
Manu Sporny: My only concern is that if Drupal wants to move to RDF in the future, then that direction might be problematic longer term.
Stéphane Corlosquet: probably not a real concern for the time being.
Markus Lanthaler: Here's how the example could work today - completely round-trippable: http://bit.ly/P8i7h7
Lin Clark: When we come to that, we could update what's needed to move things to the RDF data modeling
Lin Clark: I got bumped
Manu Sporny: OK, it definitely works. I don't know if it's good to model data in that way. I feel uneasy about it.
... The other concern I have is that if it works for Drupal folks, and if that works as well for Wikidata folks, then there's a question about supporting language maps in the end.
Manu Sporny: it didn't pick up first try [scribe assist by Lin Clark]
Markus Lanthaler: I wonder if language map couldn't be restricted to simple values such as "title.en" resolves to the English title
Lin Clark: now it is busy
Markus Lanthaler: Wikidata apparantly just uses simple language maps: http://meta.wikimedia.org/wiki/Wikidata/Data_model_in_JSON
Manu Sporny: Ok, we spent an hour on this. We should step back and think a bit more about it.
Lin Clark: dialing in
... We have two fairly proposals on the table.
... 1) Languages become IRIs, 2) 1-to-1 between compact/expanded form for language maps.
Niklas Lindström: I wonder where the title would end up in the example Markus wrote up. Would there be a similar map for each thing or would we want to group them in language buckets?
Markus Lanthaler: that's what I suggested initially but Lin suggested they would rather have properties before languages.
Manu Sporny: any objection to move on to next issue and track this up in github comments?
Markus Lanthaler: Do we want to support complex language maps?
Manu Sporny: My gut feeling is that, if we're going to support language maps, we need to support all of Drupal's needs. I don't know if it's worth the complexity to add language maps for literal values only.
... We could associate the language with a term in the context. If we go with the approach Markus proposed, I don't think we need language maps in the end. In the context, you would have term definitions for languages.
Manu Sporny: "en": {"@id": "http://purl.org/bcp47#en", "@language": "en"}
... That would expand to:
Manu Sporny: "en": "Foo" - "http://purl.org/bcp47#en": {"@value": "Foo", "@language": en}
Manu Sporny: The way we're modeling this does not really map to RDF, that's what I'm concerned about.
Niklas Lindström: I do think that things such as freebase may benefit from data exported by Drupal sites
Stéphane Corlosquet: I don't think we should be blocking things here. We could create IRIs for each translations and so on if we really need to.
Lin Clark: hmm, I can't hear what was said but Crell specifically requested we not create IRIs for each translation
Lin Clark: was talking about how to handle things in RDF [scribe assist by Stéphane Corlosquet]
Stéphane Corlosquet: not in JSON-LD
Lin Clark: what we've discussed before is that you lose the language handling for objects that are resources
Lin Clark: I don't think we want to have different subject IRIs between JSON-LD and other RDF formats
Manu Sporny: I'm not convinced that we need to model the data in the way Markus and Niklas are proposing. It works for Drupal folks but I don't think it's the right way to model it as RDF.
Lin Clark: yes - we said we could discuss this outside the call [scribe assist by Stéphane Corlosquet]
Stéphane Corlosquet: we haven't decided or changed anything since when you dropped
... The other concern that I have is that JSON-LD should be able to cope with data as modeled, especially in cases such as Drupal when it's difficult to identify a right/wrong way of modeling data.
Lin Clark: yeah, I was able to get back in now
Manu Sporny: I think these are the options available to us right now:
Manu Sporny: 1) Ask Drupal to change the data model (non-starter),
Manu Sporny: 2) Adopt a 1-to-1 mapping between compact/expanded form for language maps, (adds complexity to syntax)
Manu Sporny: 3) Adopt a complex algorithm to reconstruct language maps from expanded form, (adds complexity to API, and may be non-deterministic)
Manu Sporny: 4) Model the data using BCP47 language code IRIs. (problematic from an RDF data model standpoint)
Manu Sporny: each has annoying down-sides.