JSON-LD Community Group Telecon

Minutes for 2013-06-18

Agenda
http://lists.w3.org/Archives/Public/public-linked-json/2013Jun/0034.html
Topics
  1. Linked Data introductory text
  2. Skolemization in toRDF() algorithm
  3. ISSUE-265: Media Type Registration
  4. Support for xsd:short and other integer types.
  5. fromRDF() creates nodes for things like rdf:type.
Resolutions
  1. Adopt proposal 2g and replace the first paragraph in the JSON-LD Syntax Introduction with: Linked Data[LINKED_DATA] is a way to create a network of standards-based machine interpretable data across different documents and Web sites. It allows an application to start at one piece of Linked Data, and follow embedded links to other pieces of Linked Data that are hosted on different sites across the Web.
  2. In the security considerations section, reference RFC4627 and add text explaining that evaluating the data as code can lead to unexpected side effects compromising the security of a system.
  3. Add the following text to the Security Considerations section: When processing JSON-LD documents, links to remote contexts are typically followed automatically, resulting in the transfer of files without the explicit request of the user for each one. If remote contexts are served by third parties, it may allow them to gather usage patterns or similar information leading to privacy concerns. Explain that this can be controlled through effective use of the API.
Chair
Manu Sporny
Scribe
Dave Longley
Present
Dave Longley, Markus Lanthaler, Manu Sporny, David Booth, Gregg Kellogg, Niklas Lindström, Clay Wells
Audio Log
audio.ogg
Dave Longley is scribing.
Markus Lanthaler: if we have time, i'd like to discuss the blank nodes as datatypes issue

Topic: Linked Data introductory text

Manu Sporny: david booth just sent out a proposal to the mailing list that i thought was pretty good
Manu Sporny: would you mind going over the proposals, david?
David Booth: proposal 1 is separate that i think we already agreed to ... to include TimBL doc in the references
Manu Sporny: i'm not sure we agreed
David Booth: we can skip over that for the moment
David Booth: i tried to give a range of possibilities, 2a would quote TimBL from his doc w/typos, 2b would do the same but clean up typos, 2c would make that text not be a definition of Linked Data
David Booth: [[
David Booth: Linked Data[LINKED_DATA] is a technique for creating a network of
David Booth: inter-connected machine interpretable data across different documents
David Booth: and Web sites. It allows an application to start at one piece of Linked
David Booth: Data, and follow embedded links to other pieces of Linked Data that are
David Booth: hosted on different sites across the Web.
David Booth: ]]
Gregg Kellogg: i thought it was interesting, kingsley chimed in comments yesterday and actually supported the introduction changing to TimBL's version as being consistent with something the [w3c] group might want to do
Gregg Kellogg: we actually started off deviating from that because of Kingsley's vehement objections, now that he seems like he has changed his position that's interesting
Manu Sporny: i think that's a misread, his position is nuanced, i think if the group wants to make a formal declaration of what Linked Data is but he thinks things are further conflated, it would just make the RDF WG make it clear what their position is
Gregg Kellogg: i think this issue about "what is Linked Data" is larger than JSON-LD and we have another linked data spec for the LDP with a different definition is a problem
Gregg Kellogg: in my mind, i would go with 2b or 2c from david
Manu Sporny: i really like proposal 2c
Manu Sporny: i think it accomplishes was david booth wants to see and it doesn't overly complicate the intro
Dave Longley: I like proposal 2c. In order to avoid statements of being unfair, we should link to the Linked Data "definition". [scribe assist by Manu Sporny]
Dave Longley: I'd like to avoid the problem we had before - people accusing us of not being straightforward. [scribe assist by Manu Sporny]
David Booth: do you think there would be big objections to 2b
Manu Sporny: i would object to it
Manu Sporny: The problem I have is with the last part of this statement - "When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)" [scribe assist by Manu Sporny]
Manu Sporny: i'll just quickly outline the objections: is the line with RDF, SPARQL, if it just said "using standards"
David Booth: ok, let's not get into that then
Manu Sporny: are there any objections to proposal 2c
David Booth: to be clearer it should say something about standards
Manu Sporny: as long as we're vague about which standards, we can do that
Niklas Lindström: specifically, open standards? [scribe assist by Niklas Lindström]
David Booth: proposal 2d
David Booth: [[
David Booth: Linked Data[LINKED_DATA] is a technique for creating a network of
David Booth: inter-connected, standards-based machine interpretable data across different documents
David Booth: and Web sites. It allows an application to start at one piece of Linked
David Booth: Data, and follow embedded links to other pieces of Linked Data that are
David Booth: hosted on different sites across the Web.
David Booth: ]]
Gregg Kellogg: when we talk about embedding links there are mime types involved
Clay Wells: +1
Manu Sporny: that looks good to me
Manu Sporny: but it would be good to make it less of a mouthful
Gregg Kellogg: maybe move standards-based later on
Manu Sporny: let's wordsmith later
David Booth: proposal 2e
David Booth: [[
David Booth: Linked Data[LINKED_DATA] is a technique for creating a network of
David Booth: machine interpretable data across different documents
David Booth: and Web sites. It allows an application to start at one piece of Linked
David Booth: Data, and follow embedded links to other pieces of Linked Data that are
David Booth: hosted on different sites across the Web.
David Booth: ]]
David Booth: proposal 2f
David Booth: [[
David Booth: Linked Data[LINKED_DATA] is a technique for creating a network of
David Booth: standards-based machine interpretable data across different documents
David Booth: and Web sites. It allows an application to start at one piece of Linked
David Booth: Data, and follow embedded links to other pieces of Linked Data that are
David Booth: hosted on different sites across the Web.
David Booth: ]]
Manu Sporny: any problems with this other than switching out "technique" with "practice" ?
Markus Lanthaler: What we have in the spec today: [scribe assist by Markus Lanthaler]
Markus Lanthaler: These properties allow data published on the Web to work much like Web pages do today. One can start at one piece of Linked Data, and follow the links to other pieces of data that are hosted on different sites across the Web.
Niklas Lindström: .. I'd like s/technique/practise/ (and perhaps "open" or "royalty-free" somewhere; unless that's implied by this spec being from W3C?)
Markus Lanthaler: "These properties allow data published on the Web to work much like Web pages do today."
Clay Wells: method
David Booth: how about s/technique for creating/way to/ ?
David Booth: proposal 2g
David Booth: [[
David Booth: Linked Data[LINKED_DATA] is a way to create a network of
David Booth: standards-based machine interpretable data across different documents
David Booth: and Web sites. It allows an application to start at one piece of Linked
David Booth: Data, and follow embedded links to other pieces of Linked Data that are
David Booth: hosted on different sites across the Web.
David Booth: ]]
PROPOSAL: Adopt proposal 2g and replace the first paragraph in the JSON-LD Syntax Introduction with: Linked Data[LINKED_DATA] is a way to create a network of standards-based machine interpretable data across different documents and Web sites. It allows an application to start at one piece of Linked Data, and follow embedded links to other pieces of Linked Data that are hosted on different sites across the Web.
Gregg Kellogg: +1
Dave Longley: +1
Manu Sporny: +1
Niklas Lindström: +1
Clay Wells: +1
David Booth: +1
Markus Lanthaler: +1 presumed the rest of the RDF WG agrees
RESOLUTION: Adopt proposal 2g and replace the first paragraph in the JSON-LD Syntax Introduction with: Linked Data[LINKED_DATA] is a way to create a network of standards-based machine interpretable data across different documents and Web sites. It allows an application to start at one piece of Linked Data, and follow embedded links to other pieces of Linked Data that are hosted on different sites across the Web.

Topic: Skolemization in toRDF() algorithm

David Booth: so at present, there is some misalignment between the JSON-LD data model and the RDF data model, and that means that when JSON-LD is interpreted as RDF, the results of that interpretation is unpredictable, some implementations may throw away triples that contain blank nodes in "illegal" positions (illegal for the RDF data model)
David Booth: the proposal is to avoid that mismatch when interpreting JSON-LD as RDF, blank nodes that would be illegal MUST be skolemized according to RDF skolemization
David Booth: so the question is about requiring skolemization when interpreting JSON-LD as RDF, the only push back i've seen is requiring clients to do skolemization when the client has no way of minting globally unique IRIs
Markus Lanthaler: yes, i brought that up and there were also concerns from Andy
Markus Lanthaler: and he said that consumers should really figure out how to handle that use case
David Booth: i think there was some confusion on the list about which skolemization topic we were talking about
David Booth: one topic was about skolemization within the RDF spec and the other was about JSON-LD requiring it
Markus Lanthaler: andy replied to the minutes where we were discussing that
Manu Sporny: if your graph store actually supports graph labels (bnodes as graph names or properties) that there could be an issue by forcing a processor to go through some hoops that it would not normally need to do
Manu Sporny: in other words, there's no need to skolemize
Manu Sporny: i think we should use a SHOULD, not a MUST, because that gives us interop but still allows application developers to make the decision for what is appropriate for their stack, so the decision can be made closer to the people that it affects
David Booth: if it's just an internal decision that an application is making, then i see that as irrelevant, any application can do whatever it wants
David Booth: the spec is about what JSON-LD means, the point is, if two independent parties process the same document, they should come up with the set same triples
David Booth: except for the naming of the skolemized IRIs
Manu Sporny: how does that address markus' point that you could generate data that's wrong, that there's a clash
David Booth: if you have a clash then you haven't done proper skolemization
Manu Sporny: the point is that it's impossible to know if you've done proper skolemization as a client
David Booth: i don't know if that's true, there are lots of ways to ensure uniqueness
Markus Lanthaler: they are very simple if you don't have a distributed system
Markus Lanthaler: if we take Dave Longley's JavaScript implementation, which email address should he use? his own?
David Booth: is the data going to be republished?
Markus Lanthaler: the point is that you don't know what the user is going to do with the data, the user knows that, and they can make the best decision
Gregg Kellogg: if i sent JSON-LD to a list in an email, then it's effectively republished
Gregg Kellogg: although, at a point where it's not reasonable to do such a skolemization
Gregg Kellogg: one of the issues of i have with forcing skolemization is that it moves a closed world to an open world
David Booth: notes that Gregg is using the term "closed world" in a non-standard way
Gregg Kellogg: once you've skolemized a bnode you've gone from a closed model of data with some security benefits, to an open world and you can amend facts that were intended in a closed fashion
Gregg Kellogg: intended to be stated in a closed fashion
Gregg Kellogg: that's what the payswarm work is about so you can't state things about the graphs later on
Manu Sporny: if you're dealing with a financial system and people can later make statements about your data
Manu Sporny: that they shouldn't be able to do, and if they figured out your skolemization algorithm they could inject data into your graph
Gregg Kellogg: they don't even have to do that, they just have to take a skolemized identifier and make statements about it later
Manu Sporny: i think it would complicate implementations greatly
Manu Sporny: since we say you have to support bnodes in graph and property positions you don't have to deal with skolemization
David Booth: i would categorically object to encouraging features that make it impossible to make statements about things
Manu Sporny: no one is proposing that, i don't think that's on the table
David Booth: i'm talking about making it hard for someone to make statements about a graph
Manu Sporny: i don't follow, how are we going to do that?
David Booth: the comment was that if something is published with a bnode that you can't make statements about it elsewhere
Gregg Kellogg: that's true today about RDF
David Booth: i think that's very anti-web and a negative property
David Booth: there is a whole discussion about draconian gov'ts linking to or making reference to certain statements
Manu Sporny: i think we're having a philosophical discussion at this point, we should focus on a solid proposal
Manu Sporny: there are a number of people that agree with you but let's stick to solid proposals
David Booth: i don't think we have enough convergence yet, i wasn't aware of andy pushing back here, i need to analyze what he means, i need to better understand the use case that would be injured by requiring skolemization
David Booth: there is another solution entirely, which would be for the RDF WG to allow bnodes in those positions
Manu Sporny: peter raised just that this morning
David Booth: my overall objective here is that i think it's critical if two parties deserialize RDF they get the same result, or as close as possible
David Booth: with a minimum loss of information
Markus Lanthaler: but they [aren't] the same result, [are] they?
David Booth: the problem is two different clients taking the same JSON-LD document would produce data that is interpreted in different ways
Manu Sporny: i think that's where we disagree, i think they are entirely different documents, one uses bnodes the other skolemization
David Booth: i'm saying that when a JSON-LD document is interpreted as RDF, skolemized IDs should be generated
Manu Sporny: that would mean that your graph store, if it supports bnodes in these positions, is broken
Manu Sporny: your graph store could interpret things in the correct way, but it will be forced to interpret in a different way
Markus Lanthaler: you are always getting different data since every skolemization will produce different IDs
David Booth: the data is only different in non-important ways
Gregg Kellogg: RDF Datasets allow bnode graph names: https://www.w3.org/2013/meeting/rdf-wg/2013-06-12#resolution_1
David Booth: by unimportant i mean that the data is the same graph and they use all the same IRIs with the exception of the skolemized IRIs
Gregg Kellogg: i think this issue with regards to bnode graph names is moot since RDF datasets can have graph labels that are bnodes
Manu Sporny: We did have a use case - JSON
Gregg Kellogg: i cannot recall if we had a driving use case for allowing bnodes as predicates
Gregg Kellogg: can someone remind me
Dave Longley: RDF Concepts doesn't allow blank nodes in predicate position - perhaps we should say that. [scribe assist by Manu Sporny]
Manu Sporny: I think RDF Concepts basically does that. [scribe assist by Manu Sporny]

Topic: ISSUE-265: Media Type Registration

Dave Longley: I'm with you on this issue [scribe assist by Clay Wells]
David Booth: i think there's still some technical misunderstanding of skolemization which i will try to better explain on the mailing list
Markus Lanthaler: so the request is for us to be more explicit
Markus Lanthaler: the feedback is that we should be more explicit about why using JavaScript's eval() method is a bad idea, etc.
Manu Sporny: isn't there some security consideration we could link to
Clay Wells: sorry, dbooth, I'm with you on that topic. Thanks!
Markus Lanthaler: we need to reference the JSON security section
Markus Lanthaler: we just need to add a sentence, i think, explaining why eval() is a problem
Markus Lanthaler: we don't have to change much but it would address their concerns
PROPOSAL: In the security considerations section, reference RFC4627 and add text explaining that evaluating the data as code can lead to unexpected side effects compromising the security of a system.
Markus Lanthaler: we also need to address fetching remote contexts automatically that might leak user information, and we should say that processing a JSON-LD might result in http request without explicit request by the user, etc.
Niklas Lindström: +1
Gregg Kellogg: +1
Dave Longley: +1
Markus Lanthaler: +1
Clay Wells: +1
RESOLUTION: In the security considerations section, reference RFC4627 and add text explaining that evaluating the data as code can lead to unexpected side effects compromising the security of a system.
David Booth: needs to drop off for another call. Thanks all!
PROPOSAL: In the security considerations section, warn that remote contexts are dereferenced automatically and that usage patterns could be tracked based on the requests.
Niklas Lindström: do we need more specific text, maybe there is something akin to this... somewhere in the depths in the XML specs where you can include external entities, it's much more like CSS, it's a good analogy i think
PROPOSAL: In the security considerations section, warn that remote contexts are dereferenced automatically and that usage patterns could be tracked based on the requests leading to privacy concerns.
Markus Lanthaler: When processing JSON-LD documents, links to remote contexts are typically followed automatically, resulting in the transfer of files without the explicit request of the user for each one. If remote contexts are served by third parties, it may allow them to gather usage patterns or similar information
Gregg Kellogg: caching can help mitigate with this problem
Manu Sporny: the problem is that the third party is tracking you, and if they wanted to, they could say "don't cache this"
Gregg Kellogg: the publisher would be the one controlling the cache-control headers
Gregg Kellogg: if they made stable contexts that could be cached for long periods of time that would help
Markus Lanthaler: the problem is that the user is not in control of the publisher or the third party
Markus Lanthaler: if you put schema.org in your context, schema.org would be the third party
Manu Sporny: if you have proxy in between it can poison the cache, etc.
Niklas Lindström: this all reminds me of XML catalogs, that is, some mechanism for the processor to control cached versions of contexts so that, for instance, you could pass in a reference to a dictionary of contexts and tell the processor it can only use that
Dave Longley: yeah, we can do that with the API and the remote context callback loading option
Gregg Kellogg: we could mention that you can use that to mitigate this problem
Gregg Kellogg: if we don't have a way to mitigate it could result in further debate or denial, so we should mention this
PROPOSAL: Add the following text to the Security Considerations section: When processing JSON-LD documents, links to remote contexts are typically followed automatically, resulting in the transfer of files without the explicit request of the user for each one. If remote contexts are served by third parties, it may allow them to gather usage patterns or similar information leading to privacy concerns. Explain that this can be mitigated through effective use of the API.
Markus Lanthaler: i don't think this is about implementation guidance
Markus Lanthaler: +1
Dave Longley: +1
Clay Wells: +1
Gregg Kellogg: +1
Niklas Lindström: +1 and rewording "mitigate" to "control"
Manu Sporny: +1
RESOLUTION: Add the following text to the Security Considerations section: When processing JSON-LD documents, links to remote contexts are typically followed automatically, resulting in the transfer of files without the explicit request of the user for each one. If remote contexts are served by third parties, it may allow them to gather usage patterns or similar information leading to privacy concerns. Explain that this can be controlled through effective use of the API.

Topic: Support for xsd:short and other integer types.

Dave Longley: We don't want to eliminate the use case where people want to stay within the range of the systems limitations. [scribe assist by Manu Sporny]
Manu Sporny: We had discussed this before, we don't want to introduce round-tripping issues. [scribe assist by Manu Sporny]
Markus Lanthaler: [scribe missed] [scribe assist by Manu Sporny]
Dave Longley: At some point, the spec said "if there is a fractional part, turn it into a double" - that might be difficult to do in certain languages. You do double-based math and get the wrong result. [scribe assist by Manu Sporny]
Markus Lanthaler: Yes, you run into rounding errors. [scribe assist by Manu Sporny]
Dave Longley: I'm not sure, there are unfortunate cases w/ doubles. I'm less concerned about those than the common use case - if something gets turned into an integer (when it was a double) then that's bad. [scribe assist by Manu Sporny]
Markus Lanthaler: a double like 5.0 will become an 5 xsd:integer when going though RDF round-tripping [scribe assist by Markus Lanthaler]
Dave Longley: Out of all the possible choices we have, we have picked the least sucky approach. We're trying to be pragmatic. [scribe assist by Manu Sporny]
Markus Lanthaler: We keep what we have, and we ask Sandro about his opinion. If he's fine with this, then we can close the issue. [scribe assist by Manu Sporny]
Dave Longley: I think people are going to want to control this via the context (convert to native types) [scribe assist by Manu Sporny]
Dave Longley: And in that way, it's not tied to XSD. [scribe assist by Manu Sporny]
Gregg Kellogg: We could get rid of the runtime flag, not use the context? [scribe assist by Manu Sporny]
Dave Longley: The only issue is that everything would happen in compaction/expansion. [scribe assist by Manu Sporny]
Dave Longley: In fromRDF/toRDF we can have an option to make it more specific... it would only convert the things in the context that are specified. [scribe assist by Manu Sporny]
Niklas Lindström: When you use Turtle, and use the native thing there - most things are turned to decimals. More fine-grained control is needed. [scribe assist by Manu Sporny]
Niklas Lindström: How would we do this? [scribe assist by Manu Sporny]
Dave Longley: We'd have to add something if we wanted to be fine-grained. [scribe assist by Manu Sporny]
Dave Longley: If we wanted something more than we have right now, if we find something that is typed coerced to this type, convert it to an integer or a double - that's the simpler solution w/o adding keywords to JSON-LD. [scribe assist by Manu Sporny]
Niklas Lindström: … {"rating": "xsd:decimal"}, values of rating being JSON Numbers, and always xsd:decimals in RDF
Gregg Kellogg: That the useNativeTypes flag is false by default, that mitigates this to a large degree. [scribe assist by Manu Sporny]
Gregg Kellogg: Developers can specify a wrapper to make this easier. [scribe assist by Manu Sporny]
Markus Lanthaler: When a pattern emerges, we can standardize it in 1.1 [scribe assist by Manu Sporny]
Dave Longley: Yes, let implementations sort it out and we can standardize it later. [scribe assist by Manu Sporny]
Niklas Lindström: RDFLib in Python always exposes a literal as a native number... but uses the added datatype tag to determine the exact datatype tag is very useful. You never really care about the lexical representation in that case. [scribe assist by Manu Sporny]
Niklas Lindström: You can always use the number directly in processing. It would be nice to do that in the default case. [scribe assist by Manu Sporny]

Topic: fromRDF() creates nodes for things like rdf:type.

Niklas Lindström: i brought up this issue a couple of weeks ago, the resulting JSON-LD, flattened JSON-LD from RDF conversion creates nodes for things that haven't been linked to, the most glaring examples being every rdf:type
Manu Sporny: scribe-nods.
Niklas Lindström: except rdf:nil ... those don't turn up
Niklas Lindström: another example would be if you gave someone a homepage, it shows up as a node
Niklas Lindström: this is annoying too because choosing JSON-LD to transmit RDF would increase the size of the data
Niklas Lindström: with unusable nodes
Gregg Kellogg: markus made the point that by doing this you can effectively find all the links in the document
Gregg Kellogg: my own position is that it does make it a little bit more ugly, but i don't know if that effectively matters in the long run
Markus Lanthaler: that's the result of the node map generation algorithm which basically just collects all the nodes in the doc
Niklas Lindström: you don't get all the bnodes, you only get the things that are IRIs
Markus Lanthaler: you get bnodes as well, they are all labeled with blank node identifiers
Markus Lanthaler: every node that appears in the graph appears in the flattened output
Manu Sporny: you could just post-process the output, right?
Niklas Lindström: you could also do the same in the other direction
Markus Lanthaler: i think it's more useful to be able to find all the nodes in a graph
Markus Lanthaler: if you wanted to create a graph you'd just loop
Niklas Lindström: when i connect them i have to create all the links anyway
Markus Lanthaler: you can just enumerate them in a simple way
Manu Sporny: does anyone care enough about this to modify the flattening algorithm to remove these?
Niklas Lindström: when i implemented this, it looked much more verbose, i had to do additional work to make the result look uglier
Manu Sporny: is it that much of an issue to create a function to remove the nodes you don't want
Manu Sporny: it seems to me like something someone could do in 10 minutes at most
Niklas Lindström: i don't see the point of making flattening looking uglier than it has to
Niklas Lindström: it looks like an artifact of an algorithm that could be changed
Niklas Lindström: it seems like preparing for something that someone might want to use for something else
Niklas Lindström: i used this with connect and i didn't need this structure
Dave Longley: It's already difficult for people to find stuff by subject. [scribe assist by Manu Sporny]
Dave Longley: I want to make sure we're not making things more difficult than they already are, since we don't expose a node map. [scribe assist by Manu Sporny]
Manu Sporny: notes we're out of time.
Discussion about rdf:nil and lists and whether they show up in flattening as well.
Markus Lanthaler: We need to cover this next time: https://github.com/json-ld/json-ld.org/issues/257
Dave Longley: We also need some input on this in the issue tracker: https://github.com/json-ld/json-ld.org/issues/264