JSON-LD Community Group Telecon

Minutes for 2012-12-18

Agenda
http://lists.w3.org/Archives/Public/public-linked-json/2012Dec/0030.html
Topics
  1. Schedule for telecons and publication
  2. JSON-LD Test Suite
  3. Renaming of blank nodes
  4. ISSUE-203: Validate IRIs and language tags
  5. ISSUE-109: Add flatten() method to JSON-LD API
  6. ISSUE-206: Clarify that the algorithms operate a copy of the input
Resolutions
  1. Rename all blank node identifiers when doing expansion.
  2. JSON-LD Processors MAY issue validation warnings for malformed IRIs and BCP47 language strings, but they MUST NOT attempt to correct validation errors.
  3. Add a .flatten() method to the JSON-LD API, which returns all data in flattened, compact form. Remove the flatten flag from the .expand() and .compact() methods. Ensure that the .flatten() method preserves data in named graphs.
  4. Any input to JSON-LD API methods MUST NOT be modified.
Chair
Manu Sporny
Scribe
Niklas Lindström
Present
Niklas Lindström, Manu Sporny, Gregg Kellogg, Markus Lanthaler, Dave Longley, David I. Lehn
Audio Log
audio.ogg
Niklas Lindström is scribing.
Two more Agenda items suggested by Gregg: the test suite and pervasive renaming of bnodes

Topic: Schedule for telecons and publication

Manu Sporny: next two telecons cancelled due to holidays. at least one telecon before last call at end of january
… when we go to last call, we must include text to specify that the algorithms may need to change to address bugs and those changes may be significant based on the severity of the issue. We want to do this to ensure that an annoying corner-case bug won't make us have to go through another Last Call. We're fairly certain what these algorithms should be doing, and no matter how many times we've reviewed them, we'll find issues that we have to fix w/ the algorithm through LC and CR.

Topic: JSON-LD Test Suite

Gregg Kellogg: a couple of things to do: how to deal with options and callback behavior
… e.g. option for RDF to use native types
… options for context given in option for use in expansion, etc.
… more concerning: the granularity of tests
… each test tests some particular aspect of an algorithm, but does so in many parallel ways
… add as small a test as possible, to make it easy to detect what causes an error
Manu Sporny: I agree
… same problems in the early days of the RDFa test suite
… we may want to different suites, one for the syntax, one for the api
… the latter may benefit from a real JS test runner
... or else we may end up with a meta language to control flags etc.
... so we should simplify the tests to make them more atomic
Gregg Kellogg: for the RDFa tests we used (e.g.) query parameters to set options/flags
... we might be able to use that
... problem with js test framework is that it only works for js
Markus Lanthaler: ok.. just a sec
Markus Lanthaler: I agree that we should define the tests to be independent of the implementation language
... we could use JSON to set options
… we should have minimal tests, but we also need some complex input data to test corner cases
… sometimes things work in separation, but certain things only happen when combined
Gregg Kellogg: yes, there is a need for those complex things as well. We might be able to separate them within the numbering of tests
… if someone passes all the simple tests, we should attempt to find the smallest example which triggers a problem with combinations
… we could put all the complex tests starting with 1000
Manu Sporny: so more than one feature is an integration test, starting at 1000
Gregg Kellogg: even one feature, like IRI expansion, needs to test many variants
… but we need to find the simplest possible input data for those as well
Markus Lanthaler: what are the requirements from W3C regarding tests?
Manu Sporny: what's needed is an implementation report showing at least two independent interoperable implementations
… but test suites makes that much simpler to measure
Gregg Kellogg: not always though; automated test runners aren't always the best; it's useful to have independent test runners generating EARL reports, which we can collate and put into the report
… I'm not against it, but it was a complicated setup for RDFa
Discussion about the balance between test suite runner implementation difficulties vs. getting reports from implementations in general.
Manu Sporny: main reason for automated test runner is not to block us when implementations develop and need to be verified
Manu Sporny: It would be very good to have it running completely in the browser
… I'll make an attempt in the coming month
Gregg Kellogg: leveraging the rdfa runner may be feasible
Manu Sporny: so, we want to make the test suite more atomic, and separate unit and integration tests (starting on 1000)
… and attempt to make an online test runner

Topic: Renaming of blank nodes

Gregg Kellogg: Markus wrote a test to verify that blank nodes are renamed.
… the use of bnodes in expansion, for e.g. property generators and node defs not containing an id; so that duplication doesn't create a new node.
Markus Lanthaler: Discussion about this was here: https://github.com/json-ld/json-ld.org/issues/160#issuecomment-11046185
… problem is that if you pick a bnode identifier it mustn't collide with an existing one. one solution is to rename all of them.
…. but that may create problems for implementations, e.g. it happens right now for the wikimedia stuff
Manu Sporny: so you propose that we use a very unique prefix, which hopefully doesn't collide with an existing bnode id?
Gregg Kellogg: that, or scan through existing use and then pick something unique
Manu Sporny: the scan through prevents stream-based processing, although that may already be out
Niklas Lindström: reserving bnode id prefixes causes problems when expansion has been run; the input would use those at that point
Markus Lanthaler: not sure what the problem here is? bnode id:s are local/internal, so we should be able to change them if we want to.
Gregg Kellogg: so far, we try to keep the json form consistent with what is written; so that bnode id:s use some internal pattern
… while it's formally very bad (especially from an RDF perspective), this can be useful for handing JSON
… I wouldn't vote against renaming if it's necessary; but do we always need to do it?
… it's a big change fairly late in the process
Markus Lanthaler: could you use other identifiers?
Gregg Kellogg: a bit tricky with deployed code right now.
… previously, we didn't change bnode ids on expansion/compaction
Manu Sporny: why can't we instead track already used bnodes, and ensure that generated ones aren't used?
... ofcourse, subsequently encountered ones are problematic
Manu Sporny: keep track of both generated and encountered bnodes, and if an overlap occurs, start renaming only those that are already encountered/generated
Markus Lanthaler: is this the final code for wikia?
Gregg Kellogg: the plan is to use URIs, but the scrum process hasn't gotten there; we currently use article id:s locally
Markus Lanthaler: the flag for property generators could also disable bnode renaming
Manu Sporny: if we can ensure that only if property generators are used, renaming occurs..
Markus Lanthaler: the property generators could be used for DoS attacks, I will support such a flag
Gregg Kellogg: I'd prefer to avoid renaming if property generators aren't used
Markus Lanthaler: I sttil think bnodes are dangerous to preserve, since they should not be used
Manu Sporny: it's a good point. But some users don't want to change the raw data.
… it is a large change, but it's still before LC, and makes a good point
PROPOSAL: Rename all blank node identifiers when doing expansion.
Markus Lanthaler: +1
Gregg Kellogg: +0.1
Manu Sporny: +1
Niklas Lindström: +0.5
Dave Longley: +0.3 [scribe assist by Manu Sporny]
David I. Lehn: +0
RESOLUTION: Rename all blank node identifiers when doing expansion.
Markus Lanthaler: filed the resolution under ISSUE-160

Topic: ISSUE-203: Validate IRIs and language tags

Markus Lanthaler: the question is whether processors should validate IRIs and language tags fully, or just assume they work
… Richard made the point that language tags have to be normalized, and validated(?)
Gregg Kellogg: compare to Turtle, it includes a simplest form of BCP 47. A full validation needs much more logic.
… it's complicated to get it exactly correct.
… And normalization doesn't require full validation. Same thing with URIs. Most libraries detect simple problems, but full checking requires much more complexity.
… it's better to not include in the core algorithm. As François said, there's a difference between a processor and a validator
Manu Sporny: I agree with all of that.
Markus Lanthaler: so, what do we say specifically?
Manu Sporny: we don't say should/must not; all we say is that it's not required to do full validation
Manu Sporny: it's strange to have a discussion about this but not say anything in the spec
Niklas Lindström: Could we say something to the effect of "it's not users of processors might not expect that all processors are fully validating processors"? That is, invalid input data might lead to different results depending on the level of validation for the processor. [scribe assist by Manu Sporny]
Niklas Lindström: So, basically - the output may not be the same for corner cases. [scribe assist by Manu Sporny]
Manu Sporny: or we could say that processors may issue warnings about data which is not valid, but processors must not modify data to correct it
Niklas Lindström: I think that might work. [scribe assist by Manu Sporny]
Markus Lanthaler: I agree, no validation. And not include any language about it in the spec..
… we already say that algorithms are only specified for well-formed input
Gregg Kellogg: we do say that to be valid, these must be valid BCP 47 tags / IRIs
Manu Sporny: What about this for a proposal? JSON-LD Processors MAY issue validating warnings for malformed IRIs and BCP47 language strings, but they MUST NOT attempt to correct validation errors and MUST only perform normalization on IRIs and BCP47 language strings.
… we shouldn't say whether processors should tolerate invalid values for that.. We need to compare with e.g. Turtle spec.
PROPOSAL: JSON-LD Processors MAY issue validation warnings for malformed IRIs and BCP47 language strings, but they MUST NOT attempt to correct validation errors.
Manu Sporny: +1
Gregg Kellogg: +1
Markus Lanthaler: +0.5 (would also be fine with being silent about it)
Niklas Lindström: +0.9 (unless something much different is done in e.g. the turtle spec)
RESOLUTION: JSON-LD Processors MAY issue validation warnings for malformed IRIs and BCP47 language strings, but they MUST NOT attempt to correct validation errors.

Topic: ISSUE-109: Add flatten() method to JSON-LD API

Manu Sporny: we're signaling that there's an easy way of flattening (apart from the flag)
Markus Lanthaler: I'd suggest to drop the flags then
… and that it should also return all the graphs (currently just the default graph?)
… i.e. drop the 'merged'/'default' and return all graphs
Manu Sporny: yes, we don't want lossy algorithms
Markus Lanthaler: so the signature would be flatten(input, context, callback, options)
PROPOSAL: Add a .flatten() method to the JSON-LD API, which returns all data in flattened, compact form. Remove the flatten flag from the .expand() and .compact() methods. Ensure that the .flatten() method preserves data in named graphs.
Manu Sporny: +1
Markus Lanthaler: +1
Gregg Kellogg: +1
Niklas Lindström: +0.75 (not entirely sure about how people not knowing this stuff in detail will get the meaning of "flatten")
RESOLUTION: Add a .flatten() method to the JSON-LD API, which returns all data in flattened, compact form. Remove the flatten flag from the .expand() and .compact() methods. Ensure that the .flatten() method preserves data in named graphs.

Topic: ISSUE-206: Clarify that the algorithms operate a copy of the input

Manu Sporny: we want to clarify that implementors mustn't modify the input data in-place
Gregg Kellogg: the fact that the algorithms speak of serializations does imply that there is no modification. It may be good to say that the algorithms operate of a live data structure, and hence need to create copies.
Gregg Kellogg: implementations MAY operate on native data structures, and if so, they must generate new data structures
PROPOSAL: Any input to JSON-LD API methods MUST NOT be modified.
Markus Lanthaler: +1
Manu Sporny: +1
Niklas Lindström: +1
Gregg Kellogg: +1
RESOLUTION: Any input to JSON-LD API methods MUST NOT be modified.