-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lexical issues #5
Comments
This escaping issue has already been covered briefly in the blog post as follows:
Let me reiterate it, none of the ORS or CDXJ support objects as keys or multiple objects as values. The prefix key is optional and if present it can be one or more string tokens quoted or unquoted. The value portion is one and only one instance of a single JSON block per line. The value block can be object format or array format JSON which can have arbitrary number of nesting. The value block can be an empty JSON, but cannot be blank/nil. I hope this resolves all the concerns raised here. |
Yes, I think so. I bring this up with the default MRJob tab-delimited At first glance, it would appear that it could be a compatible (subset) of ORS, with the key also being a JSON dict, but if escaping |
Additionally, if a data key begins with |
I am not sure about the reason why MRJob has an object for the key instead of a basic data type in the tuple, but in the current format it is not compatible with ORS. I have expressed my thoughts around it in the email. |
For parsing CDXJ/ORS, need to ensure there is no ambiguity when the key ends.
Ambiguities can occur if there is a
{
anywhere in the key..For CDXJ, this is usually avoided as keys are usually url-encoded and there are no spaces in urls.
But should this be a requirement? Or escaping spaces and {?
For ORS, there is of course the general case of multiple JSON dicts, with other nested JSON dicts.
{"foo": "bar"} {"boo": "baz", "foo2": {"a": {"c": "d"}} {"key": "value", "key2": {"a": "b"}}
Since the value must be a valid JSON dict, it would have to be:
value -
{"key": "value", "key2": {"a": "b"}}
key -
{"foo": "bar"} {"boo": "baz", "foo2": {"a": {"c": "d"}}
Could get tricky if this is to be supported with a more generic key, though I guess escaping enforcement should help...
The text was updated successfully, but these errors were encountered: