Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle common, non-primitive data types #70

Open
jtc42 opened this issue May 2, 2020 · 2 comments
Open

How to handle common, non-primitive data types #70

jtc42 opened this issue May 2, 2020 · 2 comments
Assignees

Comments

@jtc42
Copy link
Member

jtc42 commented May 2, 2020

This is likely going to be an open question for a while, but there are my current thoughts. All input is welcome.

I feel like, by and large, data collected from lab instruments can sensibly be converted to primitive data types. The most common types I have in mind are Numpy arrays, and Pandas data frames. Both of these can be represented easily with primitive data types.

There are however cases where data will be collected that cannot be converted to a primitive type.

In the new cbor branch, I've added a section to the JSON encoder that will base64 encode bytes Python objects. I've correspondingly included a Marshmallow Bytes field to handle validating binary data in this format. It populates the documentation with information about the string values being a base64 encoded block of binary data. Everything is fine on that front.

However, as @rwb27 has mentioned in the past, sometimes the binary data collected will be big enough that the b64 encoding overhead could become problematic. To handle these cases, I've included support for clients to accept application/cbor responses instead of application/json.

CBOR has built in support for binary encoded data, so if a client requests a CBOR response, no encoding overhead is introduced. The data gets passed directly to the CBOR response, otherwise identical to the JSON response, but with the binary section unencoded.

This solution isn't perfect though. The Thing Description is required to be JSON. This is fine in most cases as it accurately describes the base64 encoded binary blobs. However, it means that the CBOR response will deviate from the Thing Description, receiving a bytes type value where the Description says a string will be returned.

I currently feel however that the cases where large, non-primitive data files are being collected with such high frequency that CBOR encoding is required are infrequent enough that, given proper documentation, this solution could still be fine.

Again, thoughts are welcome.

Note: The CBOR branch is useful even aside from this. It's a much more compact data format that JSON, so for many cases it may be beneficial to actually communicate over BSON even without needing to transfer bytes objects. It was easy to add support, and doesn't affect the JSON functionality at all.

@ChasNelson1990
Copy link

I had to look up CBOR but this seems like a good solution.

Are you saying that the only negative (or most significant negative) is the divergence from the W3C Web of Things standard?

If so, have you brought this problem/solution to their forum? Somebody might provide a insight on any thoughts the working group(s?) have had. Also, a quick search says that they're currently rechartering the working group so now might be a good time to introduce new ideas for their consideration.

@jtc42
Copy link
Member Author

jtc42 commented May 11, 2020

Yeah pretty much, though interestingly the Mozilla implementation actually already specifically describes both CBOR representations and WebSocket protocol bindings, so the newest versions of LabThings are based more heavily on the Mozilla implementation of the W3C standard.

I imagine that if the W3C add new information around these, Mozilla will update their implementation correspondingly. Our spec repo is forked from the Mozilla spec so we can easily make sure we’re synchronised with upstream.

Mozilla have made this much simpler than it would otherwise have been. Very happy!

@jtc42 jtc42 pinned this issue May 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants