Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cstring null terminated but with a fixed length? #23

Open
LiamKarlMitchell opened this issue Jun 16, 2017 · 6 comments
Open

cstring null terminated but with a fixed length? #23

LiamKarlMitchell opened this issue Jun 16, 2017 · 6 comments

Comments

@LiamKarlMitchell
Copy link

How would I make a cstring with a fixed length?

I guess like char name[13]; where the app ensures a length of 12 is always in there or copied from there and last byte is always NULL.

The data i'm reading/writing is fixed

Do I need to add my own custom type for this?
Using count and countType cause validation errors.

Example:

Buffer data
31 33 33 37 00 00 00 00

I would expect to count up to the first NULL terminator.
Reading string as this.
'1337'

But the max length of the cstring would be 7 characters.
With a total byte length of 8.
1 byte at the end for the final NULL terminator if all count-1 characters are set.

31 00 31 00 00 00 00 00
Would be read as a string of '1' the other data is ignored because its null terminated.

I'll code my own for now as I already did wide strings in another project.

Thanks

@roblabla
Copy link
Contributor

I've been meaning to add a generic "fixed-length" type that would look like

["fixed", {
  length: 8,
  type: "cstring"
}]

But never went around to actually writing it. The idea is that it would limit what the underlying type "sees" to the length. If the underlying type throws a PartialReadError, then there is a protocol error, and we give up.

This is useful for things like restBuffer, etc...

@Saiv46
Copy link
Contributor

Saiv46 commented Dec 30, 2020

@LiamKarlMitchell I'll ask you one thing - why? Protocols are used to serialize data with fewer bytes as possible.

@roblabla
Copy link
Contributor

roblabla commented Dec 30, 2020

null-terminated string takes less bytes to encode that length-prefixed ones for strings of long lengths (e.g. anything over 128/256 bytes would requires at least a 2-byte prefix, but only a single byte null terminator).

Also, ProtoDef is meant to describe existing protocols, and I know of a handful that use character-termination instead of length prefixes. As a general rule, having a very generic "field-delimited array" type on top of which we could build cstring would be great.


Somewhat related to this is the notion of substreams that I had for a long long while when working on this. The idea being that we could limit parsers to only take values until a certain predicate is hit. Then we could specify cstring as:

["limited", {
  "endByte": 0,
  "subtype": "string"
}]

Where "string" is a byte sequence that exhausts the current stream (on that topic, string should probably take an encoding parameter). limited would create a new stream that lasts until the endByte byte is seen. Of course, this is not very well thought-out, but it's just to provide an idea of what it could look like. Length-prefixed streams could also be provided for a similar effect, and could be used to recreate the pstring type.

This would provide a new fundamental type that could be used to generate new classes of complex types to parse protocols (\n-delimited strings for text-oriented protocols come to mind).

@LiamKarlMitchell
Copy link
Author

LiamKarlMitchell commented Dec 30, 2020

@Saiv46 Implementing communication using an existing protocol that is not mindful of saving space and as a nicety to avoid having to read bytes then trim the zero bytes from the string in own code nice if it can be implemented as part of the definition.

@Saiv46
Copy link
Contributor

Saiv46 commented Dec 30, 2020

The idea being that we could limit parsers to only take values until a certain predicate is hit.

@roblabla Good idea. Not working.

Usually ProtoDef used to serialize user data, so such kind of datatypes would lead to unexpected behaviour.

For example, how will this

["limited", {
  "endByte": 0,
  "subtype": ["container", [
    { "name": "index", "type": "u8" },
    { "name": "value", "type": "string" }
  ]]
}]

Work with { index: 0, value: "\00\00" }? Result will have 4 bytes, but only 1 will be read.

@roblabla
Copy link
Contributor

roblabla commented Dec 30, 2020

I expect serialization to throw on such inputs. The idea being that when the subtype writes to the current stream, limited would see the 0 and raise an error. But yes, I concede it is not straightforward to implement.

Also, it could be decided that invalid inputs are allowed to produce gibberish. Obviously not my favorite choice, but nothing says all input to a given type have to produce a valid bytestream. In such a world, we'd simply warn users not to put 0s in their limited inputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants