Buffers & Bytes & Zerocopy #1040
Replies: 1 comment 2 replies
-
Hi @somethingelseentirely, thanks for dropping by. We'll definitely look into anybytes. As you might have seen, for now we have a https://github.com/spiraldb/vortex/tree/develop/vortex-buffer crate that wraps an enum of Tokio Bytes and Arrow Bytes, which isn't ideal, but at least gives us a place to swap these things out. A lot of the problems we faced were on requesting / guaranteeing alignment. For now we just lean on our existing primitive types, including a u128, in order to force it at allocation time. But it's a little bit more fragile than I'd like. The other feature we use quite often is the ability to optimistically convert immutable bytes back into mutable bytes, e.g. if we hold the only strong reference. In terms of serde formats, while we hope to never have to implement or maintain a non-Rust Vortex, we decided to go with flatbuffers as it was more portable than some of the alternatives, and seemed to fit our use-case more closely than e.g. cap'n'proto. But @robert3005 - we could steal the lifetime erasure trick to provide typed accessors to flat buffers, instead of passing around byte slices and positions? https://github.com/triblespace/anybytes/blob/0d384ca2a17b7fb13ee24ec03e513ed0de974e33/src/bytes.rs#L158 Thank you very much for the links though, it's given us some things to think out. |
Beta Was this translation helpful? Give feedback.
-
Hey,
I saw vortex on HN and figured I'd drop by for a chat.
I'm operating in a similar space (building triple stores), and similarly hit the lack of
Bytes
extensibility when implementing zero-copy mmap-ed reads.After surveying the space I decided to resurrect
minibytes
asanybytes
.It uses the same
trait
based approach thatminibytes
andownedbytes
(a crate written and mainly used by the Quickwit fulltext-search project) use, but has betterZerocopy
compatibility. (it's usable for experimentation but I want to do a full model check with Kani before it hits 1.0)I figured it might be of interest for you guys as you too seem to operate in a mostly read-only regime, and I discovered that it can be surprisingly easy to get safe zerocopy bytes <-> types transmutes in that space.
Epsilon-Serde might also be of interest to you, and it's somewhat hard to find if you're not operating in the succinct datastructure space. The basic idea is that a lot of read-only data-structures can be conveniently represented as arrays (cough columns) that are mmap-ed, with only a small allocation and dynamic deserialization of header information.
Hope this is of some relevance.
Beta Was this translation helpful? Give feedback.
All reactions