Skip to content

Commit

Permalink
Add support for chunking of blobs, using a variant of BLAKE3
Browse files Browse the repository at this point in the history
Buildbarn has invested heavily in using virtual file systems. Both on
the worker and client side it's possible to lazily fault in data from
the CAS. As Buildbarn implements checksum verification where needed,
randomly accessing large files may be slow. To address this, this change
adds support for composing and decomposing CAS objects, using newly
added ConcatenateBlobs() and SplitBlobs() operations.

If implemented naively (e.g., using SHA-256), these operations would not
be verifiable. To rephrase: when merely given the checksum of smaller
objects, there is no way to obtain that of its concatenated version.
This is why at the same time, this change adds a new digest function
that closely resembles BLAKE3. BLAKE3 is based on a binary Merkle tree,
meaning that it's possible to efficiently concatenate and split objects
at the 2^k boundary (where k > 10).

With these new operations present, there is no true need to use the
Bytestream protocol any longer. Writes can be performed by uploading
smaller parts through BatchUpdateBlobs(), followed by calling
ConcatenateBlobs(). Conversely, reads of large objects can be performed
by calling SplitBlobs() and downloading individual parts through
BatchReadBlobs(). At no point is integrity compromised, as callers of
SplitBlobs() can validate the resulting tree nodes against the original
digests.

One feature of BLAKE3 is that its hashes are variable length. Though
this is nice to have (allowing users to make size/security tradeoffs),
it does mean that the length of the hash cannot be used to infer the
digest function used. This has already become an issue with MD5 vs
MURMUR3. To solve that, we extend all operations that work with digests
to take a digest function explicitly. For compatibility, we allow this
to be UNKNOWN for all existing digest functions.
  • Loading branch information
EdSchouten committed Nov 7, 2022
1 parent 7d1354e commit 677b4e8
Show file tree
Hide file tree
Showing 2 changed files with 1,602 additions and 564 deletions.
Loading

0 comments on commit 677b4e8

Please sign in to comment.