Skip to content
This repository has been archived by the owner on Apr 29, 2020. It is now read-only.

What does this format need to do? #7

Open
wking opened this issue May 6, 2015 · 3 comments
Open

What does this format need to do? #7

wking opened this issue May 6, 2015 · 3 comments

Comments

@wking
Copy link

wking commented May 6, 2015

On Tue, May 05, 2015 at 08:52:35PM -0700, Jeromy Johnson wrote:

Hrm, i guess we should lay out what properties we are interested in having before attempting to design anything.

This is a good idea ;). Possibilities include:

  1. Store one Merkle-DAG object and some set of it's descendents (Multiple roots in the archive #3).
  2. Potentially mark descendents as not included (Ignoring shared subtrees #5).
  3. Be reasonably quick to write. Ideally by streaming (i.e. not holding all serialized objects in memory), but probably not a hard constraint.
  4. Be very quick to list the object's multihash and not-included object multihashes (Ignoring shared subtrees #5) so we know what the file contains.
  5. Be very quick to list multihashes for all included objects.
  6. Be very quick to iterate over all contained objects (e.g. if you're loading them into a node's object store).
  7. Be very quick to lookup a particular object by path (e.g. finding Key3 knowing Key1/Key2/Key3, discussion in Using a protobuf or other IPFS-ish format for the wrapping archive #6).
  8. Be very quick to lookup a particular object directly (e.g. a deep Key3 without other information, discussion in Using a protobuf or other IPFS-ish format for the wrapping archive #6, sketches of doing this via Git-style fanout and indexing here).
  9. Detect and adjust to truncation and/or corruption (Ensuring completeness and catching corruption #4).
  10. Store delta representations (Delta representations #2).
  11. Be forward and backward compatible across archive-format evolution.
  12. Be forward and backward compatible across IPFS-object-format evolution.

For stuff with issues, please discuss whether or not we need the feature or how to go about implementing it in that issue. For stuff without issues, please create a new issue for that feature and discuss there (we'll link to it from here).

This issue is for potential requirements that I haven't listed. Thoughts?

@wking
Copy link
Author

wking commented May 6, 2015

Related to quick reads and writes (3, 4, 5, 6, 7, 8), I'm pretty sure we want to pick a format for which:

13 Writing does not require holding many written objects in memory at once (possibly related to #6).
14. Reading does not require holding many previously-read objects in memory at once (possibly related to #6).

@okdistribute
Copy link

Dat would use a merkle-dag storage format. It'd be nice if we could find something that exists to build from so it'd be compatible with already-existing archive readers in a variety of languages.

@jbenet
Copy link
Contributor

jbenet commented Sep 13, 2015

@Karissa the core format is going to use IPLD -- see https://github.com/ipfs/go-ipld and https://github.com/diasdavid/node-ipld -- stored as cbor (http://cbor.io).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants