What does this format need to do? #7

wking · 2015-05-06T04:09:21Z

On Tue, May 05, 2015 at 08:52:35PM -0700, Jeromy Johnson wrote:

Hrm, i guess we should lay out what properties we are interested in having before attempting to design anything.

This is a good idea ;). Possibilities include:

Store one Merkle-DAG object and some set of it's descendents (Multiple roots in the archive #3).
Potentially mark descendents as not included (Ignoring shared subtrees #5).
Be reasonably quick to write. Ideally by streaming (i.e. not holding all serialized objects in memory), but probably not a hard constraint.
Be very quick to list the object's multihash and not-included object multihashes (Ignoring shared subtrees #5) so we know what the file contains.
Be very quick to list multihashes for all included objects.
Be very quick to iterate over all contained objects (e.g. if you're loading them into a node's object store).
Be very quick to lookup a particular object by path (e.g. finding Key3 knowing Key1/Key2/Key3, discussion in Using a protobuf or other IPFS-ish format for the wrapping archive #6).
Be very quick to lookup a particular object directly (e.g. a deep Key3 without other information, discussion in Using a protobuf or other IPFS-ish format for the wrapping archive #6, sketches of doing this via Git-style fanout and indexing here).
Detect and adjust to truncation and/or corruption (Ensuring completeness and catching corruption #4).
Store delta representations (Delta representations #2).
Be forward and backward compatible across archive-format evolution.
Be forward and backward compatible across IPFS-object-format evolution.

For stuff with issues, please discuss whether or not we need the feature or how to go about implementing it in that issue. For stuff without issues, please create a new issue for that feature and discuss there (we'll link to it from here).

This issue is for potential requirements that I haven't listed. Thoughts?

wking · 2015-05-06T15:56:46Z

Related to quick reads and writes (3, 4, 5, 6, 7, 8), I'm pretty sure we want to pick a format for which:

13 Writing does not require holding many written objects in memory at once (possibly related to #6).
14. Reading does not require holding many previously-read objects in memory at once (possibly related to #6).

okdistribute · 2015-09-13T22:09:13Z

Dat would use a merkle-dag storage format. It'd be nice if we could find something that exists to build from so it'd be compatible with already-existing archive readers in a variety of languages.

jbenet · 2015-09-13T22:24:54Z

@Karissa the core format is going to use IPLD -- see https://github.com/ipfs/go-ipld and https://github.com/diasdavid/node-ipld -- stored as cbor (http://cbor.io).

max-mapper mentioned this issue Sep 16, 2015

lets pick an archive format dat-ecosystem-archive/datproject-discussions#27

Closed

okdistribute mentioned this issue Jun 17, 2016

sleep.dat archive format dat-ecosystem-archive/projects#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What does this format need to do? #7

What does this format need to do? #7

wking commented May 6, 2015

wking commented May 6, 2015

okdistribute commented Sep 13, 2015

jbenet commented Sep 13, 2015

What does this format need to do? #7

What does this format need to do? #7

Comments

wking commented May 6, 2015

wking commented May 6, 2015

okdistribute commented Sep 13, 2015

jbenet commented Sep 13, 2015