Skip to content
This repository has been archived by the owner on Nov 8, 2022. It is now read-only.

schema improvements #29

Open
michaelsembwever opened this issue Dec 14, 2016 · 1 comment
Open

schema improvements #29

michaelsembwever opened this issue Dec 14, 2016 · 1 comment

Comments

@michaelsembwever
Copy link

michaelsembwever commented Dec 14, 2016

The current schema won't hold up under real usage.

The partitions need a bucketing strategy applied so to prevent rows from ever growing. A time bucket resolution should be chosen so to avoid partitions larger than 100MB.

And the TimeWindowCompactionStrategy should be applied given this is a time-series datamodel.

For example:

CREATE TABLE snap.metrics (
    time_bucket timestamp,
    ns text,
    ver int,
    host text,
    time timestamp,
    boolval boolean,
    doubleval double,
    strval text,
    tags map<text, text>,
    valtype text,
    PRIMARY KEY ((time_bucket, ns, ver, host), time)
) WITH CLUSTERING ORDER BY (time DESC)
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy'}';

CREATE TABLE snap.tags (
    time_bucket timestamp,
    key text,
    val text,
    time timestamp,
    boolval boolean,
    doubleval double,
    host text,
    ns text,
    strval text,
    tags map<text, text>,
    valtype text,
    ver int,
    PRIMARY KEY ((time_bucket, key, val), time)
) WITH CLUSTERING ORDER BY (time DESC)
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy'}
@candysmurf
Copy link
Collaborator

@michaelsembwever, Thanks for your suggestion. Agreed that it's a good way to improve scalability for time-based metrics.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants