This document describes the information we keep in zookeeper, how it is generated, and how the python client uses it.

Keyspace / Shard / Tablet Data


Each keyspace is now a global zookeeper path, with sub-directories for its shards and action / actionlog. There is no data present (there used to be, but the vt tools never touched it).

$ zk ls /zk/global/vt/keyspaces/ruser

The path and sub-paths are created by 'vtctl CreateKeyspace'.

We use the action and actionlog paths for locking only, no process is actively watching these paths.


A shard is a global zookeeper path, with sub-directories for its action / actionlog, and replication graph.

$ zk ls /zk/global/vt/keyspaces/user/shards/10-20

We use the action and actionlog paths for locking only, no process is actively watching these paths.

A shard path is also a node with a Shard object:

type Shard struct {
// There can be only at most one master, but there may be none. (0)
   MasterAlias TabletAlias

// This must match the shard name based on our other conventions, but  

    // helpful to have it decomposed here.
    KeyRange key.KeyRange

    // ServedTypes is a list of all the tablet types this shard will
    // serve. This is usually used with overlapping shards during
    // data shuffles like shard splitting.
    ServedTypes []TabletType

    // SourceShards is the list of shards we're replicating from,
    // using filtered replication.
    SourceShards []SourceShard

    // Cells is the list of cells that have tablets for this shard.
    // It is populated at InitTablet time when a tabelt is added
    // in a cell that is not in the list yet.
    Cells []string

type SourceShard struct {
    // Uid is the unique ID for this SourceShard object.
    // It is for instance used as a unique index in blp_checkpoint
    // when storing the position. It should be unique whithin a
    // destination Shard, but not globally unique.
    Uid uint32

    // the source keyspace
    Keyspace string

    // the source shard
    Shard string

    // the source shard keyrange
    KeyRange key.KeyRange

    // we could add other filtering information, like table list, ...

$ zk cat /zk/global/vt/keyspaces/ruser/shards/10-20
  "MasterAlias": {
    "Cell": "nyc",
    "Uid": 200278
  "KeyRange": {
    "Start": "10",
    "End": "20"
 "Cells": [

The shard path and sub-directories are created when the first tablet in that shard is created.

The Shard object is changed when we add tablets in unknown cells, or when we change the master.


A tablet has a path in zookeeper, with its action / actionlog and pid file:

$ zk ls /zk/nyc/vt/tablets/0000200308

We use the action and actionlog paths for remote execution of actions. vttablet will watch that directory and launch a vtaction for every requested action.

A tablet also has a node of type Tablet:

struct {
        Cell        string      // the zk cell this tablet is assigned to (doesn't change)
        Uid         uint32      // the server id for this instance
        Parent      TabletAlias // the globally unique alias for our replication parent - zero if this is the global master
        Addr        string      // host:port for queryserver
        SecureAddr  string      // host:port for queryserver using encrypted connection
        MysqlAddr   string      // host:port for the mysql instance
        MysqlIpAddr string      // ip:port for the mysql instance - needed to match slaves with tablets and preferable to relying on reverse dns

        Keyspace string
        Shard    string
        Type     TabletType

        State TabletState

        // Normally the database name is implied by "vt_" + keyspace. I
        // really want to remove this but there are some databases that are
        // hard to rename.
        DbNameOverride string
        KeyRange       key.KeyRange

$ zk cat /zk/nyc/vt/tablets/0000200308
  "Cell": "nyc",
  "Uid": 200308,
  "Parent": {
    "Cell": "",
    "Uid": 0
  "Addr": "",
  "SecureAddr": "",
  "MysqlAddr": "",
  "MysqlIpAddr": "",
  "Keyspace": "",
  "Shard": "",
  "Type": "idle",
  "State": "ReadOnly",
  "DbNameOverride": "",
  "KeyRange": {
    "Start": "",
    "End": ""

The Tablet object is created by 'vtctl InitTablet'. Up-to-date information (port numbers, ...) is maintained by the vttablet process. 'vtctl ChangeSlaveType' will also change the Tablet record.

Replication Graph

The data maintained by vt tools is as follows: 0 the master is stored in the Shard object 0 the replication links are stored in each cell for each master/slave relationship (in the cell of the slave), in a ShardReplication object.

Whenever the replication graph changes, we rebuild the serving graph for that cell.

Serving Graph

The serving graph for a shard is maintained in every cell that contains tablets for that shard. To get all the available keyspaces in a cell, just list the top-level cell serving graph directory:

$ zk ls /zk/nyc/vt/ns

The python client lists that directory at startup to find all the keyspaces.


The keyspace data is stored under /zk//vt/ns/

type SrvShard struct {
// Copied from Shard
KeyRange key.KeyRange
ServedTypes []TabletType

    // TabletTypes represents the list of types we have serving tablets  
    // for, in this cell only.  
    TabletTypes []TabletType  
    // For atomic updates  
    version int64  


type SrvKeyspace struct {
// Shards to use per type, only contains complete partitions.
Partitions map[TabletType]*KeyspacePartition

    // This list will be deprecated as soon as Partitions is used.  
    // List of non-overlapping shards sorted by range.  
    Shards []SrvShard  
    // List of available tablet types for this keyspace in this cell.  
    // May not have a server for every shard, but we have some.  
    TabletTypes []TabletType  
    // For atomic updates  
    version int64  


type KeyspacePartition struct {
        // List of non-overlapping continuous shards sorted by range.
        Shards []SrvShard

$ zk cat /zk/nyc/vt/ns/rlookup
"Shards": [
"KeyRange": {
"Start": "",
"End": ""
"TabletTypes": [

The only way to build this data is to run the following vtctl command:

$ vtctl RebuildKeyspaceGraph

When building a new data center, this command should be run for every keyspace.

Rebuilding a keyspace graph will: 0 find all the shard names in the keyspace from looking at the children of /zk/global/vt/keyspaces//shards 0 rebuild the graph for every shard in the keyspace (see below) 0 find the list of cells to update by collecting the cells for each tablet of the first shard compute, sanity check and update the keyspace graph object in every cell 0 The python client reads the nodes to find the shard map (KeyRanges, TabletTypes, ...)


The shard data is stored under /zk//vt/ns//

$ zk cat /zk/nyc/vt/ns/rlookup/0
"KeyRange": {
"Start": "",
"End": ""

We also have per serving type data under /zk//vt/ns///

$ zk cat /zk/nyc/vt/ns/rlookup/0/master { "entries": [ { "uid": 200274,
"host": "",
"port": 0,
"named_port_map": {
"_mysql": 3306,
"_vtocc": 8101,
"_vts": 8102

The shard serving graph can be re-built using the 'vtctl RebuildShardGraph /' command. However, it is also triggered by any 'vtctl ChangeSlaveType' command, so it is usually not needed. For instance, when vt_bns_monitor takes servers in and out of serving state, it will rebuild the shard graph.

Note this will rebuild the serving graph for all cells, not just one cell.

Rebuilding a shard serving graph will: 0 compute the data to write by looking at all the tablets from the replicaton graph 0 write all the /zk//vt/ns/// nodes everywhere 0 delete any pre-existing /zk//vt/ns/// that is not in use any more 0 compute and write all the /zk//vt/ns// nodes everywhere

The python client reads the per-type data nodes to find servers to talk to. When resolving user.10-20.master, it will try to read /zk/local/vt/ns/user/10-20/master.