← 3.1. GIVE-Toolbox	↑ Index	4.1. Bed Format →

MySQL commands for managing data in GIVE data source

To visualize new data with GIVE, you can add it to the MySQL-compatible data source.

NOTE: If you are not very familiar with SQL commands, it is recommended that you manage data used by GIVE using GIVE-Toolbox. Detailed manual of GIVE-Toolbox is shown in 3.1. GIVE-Toolbox usages for managing data in GIVE data source.

Creating a new reference genome for GIVE

Creating a new reference genome database

To visualize a new reference genome, GIVE only needs to know:

The names of the species for this reference (Latin and common names are recommended but any name should work);
Its chromosomal information, including names, sizes and the location of centromeres.

These are stored in two locations within the data source. First you need to create a database with your_reference_database:

CREATE DATABASE `<your_reference_database>`;

Also, make sure your GIVE Database User has access to this database (if you haven't granted privilege for *.*):

GRANT SELECT, CREATE TEMPORARY TABLES ON `<your_reference_database>`.* TO `<give_data_user>`@'%';

Creating a `cytoBandIdeo` table and populate it with data

Then you need to create a cytoBandIdeo table with chromosomal information in your reference database with the following columns:

Column name	Type	Description
`chrom`	`varchar`	Chromosome name
`chromStart`	`unsigned int`	The start coordinate of the band
`chromEnd`	`unsigned int`	The end coordinate of the band
`name`	`varchar`	Name of the band
`gieStain`	`varchar`	Giemsa Stain info, to identify bands, centromeres, etc.

The SQL code to create this table is shown below:

CREATE TABLE `<your_reference_database>`.`cytoBandIdeo` (
  `chrom` varchar(255) NOT NULL,
  `chromStart` int(10) unsigned NOT NULL,
  `chromEnd` int(10) unsigned NOT NULL,
  `name` varchar(255) NOT NULL,
  `gieStain` varchar(255) NOT NULL,
  KEY `chrom` (`chrom`(23), `chromStart`)
) ENGINE=InnoDB;

The cytoBandIdeo table also needs to be populated by actual data. The following SQL command can be used if the file cytoBandIdeo is already on the server:

LOAD DATA LOCAL INFILE "<cytoBandIdeo_file_path>" INTO TABLE `<your_reference_database>`.`cytoBandIdeo`;

NOTE: The annotation files, including the cytoBandIdeo file for all references available on GIVE can be downloaded from UCSC or the following URL: https://demo.give.genemo.org/annotations/. Currently, hg19, hg38, mm9 and mm10 are available.

Adding an entry in `ref` table of `compbrowser`

After creating the table, you also need to add one entry in table ref of database compbrowser. Notice that the browserActive field needs to be set to 1 and in the settings field, the JSON string also has its browserActive attribute set as true. (You may want to try http://www.objgen.com/json to get a JSON string with ease.)

Please see

The SQL code is shown below:

INSERT INTO `compbrowser`.`ref` (
  `dbname`,
  `name`,
  `commonname`,
  `browserActive`,
  `settings`
) VALUES (
  '<your_reference_database>',
  '<species_name>',
  '<species_common_name>',
  1,
  '{
    "browserActive": true
  }'
);

Creating track groups

Tracks in GIVE belong to track groups for better management and these groups need their place in the database. A grp table is required in the reference database to manage track groups with the following columns:

Column name	Type	Description
`name`	`varchar`	Group name
`label`	`varchar`	Description of the group
`priority`	`float`	Order for this group in the browser, less is upper
`defaultIsClosed`	`tinyint`	Whether the group will be closed by default, reserved
`singleChoice`	`tinyint`	Whether the group will only allow one track to be active at any time

The SQL code to create this table is shown below:

CREATE TABLE `<your_reference_database>`.`grp` (
  `name` char(150) NOT NULL,
  `label` char(255) NOT NULL DEFAULT '',
  `priority` float NOT NULL DEFAULT '0',
  `defaultIsClosed` tinyint(2) DEFAULT NULL,
  `singleChoice` tinyint(1) NOT NULL DEFAULT '0',
  PRIMARY KEY (`name`)
) ENGINE=InnoDB;

Individual track groups can be created by adding entries in the grp table, using the following SQL command:

INSERT INTO `<your_reference_database>`.`grp` VALUES (
  '<group_name>',
  '<group_description>',
  <the_priority_of_the_group>,
  <default_is_closed_value>,
  <whether_the_group_only_allows_one_choice>
);

Creating the track definition table

Tracks themselves also need a place to store their annotation and data. This is achieved by creating a table named trackDb in the reference database with the following columns:

Column name	Type	Description
`tableName`	`varchar`	Name of the track table
`type`	`varchar`	Type of the track (Important)
`priority`	`float`	Order for the track (within group)
`url`	`longblob`	URL for the track, reserved
`html`	`longtext`	HTML description for the track, reserved
`grp`	`varchar`	Group of the track, should be the same as *<group_name>*
`settings`	`longtext`	Detailed track settings, JSON format. See Database table properties documentation for details.

The SQL code to create this table is shown below:

CREATE TABLE `<your_reference_database>`.`trackDb` (
  `tableName` varchar(150) NOT NULL,
  `type` varchar(255) NOT NULL,
  `priority` float NOT NULL,
  `url` longblob,
  `html` longtext,
  `grp` varchar(150) NOT NULL,
  `settings` longtext NOT NULL,
  PRIMARY KEY (`tableName`),
  FOREIGN KEY `group_id` (`grp`) REFERENCES
    `<your_reference_database>`.`grp` (`name`)
    ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB;

After this step, GIVE will be able to display the new reference genome and also data tracks within it.

Adding data

Adding tracks to GIVE database typically involves two steps:

Create a table corresponding to the data type and populate it;
Add an entry in the trackDb table with the metadata of the track.

Adding genomic span data (BED)

The table for BED tracks needs to contain the following columns:

Column name	Type	Description
`name`	`varchar`	Name of the span (gene, transcript, etc.)
`chrom`	`varchar`	Chromosome name
`strand`	`char`	Strand of the span
`txStart`	`int`	Start coordinate of the span
`txEnd`	`int`	End coordinate of the span
`cdsStart`	`int`	Start coordinate of the 'thick' portion of the span
`cdsEnd`	`int`	End coordinate of the 'thick' portion of the span
`exonCount`	`int`	Number of exons in the span
`exonStarts`	`longblob`	The start coordinates of all exons, separated by comma
`exonEnds`	`longblob`	The end coordinates of all exons, separated by comma
`proteinID`	`varchar`	The ID of corresponding protein, only applies to gene tracks
`alignID`	`varchar`	The ID of alignment segments, only applies to gene tracks

Note: This is different than BED12 format: 1) field order is slightly different; 2) the 9th and 10th columns represent the start and end coordinate of all the exons, instead of the start within the gene and length of the exon in BED12.

The SQL command to create such a table is shown below:

CREATE TABLE `<your_reference_database>`.`<track_table_name>` (
  `name` varchar(255) NOT NULL DEFAULT '',
  `chrom` varchar(255) NOT NULL DEFAULT '',
  `strand` char(2) NOT NULL DEFAULT '',
  `txStart` int(10) unsigned NOT NULL DEFAULT '0',
  `txEnd` int(10) unsigned NOT NULL DEFAULT '0',
  `cdsStart` int(10) unsigned NOT NULL DEFAULT '0',
  `cdsEnd` int(10) unsigned NOT NULL DEFAULT '0',
  `exonCount` int(10) unsigned NOT NULL DEFAULT '0',
  `exonStarts` longblob NOT NULL,
  `exonEnds` longblob NOT NULL,
  `proteinID` varchar(40) NOT NULL DEFAULT '',
  `alignID` varchar(255) NOT NULL DEFAULT '',
  KEY `name` (`name`),
  KEY `chrom` (`chrom`(16),`txStart`),
  KEY `chrom_2` (`chrom`(16),`txEnd`),
  KEY `protein` (`proteinID`(16)),
  KEY `align` (`alignID`)
) ENGINE=InnoDB;

After the table is created, you can populate it with the actual data:

LOAD DATA LOCAL INFILE "<bed_data_file_path>" INTO TABLE `<your_reference_database>`.`<track_table_name>`;

The entry in the trackDb table can be added via the following SQL command:

INSERT INTO `<your_reference_database>`.`trackDb` VALUES (
  '<track_table_name>',
  'bed',
  1,
  NULL,
  NULL,
  'genes',                      -- Group name, should be the same as grp.name
  '{
    "group":"<group_name>",
    "longLabel":"<long_label>",
    "priority":1,
    "shortLabel":"<short_label>",
    "track":"<track_table_name>",
    "type":"bed",
    "visibility":"pack",
    "adaptive":true
  }'
);

Adding linear tracks (bigWig)

The table for bigWig tracks only needs to contain the following column:

Column name	Type	Description
`fileName`	`varchar`	The path or URL of the bigWig file

The SQL command to create such a table is shown below:

CREATE TABLE `<your_reference_database>`.`<track_table_name>` (
  `fileName` varchar(255) NOT NULL
) ENGINE=InnoDB;

After the table is created, you can populate it with the actual data:

INSERT INTO `<your_reference_database>`.`<track_table_name>` VALUES (
  '<bigWig_file_path>'
);

The entry in the trackDb table can be added via the following SQL command:

INSERT INTO `<your_reference_database>`.`trackDb` VALUES (
  '<track_table_name>',
  'bigWig',
  1,
  NULL,
  NULL,
  'genes',                      -- Group name, should be the same as grp.name
  '{
    "group":"<group_name>",
    "longLabel":"<long_label>",
    "priority":1,
    "shortLabel":"<short_label>",
    "track":"<track_table_name>",
    "type":"bigWig",
    "visibility":"full",
    "autoScale":false,
  }'
);

Adding interaction tracks

Adding interaction tracks (in interaction format) is similar to adding BED or GenePred tracks. The table for interaction tracks needs to contain the following columns:

Column name	Type	Description
`ID`	`int`	ID of the interaction segment (can be generated by auto-increment)
`chrom`	`varchar`	Chromosome name
`Start`	`int`	Start coordinate of the span
`End`	`int`	End coordinate of the span
`linkID`	`int`	ID of the link (segments with the same linkID are linked together)
`value`	`float`	The value of the link
`dirFlag`	`tinyint`	The direction of the link (the link should go from the segment with `dirFlag` = 0 to the one with `dirFlag` = 1) if the link has direction, '-1' if the link does not have direction

The SQL command to create such a table is shown below:

CREATE TABLE ``.`<track_table_name>` (
  `ID` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `chrom` varchar(255) NOT NULL DEFAULT '',
  `start` int(10) unsigned NOT NULL DEFAULT '0',
  `end` int(10) unsigned NOT NULL DEFAULT '0',
  `linkID` int(10) unsigned NOT NULL DEFAULT '0',
  `value` float NOT NULL DEFAULT '0',
  `dirFlag` tinyint(4) NOT NULL DEFAULT '-1',
  PRIMARY KEY (`ID`),
  KEY `chrom` (`chrom`(16),`start`),
  KEY `chrom_2` (`chrom`(16),`end`),
  KEY `linkID` (`linkID`)
) ENGINE=InnoDB;

After the table is created, you can populate it with the actual data:

LOAD DATA LOCAL INFILE "<interaction_data_file_path>" INTO TABLE `<your_reference_database>`.`<track_table_name>`;

The entry in the trackDb table can be added via the following SQL command:

INSERT INTO `<your_reference_database>`.`trackDb` VALUES (
  '<track_table_name>',
  'interaction',
  1,
  NULL,
  NULL,
  'genes',                      -- Group name, should be the same as grp.name
  '{
    "group":"<group_name>",
    "longLabel":"<long_label>",
    "priority":1,
    "shortLabel":"<short_label>",
    "track":"<track_table_name>",
    "type":"interation",
    "visibility":"full",
    "thresholdPercentile": [
      <percentile_values_for_interaction_thresholds>
    ],
  }'
);

Property references

For both references and tracks, the settings column in the corresponding tables (ref table in compbrowser database for a reference, and trackDb table in your reference database for a track.) includes some properties that are available. This section lists some of the useful properties.

Reference table properties documentation

Here are a brief description of the properties you may use for a reference in the settings column of ref in the compbrowser database:

browserActive

Type: Boolean
Default: false
Whether this reference will be enabled for the GIVE browser. If you would like to have this reference available, you will need to set this value to true.

geneCoorTable

Type: String
Default: ''
The database table for the gene name search function of <gene-coor-input> element. This table should have the columns shown below.

Column name Type Description

<gene-symbol-column> VARCHAR Gene names/symbols (this can be a column of the table itself, or a column in the linked table(s)). This value can be specified in geneSymbolColumn property, see below.

chrom VARCHAR The name of the chromosome the gene is on

chromStart INT Chromosomal start coordinates

chromEnd INT Chromosomal end coordinates

geneSymbolColumn

Type: String
Default: 'name'
The column for gene symbols in geneCoorTable.

geneDescTable

Type: String
Default: ''
When implementing gene name search function of <gene-coor-input>, this table can be used to provide the gene description in the search results. This table should have the columns shown below.

Column name type Description

<desc-symbol-column> VARCHAR Gene symbols (corresponding to <gene-symbol-column> in <gene-coordinate-table>). This value can be specified in descSymbolColumn property, see below.

description VARCHAR Gene description

descSymbolColumn

Type: String
Default: 'Symbol'
The column for gene symbols in geneDescTable.

aliasTable

Type: String
Default: ''
When implementing gene name search function of <gene-coor-input>, this table can be used to link all the different aliases to their official symbol. (For example, OCT3/4 to POU5F1.) This table should have the columns shown below.

Column name type Description

<alias-symbol-column> VARCHAR gene symbols (corresponding to <gene-symbol-column> in <gene-coordinate-table>). This value can be specified in aliasSymbolColumn property, see below.

alias VARCHAR Gene aliases, each alias shall occupy one row

isSymbol BIT Indicating whether this alias is the same as <gene-symbol-column>, 1 if the alias is the same, 0 if not

aliasSymbolColumn

Type: String
Default: 'Symbol'
The column for gene symbols in aliasTable.

Database table properties documentation

Here are a brief description of the properties you may use in the settings column of trackDb in your reference database:

shortLabel

Type: String
Default: ''
The short label shown by the track.

longLabel

Type: String
Default: ''
The long label providing more information. Reserved.

visibility

Type: enum ('full', 'pack', 'collapsed', 'dense', and 'hide')
Default: 'hide'
The visibility of the track, values on the left are more visible/prominent.

adaptive

Type: Boolean
Default: false
Whether the visibility will adapt to the height of the ending result. If the height exceeds certain threshold, the visibility will be degraded by one.

dataType, cellType, trackFeature, labName,
groupDataType, groupFeature, groupSampleType

Type: String
Default: null
These are metadata of the tracks. Reserved for future features.

thresholdPercentile

Type: Array<Number>
Default: null
A series of numbers defining the percentile of the data points in the track. This is used to color interaction graphs by their corresponding signal level. May be applied to other types of tracks in a future release.


← 3.1. GIVE-Toolbox	↑ Index	4.1. Bed Format →

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3.2-dataSource.md

3.2-dataSource.md

MySQL commands for managing data in GIVE data source

Table of Contents

Creating a new reference genome for GIVE

Creating a new reference genome database

Creating a `cytoBandIdeo` table and populate it with data

Adding an entry in `ref` table of `compbrowser`

Creating track groups

Creating the track definition table

Adding data

Adding genomic span data (BED)

Adding linear tracks (bigWig)

Adding interaction tracks

Property references

Reference table properties documentation

Database table properties documentation

Column name	Type	Description
`<gene-symbol-column>`	`VARCHAR`	Gene names/symbols (this can be a column of the table itself, or a column in the linked table(s)). This value can be specified in `geneSymbolColumn` property, see below.
`chrom`	`VARCHAR`	The name of the chromosome the gene is on
`chromStart`	`INT`	Chromosomal start coordinates
`chromEnd`	`INT`	Chromosomal end coordinates

Column name	type	Description
`<desc-symbol-column>`	`VARCHAR`	Gene symbols (corresponding to `<gene-symbol-column>` in `<gene-coordinate-table>`). This value can be specified in `descSymbolColumn` property, see below.
`description`	`VARCHAR`	Gene description

Column name	type	Description
`<alias-symbol-column>`	`VARCHAR`	gene symbols (corresponding to `<gene-symbol-column>` in `<gene-coordinate-table>`). This value can be specified in `aliasSymbolColumn` property, see below.
`alias`	`VARCHAR`	Gene aliases, each alias shall occupy one row
`isSymbol`	`BIT`	Indicating whether this alias is the same as `<gene-symbol-column>`, 1 if the alias is the same, 0 if not

Files

3.2-dataSource.md

Latest commit

History

3.2-dataSource.md

File metadata and controls

MySQL commands for managing data in GIVE data source

Table of Contents

Creating a new reference genome for GIVE

Creating a new reference genome database

Creating a cytoBandIdeo table and populate it with data

Adding an entry in ref table of compbrowser

Creating track groups

Creating the track definition table

Adding data

Adding genomic span data (BED)

Adding linear tracks (bigWig)

Adding interaction tracks

Property references

Reference table properties documentation

Database table properties documentation

Creating a `cytoBandIdeo` table and populate it with data

Adding an entry in `ref` table of `compbrowser`