Skip to content

Latest commit

 

History

History
480 lines (396 loc) · 19.5 KB

File metadata and controls

480 lines (396 loc) · 19.5 KB
← 3.1. GIVE-Toolbox ↑ Index 4.1. Bed Format →

MySQL commands for managing data in GIVE data source

To visualize new data with GIVE, you can add it to the MySQL-compatible data source.


NOTE: If you are not very familiar with SQL commands, it is recommended that you manage data used by GIVE using GIVE-Toolbox. Detailed manual of GIVE-Toolbox is shown in 3.1. GIVE-Toolbox usages for managing data in GIVE data source.


Table of Contents

Creating a new reference genome for GIVE

Creating a new reference genome database

To visualize a new reference genome, GIVE only needs to know:

  1. The names of the species for this reference (Latin and common names are recommended but any name should work);
  2. Its chromosomal information, including names, sizes and the location of centromeres.

These are stored in two locations within the data source. First you need to create a database with your_reference_database:

CREATE DATABASE `<your_reference_database>`;

Also, make sure your GIVE Database User has access to this database (if you haven't granted privilege for *.*):

GRANT SELECT, CREATE TEMPORARY TABLES ON `<your_reference_database>`.* TO `<give_data_user>`@'%';

Creating a cytoBandIdeo table and populate it with data

Then you need to create a cytoBandIdeo table with chromosomal information in your reference database with the following columns:

Column name Type Description
chrom varchar Chromosome name
chromStart unsigned int The start coordinate of the band
chromEnd unsigned int The end coordinate of the band
name varchar Name of the band
gieStain varchar Giemsa Stain info, to identify bands, centromeres, etc.

The SQL code to create this table is shown below:

CREATE TABLE `<your_reference_database>`.`cytoBandIdeo` (
  `chrom` varchar(255) NOT NULL,
  `chromStart` int(10) unsigned NOT NULL,
  `chromEnd` int(10) unsigned NOT NULL,
  `name` varchar(255) NOT NULL,
  `gieStain` varchar(255) NOT NULL,
  KEY `chrom` (`chrom`(23), `chromStart`)
) ENGINE=InnoDB;

The cytoBandIdeo table also needs to be populated by actual data. The following SQL command can be used if the file cytoBandIdeo is already on the server:

LOAD DATA LOCAL INFILE "<cytoBandIdeo_file_path>" INTO TABLE `<your_reference_database>`.`cytoBandIdeo`;

NOTE: The annotation files, including the cytoBandIdeo file for all references available on GIVE can be downloaded from UCSC or the following URL: https://demo.give.genemo.org/annotations/. Currently, hg19, hg38, mm9 and mm10 are available.


Adding an entry in ref table of compbrowser

After creating the table, you also need to add one entry in table ref of database compbrowser. Notice that the browserActive field needs to be set to 1 and in the settings field, the JSON string also has its browserActive attribute set as true. (You may want to try http://www.objgen.com/json to get a JSON string with ease.)

Please see

The SQL code is shown below:

INSERT INTO `compbrowser`.`ref` (
  `dbname`,
  `name`,
  `commonname`,
  `browserActive`,
  `settings`
) VALUES (
  '<your_reference_database>',
  '<species_name>',
  '<species_common_name>',
  1,
  '{
    "browserActive": true
  }'
);

Creating track groups

Tracks in GIVE belong to track groups for better management and these groups need their place in the database. A grp table is required in the reference database to manage track groups with the following columns:

Column name Type Description
name varchar Group name
label varchar Description of the group
priority float Order for this group in the browser, less is upper
defaultIsClosed tinyint Whether the group will be closed by default, reserved
singleChoice tinyint Whether the group will only allow one track to be active at any time

The SQL code to create this table is shown below:

CREATE TABLE `<your_reference_database>`.`grp` (
  `name` char(150) NOT NULL,
  `label` char(255) NOT NULL DEFAULT '',
  `priority` float NOT NULL DEFAULT '0',
  `defaultIsClosed` tinyint(2) DEFAULT NULL,
  `singleChoice` tinyint(1) NOT NULL DEFAULT '0',
  PRIMARY KEY (`name`)
) ENGINE=InnoDB;

Individual track groups can be created by adding entries in the grp table, using the following SQL command:

INSERT INTO `<your_reference_database>`.`grp` VALUES (
  '<group_name>',
  '<group_description>',
  <the_priority_of_the_group>,
  <default_is_closed_value>,
  <whether_the_group_only_allows_one_choice>
);

Creating the track definition table

Tracks themselves also need a place to store their annotation and data. This is achieved by creating a table named trackDb in the reference database with the following columns:

Column name Type Description
tableName varchar Name of the track table
type varchar Type of the track (Important)
priority float Order for the track (within group)
url longblob URL for the track, reserved
html longtext HTML description for the track, reserved
grp varchar Group of the track, should be the same as <group_name>
settings longtext Detailed track settings, JSON format. See Database table properties documentation for details.

The SQL code to create this table is shown below:

CREATE TABLE `<your_reference_database>`.`trackDb` (
  `tableName` varchar(150) NOT NULL,
  `type` varchar(255) NOT NULL,
  `priority` float NOT NULL,
  `url` longblob,
  `html` longtext,
  `grp` varchar(150) NOT NULL,
  `settings` longtext NOT NULL,
  PRIMARY KEY (`tableName`),
  FOREIGN KEY `group_id` (`grp`) REFERENCES
    `<your_reference_database>`.`grp` (`name`)
    ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB;

After this step, GIVE will be able to display the new reference genome and also data tracks within it.

Adding data

Adding tracks to GIVE database typically involves two steps:

  1. Create a table corresponding to the data type and populate it;
  2. Add an entry in the trackDb table with the metadata of the track.

Adding genomic span data (BED)

The table for BED tracks needs to contain the following columns:

Column name Type Description
name varchar Name of the span (gene, transcript, etc.)
chrom varchar Chromosome name
strand char Strand of the span
txStart int Start coordinate of the span
txEnd int End coordinate of the span
cdsStart int Start coordinate of the 'thick' portion of the span
cdsEnd int End coordinate of the 'thick' portion of the span
exonCount int Number of exons in the span
exonStarts longblob The start coordinates of all exons, separated by comma
exonEnds longblob The end coordinates of all exons, separated by comma
proteinID varchar The ID of corresponding protein, only applies to gene tracks
alignID varchar The ID of alignment segments, only applies to gene tracks

Note: This is different than BED12 format: 1) field order is slightly different; 2) the 9th and 10th columns represent the start and end coordinate of all the exons, instead of the start within the gene and length of the exon in BED12.


The SQL command to create such a table is shown below:

CREATE TABLE `<your_reference_database>`.`<track_table_name>` (
  `name` varchar(255) NOT NULL DEFAULT '',
  `chrom` varchar(255) NOT NULL DEFAULT '',
  `strand` char(2) NOT NULL DEFAULT '',
  `txStart` int(10) unsigned NOT NULL DEFAULT '0',
  `txEnd` int(10) unsigned NOT NULL DEFAULT '0',
  `cdsStart` int(10) unsigned NOT NULL DEFAULT '0',
  `cdsEnd` int(10) unsigned NOT NULL DEFAULT '0',
  `exonCount` int(10) unsigned NOT NULL DEFAULT '0',
  `exonStarts` longblob NOT NULL,
  `exonEnds` longblob NOT NULL,
  `proteinID` varchar(40) NOT NULL DEFAULT '',
  `alignID` varchar(255) NOT NULL DEFAULT '',
  KEY `name` (`name`),
  KEY `chrom` (`chrom`(16),`txStart`),
  KEY `chrom_2` (`chrom`(16),`txEnd`),
  KEY `protein` (`proteinID`(16)),
  KEY `align` (`alignID`)
) ENGINE=InnoDB;

After the table is created, you can populate it with the actual data:

LOAD DATA LOCAL INFILE "<bed_data_file_path>" INTO TABLE `<your_reference_database>`.`<track_table_name>`;

The entry in the trackDb table can be added via the following SQL command:

INSERT INTO `<your_reference_database>`.`trackDb` VALUES (
  '<track_table_name>',
  'bed',
  1,
  NULL,
  NULL,
  'genes',                      -- Group name, should be the same as grp.name
  '{
    "group":"<group_name>",
    "longLabel":"<long_label>",
    "priority":1,
    "shortLabel":"<short_label>",
    "track":"<track_table_name>",
    "type":"bed",
    "visibility":"pack",
    "adaptive":true
  }'
);

Adding linear tracks (bigWig)

The table for bigWig tracks only needs to contain the following column:

Column name Type Description
fileName varchar The path or URL of the bigWig file

The SQL command to create such a table is shown below:

CREATE TABLE `<your_reference_database>`.`<track_table_name>` (
  `fileName` varchar(255) NOT NULL
) ENGINE=InnoDB;

After the table is created, you can populate it with the actual data:

INSERT INTO `<your_reference_database>`.`<track_table_name>` VALUES (
  '<bigWig_file_path>'
);

The entry in the trackDb table can be added via the following SQL command:

INSERT INTO `<your_reference_database>`.`trackDb` VALUES (
  '<track_table_name>',
  'bigWig',
  1,
  NULL,
  NULL,
  'genes',                      -- Group name, should be the same as grp.name
  '{
    "group":"<group_name>",
    "longLabel":"<long_label>",
    "priority":1,
    "shortLabel":"<short_label>",
    "track":"<track_table_name>",
    "type":"bigWig",
    "visibility":"full",
    "autoScale":false,
  }'
);

Adding interaction tracks

Adding interaction tracks (in interaction format) is similar to adding BED or GenePred tracks. The table for interaction tracks needs to contain the following columns:

Column name Type Description
ID int ID of the interaction segment (can be generated by auto-increment)
chrom varchar Chromosome name
Start int Start coordinate of the span
End int End coordinate of the span
linkID int ID of the link (segments with the same linkID are linked together)
value float The value of the link
dirFlag tinyint The direction of the link (the link should go from the segment with dirFlag = 0 to the one with dirFlag = 1) if the link has direction, '-1' if the link does not have direction

The SQL command to create such a table is shown below:

CREATE TABLE ``.`<track_table_name>` (
  `ID` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `chrom` varchar(255) NOT NULL DEFAULT '',
  `start` int(10) unsigned NOT NULL DEFAULT '0',
  `end` int(10) unsigned NOT NULL DEFAULT '0',
  `linkID` int(10) unsigned NOT NULL DEFAULT '0',
  `value` float NOT NULL DEFAULT '0',
  `dirFlag` tinyint(4) NOT NULL DEFAULT '-1',
  PRIMARY KEY (`ID`),
  KEY `chrom` (`chrom`(16),`start`),
  KEY `chrom_2` (`chrom`(16),`end`),
  KEY `linkID` (`linkID`)
) ENGINE=InnoDB;

After the table is created, you can populate it with the actual data:

LOAD DATA LOCAL INFILE "<interaction_data_file_path>" INTO TABLE `<your_reference_database>`.`<track_table_name>`;

The entry in the trackDb table can be added via the following SQL command:

INSERT INTO `<your_reference_database>`.`trackDb` VALUES (
  '<track_table_name>',
  'interaction',
  1,
  NULL,
  NULL,
  'genes',                      -- Group name, should be the same as grp.name
  '{
    "group":"<group_name>",
    "longLabel":"<long_label>",
    "priority":1,
    "shortLabel":"<short_label>",
    "track":"<track_table_name>",
    "type":"interation",
    "visibility":"full",
    "thresholdPercentile": [
      <percentile_values_for_interaction_thresholds>
    ],
  }'
);

Property references

For both references and tracks, the settings column in the corresponding tables (ref table in compbrowser database for a reference, and trackDb table in your reference database for a track.) includes some properties that are available. This section lists some of the useful properties.

Reference table properties documentation

Here are a brief description of the properties you may use for a reference in the settings column of ref in the compbrowser database:

browserActive

Type: Boolean
Default: false
Whether this reference will be enabled for the GIVE browser. If you would like to have this reference available, you will need to set this value to true.

geneCoorTable

Type: String
Default: ''
The database table for the gene name search function of <gene-coor-input> element. This table should have the columns shown below.

Column name Type Description
<gene-symbol-column> VARCHAR Gene names/symbols (this can be a column of the table itself, or a column in the linked table(s)). This value can be specified in geneSymbolColumn property, see below.
chrom VARCHAR The name of the chromosome the gene is on
chromStart INT Chromosomal start coordinates
chromEnd INT Chromosomal end coordinates

geneSymbolColumn

Type: String
Default: 'name'
The column for gene symbols in geneCoorTable.

geneDescTable

Type: String
Default: ''
When implementing gene name search function of <gene-coor-input>, this table can be used to provide the gene description in the search results. This table should have the columns shown below.

Column name type Description
<desc-symbol-column> VARCHAR Gene symbols (corresponding to <gene-symbol-column> in <gene-coordinate-table>). This value can be specified in descSymbolColumn property, see below.
description VARCHAR Gene description

descSymbolColumn

Type: String
Default: 'Symbol'
The column for gene symbols in geneDescTable.

aliasTable

Type: String
Default: ''
When implementing gene name search function of <gene-coor-input>, this table can be used to link all the different aliases to their official symbol. (For example, OCT3/4 to POU5F1.) This table should have the columns shown below.

Column name type Description
<alias-symbol-column> VARCHAR gene symbols (corresponding to <gene-symbol-column> in <gene-coordinate-table>). This value can be specified in aliasSymbolColumn property, see below.
alias VARCHAR Gene aliases, each alias shall occupy one row
isSymbol BIT Indicating whether this alias is the same as <gene-symbol-column>, 1 if the alias is the same, 0 if not

aliasSymbolColumn

Type: String
Default: 'Symbol'
The column for gene symbols in aliasTable.

Database table properties documentation

Here are a brief description of the properties you may use in the settings column of trackDb in your reference database:

shortLabel

Type: String
Default: ''
The short label shown by the track.

longLabel

Type: String
Default: ''
The long label providing more information. Reserved.

visibility

Type: enum ('full', 'pack', 'collapsed', 'dense', and 'hide')
Default: 'hide'
The visibility of the track, values on the left are more visible/prominent.

adaptive

Type: Boolean
Default: false
Whether the visibility will adapt to the height of the ending result. If the height exceeds certain threshold, the visibility will be degraded by one.

dataType, cellType, trackFeature, labName,
groupDataType, groupFeature, groupSampleType

Type: String
Default: null
These are metadata of the tracks. Reserved for future features.

thresholdPercentile

Type: Array<Number>
Default: null
A series of numbers defining the percentile of the data points in the track. This is used to color interaction graphs by their corresponding signal level. May be applied to other types of tracks in a future release.

← 3.1. GIVE-Toolbox ↑ Index 4.1. Bed Format →