← 3.1. GIVE-Toolbox | ↑ Index | 4.1. Bed Format → |
To visualize new data with GIVE, you can add it to the MySQL-compatible data source.
NOTE: If you are not very familiar with SQL commands, it is recommended that you manage data used by GIVE using GIVE-Toolbox. Detailed manual of GIVE-Toolbox is shown in 3.1. GIVE-Toolbox usages for managing data in GIVE data source.
To visualize a new reference genome, GIVE only needs to know:
- The names of the species for this reference (Latin and common names are recommended but any name should work);
- Its chromosomal information, including names, sizes and the location of centromeres.
These are stored in two locations within the data source. First you need to create a database with your_reference_database
:
CREATE DATABASE `<your_reference_database>`;
Also, make sure your GIVE Database User has access to this database (if you haven't granted privilege for *.*
):
GRANT SELECT, CREATE TEMPORARY TABLES ON `<your_reference_database>`.* TO `<give_data_user>`@'%';
Then you need to create a cytoBandIdeo
table with chromosomal information in your reference database with the following columns:
Column name | Type | Description |
---|---|---|
chrom |
varchar |
Chromosome name |
chromStart |
unsigned int |
The start coordinate of the band |
chromEnd |
unsigned int |
The end coordinate of the band |
name |
varchar |
Name of the band |
gieStain |
varchar |
Giemsa Stain info, to identify bands, centromeres, etc. |
The SQL code to create this table is shown below:
CREATE TABLE `<your_reference_database>`.`cytoBandIdeo` ( `chrom` varchar(255) NOT NULL, `chromStart` int(10) unsigned NOT NULL, `chromEnd` int(10) unsigned NOT NULL, `name` varchar(255) NOT NULL, `gieStain` varchar(255) NOT NULL, KEY `chrom` (`chrom`(23), `chromStart`) ) ENGINE=InnoDB;
The cytoBandIdeo
table also needs to be populated by actual data. The following SQL command can be used if the file cytoBandIdeo is already on the server:
LOAD DATA LOCAL INFILE "<cytoBandIdeo_file_path>" INTO TABLE `<your_reference_database>`.`cytoBandIdeo`;
NOTE: The annotation files, including the cytoBandIdeo
file for all references available on GIVE can be downloaded from UCSC or the following URL: https://demo.give.genemo.org/annotations/. Currently, hg19
, hg38
, mm9
and mm10
are available.
After creating the table, you also need to add one entry in table ref
of database compbrowser
. Notice that the browserActive
field needs to be set to 1
and in the settings
field, the JSON string also has its browserActive
attribute set as true
. (You may want to try http://www.objgen.com/json to get a JSON string with ease.)
Please see
The SQL code is shown below:
INSERT INTO `compbrowser`.`ref` ( `dbname`, `name`, `commonname`, `browserActive`, `settings` ) VALUES ( '<your_reference_database>', '<species_name>', '<species_common_name>', 1, '{ "browserActive": true }' );
Tracks in GIVE belong to track groups for better management and these groups need their place in the database. A grp
table is required in the reference database to manage track groups with the following columns:
Column name | Type | Description |
---|---|---|
name |
varchar |
Group name |
label |
varchar |
Description of the group |
priority |
float |
Order for this group in the browser, less is upper |
defaultIsClosed |
tinyint |
Whether the group will be closed by default, reserved |
singleChoice |
tinyint |
Whether the group will only allow one track to be active at any time |
The SQL code to create this table is shown below:
CREATE TABLE `<your_reference_database>`.`grp` ( `name` char(150) NOT NULL, `label` char(255) NOT NULL DEFAULT '', `priority` float NOT NULL DEFAULT '0', `defaultIsClosed` tinyint(2) DEFAULT NULL, `singleChoice` tinyint(1) NOT NULL DEFAULT '0', PRIMARY KEY (`name`) ) ENGINE=InnoDB;
Individual track groups can be created by adding entries in the grp
table, using the following SQL command:
INSERT INTO `<your_reference_database>`.`grp` VALUES ( '<group_name>', '<group_description>', <the_priority_of_the_group>, <default_is_closed_value>, <whether_the_group_only_allows_one_choice> );
Tracks themselves also need a place to store their annotation and data. This is achieved by creating a table named trackDb
in the reference database with the following columns:
Column name | Type | Description |
---|---|---|
tableName |
varchar |
Name of the track table |
type |
varchar |
Type of the track (Important) |
priority |
float |
Order for the track (within group) |
url |
longblob |
URL for the track, reserved |
html |
longtext |
HTML description for the track, reserved |
grp |
varchar |
Group of the track, should be the same as <group_name> |
settings |
longtext |
Detailed track settings, JSON format. See Database table properties documentation for details. |
The SQL code to create this table is shown below:
CREATE TABLE `<your_reference_database>`.`trackDb` ( `tableName` varchar(150) NOT NULL, `type` varchar(255) NOT NULL, `priority` float NOT NULL, `url` longblob, `html` longtext, `grp` varchar(150) NOT NULL, `settings` longtext NOT NULL, PRIMARY KEY (`tableName`), FOREIGN KEY `group_id` (`grp`) REFERENCES `<your_reference_database>`.`grp` (`name`) ON DELETE CASCADE ON UPDATE CASCADE ) ENGINE=InnoDB;
After this step, GIVE will be able to display the new reference genome and also data tracks within it.
Adding tracks to GIVE database typically involves two steps:
- Create a table corresponding to the data type and populate it;
- Add an entry in the
trackDb
table with the metadata of the track.
The table for BED tracks needs to contain the following columns:
Column name | Type | Description |
---|---|---|
name |
varchar |
Name of the span (gene, transcript, etc.) |
chrom |
varchar |
Chromosome name |
strand |
char |
Strand of the span |
txStart |
int |
Start coordinate of the span |
txEnd |
int |
End coordinate of the span |
cdsStart |
int |
Start coordinate of the 'thick' portion of the span |
cdsEnd |
int |
End coordinate of the 'thick' portion of the span |
exonCount |
int |
Number of exons in the span |
exonStarts |
longblob |
The start coordinates of all exons, separated by comma |
exonEnds |
longblob |
The end coordinates of all exons, separated by comma |
proteinID |
varchar |
The ID of corresponding protein, only applies to gene tracks |
alignID |
varchar |
The ID of alignment segments, only applies to gene tracks |
Note: This is different than BED12 format: 1) field order is slightly different; 2) the 9th and 10th columns represent the start and end coordinate of all the exons, instead of the start within the gene and length of the exon in BED12.
The SQL command to create such a table is shown below:
CREATE TABLE `<your_reference_database>`.`<track_table_name>` ( `name` varchar(255) NOT NULL DEFAULT '', `chrom` varchar(255) NOT NULL DEFAULT '', `strand` char(2) NOT NULL DEFAULT '', `txStart` int(10) unsigned NOT NULL DEFAULT '0', `txEnd` int(10) unsigned NOT NULL DEFAULT '0', `cdsStart` int(10) unsigned NOT NULL DEFAULT '0', `cdsEnd` int(10) unsigned NOT NULL DEFAULT '0', `exonCount` int(10) unsigned NOT NULL DEFAULT '0', `exonStarts` longblob NOT NULL, `exonEnds` longblob NOT NULL, `proteinID` varchar(40) NOT NULL DEFAULT '', `alignID` varchar(255) NOT NULL DEFAULT '', KEY `name` (`name`), KEY `chrom` (`chrom`(16),`txStart`), KEY `chrom_2` (`chrom`(16),`txEnd`), KEY `protein` (`proteinID`(16)), KEY `align` (`alignID`) ) ENGINE=InnoDB;
After the table is created, you can populate it with the actual data:
LOAD DATA LOCAL INFILE "<bed_data_file_path>" INTO TABLE `<your_reference_database>`.`<track_table_name>`;
The entry in the trackDb
table can be added via the following SQL command:
INSERT INTO `<your_reference_database>`.`trackDb` VALUES ( '<track_table_name>', 'bed', 1, NULL, NULL, 'genes', -- Group name, should be the same as grp.name '{ "group":"<group_name>", "longLabel":"<long_label>", "priority":1, "shortLabel":"<short_label>", "track":"<track_table_name>", "type":"bed", "visibility":"pack", "adaptive":true }' );
The table for bigWig tracks only needs to contain the following column:
Column name | Type | Description |
---|---|---|
fileName |
varchar |
The path or URL of the bigWig file |
The SQL command to create such a table is shown below:
CREATE TABLE `<your_reference_database>`.`<track_table_name>` ( `fileName` varchar(255) NOT NULL ) ENGINE=InnoDB;
After the table is created, you can populate it with the actual data:
INSERT INTO `<your_reference_database>`.`<track_table_name>` VALUES ( '<bigWig_file_path>' );
The entry in the trackDb
table can be added via the following SQL command:
INSERT INTO `<your_reference_database>`.`trackDb` VALUES ( '<track_table_name>', 'bigWig', 1, NULL, NULL, 'genes', -- Group name, should be the same as grp.name '{ "group":"<group_name>", "longLabel":"<long_label>", "priority":1, "shortLabel":"<short_label>", "track":"<track_table_name>", "type":"bigWig", "visibility":"full", "autoScale":false, }' );
Adding interaction tracks (in interaction
format) is similar to adding BED
or GenePred
tracks. The table for interaction tracks needs to contain the following columns:
Column name | Type | Description |
---|---|---|
ID |
int |
ID of the interaction segment (can be generated by auto-increment) |
chrom |
varchar |
Chromosome name |
Start |
int |
Start coordinate of the span |
End |
int |
End coordinate of the span |
linkID |
int |
ID of the link (segments with the same linkID are linked together) |
value |
float |
The value of the link |
dirFlag |
tinyint |
The direction of the link (the link should go from the segment with dirFlag = 0 to the one with dirFlag = 1) if the link has direction, '-1' if the link does not have direction |
The SQL command to create such a table is shown below:
CREATE TABLE ``.`<track_table_name>` ( `ID` int(10) unsigned NOT NULL AUTO_INCREMENT, `chrom` varchar(255) NOT NULL DEFAULT '', `start` int(10) unsigned NOT NULL DEFAULT '0', `end` int(10) unsigned NOT NULL DEFAULT '0', `linkID` int(10) unsigned NOT NULL DEFAULT '0', `value` float NOT NULL DEFAULT '0', `dirFlag` tinyint(4) NOT NULL DEFAULT '-1', PRIMARY KEY (`ID`), KEY `chrom` (`chrom`(16),`start`), KEY `chrom_2` (`chrom`(16),`end`), KEY `linkID` (`linkID`) ) ENGINE=InnoDB;
After the table is created, you can populate it with the actual data:
LOAD DATA LOCAL INFILE "<interaction_data_file_path>" INTO TABLE `<your_reference_database>`.`<track_table_name>`;
The entry in the trackDb
table can be added via the following SQL command:
INSERT INTO `<your_reference_database>`.`trackDb` VALUES ( '<track_table_name>', 'interaction', 1, NULL, NULL, 'genes', -- Group name, should be the same as grp.name '{ "group":"<group_name>", "longLabel":"<long_label>", "priority":1, "shortLabel":"<short_label>", "track":"<track_table_name>", "type":"interation", "visibility":"full", "thresholdPercentile": [ <percentile_values_for_interaction_thresholds> ], }' );
For both references and tracks, the settings
column in the corresponding
tables (ref
table in compbrowser
database for a reference, and trackDb
table in your reference database for a track.) includes some properties that
are available. This section lists some of the useful properties.
Here are a brief description of the properties you may use for a reference in
the settings
column of ref
in the compbrowser
database:
browserActive
Type:
Boolean
Default:false
Whether this reference will be enabled for the GIVE browser. If you would like to have this reference available, you will need to set this value totrue
.
geneCoorTable
Type:
String
Default:''
The database table for the gene name search function of<gene-coor-input>
element. This table should have the columns shown below.
Column name Type Description <gene-symbol-column>
VARCHAR
Gene names/symbols (this can be a column of the table itself, or a column in the linked table(s)). This value can be specified in geneSymbolColumn
property, see below.chrom
VARCHAR
The name of the chromosome the gene is on chromStart
INT
Chromosomal start coordinates chromEnd
INT
Chromosomal end coordinates
geneSymbolColumn
Type:
String
Default:'name'
The column for gene symbols ingeneCoorTable
.
geneDescTable
Type:
String
Default:''
When implementing gene name search function of<gene-coor-input>
, this table can be used to provide the gene description in the search results. This table should have the columns shown below.
Column name type Description <desc-symbol-column>
VARCHAR
Gene symbols (corresponding to <gene-symbol-column>
in<gene-coordinate-table>
). This value can be specified indescSymbolColumn
property, see below.description
VARCHAR
Gene description
descSymbolColumn
Type:
String
Default:'Symbol'
The column for gene symbols ingeneDescTable
.
aliasTable
Type:
String
Default:''
When implementing gene name search function of<gene-coor-input>
, this table can be used to link all the different aliases to their official symbol. (For example,OCT3/4
toPOU5F1
.) This table should have the columns shown below.
Column name type Description <alias-symbol-column>
VARCHAR
gene symbols (corresponding to <gene-symbol-column>
in<gene-coordinate-table>
). This value can be specified inaliasSymbolColumn
property, see below.alias
VARCHAR
Gene aliases, each alias shall occupy one row isSymbol
BIT
Indicating whether this alias is the same as <gene-symbol-column>
, 1 if the alias is the same, 0 if not
aliasSymbolColumn
Type:
String
Default:'Symbol'
The column for gene symbols inaliasTable
.
Here are a brief description of the properties you may use in the settings
column of trackDb
in your reference database:
shortLabel
Type:
String
Default:''
The short label shown by the track.
longLabel
Type:
String
Default:''
The long label providing more information. Reserved.
visibility
Type:
enum
('full'
,'pack'
,'collapsed'
,'dense'
, and'hide'
)
Default:'hide'
The visibility of the track, values on the left are more visible/prominent.
adaptive
Type:
Boolean
Default:false
Whether the visibility will adapt to the height of the ending result. If the height exceeds certain threshold, the visibility will be degraded by one.
dataType, cellType, trackFeature, labName,
groupDataType, groupFeature, groupSampleType
Type:
String
Default:null
These are metadata of the tracks. Reserved for future features.
thresholdPercentile
Type:
Array<Number>
Default:null
A series of numbers defining the percentile of the data points in the track. This is used to color interaction graphs by their corresponding signal level. May be applied to other types of tracks in a future release.
← 3.1. GIVE-Toolbox | ↑ Index | 4.1. Bed Format → |