- Introduction
- Walkthrough Example
- Step 1: Deployment of GIVE
- Step 2: Initialization and Create Reference Genome
- Step 3: Create Track Groups
- Step 4: Create Gene Annotation Track
- Step 5: Create Data Track from bed File
- Step 6: Create Data Track from bigWig File
- Step 7: Create Data Track from interaction File
- List and Remove Data Tracks
- Using The Customized GIVE Genome Browser
GIVE-Toolbox is a set of shell(bash) scripts that relieves users from writing MySQL commands. GIVE uses MySQL in the back end for data management. Users who do not want to directly work with MySQL can choose to use GIVE-Toolbox. GIVE-Toolbox offers all the equivalent functions and capabilities as directly writing MySQL commands. GIVE-Toolbox simplifies the operations for configuring and initializing data sources, as well as building and managing data tracks. In the following walk-through example, we will build a customized genome browser in 7 steps.
If you would like to know the actual operations in MySQL, please refer to GIVE manual 3.2: MySQL commands for managing data in GIVE data source.
GIVE support gene annotation
, bed
, bigWig
and interaction
data tracks. Here are some documents regarding the format definitions and related visualizing options in GIVE:
- gene annotation: Gene annotation files in UCSC gene table format can be download from UCSC Table Browser.
bed
TracksbigWig
Tracksinteraction
Tracks
The current version of GIVE-Toolbox includes 9 script tools. These tools can meet all the needs of configuring and managing GIVE data source. Please also read manual 3.1 GIVE-Toolbox usages for managing data in GIVE data source for the detail of usage and arguments of each script tool.
config_host.sh
: set configurations for GIVE serviceinitial_ref.sh
: initialize MySQL database structure and build reference genome databaseadd_track_geneAnnot.sh
: add gene annotation track from UCSC gene table fileadd_trackGroup.sh
: add track groupadd_track_bed.sh
: add data track frombed
format data fileadd_track_bigWig.sh
: add data track frombigWig
format data fileadd_track_interaction.sh
: add data track frominteraction
format data filelist_tracks.sh
: list the tree structure of whole GIVE data source or a specified track group, or detail settings of a specified data trackremove_data.sh
: remove a track, a track group (all the tracks in the track group will be removed) or a whole reference genome (all the track groups and tracks in the reference genome will be removed)
This walk-through example will show you how to use GIVE-Toolbox to easily build a customized GIVE genome browser. It includes 7 steps. In each step, you just need to run one or two bash command lines. All the data in the example can be found in the example_data
folder in GIVE-Docker repo.
For utilizing GIVE-Toolbox with different GIVE deployment approaches:
- GIVE-Docker: GIVE-Toolbox is integrated in the GIVE-Docker. When you run a GIVE container and log into the container system (bash terminal), all the script tools can be directly executed with their names, which are located in system path
/usr/local/bin
(the container system). Theexample_data
is already in the/tmp
path (the container system). - Custom installation: You need to clone the GIVE GitHub repository and install it correctly .
GIVE-Toolbox
folder is included in the cloned GIVE repo. You can use these tools with their directory likebash ~/GIVE/GIVE-Toolbox/add_track_bed.sh -u ...
. For convenience, you can set all these tools executable (usingchmod +x
command to them) and copy them to a system path folder (such as/usr/local/bin
or declare some path in your.bashrc
file). Then you can directly run these tools.
First of all, you need to deploy GIVE on your local machine. You can use GIVE-Docker (recommended) or custom install GIVE.
TO ensure the consistency of system environment, this walk-through example is based on GIVE-Docker. We can use only two command lines to deploy GIVE on your local machine using GIVE-Docker. Please learn how to use GIVE-Docker from GIVE Tutorial 2.1: Easy local deployment of GIVE with GIVE-Docker.
# pull GIVE-Docker image from Docker-Hub
docker pull zhonglab/give
# run a GIVE container named as "give"
docker run -d -it -p 40080:80 -p 40443:443 --name give zhonglab/give
The following command lines will lead you to the internal of "give" container with a bash terminal. Then you can execute the commands of the step 2-7 in it.
# log into "give" container
docker exec -it give /bin/bash
Only if your host server is web accessible (Apache2 or other web server installed), then you can set the host domain name using config_host.sh
tool to use GIVE genome browser remotely through internet. In this example, the domain of our host server is "http://give.genemo.org". As we are using GIVE-Docker container with HTTP port 40080, so we need to set the host domain name as "http://give.genemo.org:40080".
# set host domain name, only if your local machine is web accessible
bash config_host.sh -r /var/www/give -d "http://give.genemo.org:40080"
There is already a built-in reference genome hg19
. Hence, here we initialize a new genome reference hg38
. Just one command is enough. It uses the cytoBandIdeo file of hg38 in the /tmp/example_data
folder, which was downloaded from UCSC genome annotation database.
bash initial_ref.sh -u root -p Admin2015 -r hg38 -s "Homo sapiens" -c human -f /tmp/example_data/cytoBandIdeo.txt -w '["chr10:30000000-50000000", "chr10:35000000-60000000"]'
Now you have a hg38
reference genome. You can add data tracks on this reference genome.
Every data track in GIVE needs to be assigned to a unique track group. The add_trackGroup.sh
is designed for creating track group. The following command lines will create three track groups genes
, RNA_seq
, TAD
and genomic_interactions
.
bash add_trackGroup.sh -u root -p Admin2015 -r hg38 -g "genes" -l "Known Genes" -o 1 -s 0
bash add_trackGroup.sh -u root -p Admin2015 -r hg38 -g "RNA_seq" -l "Gene Expression" -o 2 -s 0
bash add_trackGroup.sh -u root -p Admin2015 -r hg38 -g "TAD" -l "Topologically Associating Domain" -o 3 -s 0
bash add_trackGroup.sh -u root -p Admin2015 -r hg38 -g "genomic_interactions" -l "Genomic interactions from ChIA-PET" -o 3 -s 0
Now you can add data tracks to these track groups.
This command will create a data track named as knownGene
from knownGenes.txt
file.
bash add_geneAnnot.sh -u root -p Admin2015 -r hg38 -t "knownGene" -g "genes" -l "UCSC known genes annotation" -s "UCSC Genes" -o 1 -v full -f /tmp/example_data/knownGene.txt
This command will create a data track named as exampleBed
in the peak_region
track group from example.bed
file.
bash add_track_bed.sh -u root -p Admin2015 -r hg38 -t exampleBed -g "TAD" -l "An example bed track of TAD from HiC" -s "TAD" -o 2 -v pack -f /tmp/example_data/example.bed
This command will create a data track named as exampleBW
in the RNA_seq
track group from example.bigWig
file.
bash add_track_bigWig.sh -u root -p Admin2015 -r hg38 -t exampleBW -g "RNA_seq" -l "An example bigWig from single cell RNAseq" -s "Gene Expression" -o 3 -v full -a true -f /tmp/example_data/example.bigWig
This command will create a data track named as exampleInteractions
in the genomic_interactions
track group from example.interacion
file.
bash add_track_interaction.sh -u root -p Admin2015 -r hg38 -t "exampleInteractions" -g "genomic_interactions" -l "An example genomic interactions from ChIA-PET data" -s "ChIA-PET Interactions" -o 1 -v full -q "0.37,1.32,1.78,2.19,2.60,2.97,3.43,3.85,4.34,4.90,5.48,6.16,6.94,8.01,9.05,10.41,12.37,14.88,19.84,31.77,290.17" -f /tmp/example_data/example.interaction
We also provide tools for listing and removing data tracks, which are list_tracks.sh
and remove_data.sh
.
With the following command, we can see the tree structure of all the data tracks in reference genome hg38
.
bash list_tracks.sh -u root -p Admin2015 -r hg38
Using remove_data.sh
, you can remove data track, group or the whole reference genome. The following command will remove the TAD
track group. After removing, you can use list_tracks.sh
to view the result.
bash remove_data.sh -u root -p Admin2015 -r hg38 -g TAD
Finally, in only 7 steps, you have built a full customized genome browser with 3 data tracks built from 3 kinds of supported data formats. You can use the genome browser with several lines of HTML code as below.
<script src="http://localhost:40080/bower_components/webcomponentsjs/webcomponents-loader.js"></script>
<link rel="import" href="http://localhost:40080/components/chart-controller/chart-controller.html">
<chart-controller ref="hg38" num-of-subs="2"
group-id-list='["genes", "TAD", "RNA_seq", "genomic_interactions"]'
default-track-id-list='["knownGene", "exampleBed", "exampleBW", "exampleInteractions"]'>
</chart-controller>
Copy, paste and save the code into a HTML file, then open it with your web browser to view the genome browser you just built.
If your local machine is web accessible and the you have correctly set the host domain name in step 1, you can use the genome browser remotely. Just modify http://localhost:40080/
with your host server domain name in the HTML file, such as http://give.genemo.org:40080
for our host server. Anyone can view your customized genome browser using the HTML file through internet.
With GIVE-HUG, you can generate HTML file of your customized genome browser much easier. Open the URL http://localhost:40080/data-hub.html
on local machine or http://<host domain name>:40080/data-hub.html
in web access mode, you will find your data hub, which lists all the existing data tracks. In HTML generator mode, it can generate a HTML file according to your customization. Please check the tutorial of GIVE Data Hub and GIVE-Docker to learn more.