A comprehensive eQTL study requires first obtaining genetic markers as well as expression profiles for each individual in the population, then taking the expression of each target gene as a trait (termed an expression trait, eTrait) and determining whether some markers are statistically associated with the eTrait by association analysis, and finally identifying candidate genes or regulatory sequences around the associated markers through various additional evidence. Usually, eQTL can be classified into two types: cis and trans. A cis-acting eQTL is an eQTL for a gene that is localized around that gene, indicating that sequence differences around that gene result in changes in expression levels. A trans-acting eQTL is an eQTL that is positioned distantly from the target gene it regulates, indicating that the expression level of the target gene is genetically regulated by distal factors (e.g., upstream transcription factors).
In the following protocol, we explain how to use QTLtools to identify cis- and trans- eQTL and use qqman to visualize the results.To guide eBook authors having a better sense of the workflow layout, here we briefly introduce the specific purposes of the dir system.
- cache: Here, it stores intermediate datasets or results that are generated during the preprocessing steps.
- graphs: The graphs/figures produced during the analysis.
- input: Here, we store the raw input data.
- lib: The source code, functions, or algorithms used within the workflow.
- output: The final output results of the workflow.
- workflow: Step by step pipeline.
Installing Anaconda
wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-4.1.0-Linux-x86.sh
bash Anaconda3-4.1.0-Linux-x86.sh
echo 'export PATH="~/anaconda3/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc
conda create -n eQTL_Analysis_for_Rice_RIL_population
conda activate eQTL_Analysis_for_Rice_RIL_population
Installing Htslib
conda install -c bioconda htslib
Installing Bcftools
conda install -c bioconda bcftools
Installing Samtools
conda install -c bioconda samtools
Installing R 3.6.1
conda install r-base=3.6.1
Installing QTLtools(download and unzip to use)
wget https://qtltools.github.io/qtltools/binaries/QTLtools_1.2_CentOS7.8_x86_64.tar.gz
tar xzvf QTLtools_1.2_CentOS7.8_x86_64.tar.gz
cd QTLtools_1.2_CentOS7.8_x86_64
ln -s QTLtools_1.2_CentOS7.8_x86_64 QTLtools
echo 'export PATH="~/QTLtools_1.2_CentOS7.8_x86_64:$PATH"' >> ~/.bashrc
source ~/.bashrc
The raw data are available from the National Center for Biotechnology Information Gene Expression Omnibus database under the accession number GSE49020.
- a. Genotype data(VCF/BCF format):eQTL_genotype.vcf
- b. eTrait/phenotype data(BED format):flag_leaf_eTrait.bed
sh Compression.sh
- Script content of Compression.sh
bgzip eQTL_genotype.vcf
tabix vcf eQTL_genotype.vcf.gz
bgzip flag_leaf_eTrait.bed
tabix flag_leaf_eTrait.bed.gz
sh QTLtools_cis.sh
- Script content of QTLtools_cis.sh
QTLtools cis --vcf eQTL_genotype.vcf.gz --bed flag_leaf_eTrait.bed.gz --permute 1000 --out flag_leaf_eTrait_cis_permutation.txt > running.log
QTLtools cis --vcf eQTL_genotype.vcf.gz --bed flag_leaf_eTrait.bed.gz --permute 1000 --out flag_leaf_eTrait_cis_permutation.txt --normal> running.log
The command 1 will generate three output files, one named “.best.txt.gz” containing the top eQTL for each eTrait, one named “.bins.txt.gz” containing all eQTLs with P-values below the specified threshold, and the last named “.hits.txt.gz”, containing the details of all eQTLs with P-values above the specified threshold. The command 2 permutes all eTraits and generate three files like the command 1. The command 3 will generate the file “flag_leaf_trans_005_permutations_all.txt” which contains the data in “.hits.txt.gz” and with an additional column that gives the estimated false discovery rate (FDR) for each eTrait by 100 permutations.
sh QTLtools_trans.sh
- Script content of QTLtools_trans.sh
##command 1
QTLtools trans --vcf eQTL_genotype.vcf.gz --bed flag_leaf_eTrait.bed.gz --nominal --threshold 0.05 --out flag_leaf005.trans.nominal.hits.txt.gz > running.log
##command 2
for i in {1..100};do
QTLtools trans --vcf eQTL_genotype.vcf.gz --bed flag_leaf_eTrait.bed.gz --threshold 0.05 --permute --out flag_leaf005_trans_perm_${i} --seed ${i} > running.log
done
##command 3
zcat flag_leaf005_trans_perm_*.hits.txt.gz | gzip -c > flag_leaf005_permutations_all.txt.gz
Rscript runFDR_ftrans.R flag_leaf005.trans.nominal.hits.txt.gz flag_leaf005_permutations_all.txt.gz flag_leaf_trans_005_permutations_all.txt
Note:We have made changes to qqman's Manhattan function,The modified document is Manhattan_function.R.
sh get_Ehd1_eQTL_result.sh
- Script content of get_Ehd1_eQTL_result.sh
zcat flag_leaf005.trans.nominal.hits.txt.gz | grep OsAffx.30643.1.S1_at > Ehd1_eQTL_result.txt
Rscript Manhattan.R
- Script content of Manhattan.R
source('Manhattan_function.R')
data <- read.table(" Ehd1_eQTL_result.txt ")
data <-data[,c(4,5,6,7)]
colnames(data)<-c('SNP','CHR','BP','P')
manhattan(data,main = "Manhattan Plot",suggestiveline =-log10(6.17e-04),genomewideline = -log10(6.12e-04),cex.lab=1.2, cex.axis=1.5,cex.main=1.5,annotatePval = 5e-40,annotateTop = FALSE,xlim=c(32284500,155000000),ylim=c(0,50),cex = 1.5)
- flag_leaf_eTrait_cis_permutation.txt
- flag_leaf005.trans.nominal.hits.txt.gz
- flag_leaf_trans_005_permutations_all.txt( Compare to the “flag_leaf005.trans.nominal.hits.txt.gz”,this file has an additional column that gives the estimated false discovery rate (FDR) for each eTrait by 100 permutations.)
It is a free and open source software, licensed under (choose a license from the suggested list: GPLv3, MIT, or CC BY 4.0).