-
Notifications
You must be signed in to change notification settings - Fork 14
/
vep.html
54 lines (52 loc) · 4.18 KB
/
vep.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<meta name="generator" content="pandoc" />
<meta name="author" content="Variant annotation" />
<title>NGS data analysis course</title>
<style type="text/css">code{white-space: pre;}</style>
<link rel="stylesheet" href="../../../Commons/css_template_for_examples.css" type="text/css" />
</head>
<body>
<div id="header">
<h1 class="title"><a href="http://ngscourse.github.io/">NGS data analysis course</a></h1>
<h2 class="author"><strong>Variant annotation</strong></h2>
<h3 class="date"><em>(updated 21-10-2015)</em></h3>
</div>
<!-- COMMON LINKS HERE -->
<h1 id="variant-annotation-with-vep">Variant annotation with VEP</h1>
<p>In this part of the course we will focus on filtering, annotating and prioritisation of the variants we have called with GATK.</p>
<p>We are going to work in the annotation folder:</p>
<pre><code>cd /home/training/ngs_course/annotation</code></pre>
<h2 id="compressing-and-indexing-vcf-files">1. Compressing and indexing VCF files</h2>
<p>In order to further process the variants in the VCF file we created with GATK we first need to compress and index it.</p>
<pre><code>bgzip -c ~/ngs_course/calling/example_0/006-dna_chr21_100_he_pe_filtered.vcf > dna_chr21_100_he_pe_filtered.vcf.gz
tabix -p vcf dna_chr21_100_he_pe_filtered.vcf.gz</code></pre>
<p>The with bgzip compressed files end with <code>.vcf.gz</code> and the corresponding tabix index files have the extension <code>.vcf.gz.tbi</code>.</p>
<h2 id="quality-control-of-variants">2. Quality control of variants</h2>
<p>Summary statistics can be generated with bcftools stats. Run the command below to generate the summary statistics and the corresponding report and take a look the results.</p>
<pre><code>bcftools stats dna_chr21_100_he_pe_filtered.vcf.gz > vcfstats.txt</code></pre>
<h2 id="variant-effect-prediction-and-filtering">3. Variant effect prediction and filtering</h2>
<p>In order to predict the functional consequences of the variants on genes and to gain additional useful annotation/information about their deleteriousness we can run Ensembl’s Variant Effect Predictor (VEP). The consequence types that are predicted by VEP are illustrated here. VEP can be run on the command line (see below) or directly via <a href="http://www.ensembl.org/info/docs/tools/vep/index.html">Ensembl’s web page</a>.</p>
<h2 id="vep-command-line-tool">VEP command line tool</h2>
<p>There is a limit to the size of the files that can be uploaded to the web page. If the files are getting very big it is advised to use VEP Perl scrip variant_effect_predictor.pl from the command line.</p>
<pre><code>~/soft/ensembl-vep-82/scripts/variant_effect_predictor/variant_effect_predictor.pl</code></pre>
<p>Take a look at the documentation and make yourself familiar with the command line parameters we are using here. You can also check the documentation at their website: http://grch37.ensembl.org/info/docs/tools/vep/script/vep_options.html</p>
<p>Let’s predict the effect of the variants detected in our VCF.</p>
<pre><code>~/soft/ensembl-vep-82/scripts/variant_effect_predictor/variant_effect_predictor.pl \
--offline \
--input_file dna_chr21_100_he_pe_filtered.vcf.gz \
--output_file test \
--vcf \
--stats_file test_summary.html \
--assembly GRCh37</code></pre>
<p>When it finishes, take a look at the output files.</p>
<p>The overall results are summarised in a neat HTML file ending with <code>_summary.html</code> that can be viewed in your browser.</p>
<h3 id="exercise">Exercise:</h3>
<p>Work out what command line options are needed to filter out rare coding variants with a MAF<0.01 in central european (CEU) population and run the command. Show how to filter with <code>filter_vep.pl</code> and <code>bcftools</code>.</p>
<h2 id="vep-web-application">VEP web application</h2>
<p>You can try to run the same from their web tool: <a href="http://grch37.ensembl.org/Tools/VEP">http://grch37.ensembl.org/Tools/VEP</a>.</p>
</body>
</html>