forked from stephenturner/annotables
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
133 lines (98 loc) · 3.28 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
title: "annotables"
output: github_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(
cache = FALSE,
echo = TRUE,
message = FALSE,
warning = FALSE)
library(annotables)
```
[airway]: https://bioconductor.org/packages/release/data/experiment/html/airway.html
[biobroom]: http://www.bioconductor.org/packages/devel/bioc/html/biobroom.html
[Bioconductor]: https://bioconductor.org
[biomaRt]: https://bioconductor.org/packages/release/bioc/html/biomaRt.html
[DESeq2]: https://bioconductor.org/packages/release/bioc/html/DESeq2.html
[devtools]: https://cran.r-project.org/package=devtools
[dplyr]: http://dplyr.tidyverse.org
[R]: https://www.r-project.org
[tibble]: http://tibble.tidyverse.org
[![DOI](https://zenodo.org/badge/3882/stephenturner/annotables.svg)](https://zenodo.org/badge/latestdoi/3882/stephenturner/annotables)
Provides tables for converting and annotating Ensembl Gene IDs.
## Installation
This is an [R][] package.
### [Bioconductor][] method
```{r, eval=FALSE}
source("https://bioconductor.org/biocLite.R")
biocLite("stephenturner/annotables")
```
### [devtools][] method
```{r, eval=FALSE}
install.packages("devtools")
devtools::install_github("stephenturner/annotables")
```
## Rationale
Many bioinformatics tasks require converting gene identifiers from one convention to another, or annotating gene identifiers with gene symbol, description, position, etc. Sure, [biomaRt][] does this for you, but I got tired of remembering biomaRt syntax and hammering Ensembl's servers every time I needed to do this.
This package has basic annotation information from **`r ensembl_version`** for:
- Human build 38 (`grch38`)
- Human build 37 (`grch37`)
- Mouse (`grcm38`)
- Rat (`rnor6`)
- Chicken (`galgal5`)
- Worm (`wbcel235`)
- Fly (`bdgp6`)
- Macaque (`mmul801`)
Where each table contains:
- `ensgene`: Ensembl gene ID
- `entrez`: Entrez gene ID
- `symbol`: Gene symbol
- `chr`: Chromosome
- `start`: Start
- `end`: End
- `strand`: Strand
- `biotype`: Protein coding, pseudogene, mitochondrial tRNA, etc.
- `description`: Full gene name/description
Additionally, there are `tx2gene` tables that link Ensembl gene IDs to Ensembl transcript IDs.
## Usage
```{r, eval=FALSE}
library(annotables)
```
Look at the human genes table (note the description column gets cut off because the table becomes too wide to print nicely):
```{r}
grch38
```
Look at the human genes-to-transcripts table:
```{r}
grch38_tx2gene
```
Tables are saved in [tibble][] format, pipe-able with [dplyr][]:
```{r, results='asis'}
grch38 %>%
dplyr::filter(biotype == "protein_coding" & chr == "1") %>%
dplyr::select(ensgene, symbol, chr, start, end, description) %>%
head %>%
knitr::kable(.)
```
Example with [DESeq2][] results from the [airway][] package, made tidy with [biobroom][]:
```{r}
library(DESeq2)
library(airway)
data(airway)
airway <- DESeqDataSet(airway, design = ~cell + dex)
airway <- DESeq(airway)
res <- results(airway)
# tidy results with biobroom
library(biobroom)
res_tidy <- tidy.DESeqResults(res)
head(res_tidy)
```
```{r, results='asis'}
res_tidy %>%
dplyr::arrange(p.adjusted) %>%
head(20) %>%
dplyr::inner_join(grch38, by = c("gene" = "ensgene")) %>%
dplyr::select(gene, estimate, p.adjusted, symbol, description) %>%
knitr::kable(.)
```