Skip to content

Commit

Permalink
Merge pull request #6 from Aveglia/dev
Browse files Browse the repository at this point in the history
Dev
  • Loading branch information
Aveglia authored Oct 21, 2021
2 parents 5d5c0bc + 5431bac commit 898e209
Show file tree
Hide file tree
Showing 9 changed files with 861 additions and 108 deletions.
24 changes: 14 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@
[![release](https://img.shields.io/github/v/release/Aveglia/vAMPirus?label=release&logo=github)](https://github.com/Aveglia/vAMPirus/releases/latest)

# Table of contents
* [New in vAMPirus version 2.0.0](#New-in-vAMPirus-version-2.0.0)
* [New in vAMPirus version 2.0.1](#New-in-vAMPirus-version-2.0.1)
* [Quick intro](#Quick-intro)
* [Contact/support](#Contact/support)
* [Contact/support](#Contact/support)
* [Getting started](#Getting-started)
* [Order of operations](##Order-of-operations)
* [Dependencies](###Dependencies-(See-How-to-cite))
Expand All @@ -21,7 +21,7 @@
* [Running vAMPirus](#Running-vAMPirus)
* [Who to cite](#Who-to-cite)

# New in vAMPirus version 2.0.0
# New in vAMPirus version 2.0.1

1. Reduced redundancy of processes and the volume of generated result files per full run (Example - read processing only done once if running DataCheck then Analyze).

Expand All @@ -35,7 +35,9 @@

6. (EXPERIMENTAL) Added Minimum Entropy Decomposition analysis using the oligotyping program produced by the Meren Lab. This allows for sequence clustering based on sequence positions of interest (biologically meaningful) or top positions with the highest Shannon's Entropy (read more here: https://merenlab.org/software/oligotyping/ ; and below).

7. Color nodes on phylogenetic trees based on Taxonomy or Minimum Entropy Decomposition results
7. Phylogeny-based clustering ASV or AminoType sequences with TreeCluster (https://github.com/niemasd/TreeCluster; https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0221068)

8. Color nodes on phylogenetic trees based on Taxonomy or Minimum Entropy Decomposition results

8. PCoA plots added to Analyze report if NMDS does not converge.

Expand Down Expand Up @@ -68,7 +70,7 @@ If you have a feature request or any feedback/questions, feel free to email vAMP

## Quick order of operations

1. Clone vAMPirus from github
1. Clone vAMPirus from github

2. Before launching the vAMPirus.nf, be sure to run the vampirus_startup.sh script to install dependencies and/or databases (NOTE: You will need to have the xz program installed before running startup script when downloading the RVDB database)

Expand All @@ -85,7 +87,7 @@ If you have a feature request or any feedback/questions, feel free to email vAMP
8. Explore results directories and produced final reports


### Installing dependencies (see Who to cite section)
### Installing dependencies (see Who to cite section)

If you plan on using Conda to run vAMPirus, all dependencies will be installed as a Conda environment automatically with the vampirus_startup.sh script.

Expand Down Expand Up @@ -213,19 +215,19 @@ Launch commands for testing (you do not need to edit anything in the config file

### DataCheck test =>

/path/to/nextflow run /path/to/vAMPirus.nf -c /path/to/vampirus.config -profile conda,test --DataCheck
`/path/to/nextflow run /path/to/vAMPirus.nf -c /path/to/vampirus.config -profile conda,test --DataCheck -resume`

OR

nextflow run vAMPirus.nf -c vampirus.config -profile singularity,test --DataCheck
`nextflow run vAMPirus.nf -c vampirus.config -profile singularity,test --DataCheck -resume`

### Analyze test =>

`/path/to/nextflow run /path/to/vAMPirus.nf -c /path/to/vampirus.config -profile conda,test --Analyze`
`/path/to/nextflow run /path/to/vAMPirus.nf -c /path/to/vampirus.config -profile conda,test --Analyze -resume`

OR

`nextflow run vAMPirus.nf -c vampirus.config -profile singularity,test --Analyze`
`nextflow run vAMPirus.nf -c vampirus.config -profile singularity,test --Analyze -resume`


# Running vAMPirus
Expand Down Expand Up @@ -305,3 +307,5 @@ If you do use vAMPirus for your analyses, please cite the following ->
15. UNOISE algorithm - R.C. Edgar (2016). UNOISE2: improved error-correction for Illumina 16S and ITS amplicon sequencing, https://doi.org/10.1101/081257

16. Oligotyping - A. Murat Eren, Gary G. Borisy, Susan M. Huse, Jessica L. Mark Welch (2014). Oligotyping analysis of the human oral microbiome. Proceedings of the National Academy of Sciences Jul 2014, 111 (28) E2875-E2884; DOI: 10.1073/pnas.1409644111

17. Balaban M, Moshiri N, Mai U, Jia X, Mirarab S (2019). "TreeCluster: Clustering biological sequences using phylogenetic trees." PLoS ONE. 14(8):e0221068. doi:10.1371/journal.pone.0221068
98 changes: 80 additions & 18 deletions bin/vAMPirus_DC_Report.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -39,11 +39,9 @@ library(scales)
library(cowplot)
library(dplyr)
library(plotly)
#library(BiocParallel)
library(knitr)
library(kableExtra) #install.packages("kableExtra")
library(kableExtra)
library(rmarkdown)
#register(MulticoreParam(4))
```

```{r colors, include=FALSE}
Expand Down Expand Up @@ -360,6 +358,12 @@ postlhp <- postlhp %>% layout(yaxis = list(title = "Count"))
postlhp <- postlhp %>% layout(xaxis = list(title = "Read Length"))
postlhp <- postlhp %>% config(toImageButtonOptions=list(format='svg',filename='ReadsperLen_postfilt', height= 500, width= 800, scale= 1))
postlhp
```
<br>
```{bash load_datasets_bash, include=FALSE}
mv *AminoType_PairwiseDistance.matrix ./amino_matrix.txt
mv *_ASV_PairwiseDistance.matrix ./asv_matrix.txt
```

<br>
Expand All @@ -383,24 +387,35 @@ nnp
NOTE: The "1" on the x-axis represents number of ASVs identified by vsearch
<br>

### Number of pcASVs per clustering percentage

<br>

```{r prot_number, echo=FALSE}
pn=read.csv("number_per_percentage_prot.csv", header=TRUE)
#pn=read.csv("number_per_percentage_prot.csv", header=T)
pnp <- plot_ly(pn, x=pn[,1], y=pn[,2], type="scatter", mode = 'lines+markers', marker=list(color='#088da5', line=list(color = 'black',
width = .1)), hovertemplate = paste('ID%: %{x}','<br>Number of pcASVs: %{y}','<extra></extra>'))
pnp <- pnp %>% layout(yaxis = list(title = "Number of pcASVs"))
pnp <- pnp %>% layout(xaxis = list(title = "Clustering ID %"))
pnp <- pnp %>% config(toImageButtonOptions=list(format='svg',filename='Protclustresults', height= 500, width= 800, scale= 1))
pnp
<div class="rectangle"><h2 style="color:white">&nbsp;&nbsp;ASV Pairwise Distance Heatmap</h2></div>

<br>
<br>
```{r asvheatmap, echo=FALSE}
simmatrix<- read.csv("asv_matrix.txt", header=FALSE)
rownames(simmatrix) <- simmatrix[,1]
simmatrix <- simmatrix[,-1]
colnames(simmatrix) <-rownames(simmatrix)
cols <- dim(simmatrix)[2]
simmatrix$AA <- rownames(simmatrix)
rval=nrow(simmatrix)
simmatrix2 <- simmatrix %>%
gather(1:rval, key=sequence, value=Distance)
x=reorder(simmatrix2$AA,simmatrix2$Distance)
y=reorder(simmatrix2$sequence,simmatrix2$Distance)
similaritymatrix <- ggplot(simmatrix2, aes(x=x, y=y,fill=Distance))+
geom_raster()+
scale_fill_distiller(palette="Spectral")+
theme(axis.text.x = element_text(angle = 90))+
theme(axis.title.x=element_blank())+
theme(axis.title.y=element_blank())
heat <- ggplotly(similaritymatrix)
heat <- heat %>% config(toImageButtonOptions=list(format='svg',filename='heatmap', height= 500, width= 800, scale= 1))
heat
```
NOTE: The "1" represents the number of AminoTypes which are unique amino acid sequences in your dataset
<br>


### ASV Shannon Entropy Analysis (https://merenlab.org/2012/05/11/oligotyping-pipeline-explained/)

<br>
Expand Down Expand Up @@ -435,6 +450,53 @@ paged_table(med_asv_csv, options = list(rows.print = 10))
```
<br>

### Number of pcASVs per clustering percentage

<br>

```{r prot_number, echo=FALSE}
pn=read.csv("number_per_percentage_prot.csv", header=TRUE)
#pn=read.csv("number_per_percentage_prot.csv", header=T)
pnp <- plot_ly(pn, x=pn[,1], y=pn[,2], type="scatter", mode = 'lines+markers', marker=list(color='#088da5', line=list(color = 'black',
width = .1)), hovertemplate = paste('ID%: %{x}','<br>Number of pcASVs: %{y}','<extra></extra>'))
pnp <- pnp %>% layout(yaxis = list(title = "Number of pcASVs"))
pnp <- pnp %>% layout(xaxis = list(title = "Clustering ID %"))
pnp <- pnp %>% config(toImageButtonOptions=list(format='svg',filename='Protclustresults', height= 500, width= 800, scale= 1))
pnp
```
NOTE: The "1" represents the number of AminoTypes which are unique amino acid sequences (AminoTypes) in your dataset

<br>
<div class="rectangle"><h2 style="color:white">&nbsp;&nbsp;AminoType Pairwise Distance Heatmap</h2></div>

<br>
<br>
```{r aminoheatmap, echo=FALSE}
simmatrix<- read.csv("amino_matrix.txt", header=FALSE)
rownames(simmatrix) <- simmatrix[,1]
simmatrix <- simmatrix[,-1]
colnames(simmatrix) <-rownames(simmatrix)
cols <- dim(simmatrix)[2]
simmatrix$AA <- rownames(simmatrix)
rval=nrow(simmatrix)
simmatrix2 <- simmatrix %>%
gather(1:rval, key=sequence, value=Distance)
x=reorder(simmatrix2$AA,simmatrix2$Distance)
y=reorder(simmatrix2$sequence,simmatrix2$Distance)
similaritymatrix <- ggplot(simmatrix2, aes(x=x, y=y,fill=Distance))+
geom_raster()+
scale_fill_distiller(palette="Spectral")+
theme(axis.text.x = element_text(angle = 90))+
theme(axis.title.x=element_blank())+
theme(axis.title.y=element_blank())
heat <- ggplotly(similaritymatrix)
heat <- heat %>% config(toImageButtonOptions=list(format='svg',filename='heatmap', height= 500, width= 800, scale= 1))
heat
```
<br>
<br>

### AminoTypes Shannon Entropy Analysis (https://merenlab.org/2012/05/11/oligotyping-pipeline-explained/)

<br>
Expand Down
Loading

0 comments on commit 898e209

Please sign in to comment.