Merge pull request #6 from Aveglia/dev

Dev
Aveglia · Oct 21, 2021 · 898e209 · 898e209
2 parents 5d5c0bc + 5431bac
commit 898e209
Show file tree

Hide file tree

Showing 9 changed files with 861 additions and 108 deletions.
diff --git a/README.md b/README.md
@@ -8,9 +8,9 @@
 [![release](https://img.shields.io/github/v/release/Aveglia/vAMPirus?label=release&logo=github)](https://github.com/Aveglia/vAMPirus/releases/latest)
 
 # Table of contents
-* [New in vAMPirus version 2.0.0](#New-in-vAMPirus-version-2.0.0)
+* [New in vAMPirus version 2.0.1](#New-in-vAMPirus-version-2.0.1)
 * [Quick intro](#Quick-intro)
-  * [Contact/support](#Contact/support)  
+  * [Contact/support](#Contact/support)
 * [Getting started](#Getting-started)
   * [Order of operations](##Order-of-operations)
     * [Dependencies](###Dependencies-(See-How-to-cite))
@@ -21,7 +21,7 @@
 * [Running vAMPirus](#Running-vAMPirus)
 * [Who to cite](#Who-to-cite)
 
-# New in vAMPirus version 2.0.0
+# New in vAMPirus version 2.0.1
 
 1. Reduced redundancy of processes and the volume of generated result files per full run (Example - read processing only done once if running DataCheck then Analyze).
 
@@ -35,7 +35,9 @@
 
 6. (EXPERIMENTAL) Added Minimum Entropy Decomposition analysis using the oligotyping program produced by the Meren Lab. This allows for sequence clustering based on sequence positions of interest (biologically meaningful) or top positions with the highest Shannon's Entropy (read more here: https://merenlab.org/software/oligotyping/ ; and below).
 
-7. Color nodes on phylogenetic trees based on Taxonomy or Minimum Entropy Decomposition results
+7. Phylogeny-based clustering ASV or AminoType sequences with TreeCluster (https://github.com/niemasd/TreeCluster; https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0221068)
+
+8. Color nodes on phylogenetic trees based on Taxonomy or Minimum Entropy Decomposition results
 
 8. PCoA plots added to Analyze  report if NMDS does not converge.
 
@@ -68,7 +70,7 @@ If you have a feature request or any feedback/questions, feel free to email vAMP
 
 ## Quick order of operations
 
-1. Clone vAMPirus from github   
+1. Clone vAMPirus from github
 
 2. Before launching the vAMPirus.nf, be sure to run the vampirus_startup.sh script to install dependencies and/or databases (NOTE: You will need to have the xz program installed before running startup script when downloading the RVDB database)
 
@@ -85,7 +87,7 @@ If you have a feature request or any feedback/questions, feel free to email vAMP
 8. Explore results directories and produced final reports
 
 
-### Installing dependencies (see Who to cite section)    
+### Installing dependencies (see Who to cite section)
 
 If you plan on using Conda to run vAMPirus, all dependencies will be installed as a Conda environment automatically with the vampirus_startup.sh script.
 
@@ -213,19 +215,19 @@ Launch commands for testing (you do not need to edit anything in the config file
 
 ### DataCheck test =>
 
-      /path/to/nextflow run /path/to/vAMPirus.nf -c /path/to/vampirus.config -profile conda,test --DataCheck
+      `/path/to/nextflow run /path/to/vAMPirus.nf -c /path/to/vampirus.config -profile conda,test --DataCheck -resume`
 
 OR
 
-      nextflow run vAMPirus.nf -c vampirus.config -profile singularity,test --DataCheck
+      `nextflow run vAMPirus.nf -c vampirus.config -profile singularity,test --DataCheck -resume`
 
 ### Analyze test =>
 
-      `/path/to/nextflow run /path/to/vAMPirus.nf -c /path/to/vampirus.config -profile conda,test --Analyze`
+      `/path/to/nextflow run /path/to/vAMPirus.nf -c /path/to/vampirus.config -profile conda,test --Analyze -resume`
 
 OR
 
-      `nextflow run vAMPirus.nf -c vampirus.config -profile singularity,test --Analyze`
+      `nextflow run vAMPirus.nf -c vampirus.config -profile singularity,test --Analyze -resume`
 
 
 # Running vAMPirus
@@ -305,3 +307,5 @@ If you do use vAMPirus for your analyses, please cite the following ->
 15. UNOISE algorithm - R.C. Edgar (2016). UNOISE2: improved error-correction for Illumina 16S and ITS amplicon sequencing, https://doi.org/10.1101/081257
 
 16. Oligotyping - A. Murat Eren, Gary G. Borisy, Susan M. Huse, Jessica L. Mark Welch (2014). Oligotyping analysis of the human oral microbiome. Proceedings of the National Academy of Sciences Jul 2014, 111 (28) E2875-E2884; DOI: 10.1073/pnas.1409644111
+
+17. Balaban M, Moshiri N, Mai U, Jia X, Mirarab S (2019). "TreeCluster: Clustering biological sequences using phylogenetic trees." PLoS ONE. 14(8):e0221068. doi:10.1371/journal.pone.0221068
diff --git a/bin/vAMPirus_DC_Report.Rmd b/bin/vAMPirus_DC_Report.Rmd
@@ -39,11 +39,9 @@ library(scales)
 library(cowplot)
 library(dplyr)
 library(plotly)
-#library(BiocParallel)
 library(knitr)
-library(kableExtra) #install.packages("kableExtra")
+library(kableExtra)
 library(rmarkdown)
-#register(MulticoreParam(4))
 ```
 
 ```{r colors, include=FALSE}
@@ -360,6 +358,12 @@ postlhp <- postlhp %>% layout(yaxis = list(title = "Count"))
 postlhp <- postlhp %>% layout(xaxis = list(title = "Read Length"))
 postlhp <- postlhp %>% config(toImageButtonOptions=list(format='svg',filename='ReadsperLen_postfilt', height= 500, width= 800, scale= 1))
 postlhp
+```
+<br>
+```{bash load_datasets_bash, include=FALSE}
+mv *AminoType_PairwiseDistance.matrix ./amino_matrix.txt
+mv *_ASV_PairwiseDistance.matrix ./asv_matrix.txt
+
 ```
 
 <br>
@@ -383,24 +387,35 @@ nnp
 NOTE: The "1" on the x-axis represents number of ASVs identified by vsearch
 <br>
 
-### Number of pcASVs per clustering percentage
-
-<br>
-
-```{r prot_number, echo=FALSE}
-pn=read.csv("number_per_percentage_prot.csv", header=TRUE)
-#pn=read.csv("number_per_percentage_prot.csv", header=T)
-pnp <- plot_ly(pn, x=pn[,1], y=pn[,2], type="scatter", mode = 'lines+markers', marker=list(color='#088da5', line=list(color = 'black',
-               width = .1)), hovertemplate = paste('ID%: %{x}','<br>Number of pcASVs: %{y}','<extra></extra>'))
-pnp <- pnp %>% layout(yaxis = list(title = "Number of pcASVs"))
-pnp <- pnp %>% layout(xaxis = list(title = "Clustering ID %"))
-pnp <- pnp %>% config(toImageButtonOptions=list(format='svg',filename='Protclustresults', height= 500, width= 800, scale= 1))
-pnp
+<div class="rectangle"><h2 style="color:white">&nbsp;&nbsp;ASV Pairwise Distance Heatmap</h2></div>
+
+<br>
+<br>
+```{r asvheatmap, echo=FALSE}
+simmatrix<- read.csv("asv_matrix.txt", header=FALSE)
+rownames(simmatrix) <- simmatrix[,1]
+simmatrix <- simmatrix[,-1]
+colnames(simmatrix) <-rownames(simmatrix)
+cols <- dim(simmatrix)[2]
+simmatrix$AA <- rownames(simmatrix)
+rval=nrow(simmatrix)
+simmatrix2 <- simmatrix %>%
+  gather(1:rval, key=sequence, value=Distance)
+x=reorder(simmatrix2$AA,simmatrix2$Distance)
+y=reorder(simmatrix2$sequence,simmatrix2$Distance)
+similaritymatrix <- ggplot(simmatrix2, aes(x=x, y=y,fill=Distance))+
+      geom_raster()+
+      scale_fill_distiller(palette="Spectral")+
+      theme(axis.text.x = element_text(angle = 90))+
+      theme(axis.title.x=element_blank())+
+      theme(axis.title.y=element_blank())
+
+heat <- ggplotly(similaritymatrix)
+heat <- heat %>% config(toImageButtonOptions=list(format='svg',filename='heatmap', height= 500, width= 800, scale= 1))
+heat
 ```
-NOTE: The "1" represents the number of AminoTypes which are unique amino acid sequences in your dataset
 <br>
 
-
 ### ASV Shannon Entropy Analysis (https://merenlab.org/2012/05/11/oligotyping-pipeline-explained/)
 
 <br>
@@ -435,6 +450,53 @@ paged_table(med_asv_csv, options = list(rows.print = 10))
 ```
 <br>
 
+### Number of pcASVs per clustering percentage
+
+<br>
+
+```{r prot_number, echo=FALSE}
+pn=read.csv("number_per_percentage_prot.csv", header=TRUE)
+#pn=read.csv("number_per_percentage_prot.csv", header=T)
+pnp <- plot_ly(pn, x=pn[,1], y=pn[,2], type="scatter", mode = 'lines+markers', marker=list(color='#088da5', line=list(color = 'black',
+               width = .1)), hovertemplate = paste('ID%: %{x}','<br>Number of pcASVs: %{y}','<extra></extra>'))
+pnp <- pnp %>% layout(yaxis = list(title = "Number of pcASVs"))
+pnp <- pnp %>% layout(xaxis = list(title = "Clustering ID %"))
+pnp <- pnp %>% config(toImageButtonOptions=list(format='svg',filename='Protclustresults', height= 500, width= 800, scale= 1))
+pnp
+```
+NOTE: The "1" represents the number of AminoTypes which are unique amino acid sequences (AminoTypes) in your dataset
+
+<br>
+<div class="rectangle"><h2 style="color:white">&nbsp;&nbsp;AminoType Pairwise Distance Heatmap</h2></div>
+
+<br>
+<br>
+```{r aminoheatmap, echo=FALSE}
+simmatrix<- read.csv("amino_matrix.txt", header=FALSE)
+rownames(simmatrix) <- simmatrix[,1]
+simmatrix <- simmatrix[,-1]
+colnames(simmatrix) <-rownames(simmatrix)
+cols <- dim(simmatrix)[2]
+simmatrix$AA <- rownames(simmatrix)
+rval=nrow(simmatrix)
+simmatrix2 <- simmatrix %>%
+  gather(1:rval, key=sequence, value=Distance)
+x=reorder(simmatrix2$AA,simmatrix2$Distance)
+y=reorder(simmatrix2$sequence,simmatrix2$Distance)
+similaritymatrix <- ggplot(simmatrix2, aes(x=x, y=y,fill=Distance))+
+      geom_raster()+
+      scale_fill_distiller(palette="Spectral")+
+      theme(axis.text.x = element_text(angle = 90))+
+      theme(axis.title.x=element_blank())+
+      theme(axis.title.y=element_blank())
+
+heat <- ggplotly(similaritymatrix)
+heat <- heat %>% config(toImageButtonOptions=list(format='svg',filename='heatmap', height= 500, width= 800, scale= 1))
+heat
+```
+<br>
+<br>
+
 ### AminoTypes Shannon Entropy Analysis (https://merenlab.org/2012/05/11/oligotyping-pipeline-explained/)
 
 <br>