Depeters_RumenSampling2018_C.Rmd

---
title: "Depeters Study 2018"
author: "Jill Hagey"
date: "Started: 10/5/18, Completed: 9/XX/2020"
output: 
  html_document:
    theme: spacelab 
    toc: true
    toc_depth: 2
    toc_float: true
    df_print: paged
    highlight: espresso
---

#Research questions
The primary goal of this study was a comparison of grab sample, stomach tubing and feces to understand how different sampling methods will effect the microbial communities found in samples. The current "gold standard" for surveying the rumen microbiome is with a grab sample from the rument that contains both liquid and solid particles. On a commercial dairy, fecal sampling is easy to do. Stomach tube could be done with a little more time. If a fecal sample is not representative of the stomach tube, then there is no sense doing the fecal sampling as a monitor for rumen conditions.  In reality, if the stomach tube and the fecal sample do not reflect the grab sample (gold standard) then neither would be used to monitor rumen microbial health (populations).

###We seek to answer the following questions:
* How are sample types different?
    + Alpha diversity (richness and evenness)
    + Beta diversity
    + Differentially abundant and differentially variable ASVs.
* What ASVs are shared between samples of the same type.

A secondary question is the decomposition of the grab sample (liquid strained & solid). The grab sample was separated into liquid strained and solid particulate by pressing the grab sample through cheese cloth to get liquid strained and solid particulate. We will have a closer look at what communites are in what parts of the grab sample.

###Other Questions of interest:
* Have a look at the feed microbiome from the two different kits (TMR_plant_kit and TMR_fecal_kit) to see if there is a relationship between the feed and sample type.
* Typically, the liquid unstrained is what we would collect from a rumen fistulated cow and then transfaunate using a stomach tube into a sick cow that is experiencing simple indigestion. How does the microbial population of the liquid unstrained compared with the grab sample, liquid strained, and solid? That is to say, when we transfaunate what mircrobial populations are we transfering.
* How constant is the rumen population over time in the same animal?
    + **This can't be tested as we only have one sample per day and thus can estimate variations on a day** 
* There is one Jersey in the study is her microbiome different from the holstiens? 
    + **We can't really answer this as we only have an n=1**

```{r setup, echo=FALSE, include=FALSE, warning=FALSE}
#Setting working directory. Pick One
#Use this one for lab computer
#setwd("C:/Users/Jill/OneDrive - UC Davis/Documents/collaboration/Depeters/DADA2_Out/Demultiplex_Redo/")
#Just a bit of house keeping to set the working directory
#knitr::opts_knit$set(root.dir = "C:/Users/Jill/OneDrive - UC Davis/Documents/collaboration/Depeters/DADA2_Out/Demultiplex_Redo/")
knitr::opts_chunk$set(echo = FALSE, warning=FALSE)
#Use thigs one for my own computer
setwd("C:/Users/Jill/OneDrive - UC Davis/Documents/collaboration/Depeters/DADA2_Out/Demultiplex_Redo/")
#Just a bit of house keeping to set the working directory
knitr::opts_knit$set(root.dir = "C:/Users/Jill/OneDrive - UC Davis/Documents/collaboration/Depeters/DADA2_Out/Demultiplex_Redo/")

#calling in custom alpha diversity plotting script
#read_chunk('C:/Users/Jill/OneDrive - UC Davis/Documents/collaboration/Depeters/DADA2_Out/plot_alpha_estimates_custom.R')

#Use thigs one for my own computer
#setwd("C:/Users/Jill/OneDrive - UC Davis/Documents/collaboration/Depeters/DADA2_Out/Demultiplex_Redo/")
#Just a bit of house keeping to set the working directory
#knitr::opts_knit$set(root.dir = "C:/Users/Jill/OneDrive - UC Davis/Documents/collaboration/Depeters/DADA2_Out/Demultiplex_Redo/")
getwd()
```

```{r loading packages, include=FALSE, error=FALSE, warning=FALSE}
#load the packages
library(dada2); packageVersion("dada2")
library(phyloseq); packageVersion("phyloseq")
library(breakaway); packageVersion("breakaway")
library(DivNet); packageVersion("DivNet")
library(corncob); packageVersion("corncob")
library(structSSI); packageVersion("structSSI")
library(ggplot2); packageVersion("ggplot2")
library(reshape2); packageVersion("reshape2")
library(plotly); packageVersion("plotly")
library(dplyr); packageVersion("dplyr")
library(tibble); packageVersion("tibble")
library(doSNOW); packageVersion("doSNOW")
library(knitr); packageVersion("knitr")
library(tidyr); packageVersion("tidyr")
library(kableExtra); packageVersion("kableExtra")
library(Biostrings); packageVersion("Biostrings")
library(ggrepel); packageVersion("ggrepel")
library(stringr); packageVersion("stringr")
library(magrittr); packageVersion("magrittr")
library(cowplot); packageVersion("cowplot")
library(xlsx); packageVersion("xlsx")
library(RColorBrewer); packageVersion("RColorBrewer")
#library(stargazer); packageVersion("stargazer")
```

Note that prior to running DADA2 sequences were cleaned with kneaddata and then demuliplexed and primers trimmed with cuteadapt. Code for this is available at my [GitHub Page](https://github.com/Jill/Depeters_RumenSampling_2018/blob/master/Clean_Up)

<!--- Calling in custom functions first --->
<!--- function to remove taxa from phyloseq object --->
```{r include=FALSE,eval=TRUE}
#running some functions I'll need for later.
#This function will return a phyloseq object with the taxa we want to keep 
pop_taxa_keep = function(physeq, goodTaxa){
  allTaxa = taxa_names(physeq)
  myTaxa <- allTaxa[(allTaxa %in% goodTaxa)]
  return(prune_taxa(myTaxa, physeq))
}
```
<!-- custom plotting of alpha diversity output --->
```{r include=FALSE,eval=TRUE}
plot_alpha_estimates_custom <- function(x, physeq = NULL, measure = NULL, facet.y=NULL, facet.x=NULL, shrink=NULL,
                                        color = NULL, shape = NULL, title = NULL, trim_plot = FALSE, ...) {
  
  if (!is.null(shrink)) { #checks to make sure x isn't a breakaway object 
    name_check <- x %>% lapply(function(x) x$name) %>% unlist %>% unique
    if(name_check=="breakaway") {
      stop("You can't shrink a plot with breakaway estimates")
    }
  }
  
  if (is.null(measure)) {
    all_measures <- x %>% lapply(function(x) x$name) %>% unlist %>% unique
    measure <- all_measures[1]
  }
  
  df <- summary(x, physeq) 
  
  if (all(is.na(df$estimate))) {
    stop("There are no estimates in this alpha_estimates object!")
  }
  if (!is.null(facet.x)) { #new
    if (facet.x %in% phyloseq::sample_variables(physeq)) {
      df[["facet.x"]] <- phyloseq::get_variable(physeq, facet.x)
    } else if (length(facet.x) == nrow(df)) {
      df[["facet.x"]] <- facet.x
    } else {
      stop("facet must either match a variable or be a custom vector of correct length!")
    }
  }
  if (!is.null(facet.y)) { #new
    if (facet.y %in% phyloseq::sample_variables(physeq)) {
      df[["facet.y"]] <- phyloseq::get_variable(physeq, facet.y)
    } else if (length(facet.y) == nrow(df)) {
      df[["facet.y"]] <- facet.y
    } else {
      stop("facet.y must either match a variable or be a custom vector of correct length!")
    }
  }
  if (!is.null(color)) {
    if (color %in% phyloseq::sample_variables(physeq)) {
      df[["color"]] <- phyloseq::get_variable(physeq, color)
    } else if (length(color) == nrow(df)) {
      df[["color"]] <- color
    } else {
      stop("color must either match a variable or be a custom vector of correct length!")
    }
  } 
  if (!is.null(shape)) {
    if (shape %in% phyloseq::sample_variables(physeq)) {
      df[["shape"]] <- phyloseq::get_variable(physeq, shape)
    } else if (length(shape) == nrow(df)) {
      df[["shape"]] <- shape
    } else {
      stop("shape must either match a variable or be a custom vector of correct length!")
    }
  } 
  
  yname1 <- measure
  yname2 <- x[[1]]$estimand
  if (is.null(physeq) & !is.null(rownames(df))) {
    df$sample_names <- rownames(df)
  }
  
  if (!is.null(shrink)){ #new
    warning(paste("Warning you should not shrink your graph unless you have used a covariate in the model.\nAdditionally, only shrink on the covariate"))
    ps.data <- as.data.frame(sample_data(physeq)) #pull sample data from physeq object into dataframe
    ps.data$sample_names <- sample_names(physeq) #make a column of sample names
    df[,shrink] <- ps.data[,shrink][match(ps.data$sample_names, df$sample_names)] #add metadata column to divnet df
    df <- df[!duplicated(df$estimate),] #remove duplicates
  }
  
  if (is.null(shape) & is.null(color)) {
    my_gg <- ggplot2::ggplot(df)
  } else if (is.null(shape) & !is.null(color)) {
    aes_map <- ggplot2::aes_string(color = "color")
    my_gg <- ggplot2::ggplot(df, aes_map)
  } else if (!is.null(shape) & is.null(color)) {
    aes_map <- ggplot2::aes_string(shape = "shape")
    my_gg <- ggplot2::ggplot(df, aes_map)
  } else if (!is.null(shape) & !is.null(color)) {
    aes_map <- ggplot2::aes_string(color = "color", shape = "shape")
    my_gg <- ggplot2::ggplot(df, aes_map)
  }
  
  if (!is.null(shrink)){ #new 
    my_gg <- my_gg +
      ggplot2::geom_point(ggplot2::aes_string(x = shrink, y = "estimate"))+
      ggplot2::xlab("")+
      ggplot2::ylab(paste(yname1, "estimate of", yname2)) +
      ggplot2::labs(title = title, color=color) +
      ggplot2::theme_bw() +
      ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 45, hjust = 1))
    
    if (!(all(is.na(df$lower)) || all(is.na(df$upper)))) {
      my_gg <- my_gg + 
        ggplot2::geom_segment(ggplot2::aes_string(x = shrink, xend = shrink, y = "lower", yend = "upper"))
    }
  } else if (is.null(shrink)) { #new
    my_gg <- my_gg +
      ggplot2::geom_point(ggplot2::aes_string(x = "sample_names", y = "estimate")) +
      ggplot2::ylab(paste(yname1, "estimate of", yname2)) +
      ggplot2::xlab("") +
      ggplot2::labs(title = title, color=color) + #added color so it will have appropriate legend title
      ggplot2::theme_bw() +
      ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 45, hjust = 1))
    
  }
  if (!is.null(facet.y)) { #new
    my_gg <- my_gg +
      ggplot2::facet_grid( facet.y ~ ., scales="free", space="free_x")
  }
  
  if (!is.null(facet.x)) { #new
    #This name changing is only for this project.
    sam_names <- c(`Grab Sample` = "Grab Sample",`Feces` = "Feces",`Stomach Tube` = "Stomach Tube",`Solid` = "Solid", `Liquid Strained` = "Liquid\nStrained",`Liquid Unstrained` = "Liquid\nUnstrained")
    my_gg <- my_gg +
      ggplot2::facet_grid( . ~ facet.x, scales="free", space="free_x", labeller = as_labeller(sam_names))
  }
  
  if (is.null(shrink)) {
    if (!(all(is.na(df$lower)) || all(is.na(df$upper)))) {
      my_gg <- my_gg + 
        ggplot2::geom_segment(ggplot2::aes_string(x = "sample_names", xend = "sample_names", y = "lower", yend = "upper"))
    }                                            
  }
  
  if (!trim_plot) {
    fiven <- stats::fivenum(df$upper, na.rm = TRUE)
    iqr <- diff(fiven[c(2, 4)])
    if (!is.na(iqr)) {
      out <- df$upper < (fiven[2L] - 1.5 * iqr) | df$upper > (fiven[4L] + 1.5 * iqr)
      ylower <- min(0, 0.95*min(df$upper[!out]), na.rm = TRUE)
      yupper <- 1.05*max(df$upper[!out], na.rm = TRUE)
      
      my_gg <- my_gg +
        ggplot2::coord_cartesian(ylim = c(ylower,yupper)) 
    } 
  } 
  
  my_gg
}
```
<!--- custom plotting of corncob output --->
```{r include=FALSE,eval=TRUE}
plot.differentialTest_custom <- function(x, level = NULL, cutoff=NULL, taxa_filter=NULL, ...) {
signif_taxa <- x$significant_taxa

if ("phyloseq" %in% class(x$data)) {
  if (!is.null(x$data@tax_table)) {
    signif_taxa <- otu_to_taxonomy(signif_taxa, x$data, level = level)
    if (length(unique(signif_taxa)) != length(unique(x$significant_taxa))) {
      # Make sure if repeated taxa add unique otu identifiers
      signif_taxa <- paste0(signif_taxa, " (", x$significant_taxa, ")")
    }
  }
}
if (length(x$significant_models) != 0) {
  var_per_mod <- length(x$restrictions_DA) + length(x$restrictions_DV)
  total_var_count <- length(signif_taxa) * var_per_mod
  df <- as.data.frame(matrix(NA, nrow = total_var_count, ncol = 5))
  colnames(df) <- c("x", "xmin", "xmax", "taxa", "variable")
  qval <- stats::qnorm(.975)
  restricts_mu <- attr(x$restrictions_DA, "index")
  restricts_phi <- attr(x$restrictions_DV, "index")
  
  count <- 1
  for (i in 1:length(x$significant_models)) {
    
    # Below from print_summary_bbdml, just to get coefficient names
    tmp <- x$significant_models[[i]]
    coefs.mu <- tmp$coefficients[1:tmp$np.mu,, drop = FALSE]
    rownames(coefs.mu) <- paste0(substring(rownames(coefs.mu), 4), "\nDifferential\nAbundance")
    coefs.mu <- coefs.mu[restricts_mu,, drop = FALSE]
    
    coefs.phi <- tmp$coefficients[(tmp$np.mu + 1):nrow(tmp$coefficients),, drop = FALSE]
    rownames(coefs.phi) <- paste0(substring(rownames(coefs.phi), 5), "\nDifferential Variability")
    coefs.phi <- coefs.phi[restricts_phi - tmp$np.mu,, drop = FALSE]
    
    coefs <- rbind(coefs.mu, coefs.phi)
    for (j in 1:var_per_mod) {
      df[count, 1:3] <- c(coefs[j, 1], coefs[j, 1] - qval * coefs[j, 2],
                          coefs[j, 1] + qval * coefs[j, 2])
      df[count, 4:5] <- c(signif_taxa[i], rownames(coefs)[j])
      count <- count + 1
    }
  }
    df$Phylum <- str_extract(df$taxa, ".+?(?<=_)")
    df$Phylum <- gsub("_", "", df$Phylum)
    df$variable <- gsub("Sample_Type", "", df$variable)

  if (!is.null(taxa_filter)) {
    
    df_filtered <- df %>% filter(Phylum == taxa_filter)
    df_filtered$taxa <- gsub(paste(taxa_filter,"_", sep=""), "", df_filtered$taxa)
    
    #need to check if all taxa have all the sample types with them? 
    
    #global variables warning suppression
    taxa <- xmin <- xmax <- NULL
  
    ggplot2::ggplot(df_filtered, ggplot2::aes(x = x, y = taxa)) +
    ggplot2::geom_vline(xintercept = 0, color = "gray50", lty = "dashed", alpha = 0.75, lwd = 1) +
    ggplot2::geom_point() +
    ggplot2::geom_errorbarh(ggplot2::aes(xmin = xmin, xmax = xmax), height = .3) +
    ggplot2::theme_bw() +
    ggplot2::facet_wrap(~variable, scales = "free_x", nrow = 1) +
    ggplot2::labs(title = "", x = "", y = "Taxa") +
    ggplot2::scale_y_discrete(limits = rev(sort(unique(df_filtered$taxa)))) +
    ggplot2::scale_x_continuous(breaks = scales::pretty_breaks(n = 5)) +
    ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 45, hjust = 1))
    #ggplot2::geom_tile(aes(fill=Phylum))
     } 
  else if(is.null(taxa_filter)) {
    
    # global variables warning suppression
    taxa <- xmin <- xmax <- NULL
    
    ggplot2::ggplot(df, ggplot2::aes(x = x, y = taxa)) +
      ggplot2::geom_vline(xintercept = 0, color = "gray50", lty = "dashed", alpha = 0.75, lwd = 1) +
      ggplot2::geom_point() +
      ggplot2::geom_errorbarh(ggplot2::aes(xmin = xmin, xmax = xmax), height = .3) +
      ggplot2::theme_bw() +
      ggplot2::facet_wrap(~variable, scales = "free_x", nrow = 1) +
      ggplot2::labs(title = "", x = "", y = "Taxa") +
      ggplot2::scale_y_discrete(limits = rev(df$taxa)) +
      ggplot2::scale_x_continuous(breaks = scales::pretty_breaks(n = 5)) +
      ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 45, hjust = 1))
    #ggplot2::geom_tile(aes(x=,fill=Phylum))
   }
  }
}
```

```{r include=FALSE,eval=TRUE}
plot.differentialTest_custom_color <- function(x, level = NULL, cutoff=NULL, taxa_filter=NULL, color=NULL, ...) {
  signif_taxa <- x$significant_taxa
  
if ("phyloseq" %in% class(x$data)) {
  if (!is.null(x$data@tax_table)) {
    signif_taxa <- otu_to_taxonomy(signif_taxa, x$data, level = level)
    if (length(unique(signif_taxa)) != length(unique(x$significant_taxa))) {
      # Make sure if repeated taxa add unique otu identifiers
      signif_taxa <- paste0(signif_taxa, " (", x$significant_taxa, ")")
    }
  }
}
if (length(x$significant_models) != 0) {
  var_per_mod <- length(x$restrictions_DA) + length(x$restrictions_DV)
  total_var_count <- length(signif_taxa) * var_per_mod
  df <- as.data.frame(matrix(NA, nrow = total_var_count, ncol = 5))
  colnames(df) <- c("x", "xmin", "xmax", "taxa", "variable")
  qval <- stats::qnorm(.975)
  restricts_mu <- attr(x$restrictions_DA, "index")
  restricts_phi <- attr(x$restrictions_DV, "index")
    
  count <- 1
  for (i in 1:length(x$significant_models)) {
      
    # Below from print_summary_bbdml, just to get coefficient names
    tmp <- x$significant_models[[i]]
    coefs.mu <- tmp$coefficients[1:tmp$np.mu,, drop = FALSE]
    rownames(coefs.mu) <- paste0(substring(rownames(coefs.mu), 4), "\nDifferential\nAbundance")
    coefs.mu <- coefs.mu[restricts_mu,, drop = FALSE]
      
    coefs.phi <- tmp$coefficients[(tmp$np.mu + 1):nrow(tmp$coefficients),, drop = FALSE]
    rownames(coefs.phi) <- paste0(substring(rownames(coefs.phi), 5), "\nDifferential Variability")
    coefs.phi <- coefs.phi[restricts_phi - tmp$np.mu,, drop = FALSE]
      
    coefs <- rbind(coefs.mu, coefs.phi)
    for (j in 1:var_per_mod) {
      df[count, 1:3] <- c(coefs[j, 1], coefs[j, 1] - qval * coefs[j, 2],
                          coefs[j, 1] + qval * coefs[j, 2])
      df[count, 4:5] <- c(signif_taxa[i], rownames(coefs)[j])
      count <- count + 1
    }
  }
  df$Phylum <- str_extract(df$taxa, ".+?(?<=_)")
  df$Phylum <- gsub("_", "", df$Phylum)
  df$variable <- gsub("Sample_Type", "", df$variable)
if (is.null(color)) {
    df$taxa <- str_replace(df$taxa, paste0(df$Phylum, "_", sep=""),"")
}
if (!is.null(taxa_filter)) {
    
    df_filtered <- df %>% filter(Phylum == taxa_filter)
    df_filtered$taxa <- gsub(paste(taxa_filter,"_", sep=""), "", df_filtered$taxa)
    print(head(df_filtered))
    
      #need to check if all taxa have all the sample types with them? 
      
      #global variables warning suppression
    taxa <- xmin <- xmax <- NULL
      
    ggplot2::ggplot(df_filtered, ggplot2::aes(x = x, y = taxa, color=color)) +
      ggplot2::geom_vline(xintercept = 0, color = "gray50", lty = "dashed", alpha = 0.75, lwd = 1) +
      ggplot2::geom_point() +
      ggplot2::geom_errorbarh(ggplot2::aes(xmin = xmin, xmax = xmax), height = .3) +
      ggplot2::theme_bw() +
      ggplot2::facet_wrap(~variable, scales = "free_x", nrow = 1) +
      ggplot2::labs(title = "", x = "", y = "Taxa") +
      ggplot2::scale_y_discrete(limits = rev(sort(unique(df_filtered$taxa)))) +
      ggplot2::scale_x_continuous(breaks = scales::pretty_breaks(n = 5)) +
      ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 45, hjust = 1))
    #ggplot2::geom_tile(aes(fill=Phylum))
  } 
  else if(is.null(taxa_filter)) {
    print(head(df))
      # global variables warning suppression
    taxa <- xmin <- xmax <- NULL
    
    ggplot2::ggplot(df, ggplot2::aes(x = x, y = taxa, color=Phylum)) +
      ggplot2::geom_vline(xintercept = 0, color = "gray50", lty = "dashed", alpha = 0.75, lwd = 1) +
      ggplot2::geom_point() +
      ggplot2::geom_errorbarh(ggplot2::aes(xmin = xmin, xmax = xmax), height = .3) +
      ggplot2::theme_bw() +
      ggplot2::facet_wrap(~variable, scales = "free_x", nrow = 1) +
      ggplot2::labs(title = "", x = "", y = "Taxa") +
      ggplot2::scale_y_discrete(limits = rev(df$taxa)) +
      ggplot2::scale_x_continuous(breaks = scales::pretty_breaks(n = 5)) +
      ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 45, hjust = 1))
      #ggplot2::geom_tile(aes(x=,fill=Phylum))
    }
  }
}
```

<!--- getting df of corncob output --->
```{r include=FALSE,eval=TRUE}
get_data_CC <- function(x, taxa_filter=NULL,  ...) {
  signif_taxa <- x$significant_taxa
  
  if ("phyloseq" %in% class(x$data)) {
    if (!is.null(x$data@tax_table)) {
      signif_taxa <- otu_to_taxonomy(signif_taxa, x$data)
      signif_taxa <- paste0(signif_taxa, " (", x$significant_taxa, ")")
    }
  }
  if (length(x$significant_models) != 0) {
    var_per_mod <- length(x$restrictions_DA) + length(x$restrictions_DV)
    total_var_count <- length(signif_taxa) * var_per_mod
    df <- as.data.frame(matrix(NA, nrow = total_var_count, ncol = 5))
    colnames(df) <- c("x", "xmin", "xmax", "taxa", "variable")
    qval <- stats::qnorm(.975)
    restricts_mu <- attr(x$restrictions_DA, "index")
    restricts_phi <- attr(x$restrictions_DV, "index")
    
    count <- 1
    for (i in 1:length(x$significant_models)) {
      
      # Below from print_summary_bbdml, just to get coefficient names
      tmp <- x$significant_models[[i]]
      coefs.mu <- tmp$coefficients[1:tmp$np.mu,, drop = FALSE]
      rownames(coefs.mu) <- paste0(substring(rownames(coefs.mu), 4), "\nDifferential Abundance")
      coefs.mu <- coefs.mu[restricts_mu,, drop = FALSE]
      
      coefs.phi <- tmp$coefficients[(tmp$np.mu + 1):nrow(tmp$coefficients),, drop = FALSE]
      rownames(coefs.phi) <- paste0(substring(rownames(coefs.phi), 5), "\nDifferential Variability")
      coefs.phi <- coefs.phi[restricts_phi - tmp$np.mu,, drop = FALSE]
      
      coefs <- rbind(coefs.mu, coefs.phi)
      for (j in 1:var_per_mod) {
        df[count, 1:3] <- c(coefs[j, 1], coefs[j, 1] - qval * coefs[j, 2],
                            coefs[j, 1] + qval * coefs[j, 2])
        df[count, 4:5] <- c(signif_taxa[i], rownames(coefs)[j])
        count <- count + 1
      }
    }
    #df$Phylum <- str_extract(df$taxa, "_.+?(?<=_)")
    #df$Phylum <- gsub("_", "", df$Phylum)
    df$variable <- gsub("Sample_Type", "", df$variable)
    df$ASV <- gsub(".*\\((.*)\\).*", "\\1", df$taxa)
    #print(head(df))
    #df$Family <- gsub("(?<=_)(.*?)(?=_)", "\\4", df$taxa)
    #df$Family <- gsub("_", "", df$Family)
    #print(head(df))
    df$Phylum <- otu_to_taxonomy(df$ASV, x$data, level = "Phylum")
    df$Family <- otu_to_taxonomy(df$ASV, x$data, level = "Family")
    df$Genus <- otu_to_taxonomy(df$ASV, x$data, level = "Genus")
    return(df)
  }
}
```

<!--- getting model out of corncob data --->
```{r}
get_model_CC <- function(x, ASV,  ...) {
  models <- x$significant_models
  models[[grep(ASV, x$significant_taxa)]]
}
```

<!--- function to compare two phyloseq objects --->
```{r}
compare_phyloseq_taxa = function(physeq1, physeq2, taxa_level){
  long <- identical(get_taxa_unique(physeq1, taxa_level), get_taxa_unique(physeq2,taxa_level))
  if (long == TRUE){
  print("There are no taxa differences at this level")
  }
  if (long == FALSE) {
  print("These taxa are found in both phyloseq objects")
  print(get_taxa_unique(rumen_A,taxa_level)[get_taxa_unique(rumen_A,taxa_level) %in% get_taxa_unique(feces_A,taxa_level)])
  print("These taxa are different between the phyloseq objects")
  print(get_taxa_unique(rumen_A,taxa_level)[!(get_taxa_unique(rumen_A,taxa_level) %in% get_taxa_unique(feces_A,taxa_level))])
  }
}
```

<!--- Get abundace and SEM --->
```{r}
#only works on phylum
ps_ave_abu_phy = function(physeq){
#calculating error bars to graph mean transformed abundance of major phyla
melted <- psmelt(physeq)
grouped <- dplyr::group_by(melted[!is.na(melted$Phylum),], Sample_Type, Phylum)
phyla_five <- as.data.frame(dplyr::summarise(grouped, mean=mean(Abundance), sd=sd(Abundance), sem = (sd(Abundance)/sqrt(length(Abundance)))))
}
#only works on family
ps_ave_abu_fam = function(physeq){
#calculating error bars to graph mean transformed abundance of major phyla
melted <- psmelt(physeq)
grouped <- dplyr::group_by(melted[!is.na(melted$Family),], Sample_Type, Family)
fam_five <- as.data.frame(dplyr::summarise(grouped, mean=mean(Abundance), sd=sd(Abundance), sem = (sd(Abundance)/sqrt(length(Abundance)))))
}
#only works on genus
ps_ave_abu_gen = function(physeq){
#calculating error bars to graph mean transformed abundance of major phyla
melted <- psmelt(physeq)
grouped <- dplyr::group_by(melted[!is.na(melted$Genus),], Sample_Type, Genus)
gen_five <- as.data.frame(dplyr::summarise(grouped, mean=mean(Abundance), sd=sd(Abundance), sem = (sd(Abundance)/sqrt(length(Abundance)))))
}
```

<!--- setting custom colors for plotting --->
```{r include=FALSE,eval=TRUE}
myColors <- brewer.pal(6, "Dark2")
names(myColors) <- c("Stomach Tube","Grab Sample","Liquid Strained","Feces","Liquid Unstrained","Solid")

myColors_DPCoA <- c("#666666", "#1B9E77","#D95F02", "#7570B3", "#E7298A", "#66A61E","#E6AB02")
names(myColors_DPCoA) <- c("Taxa","Stomach Tube","Grab Sample","Liquid Strained","Feces","Liquid Unstrained","Solid")
```

#Running DADA2 to get ASVs and assign taxonomy.

This program infers exact amplicon sequence variants (ASVs) from amplicon data, resolving biological differences of even 1 or 2 nucleotides. This algorithum is prefered as DADA2 reports fewer false positive sequence variants than other methods report false OTUs. Note that this is a computationally expensive so its run on a cluster and then the R objects are read in. 

First we will read in the data and trim ends where there is poor quality.

```{r Running DADA2, eval=FALSE}
# CHANGE ME to the directory containing the fastq files after unzipping.
path <- "C:/Users/Jill/Desktop/Depeters/" 
list.files(path)
# Forward and reverse fastq filenames have format: SAMPLENAME_R1_001.fastq and SAMPLENAME_R2_001.fastq
fnFs <- sort(list.files(path, pattern="_Trim_R1.fastq.gz", full.names = TRUE))
fnRs <- sort(list.files(path, pattern="_Trim_R2.fastq.gz", full.names = TRUE))
# Extract sample names, assuming filenames have format: SAMPLENAME_XXX.fastq
sample.names <- sapply(strsplit(basename(fnFs), "_"), `[`, 1)
plotQualityProfile(fnFs[1:10])
plotQualityProfile(fnRs[1:10])

#Place filtered files in filtered/subdirectory
filtFs <- file.path(path, "filtered", paste0(sample.names, "_F_filt.fastq.gz"))
filtRs <- file.path(path, "filtered", paste0(sample.names, "_R_filt.fastq.gz"))

out <- filterAndTrim(fnFs, filtFs, fnRs, filtRs, truncLen=c(240,220),trimLeft=c(10,0),
                     maxN=0, maxEE=c(2,2), truncQ=2, rm.phix=TRUE, minLen=150,
                     compress=TRUE, multithread=FALSE, verbose=TRUE) 
head(out)
#check quality again after trimming
plotQualityProfile(filtFs[10:20])
plotQualityProfile(filtRs[10:20])
```

The next steps learn the error rates of the data and identifies unique sequences. These data are fed into the main dada2 algorithum that makes a table of ASVs. Reads are merged and chimerias removed prior to making the final ASV table. Taxaonomy was assined using the silva database.

```{r eval=FALSE}
#learn erros for DADA2 algorithm
errF <- learnErrors(filtFs, multithread=FALSE)
errR <- learnErrors(filtRs, multithread=FALSE)

plotErrors(errF, nominalQ=TRUE)

derepFs <- derepFastq(filtFs, verbose=TRUE)
derepRs <- derepFastq(filtRs, verbose=TRUE)

# Name the derep-class objects by the sample names
names(derepFs) <- sample.names
names(derepRs) <- sample.names
#run the dada2 algorithum
dadaFs <- dada(derepFs, err=errF, multithread=FALSE, pool=TRUE)
dadaRs <- dada(derepRs, err=errR, multithread=FALSE, pool=TRUE)
#checking output
dadaFs[[1]]
#Merging forward and Reverse Reads
mergers <- mergePairs(dadaFs, derepFs, dadaRs, derepRs, verbose=TRUE)
# Inspect the merger data.frame from the first sample
head(mergers[[1]])
#Construct Sequence Table
seqtab <- makeSequenceTable(mergers)
dim(seqtab)
#Removing chimeras
seqtab.nochim <- removeBimeraDenovo(seqtab, method="consensus", multithread=FALSE, verbose=TRUE)
dim(seqtab.nochim)
#
taxa_rdp <- assignTaxonomy(seqtab.nochim, "/share/tearlab/Maga/Jill/rdp_train_set_16.fa.gz", multithread=TRUE)
saveRDS(taxa_rdp, "/share/tearlab/Maga/Jill/16s_Milk_2016/DADA2/taxa_rdp.rds")
taxa.sp_rdp <- addSpecies(taxa_rdp, "/share/tearlab/Maga/Jill/rdp_species_assignment_16.fa.gz")
saveRDS(taxa.sp_rdp, "/share/tearlab/Maga/Jill/16s_Milk_2016/DADA2/taxa.sp_rdp.rds")
#
taxa_silva <- assignTaxonomy(seqtab.nochim, "/share/tearlab/Maga/Jill/silva_nr_v132_train_set.fa.gz", multithread=TRUE)
saveRDS(taxa_silva, "/share/tearlab/Maga/Jill/16s_Milk_2016/DADA2/taxa_silva.rds")
taxa.sp_silva <- addSpecies(taxa_silva, "/share/tearlab/Maga/Jill/silva_species_assignment_v132.fa.gz")
saveRDS(taxa.sp_silva, "/share/tearlab/Maga/Jill/16s_Milk_2016/DADA2/taxa.sp_silva.rds")
```

Getting information out of DADA2 Objects.

```{r Getting info out of DADA2, eval=FALSE, include=TRUE}
#making and writing out a fasta of our final ASV seqs:
#This fasta will also be used for making a tree...
asv_fasta <- c(rbind(asv_headers, asv_seqs))
write(asv_fasta, "ASVs.fa")
#count table:
asv_tab <- t(seqtab.nochim)
row.names(asv_tab) <- sub(">", "", asv_headers)
write.table(asv_tab, "ASVs_counts.txt", sep="\t", quote=F)
#tax table:
asv_tax <- sil_taxa.sp
row.names(asv_tax) <- sub(">", "", asv_headers)
write.table(asv_tax, "ASVs_taxonomy.txt", sep="\t", quote=F)
```

Let's check the sizes of the sequences as a way to determine contamination.

```{r}
setwd("C:/Users/Jill/OneDrive - UC Davis/Documents/collaboration/Depeters/DADA2_Out/Demultiplex_Redo/")
seqtab.nochim <- readRDS("seqtab.nochim.rds")
#Inspect distribution of sequence lengths
table(nchar(getSequences(seqtab.nochim)))
median(as.numeric(rownames(table(nchar(getSequences(seqtab.nochim))))))
```

These sequences have a median length of `r median(as.numeric(rownames(table(nchar(getSequences(seqtab.nochim))))))` with most are less than 390bp. The sequences longer sequences may be the result of non-specific priming. We will look at this again after specific and thoughtful filtering. If long sequences remain after filtering we will look at them closer to make sure they are infact from bacterial origin. 

Now we check the number of chimeras in the dataset.

```{r DADA2 chimeria stats}
setwd("C:/Users/Jill/OneDrive - UC Davis/Documents/collaboration/Depeters/DADA2_Out/Demultiplex_Redo/")
seqtab <- readRDS("seqtab.rds")
#checking Frequency of chimeras
sum(seqtab.nochim)/sum(seqtab)
```

Here we see that 2.01% of the sequences were identified to be chimerias and were removed from the dataset. Next, we will have a look at the read stats.

```{r DADA2 stats, eval=TRUE}
##Examining the stats of read count to through the pipeline.
#I still need to add in sample names
dadaFs <- readRDS("dadaFs.rds")
dadaRs <- readRDS("dadaRs.rds")
mergers <- readRDS("mergers.rds")
out <- readRDS("out.rds")
#Tracking read count through pipeline
getN <- function(x) sum(getUniques(x))
track <- cbind(out, sapply(dadaFs, getN), sapply(dadaRs, getN), sapply(mergers, getN), rowSums(seqtab.nochim))
colnames(track) <- c("input", "filtered", "denoisedF", "denoisedR", "merged", "nonchim")
track
```

This shows the library sizes of the samples and how many reads were removed at each step. There is `r sum(track[,"input"])` cleaned reads that entered the DADA2 pipeline. We will now get read stats for the input ASVs. 

```{r}
#Getting total read number
sum(track[,1])
#Get info on depth of sequecing for samples
data.frame("Min" = min(track[,"input"]),"Max" = max(track[,"input"]),"Mean" = mean(track[,"input"]),
           "Range" = range(track[,"input"]), "median" = median(track[,"input"]))
```

We compare this to the read stats for the final libraries.

```{r}
#Get info on depth of sequecing for samples
data.frame("Min" = min(track[,"nonchim"]),"Max" = max(track[,"nonchim"]),"Mean" = mean(track[,"nonchim"]),
           "Range" = range(track[,"nonchim"]), "Median" = median(track[,"nonchim"]))
```

##Making phyloseq object

```{r Making phyloseq object}
setwd("C:/Users/Jill/OneDrive - UC Davis/Documents/collaboration/Depeters/DADA2_Out/Demultiplex_Redo/")
asv_tab <- readRDS("asv_tab.rds")
asv_tax <- readRDS("asv_tax.rds") 
#had the following taxa that rdp didn't Entotheonellaeota, Epsilonbacteraeota, Gemmatimonadetes, Kiritimatiellaeota, Patescibacteria, BRC1 it doesn't have SR1 or Candidatus_Saccharibacteria though
TREE <- read_tree("dada2_seqs.tre")
MAP <- import_qiime_sample_data("C:/Users/Jill/OneDrive - UC Davis/Documents/collaboration/Depeters/Mapping_File_MMDR.txt")
MAP$X.SampleID <- paste0("sample_", MAP$X.SampleID) #add sample_ to SampleID column
ps <- phyloseq(otu_table(asv_tab, taxa_are_rows=TRUE), sample_data(MAP), tax_table(asv_tax), phy_tree(TREE))
sample_data(ps)$Sample_Type <- gsub("_"," ",sample_data(ps)$Sample_Type)
sample_data(ps)$CowID <- paste0("Cow_", sample_data(ps)$CowID) #corncob doesn't like numbers for factors
#sample_names(ps) <- paste0("sample_", sample_names(ps)) #Divnet/DPCoA don't like numbers for samples
ps
```

#Cleaning data
Currently, we are starting with 5,607 ASVs from 70 samples

```{r}
ps_kit <- ps #saving copy for later
ps <- subset_samples(ps, Sample_Type != c("TMR fecal kit"))
ps <- subset_samples(ps, Sample_Type != c("TMR plant kit"))
ps
```

First, we remove the kit samples to bring us down to 68 samples. We will look at these again later. We will also remove ASVs that aren't present in any samples.

```{r Removing empty ASVs, include=FALSE}
#Checking for empty samples, samples with no taxa assoicated with them (should be "FALSE").
any(sample_sums(ps) == 0)
#Checking if there are ASVs that aren't present in any samples (should be "FALSE")
any(taxa_sums(ps) == 0)
#Determining how many ASVs there are that aren't present in any sample
sum(taxa_sums(ps) == 0)
#removing ASVs that aren't present in any samples
ps <- prune_taxa(taxa_sums(ps) > 0, ps)
ps
```

There was no empty samples or taxa which is what we want. Also, there was 16 ASVs that weren't in any sample and were removed.

#More cleaning of data

To start cleaning the data we will at the number of ASVs assigned to each phylum.

```{r clean 1}
#Create table, number of features for each phyla
table(tax_table(ps)[, "Phylum"], exclude = NULL)
```

Next we will count what samples have the ASVs that aren't assigned to a phylum.
```{r clean 2}
#checking to see what samples contain the NA phyla samples.
psNA <- subset_taxa(ps, is.na(Phylum))
psNA <- prune_taxa(taxa_sums(psNA) > 0, psNA)
psNA_tab <- melt(colSums(psNA@otu_table), value.name="ASVs")
psNA_tab[,"Sample_Type"] <- psNA@sam_data$Sample_Type
psNA_tab %>% group_by(Sample_Type) %>% summarise(sum(ASVs))
```

There are 94 ASVs that weren't able to be assigned to a phylum. These unassigned taxa are found in all sample types with most of the unassigned ASVs in solid samples. *NOTE that the sum column is reads not the number of ASVs!*  We next made a fasta file from the phyloseq object with these unknown taxa so that we can blast it later. 

```{r clean 3, include=FALSE}
taxa_sp <- readRDS("C:/Users/Jill/OneDrive - UC Davis/Documents/collaboration/Depeters/DADA2_Out/Demultiplex_Redo/sil_tax_sp_final.rds")
ps3 <- phyloseq(otu_table(seqtab.nochim, taxa_are_rows=FALSE), sample_data(MAP), tax_table(taxa_sp))
ps3 <- subset_samples(ps3, Sample_Type != c("TMR_fecal_kit"))
ps3 <- subset_samples(ps3, Sample_Type != c("TMR_plant_kit"))
#ps3 <- subset_taxa(ps3, !Order %in% "Chloroplast")
ps3 <- prune_taxa(taxa_sums(ps3) > 0, ps3)
ps3 <- subset_taxa(ps3, is.na(Phylum))
```

```{r Rechecking stats output}
#Getting our seqs out
asv_seqs2 <- colnames(otu_table(ps3))
#Making fasta file
#giving our seq headers more manageable names (ASV_1, ASV_2...)
asv_seqs <- colnames(otu_table(ps3))
asv_headers <- vector(dim(otu_table(ps3))[2], mode="character")
for (i in 1:dim(otu_table(ps3))[2]) {
  asv_headers[i] <- paste(">Seq", i, sep="_")
}
#making and writing out a fasta of our final seqs:
asv_fasta <- c(rbind(asv_headers, asv_seqs))
write(asv_fasta, "ASVs_Unknowns.fa")
```

Getting back to our orginal phyloseq object: the 94 AVSs that weren't assigned to a phyla were removed for analysis. This leaves 5,497 ASVs.

```{r}
#Removing ambiguous phylum annotation
#This changes ASVs from 5,591 to 5,452
ps <- subset_taxa(ps, !is.na(Phylum) & !Phylum %in% c("", "uncharacterized"))
ps
```

Next we will compute the total and average prevalences of the ASVs in each phylum. We are defining prevalence as the number of samples in which a taxon appears at least once.

```{r}
#Compute prevalence of each feature, store as data.frame
#prevalence in the dataset we will define here as the number of samples in which a taxon appears at least once
prevdf = apply(X = otu_table(ps),
               MARGIN = ifelse(taxa_are_rows(ps), yes = 1, no = 2),
               FUN = function(x){sum(x > 0)})
#Add taxonomy and total read counts to this data.frame
prevdf = data.frame(Prevalence = prevdf, TotalAbundance = taxa_sums(ps), tax_table(ps))
#display table
plyr::ddply(prevdf, "Phylum", function(df1){cbind(mean(df1$Prevalence),sum(df1$Prevalence))}) %>%
#making table of phyla ASVs taxa 
kable(caption="Prevelance of Phyla") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10) %>%
  scroll_box(width = "100%", height = "300px")
```

Here we see that Deferribacteres and Gemmatimonadetes ASVs only has one feature so we'll just looking into this real quick.

```{r Explore phyla, warning=FALSE}
#Making phyloseq object with Gemmatimonadetes
ps_explore <- subset_taxa(ps, Phylum == c("Gemmatimonadetes"))
ps_explore <- prune_samples(sample_sums(ps_explore) > 0, ps_explore)
ps_explore@sam_data$Sample_Type
#which sample is it found in
otu_table(ps_explore)

#Making phyloseq object with Deferribacteres
ps_explore <- subset_taxa(ps, Phylum == c("Deferribacteres"))
ps_explore <- prune_samples(sample_sums(ps_explore) > 0, ps_explore)
ps_explore@sam_data$Sample_Type
#which sample is it found in
otu_table(ps_explore)
```

The phylum Deferribacteres are only in Fecal samples (2 reads) and and Gemmatimonadetes are only in Stomach Tube samples (3 reads). This suggest these groups might be important for comparing sample types, thus we will leave reads assigned to these phyla in the dataset despite their low prevelance.

Lastly, we'll check to see if chloroplasts and Mitochondria are in the data set and remove them.

```{r Remove chloroplast, include=FALSE}
#removing phyla that are assigned to chloroplasts
tax_table(subset_taxa(ps, Order == "Chloroplast"))
ps <- subset_taxa(ps, !Order %in% "Chloroplast")
tax_table(subset_taxa(ps, Family == "Mitochondria"))
ps <- subset_taxa(ps, !Family %in% "Mitochondria")
ps
```

After removing chloroplasts and mitochondria there is 5,485 ASVs left.

#Looking at metrics after filtering

```{r eval=FALSE}
#number of taxa present
ntaxa(ps)
#checking names of taxa present at specific rank
length(get_taxa_unique(ps, "Phylum"))
length(get_taxa_unique(ps, "Order"))
length(get_taxa_unique(ps, "Family"))
length(get_taxa_unique(ps, "Genus"))
length(get_taxa_unique(ps, "Species"))
#how taxa did not have species assigned
length(which(is.na(tax_table(ps)[,"Species"])))
#what percentage of taxa had species assigned
as.numeric(format((length(which(!is.na(tax_table(ps)[,"Species"])))/length(row.names(tax_table(ps))))*100, digits = 3))
```

As we have seen previously, there are `r ntaxa(ps)` ASVs in the dataset. This is composed of `r length(get_taxa_unique(ps, "Phylum"))` phyla, `r length(get_taxa_unique(ps, "Order"))` Orders, `r length(get_taxa_unique(ps, "Family"))` Families and `r length(get_taxa_unique(ps, "Genus"))` Genera.

`r length(which(is.na(tax_table(ps)[,"Species"])))` ASVs didn't have species assigned. Only `r as.numeric(format((length(which(!is.na(tax_table(ps)[,"Species"])))/length(row.names(tax_table(ps))))*100, digits = 3))`% of taxa had species assigned. For genera, `r length(which(is.na(tax_table(ps)[,"Genus"])))` ASVs didn't have a genera assigned. Only `r as.numeric(format((length(which(!is.na(tax_table(ps)[,"Genus"])))/length(row.names(tax_table(ps))))*100, digits = 3))`% of taxa had genera assigned. 

```{r}
keep <- as.data.frame(table(tax_table(ps)[which(is.na(tax_table(ps)[,"Species"])),][,"Phylum"]))
keep$total <- cbind(table(tax_table(ps)[,"Phylum"]))
keep$percent <- (keep$Freq/keep$total)*100
colnames(keep) <- c("Phylum","#ASVs with no species assignment", "Total ASVs","Percent Unassigned")
kable(keep, caption="Unassigned Species") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10) %>%
  scroll_box(width = "100%", height = "300px")
```

This table gives the frequencing and percent of ASVs not assigned to species and their phyla. This really speaks to the limitations of the methods used here to be able to give species level assigments. 

```{r eval=FALSE}
keep <- as.data.frame(table(tax_table(ps)[which(is.na(tax_table(ps)[,"Genus"])),][,"Phylum"]))
keep <- merge(keep,as.data.frame(table(tax_table(ps)[,"Phylum"])),by="Var1",all=TRUE)
keep[is.na(keep)] <- 0  #change NAs to 0
keep$percent <- (keep$Freq.x/keep$Freq.y)*100
colnames(keep) <- c("Phylum","#ASVs with no genera assignment", "Total ASVs","Percent Unassigned")
kable(keep, caption="Unassigned Genera") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10) %>%
  scroll_box(width = "100%", height = "300px")
```

This table gives the frequencing and percent of ASVs not assigned to genus and their phyla. 
```{r Counting Singletons, eval=TRUE, include=FALSE}
#How many singletons are there? How many doubletons?
singletons <- sum(rowSums(ps@otu_table@.Data)==1) #number of singletons
doubletons <- sum(rowSums(ps@otu_table@.Data)==2) #number of doubletons
tripletons <- sum(rowSums(ps@otu_table@.Data)==3) #number of tripletons
sum(singletons,doubletons,tripletons)
```

There are `r sum(singletons,doubletons,tripletons)` singletons (`r sum(singletons)`), doubletons (`r sum(doubletons)`) or tripletons (`r sum(tripletons)`). This looks pretty good and indicates that filtering was not excessive nor was a large enough part of the data to be suspicious about. We will need these for diversity metrics. 

For the last part of our cleaning process we will graph out the prevalance of ASVs assigned to each phylum.

```{r Graph phyla , fig.width=6}
#Subset to the remaining phyla
prevdf1 = subset(prevdf, Phylum %in% get_taxa_unique(ps, "Phylum"))
ggplot(prevdf1, aes(TotalAbundance, Prevalence / nsamples(ps),color=Phylum)) +
#Include a guess for parameter
  geom_hline(yintercept = 0.05, alpha = 0.5, linetype = 2) +  geom_point(size = 2, alpha = 0.7) +
  scale_x_log10() +  xlab("Total Abundance") + ylab("Prevalence [Frac. Samples]") +
  facet_wrap(~Phylum) + theme(legend.position="none")
```

#Rechecking read stats

Before moving on we will look again at the read stats to check that we still don't have reads that are too long in the dataset.

```{r Rechecking stats, echo=FALSE}
taxa_sp <- readRDS("C:/Users/Jill/OneDrive - UC Davis/Documents/collaboration/Depeters/DADA2_Out/Demultiplex_Redo/sil_tax_sp_final.rds")
ps2 <- phyloseq(otu_table(seqtab.nochim, taxa_are_rows=FALSE), sample_data(MAP), tax_table(taxa_sp))
ps2 <- subset_samples(ps2, Sample_Type != c("TMR_fecal_kit"))
ps2 <- subset_samples(ps2, Sample_Type != c("TMR_plant_kit"))
ps2 <- subset_taxa(ps2, !Order %in% "Chloroplast")
ps2 <- subset_taxa(ps2, !Family %in% "Mitochondria")
ps2 <- subset_taxa(ps2, !is.na(Phylum) & !Phylum %in% c("", "uncharacterized"))
ps2 <- prune_taxa(taxa_sums(ps2) > 0, ps2)
#ps2 <- subset_taxa(ps2, !Phylum %in% filterPhyla)
ps2
```

```{r Rechecking stats output B}
#Getting our seqs out
asv_seqs2 <- colnames(otu_table(ps2))
#Inspect distribution of sequence lengths
table(nchar(getSequences(asv_seqs2)))
```

Looks like we now only have one sample that is greater than 300bp let's see what it is.

```{r}
#find the sequence that is greater than 300bp
large_taxa <- asv_seqs2[which(nchar(getSequences(asv_seqs2)) > 300)]
#find taxa that this sequence was assigned to
tax_table(ps2)[grep(as.character(large_taxa), row.names(tax_table(ps2))),]
```

As this large sequence is a **Methanobrevibacter** and this is a common rumen bacteria its expected to be here and will be left in the data set.

#Abundance of Phyla

As the first part of the exploratory analysis we will look the general relative abundances of phyla across sample types. 

```{r warning=FALSE}
#combing by phyla and then making into relative abundance
ps_phyla <- tax_glom(ps, "Phylum")
#Making relative
ps_phyla_rel <- transform_sample_counts(ps_phyla, function(x) 100*(x/sum(x)))

#calculating error bars to graph mean transformed abundance of major phyla
phyla <- ps_ave_abu_phy(ps_phyla_rel)

#Ordering
phyla <- phyla[order(-phyla$mean),] #ordering by mean
#phyla <- phyla[order(-phyla$Phylum),] #change to order by name*
phyla[,3:5] <- format(phyla[,3:5], digits = 3, scientific=F)
kable(phyla, caption="Statistiscs for Abundance of Phyla") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10) %>%
  scroll_box(width = "100%", height = "300px")
```

We will look at relative abundance of Phyla in certain sample types. First, grab samples.

```{r}
#getting the average grab sample phyla abundance
phyla[grep("Grab Sample",phyla$Sample_Type),]
```

Next, we examine the relative abundance of Phyla in only fecal samples.

```{r}
#getting the average grab sample phyla abundance
phyla[grep("Feces",phyla$Sample_Type),]
```

These are the phyla found in all samples sorted by decending order of mean relative abundance. 

We will have a look to see if any phyla only present in only feces or only in stomach tube samples.

```{r include=FALSE}
ps_sub <- subset_samples(ps, Sample_Type != c("Feces"))
ps_sub <- subset_samples(ps, Sample_Type == c("Stomach Tube"))
ps_sub <- prune_taxa(taxa_sums(ps_sub) > 0, ps_sub)
Feces <- subset_samples(ps, Sample_Type == c("Feces"))
Feces <- prune_taxa(taxa_sums(Feces) > 0, Feces)
setdiff(get_taxa_unique(Feces, "Phylum"), get_taxa_unique(ps_sub, "Phylum"))
setdiff(get_taxa_unique(ps_sub, "Phylum"), get_taxa_unique(Feces, "Phylum"))
```

`r setdiff(get_taxa_unique(Feces, "Phylum"), get_taxa_unique(ps_sub, "Phylum"))` Was only found in fecal samples and `r setdiff(get_taxa_unique(ps_sub, "Phylum"), get_taxa_unique(Feces, "Phylum"))` was only found in stomach tube samples.

This confirms what we say earlier and we didn't identify any other phyla that are only present in these sample types.

Next we will graph out some of the different phyla based on their abundance ranges.

```{r warning=FALSE, fig.height=6, fig.width=4}
phyla$mean <- as.numeric(phyla$mean)
#Which phyla are present at greater than 3% relative abundance
lfive <- as.list(as.character(unique(phyla[which(phyla$mean > 3),]$Phylum)))
five <- subset_taxa(ps_phyla_rel, Phylum== lfive[[1]] | Phylum== lfive[[2]] | Phylum==lfive[[3]] | Phylum==lfive[[4]] | Phylum==lfive[[5]] | Phylum==lfive[[6]])
#calculating error bars to graph mean transformed abundance of major phyla
phyla_five <- ps_ave_abu_phy(five)

#Plotting relative abundance
ggplot(phyla_five, aes(x=Sample_Type, y=mean, fill= Phylum))+
  geom_bar(aes(color=Phylum, fill=Phylum), stat="identity", position=position_dodge(), width=0.5)+
  geom_errorbar(aes(ymin=mean-sem, ymax=mean+sem),width=.2, position=position_dodge())+
  geom_abline(intercept = 0, slope = 0)+
  theme_bw()+
  facet_grid(Phylum ~ .,labeller = label_parsed, scales="free", space="free_x")+
  theme(legend.position="none",axis.text.x=element_text(angle = 45, vjust = 1, hjust=1,face = "bold",color="black"),strip.text.y=element_text(angle=0,face = "bold", color="black", size=11),strip.text.x=element_text(angle=0,face = "bold",color="black",size=11), axis.text= element_text(face = "bold",color="black", size=11), axis.title=element_text(face = "bold"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),aspect.ratio = 2/1.5)+
  labs(x="", y="Average Relative Abundance")
```

This is a graph of the major phyla, defined as those present at an abundance greater than 3% relative abundance, in rumen samples. Next we will graph phyla less than 3% relative abundance.

```{r warning=FALSE, fig.width=5,  fig.height=5}
#Which phyla are present at less than 3% relative abundance
p1 <- c(get_taxa_unique(five, "Phylum"))
p2 <- as.character(unique(phyla$Phylum))
low <-  as.list(setdiff(p2,p1))
low <-subset_taxa(ps_phyla_rel, Phylum==low[[1]] | Phylum==low[[2]] | Phylum==low[[3]] | Phylum==low[[4]] | Phylum==low[[5]] | Phylum==low[[6]] | Phylum==low[[7]] | Phylum==low[[8]] | Phylum==low[[9]] | Phylum==low[[10]] | Phylum==low[[11]] | Phylum==low[[12]] | Phylum==low[[13]] | Phylum==low[[14]] | Phylum==low[[15]])
#reformating data
phyla_low <- ps_ave_abu_phy(low)

#bubble grapgh
ggplot(phyla_low, aes(x=Sample_Type , y = Phylum , size = mean)) +
  geom_point(aes(color=Sample_Type)) +
  scale_color_manual(values=myColors)+
  guides(color=FALSE) +
  scale_size_continuous(trans="exp", range=c(0, 7), breaks=c(0,0.1,0.2,0.5,1,1.5,2))+
  theme(axis.text.x=element_text(angle = 45, vjust = 1, hjust=1),legend.title=element_blank(),strip.text = element_text(face="bold",size=11, color="black"), axis.text= element_text(color="black",face = "bold",size=11), axis.title=element_text(face = "bold"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold",size=12))+
  labs(x="", color = "Sample Type", size = "Mean Percent \n Abundance")
```

This is a graph of the "minor" phyla, defined as present at an abundance below 3%, in rumen samples. I also made an interactive version of this graph to put a link for in the manuscript.

```{r warning=FALSE, fig.width=6, fig.height=6}
#Bubble graph interactive
p <- ggplot(phyla_low, aes(x=Sample_Type, y = Phylum, size = mean, text = paste("Phylum: ", Phylum,
                         "<br>Mean: ", format(mean, digits = 3, scientific=T),
                         "<br>Standard Deviation: ", format(sd, digits = 3, scientific=T)))) +
      geom_point(shape = 21, colour = "#000000", fill = "#40b8d0") +
      #geom_text(aes(label = format(phyla_low$mean, digits = 1, scientific=F)),size = 3,nudge_y=-0.5) +
      guides(color=FALSE) +
      scale_size_continuous(trans="exp", range=c(0, 7), breaks=c(0,0.1,0.2,0.5,1,1.5,2))+
      theme(axis.text.x=element_text(angle = 45, vjust = 1, hjust=1),strip.text=element_text(face = "bold"), axis.text= element_text(color="black",face = "bold",size=12), axis.title=element_text(face = "bold", size=12), legend.text=element_text(face = "bold"), legend.title=element_text(face = "bold")) +
  labs(x="", color = "Sample Type", size = "Mean Percent \n Abundance")
ggplotly(p, tooltip = "text")
#htmlwidgets::saveWidget(as.widget(ggplotly(p, tooltip = "text")), "Minor_phyla_plotly.html")
#htmlwidgets::saveWidget(as.widget(ggplotly(p, tooltip = "text")), "Minor_phyla.html")
```

You can access the interactive figure [here](https://Jill.github.io/Depeters_RumenSampling_2018/Minor_phyla_plotly.html).Below is Figure 1 A & B. 

```{r Fig 1A and B, fig.width=12,fig.height=6}
fig1a <- ggplot(phyla_five, aes(x=Sample_Type, y=mean, fill= Phylum))+
  geom_bar(aes(color=Phylum, fill=Phylum), stat="identity", position=position_dodge(), width=0.5)+
  geom_errorbar(aes(ymin=mean-sem, ymax=mean+sem),width=.2, position=position_dodge())+
  geom_abline(intercept = 0, slope = 0)+
  theme_bw()+
  facet_grid(Phylum ~ .,labeller = label_parsed, scales="free", space="free_x")+
  theme(legend.position="none",axis.text.x=element_text(angle = 45, vjust = 1, hjust=1,face = "bold",color="black"),strip.text.y=element_text(angle=0,face = "bold", color="black", size=12),strip.text.x=element_text(angle=0,face = "bold",color="black",size=12), axis.text= element_text(face = "bold",color="black", size=11), axis.title=element_text(face = "bold"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),aspect.ratio = 2/1.5)+
  labs(x="", y="Average Relative Abundance")

fig1b <- ggplot(phyla_low, aes(x=Sample_Type , y = Phylum , size = mean)) +
  geom_point(aes(color=Sample_Type)) + 
  scale_color_manual(values=myColors)+
  guides(color=FALSE) +
  scale_size_continuous(trans="exp", range=c(0, 7), breaks=c(0,0.1,0.2,0.5,1,1.5,2))+
  theme(axis.text.x=element_text(angle = 45, vjust = 1, hjust=1),legend.title=element_blank(),strip.text = element_text(face="bold",size=12, color="black"), axis.text= element_text(color="black",face = "bold",size=12), axis.title=element_text(face = "bold"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold",size=12))+
  labs(x="", color = "Sample Type", size = "Mean Percent \n Abundance")

plot_grid(fig1a,fig1b, labels = c('A', 'B'), label_size = 12 )
ggsave("fig1_phyla.png",device = "png", dpi=320, width = 180, units = c("mm"))
```

We will do the create the same graphs at the family level as well.

#Abundance of families

This is a table of the relative abundances of families in all sample types.

```{r warning=FALSE}
#combing by phyla and then making into relative abundance
ps_fam <- tax_glom(ps, "Family")
#Making relative
ps_fam_rel <- transform_sample_counts(ps_fam, function(x) 100*(x/sum(x)))

#calculating error bars to graph mean transformed abundance of major fam
fam <- ps_ave_abu_fam(ps_fam_rel)

#Ordering
fam <- fam[order(-fam$mean),] #ordering by mean
#fam <- fam[order(-fam$Phylum),] #change to order by name*
fam[,3:5] <- format(fam[,3:5], digits = 3, scientific=F)
kable(fam, caption="Statistiscs for Abundance of fam") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10) %>%
  scroll_box(width = "100%", height = "300px")
```

These are the families found in all samples sorted by decending order of mean relative abundance. Next we will graph out some of the different fam based on their abundance ranges.

We will have a closer look at the familes present in fecal samples. 

```{r}
fam[grep("Feces", fam$Sample_Type), ]
```

These are the relative abundance of families found in fecal samples. Next we will graph out these families.

```{r warning=FALSE, fig.height=8, fig.width=4.5}
fam$mean <- as.numeric(fam$mean)
#grabbing the top 13 families based on their relative abundance
lfive <- as.list(as.character(unique(fam[which(fam$mean > 2),]$Family)))
five <-subset_taxa(ps_fam_rel, Family== lfive[[1]] | Family== lfive[[2]] | Family==lfive[[3]] | Family==lfive[[4]] | Family==lfive[[5]] | Family==lfive[[6]] | Family==lfive[[7]] | Family==lfive[[8]] | Family==lfive[[9]] | Family==lfive[[10]] | Family==lfive[[11]]| Family==lfive[[12]] | Family==lfive[[13]])
#calculating error bars to graph mean transformed abundance of major fam
fam_five <- ps_ave_abu_fam(five)

#Plotting relative abundance
ggplot(fam_five, aes(x=Sample_Type, y=mean, fill= Family))+
  geom_bar(aes(color=Family, fill=Family), stat="identity", position=position_dodge(), width=0.5)+
  geom_errorbar(aes(ymin=mean-sem, ymax=mean+sem),width=.2, position=position_dodge())+
  geom_abline(intercept = 0, slope = 0)+
  theme_bw()+
  facet_grid(Family ~ .,labeller = label_parsed, scales="free", space="free_x")+
  theme(legend.position="none",axis.text.x=element_text(angle = 45, vjust = 1, hjust=1,face = "bold"),strip.text.y=element_text(angle=0,face = "bold"),strip.text.x=element_text(angle=0,face = "bold"), axis.text= element_text(face = "bold"), axis.title=element_text(face = "bold"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),aspect.ratio = 2/1.5)+
  labs(x="", y="Average Relative Abundance")
```

Intially, we can see that there is more Bacteroidaceae and Peptostreptococcaceae in fecal samples compared to rumen samples. 

```{r}
kable(fam_five[c(grep("Bacteroidaceae", fam_five$Family)),])%>% 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10)#use kable to make it pretty :)
kable(fam_five[c(grep("Peptostreptococcaceae", fam_five$Family)),])%>% 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10)
```

Conversely, there is more Veillonellaceae and Fibrobacteraceae in rumen samples compared to feces.  

```{r}
kable(fam_five[c(grep("Veillonellaceae", fam_five$Family)),])%>% 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10)
kable(fam_five[c(grep("Fibrobacteraceae", fam_five$Family)),])%>% 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10)
```

I will make an interactive bubble graph for the lower families

```{r warning=FALSE, fig.width=6, fig.height=16}
#Which phyla are present at less than 3% relative abundance
p1 <- c(get_taxa_unique(five, "Family"))
p2 <- as.character(unique(fam$Family))
low <-  as.list(setdiff(p2,p1))
#since there is 103 families, I first make the names then pass them to subset_taxa
low_names <- paste0('Family==low[[', 51:102) %>% paste0(']]', collapse = "|") %>% noquote()
low <- subset_taxa(ps_fam_rel, (eval(parse(text=low_names))))
#reformating data
phyla_low_fam <- ps_ave_abu_fam(low)

#Bubble graph interactive
p <- ggplot(phyla_low_fam, aes(x=Sample_Type, y = Family, size = mean, text = paste("Family: ", Family,
                         "<br>Mean: ", format(mean, digits = 3, scientific=T),
                         "<br>Standard Deviation: ", format(sd, digits = 3, scientific=T)))) +
      geom_point(shape = 21, colour = "#000000", fill = "#40b8d0") +
      #geom_text(aes(label = format(phyla_low$mean, digits = 1, scientific=F)),size = 3,nudge_y=-0.5) +
      guides(color=FALSE) +
      scale_size_continuous(trans="exp", range=c(0, 7), breaks=c(0,0.1,0.2,0.5,1,1.5,2))+
      theme(axis.text.x=element_text(angle = 45, vjust = 1, hjust=1),strip.text=element_text(face = "bold"), axis.text= element_text(color="black",face = "bold",size=12), axis.title=element_text(face = "bold", size=12), legend.text=element_text(face = "bold"), legend.title=element_text(face = "bold")) +
  labs(x="", color = "Sample Type", size = "Mean Percent \n Abundance")
ggplotly(p, tooltip = "text")
#htmlwidgets::saveWidget(as.widget(ggplotly(p, tooltip = "text")), "Minor_fams_plotly_1to50.html")
#htmlwidgets::saveWidget(as.widget(ggplotly(p, tooltip = "text")), "Minor_fams_plotly_51to102.html")
```

Next we are going to do some exploratory analysis of all sample types.

#Archaea Populations

Now we will take a closer look at the Archeaon populations.

```{r archaea 1}
#looking at Archea
A <- subset_taxa(ps, Kingdom=="Archaea")
print("These are the Classes in the Kingdom Archaea found in all sample types")
get_taxa_unique(A, "Class")
print("These are the Orders in the Kingdom Archaea found in all sample types")
get_taxa_unique(A, "Order")
print("These are the Families in the Kingdom Archaea found in all sample types")
get_taxa_unique(A, "Family")
print("These are the Genera in the Kingdom Archaea found in all sample types")
get_taxa_unique(A, "Genus")
print("These are the Species in the Kingdom Archaea found in all sample types")
get_taxa_unique(A, "Species")
```

Looking at the relative abundances of archaeal genera in all samples.

```{r archea 2}
#checking minor abundances of archaea
A_gen <- tax_glom(A, "Genus")
rumen_A_rel <- transform_sample_counts(A_gen, function(x) 100*(x/sum(x)))

#calculating error bars to graph mean transformed abundance of major phyla
rumen_ave_abu <- ps_ave_abu_gen(rumen_A_rel)

#Ordering
phyla <- rumen_ave_abu[order(-rumen_ave_abu$mean),] #ordering by mean
phyla[,3:5] <- format(phyla[,3:5], digits = 3, scientific=F)
kable(phyla, caption="Statistiscs for Abundance of Archaea Genera Across all Sample Types") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10) %>%
  scroll_box(width = "100%", height = "300px")
```

Next we will check if there are certain archeal genera that are found in common between fecal and rumen samples. We will also look to see if there are differences between these two sample types.

```{r archea 3}
#investigating more archaea things
feces_A <- subset_samples(A, Sample_Type == c("Feces"))
#removing ASVs that aren't present in any samples
feces_A <- prune_taxa(taxa_sums(feces_A) > 0, feces_A)
feces_A <- tax_glom(feces_A, "Genus")

#rumen
rumen_A <- subset_samples(A, Sample_Type != c("Feces"))
#removing ASVs that aren't present in any samples
rumen_A <- prune_taxa(taxa_sums(rumen_A) > 0, rumen_A)
rumen_A <- tax_glom(rumen_A, "Genus")

compare_phyloseq_taxa(feces_A, rumen_A, "Family")
```

Next, we look at the relative abundances of genera in rumen samples.

```{r}
#conerting to relative abundance
feces_A_rel <- transform_sample_counts(feces_A, function(x) 100*(x/sum(x)))
rumen_A_rel <- transform_sample_counts(rumen_A, function(x) 100*(x/sum(x)))

#calculating error bars to graph mean transformed abundance of major phyla
rumen_ave_abu <- ps_ave_abu_gen(rumen_A_rel)
feces_ave_abu <- ps_ave_abu_gen(feces_A_rel)

#Ordering
phyla <- rumen_ave_abu[order(-rumen_ave_abu$mean),] #ordering by mean
phyla[,3:5] <- format(phyla[,3:5], digits = 3, scientific=F)
kable(phyla, caption="Statistiscs for Abundance of Archaea Genera") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10) %>%
  scroll_box(width = "100%", height = "300px")
```

We will do the same for fecal samples.

```{r}
#calculating error bars to graph mean transformed abundance of major phyla
feces_ave_abu <- ps_ave_abu_gen(feces_A_rel)

#Ordering
phyla <- feces_ave_abu[order(-feces_ave_abu$mean),] #ordering by mean
phyla[,3:5] <- format(phyla[,3:5], digits = 3, scientific=F)
kable(phyla, caption="Statistiscs for Abundance of Archaea Genera") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10) %>%
  scroll_box(width = "100%", height = "300px")
```

#Unsupervised exploratory analysis

```{r qplot}
qplot(log10(rowSums(otu_table(ps))),binwidth=0.2) +
  xlab("Logged counts-per-sample")
```

Going to use log transformations for normalizing for library size during exploratory analysis. As this looks appropriate for the "tailed" data. For additional confirmation we could do a the same analysis on ranked values for abundance.

```{r Bray-Curtis distance}
set.seed(1850)
#bray curtis distance
pslog <- transform_sample_counts(ps, function(x) log(1 + x))
out.pcoa.log <- ordinate(pslog,  method = "MDS", distance = "bray")
evals <- out.pcoa.log$values[,1]
plot_ordination(pslog, out.pcoa.log, color = "Sample_Type") +
  labs(col = "Sample Type", title="Bray-Curtis") +
  coord_fixed(sqrt(evals[2] / evals[1]))+
  scale_color_manual(values=myColors)+
  theme(legend.title=element_blank(),strip.text = element_text(face="bold",size=11, color="black"), axis.text= element_text(color="black",face = "bold",size=11), axis.title=element_text(face = "bold"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold",size=12))
ggsave("braycurtis_beta.png")
```

The fecal samples pull away from the other samples on the first axis. Liquid strained and unstrained samples move higher on the 2nd axis, but this difference is 1/5th that of the differences between fecal samples and all other samples. Overall, it appears that there is 2-3 "clusters". Next we will ordinate unifrac distances which will take into account phylogenetic differeces in differences in samples.

```{r Weighted Eigenvalues, warning=FALSE, fig.height=3, fig.width=3}
set.seed(1850)
#weighted unifrac
out.wuf.log <- ordinate(pslog, method = "MDS", distance = "wunifrac")
evals <- out.wuf.log$values$Eigenvalues
eval_per_wuf <- (out.wuf.log$values$Eigenvalues/(sum(out.wuf.log$values$Eigenvalues)))*100
#Plotting eigenvalues to determine how many axis should be shown in graph
barplot(eval_per_wuf[1:10],names.arg=paste0('Eigenvalue',1:10), ylab="Percent of explained variances", col="blue")
```

From the eigenvalues we can see that 2 axis is appropriate for graphing, together explaining `r sum(eval_per_wuf[1:2])`% of the variance between the samples. 

```{r Fig 3 Plot weighted}
plot_ordination(pslog, out.wuf.log, color = "Sample_Type") +
  labs(col = "Sample Type", title="Weighted Unifrac") +
  coord_fixed(sqrt(evals[2] / evals[1]))+
  scale_color_manual(values=myColors)+
    theme(legend.title=element_blank(),title=element_text(face = "bold", size=12, color="black"),strip.text = element_text(face="bold",size=11, color="black"), axis.text= element_text(color="black",face = "bold",size=11), axis.title=element_text(face = "bold", size=11, color="black"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold", size=11, color="black"))
#ggsave("weighted_all.png", device = "tiff", dpi=320, width = 180, units = c("mm"))
ggsave("weighted_all.tiff", device = "tiff", dpi=320, width = 180, height = 70, units = c("mm")) #Fig 3
ggsave("Figure 3.tiff", device = "tiff", dpi=320, width = 180, height = 65, units = c("mm")) #Fig 3
```

This is Figure 3. We will calculate the gap statisitic to determine how many clusters are here.

```{r}
gapStatOut <- gapstat_ord(out.wuf.log, axes=1:3)
plot_clusgap(gapStatOut)
```
The gap statistic strongly suggests at least three clusters, but makes another big jump at K=5 before the slope levels. So, K=5 it is. We had 6 sample types so this suggests 2 of the sample types are basically the same.

Now that we take into account phylogenetic information in the distance metric we see a similar clustering pattern as with the bray-curtis. However, now the difference between fecal and other samples on the 1st axis explain 66.5% of the variation. Also, although not quite as clean there still seems to be 3 "clusters". If you didn't have stomach tube samples this would be more clear. Grab sample and solid samples aren't very different from each other. 

For good measure we will look at the unweighted unifrac that puts more weight on rare species as well. 

We'll first let's check on the eigenvalues.

```{r Unweighted Eigenvalues, fig.height=3, fig.width=3}
out.unwuf.log <- ordinate(pslog, method = "MDS", distance = "unifrac")
eval_per_unwuf <- (out.unwuf.log$values$Eigenvalues/(sum(out.unwuf.log$values$Eigenvalues)))*100
#Plotting eigenvalues to determine how many axis should be shown in graph
barplot(eval_per_unwuf[1:10],names.arg=paste0('Eigenvalue',1:10), ylab="Percent of explained variances", col="blue")
```

The eigenvalues here show 2 axis are sufficient to capture most of the total variation.

```{r Unweighted plot}
#UnWeighted unifrac
evals_un <- out.unwuf.log$values$Eigenvalues
plot_ordination(pslog, out.unwuf.log, color = "Sample_Type") +
  labs(col = "Sample Type", title="Unweighted unifrac") +
  coord_fixed(sqrt(evals_un[2] / evals_un[1]))+
  scale_color_manual(values=myColors)+
  theme(legend.title=element_blank(),title=element_text(face = "bold"), strip.text = element_text(face="bold",size=11, color="black"), axis.text= element_text(color="black",face = "bold",size=11), axis.title=element_text(face = "bold"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold",size=12))
```

This gives a similar pattern as the bray-curtis and weighted unifrac. Notice that less of the variation is explained in each axis with the unweighted (total 52.7%) versus the weighted unifrac. The fecal samples are clustered closer together than with the weighted unifrac.
Again, we will look at the gap statistic.

```{r}
gapStatOut <- gapstat_ord(out.unwuf.log, axes=1:3)
plot_clusgap(gapStatOut)
```

Just as before, the gap statistic strongly suggests at least three clusters, but makes another big jump at K=5 before the slope levels. So, K=5 it is. We had 6 sample types so this suggests 2 of the sample types are basically the same.

```{r eval=FALSE, include=FALSE}
library("ape"); packageVersion("ape")
library("cluster"); packageVersion("cluster")
library("vegan")

ent = prune_taxa(!(taxa_names(ps) %in% "-1"),ps)
ent = prune_taxa(taxa_sums(ent)>0.0,ent)
ent <-transform_sample_counts(ent,function(x)x/sum(x))
ent <- subset_samples(ent,!is.na(Sample_Type))
df= data.frame(sample_data(ent))
BC= phyloseq::distance(ent, method = "bray")
DP <- phyloseq::distance(ent, method = "dpcoa")
OTU = t(as(otu_table(ent),"matrix"))
adonis(OTU~Sample_Type,data=df)
adonis(DP~Sample_Type+CowID+Day,data=df)
```


```{r Fig 4, eval=FALSE}
evals <- out.wuf.log$values$Eigenvalues
beta1A <-plot_ordination(pslog, out.wuf.log, color = "Sample_Type") +
  labs(col = "Sample Type", title="Weighted Unifrac") +
  coord_fixed(sqrt(evals[2] / evals[1]))+
  scale_color_manual(values=myColors)+
    theme(legend.title=element_blank(),title=element_text(face = "bold"),strip.text = element_text(face="bold",size=11, color="black"), axis.text= element_text(color="black",face = "bold",size=11), axis.title=element_text(face = "bold"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold",size=12))

evals_un <- out.unwuf.log$values$Eigenvalues
beta1B <- plot_ordination(pslog, out.unwuf.log, color = "Sample_Type") +
  labs(col = "Sample Type", title="Unweighted unifrac") +
  coord_fixed(sqrt(evals_un[2] / evals_un[1]))+
  scale_color_manual(values=myColors)+
  theme(legend.title=element_blank(),title=element_text(face = "bold"),strip.text = element_text(face="bold",size=11, color="black"), axis.text= element_text(color="black",face = "bold",size=11), axis.title=element_text(face = "bold"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold",size=12))

plot_grid(beta1A, beta1B, labels = "AUTO", ncol = 1)
ggsave("beta.tiff",plot_grid(beta1A, beta1B, labels = "AUTO", align = "h", ncol = 1), device = "tiff", dpi=320 width = 180, units = c("mm"))
```

Another way to comparing phylogentic differences is double principal coordinates analysis (DPCoA), which is a phylogenetic ordination method and that provides a biplot representation of both samples and taxonomic categories. The computational time for this is much longer than with the unifrac (i.e. Has to be run on a server).

```{r fig.height=3, fig.width=8}
ps_nonum <- ps
sample_names(ps_nonum) <- paste0("sample_", sample_names(ps_nonum))
#Divnet/DPCoA don't like numbers for samples
#ps_nonum <- tax_glom(ps_nonum, "Genus")
pslog <- transform_sample_counts(ps_nonum, function(x) log(1 + x))
#set.seed(1)
#out.DP.log <- ordinate(pslog, method = "DPCoA") #default distance is bray
#saveRDS(out.DP.log, "out.DP.log_all.RDS")
out.DP.log <- readRDS("out.DP.log_all.RDS")
plot_ordination(pslog,out.DP.log , type="scree")
```

The eigenvalues here show 2 axis are sufficient to capture most of the total variation.

```{r}
#getting eigenvalues for coord_fixed
evals_DP <- out.DP.log$eig

plot_ordination(pslog, out.DP.log, color = "Sample_Type", type="biplot") +
  labs(col = "Sample Type", title="DPCoA of Bray distance") +
  coord_fixed(sqrt(evals_DP[2] / evals_DP[1]))+
  scale_color_manual(values=myColors_DPCoA)+
  theme(legend.title=element_blank(),strip.text = element_text(face="bold",size=11, color="black"), axis.text= element_text(color="black",face = "bold",size=11), axis.title=element_text(face = "bold"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold",size=12))

plot_ordination(pslog, out.DP.log, color = "Sample_Type") +
  labs(col = "Sample Type", title="DPCoA of Bray distance") +
  coord_fixed(sqrt(evals_DP[2] / evals_DP[1]))+
  scale_color_manual(values=myColors_DPCoA)+
  theme(legend.title=element_blank(),strip.text = element_text(face="bold",size=11, color="black"), axis.text= element_text(color="black",face = "bold",size=11), axis.title=element_text(face = "bold"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold",size=12))

#plot_ordination(pslog, out.DP.log, type = "Species", color = "Phylum") + #not readable in current form
#  coord_fixed(sqrt(evals_DP[2] / evals_DP[1]))+
  #geom_text_repel(aes(label=Species))+
#  theme(legend.title=element_blank(),strip.text = element_text(face="bold",size=11, color="black"), axis.text= element_text(color="black",face = "bold",size=11), axis.title=element_text(face = "bold"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold",size=12))
```

We see again that the 1st axis corresponds to Rumen vs.fecal samples, while the 2nd axis distinguishes Liquid preparations vs those that get liquid and solid fractions. The biplot suggests that the 1st axis can be interpreted to say: samples that have larger scores on the first axis have a subset of taxa from Bacteroidetes and subset of Firmicutes that is different than rumen samples. Additionally, Liquid samples have more Bacteroidetes and less Firmicutes than other rumen sample types. Liquid strained samples are being pulled down on the 2nd axis by Kiritimatiellaeota and a subset of Bacteroidetes.

```{r eval=FALSE, fig.width=11, fig.height=9}
fig4a <-plot_ordination(pslog, out.DP.log, color = "Sample_Type", type="biplot") +
  labs(col = "Sample Type", title="DPCoA of Bray distance") +
  coord_fixed(sqrt(evals_DP[2] / evals_DP[1]))+
  scale_color_manual(values=myColors_DPCoA)+
    theme(legend.title=element_blank(),title=element_text(face = "bold", size=10, color="black"),strip.text = element_text(face="bold",size=9, color="black"), axis.text= element_text(color="black",face = "bold",size=9), axis.title=element_text(face = "bold", size=9, color="black"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold", size=9, color="black"))

pslog_new <- pslog #make copy of phyloseq object
sample_names(pslog_new) <- paste0("sample_", sample_names(pslog_new)) #Divnet/DPCoA don't like numbers for samples
new_taxa <- as.data.frame(pslog_new@tax_table@.Data) #make new data frame of species
new_taxa$Select_Family <- new_taxa[,"Family"] #add new column
pslog_new@tax_table@.Data[,"Family"] <- ifelse(grepl("Lachnospiraceae", new_taxa$Select_Family), "Lachnospiraceae",
         ifelse(grepl("Ruminococcaceae", new_taxa$Select_Family), "Ruminococcaceae",
         ifelse(grepl("Prevotellaceae", new_taxa$Select_Family), "Prevotellaceae", "All Other Families")))
         #ifelse(grepl("Veillonellaceae", new_taxa$Select_Family), "Veillonellaceae",
         #ifelse(grepl("Rikenellaceae", new_taxa$Select_Family), "Rikenellaceae","All Other Families")))))

fig4b <-plot_ordination(pslog_new, out.DP.log, type = "Species", color = "Phylum", shape="Family") +
  coord_fixed(sqrt(evals_DP[2] / evals_DP[1]))+
  scale_shape_manual(values=c(16,15,17,3,5,4))+
  #geom_text_repel(aes(label=Species))+
    theme(legend.title=element_blank(),title=element_text(face = "bold", size=10, color="black"),strip.text = element_text(face="bold",size=9, color="black"), axis.text= element_text(color="black",face = "bold",size=9), axis.title=element_text(face = "bold", size=9, color="black"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold", size=9, color="black"))

plot_grid(fig4a, fig4b, labels = "AUTO", align = "v",  ncol = 1)
ggsave("fig4_dpcoa.tiff",plot_grid(fig4a, fig4b, labels = "AUTO", ncol = 1 , align = "v"), device = "tiff", dpi=300, width = 180, height = 180, units = c("mm"))
ggsave("Figure 4.tiff",plot_grid(fig4a, fig4b, labels = "AUTO", ncol = 1 , align = "v"), device = "tiff", dpi=300, width = 180, height = 180, units = c("mm"))
```

Again, I have made an interactive Version of this plot that is avaliable [here](https://Jill.github.io/Depeters_RumenSampling_2018/DPCoA.html).

```{r}
AP <- plot_ordination(pslog, out.DP.log, type = "Species", color = "Phylum")+
  geom_point(aes(Species=Species,Genus=Genus,Family=Family,Order=Order,Class=Class))+
  coord_fixed(sqrt(evals_DP[2] / evals_DP[1]))
ggplotly(AP, tooltip = c("Phylum","Class","Order","Family","Genus","Species")) %>% hide_legend()
htmlwidgets::saveWidget(as_widget(ggplotly(AP, tooltip = c("Phylum","Class","Order","Family","Genus","Species")) %>% hide_legend()), "DPCoA.html", selfcontained =FALSE)
```

Since Firmicutes and Bacteroidetes take up so much of the graph we want to know what other phyla can separate the sample types.

```{r fig.height=3, fig.width=8}
ps_nonum <- ps
sample_names(ps_nonum) <- paste0("sample_", sample_names(ps_nonum))
#Divnet/DPCoA don't like numbers for samples
#ps_nonum <- tax_glom(ps_nonum, "Genus")
pslog <- transform_sample_counts(ps_nonum, function(x) log(1 + x))
pslog <- pslog %>% subset_taxa(!(Phylum %in% c("Firmicutes","Bacteroidetes"))) #remove two phyla
#set.seed(1)
#out.DP.log_new <- ordinate(pslog, method = "DPCoA") #default distance is bray
#saveRDS(out.DP.log_new, "out.DP.log_new.RDS")
out.DP.log_new <- readRDS("out.DP.log_new.RDS")
plot_ordination(pslog,out.DP.log_new, type="scree")
```

```{r}
#getting eigenvalues for coord_fixed
evals_DP <- out.DP.log_new$eig

plot_ordination(pslog, out.DP.log_new, color = "Sample_Type") +
  labs(col = "Sample Type", title="DPCoA of Bray distance") +
  coord_fixed(sqrt(evals_DP[2] / evals_DP[1]))+
  scale_color_manual(values=myColors_DPCoA)+
  theme(legend.title=element_blank(),strip.text = element_text(face="bold",size=11, color="black"), axis.text= element_text(color="black",face = "bold",size=11), axis.title=element_text(face = "bold"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold",size=12))
```

```{r fig.width=11, fig.height=12}
supfig1a <- plot_ordination(pslog, out.DP.log_new, color = "Sample_Type", type="biplot") +
  labs(col = "Sample Type", title="DPCoA of Bray distance") +
  coord_fixed(sqrt(evals_DP[2] / evals_DP[1]))+
  scale_color_manual(values=myColors_DPCoA)+
    theme(legend.title=element_blank(),title=element_text(face = "bold", size=10, color="black"),strip.text = element_text(face="bold",size=9, color="black"), axis.text= element_text(color="black",face = "bold",size=9), axis.title=element_text(face = "bold", size=9, color="black"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold", size=9, color="black"))

pslog_new <- pslog #make copy of phyloseq object
sample_names(pslog_new) <- paste0("sample_", sample_names(pslog_new)) #Divnet/DPCoA don't like numbers for samples
new_taxa <- as.data.frame(pslog_new@tax_table@.Data) #make new data frame of species
new_taxa$Select_Family <- new_taxa[,"Family"] #add new column
pslog_new@tax_table@.Data[,"Family"] <- ifelse(grepl("Spirochaetaceae", new_taxa$Select_Family), "Spirochaetaceae",
         ifelse(grepl("Fibrobacteraceae", new_taxa$Select_Family), "Fibrobacteraceae",
         ifelse(grepl("Akkermansiaceae", new_taxa$Select_Family), "Akkermansiaceae", "All Other Families")))

supfig1b <- plot_ordination(pslog_new, out.DP.log_new, type = "Species", color = "Phylum", shape="Family") +
  coord_fixed(sqrt(evals_DP[2] / evals_DP[1]))+
  #geom_text_repel(aes(label=Species))+
  scale_shape_manual(values=c(19,12,23,8),limits=c("All Other Families", "Akkermansiaceae", "Fibrobacteraceae", "Spirochaetaceae"))+
    theme(legend.title=element_blank(),title=element_text(face = "bold", size=10, color="black"),strip.text = element_text(face="bold",size=9, color="black"), axis.text= element_text(color="black",face = "bold",size=9), axis.title=element_text(face = "bold", size=9, color="black"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold", size=9, color="black"))

plot_grid(supfig1a, supfig1b, labels = "AUTO", align = "v",  ncol = 1)
ggsave("supfig1_dpcoa.tiff",plot_grid(supfig1a, supfig1b, labels = "AUTO", ncol = 1 , align = "v"), device = "tiff", dpi=300, width = 180, height = 240, units = c("mm"))
ggsave("Sup Figure 1.tiff",plot_grid(supfig1a,supfig1b, labels = "AUTO", ncol = 1 , align = "v"), device = "tiff", dpi=300, width = 180, height = 240, units = c("mm"))
```

Again, I have made an interactive Version of this plot that is avaliable [here](https://Jill.github.io/Depeters_RumenSampling_2018/DPCoA_NoFirmBact.html).

```{r fig.height=8, fig.width=10}
AP <- plot_ordination(pslog, out.DP.log_new, type = "Species", color = "Phylum")+
  geom_point(aes(Species=Species,Genus=Genus,Family=Family,Order=Order,Class=Class))+
  coord_fixed(sqrt(evals_DP[2] / evals_DP[1]))
ggplotly(AP, tooltip = c("Phylum","Class","Order","Family","Genus","Species")) %>% hide_legend()
htmlwidgets::saveWidget(as_widget(ggplotly(AP, tooltip = c("Phylum","Class","Order","Family","Genus","Species")) %>% hide_legend()), "DPCoA_NoFirmBact.html",selfcontained =FALSE)
```

Now that we have Firmicutes and Bacteroidetes gone we can see that feces are associated with *Akkermansiaceae* and isn't associated with Kiritimatiellaeota, Chloroflexi, *Fibrobacteraceae* and *Spirochaetaceae*.

#Shared ASVs between sample types.

```{r Core, eval=TRUE, include=FALSE}
#making subsets of sample types to determine core
LUS <- subset_samples(ps, Sample_Type == c("Liquid Unstrained"))
LS<- subset_samples(ps, Sample_Type == c("Liquid Strained")) 
Feces <- subset_samples(ps, Sample_Type == c("Feces")) 
Solid <- subset_samples(ps, Sample_Type == c("Solid")) 
ST <- subset_samples(ps, Sample_Type == c("Stomach Tube")) 
GS <- subset_samples(ps, Sample_Type == c("Grab Sample")) 
#making a list to determine core of each subset
Subsets <- list(GS, Solid, Feces, ST, LUS, LS)
names(Subsets) <-c("Grab Sample", "Solid", "Feces", "Stomach Tube", "Liquid Unstrained", "Liquid Strained")
Subsets <- lapply(Subsets, function(Subsets) prune_taxa(taxa_sums(Subsets) > 0, Subsets))
Cores <- lapply(Subsets, function(Subsets) filter_taxa(Subsets, function(x) sum(x >= 1) > (0.999*length(x)), TRUE))
Cores
```

Liquid unstrained samples have `r ntaxa(Cores[[5]])` ASVs in common. Liquid strained samples have `r ntaxa(Cores[[6]])` ASVs in common. Samples from feces have `r ntaxa(Cores[[3]])` ASVs in common with one another. Soild samples have `r ntaxa(Cores[[2]])` ASVs in common. Samples from a stomach tube have `r ntaxa(Cores[[4]])` ASVs in common. Grab samples have `r ntaxa(Cores[[1]])` ASVs in common.

#Differential Abundance Testing

##Corncob: Grab sample vs all other sample types

First we will look at how the "gold standard" grab sample compares to other sample types. We test all the taxa in our data to see if they are differentially-abundant. The differentialTest function will these tests on all taxa, while controlling the false discovery rate to account for multiple comparisons. Addtionally, it controls for differencs in library sizes. 

```{r Running corncob, eval=FALSE, warning=FALSE}
#we do not include the response term because we are testing multiple taxa.

#We specify the covariates of our model using formula and phi.formula 
#We also specify which covariates we want to test for by removing them in the formula_null and phi.formula_null arguments.

# The difference between the formulas and the null version of the formulas
# will be the variables that are tested. In this case, as when we examined
# the single taxon, we will be testing the coefficients of Sample Type for
# both the expected relative abundance and the overdispersion.

#changing the order of factor levels
sample_data(ps)$Sample_Type <- factor(sample_data(ps)$Sample_Type, levels = c("Grab Sample","Feces","Stomach Tube","Liquid Strained","Liquid Unstrained","Solid"))

# We set fdr_cutoff to be our controlled false discovery rate.
set.seed(1)
fullAnalysis <- differentialTest(formula = ~ Sample_Type + CowID + Day, phi.formula = ~ 1,formula_null = ~ CowID + Day, phi.formula_null = ~ 1, test="Wald",boot=FALSE, data = ps, fdr_cutoff = 0.05)
#Genus
ps_gen <- tax_glom(ps, "Genus")
set.seed(1)
fullAnalysis_all_Gen <- differentialTest(formula = ~ Sample_Type + CowID + Day, phi.formula = ~ 1,formula_null = ~ CowID + Day, phi.formula_null = ~ 1, test="Wald",boot=FALSE, data = ps_gen, fdr_cutoff = 0.05)

fullAnalysis_all_Gen_01 <- differentialTest(formula = ~ Sample_Type + CowID + Day, phi.formula = ~ 1,formula_null = ~ CowID + Day, phi.formula_null = ~ 1, test="Wald",boot=FALSE, data = ps_gen, fdr_cutoff = 0.01)

#Family
ps_Fam <- tax_glom(ps, "Family")
fullAnalysis_all_Fam <- differentialTest(formula = ~ Sample_Type + CowID + Day, phi.formula = ~ 1,formula_null = ~ CowID + Day, phi.formula_null = ~ 1, test="Wald",boot=FALSE, data = ps_Fam, fdr_cutoff = 0.05)

#Phylum
ps_Phy <- tax_glom(ps, "Phylum")
fullAnalysis_all_Phy <- differentialTest(formula = ~ Sample_Type + CowID + Day, phi.formula = ~ 1,formula_null = ~ CowID + Day, phi.formula_null = ~ 1, test="Wald",boot=TRUE, data = ps_Phy, fdr_cutoff = 0.05)
```

We will take a broad view and look at phyla that are differentially abundant
```{r fig.height=5,fig.width=12}
fullAnalysis_all_Phy <- readRDS("fullAnalysis_all_Phy.rds")
plot.differentialTest_custom(fullAnalysis_all_Phy,level=c("Phylum"))+
theme(strip.text.x=element_text(size=11,face = "bold", color="black"),axis.text=element_text(size=11,face = "bold", color="black"), axis.title=element_text(size=11,face = "bold", color="black"))
ggsave("GSvsAll_Phylum.tiff",device = "tiff", dpi=320, width = 12, height = 5)
```

```{r fig.height=8}
fig1a <- ggplot(phyla_five, aes(x=Sample_Type, y=mean, fill=Phylum))+
  geom_bar(aes(color=Phylum, fill=Phylum), stat="identity", position=position_dodge(), width=0.5)+
  geom_errorbar(aes(ymin=mean-sem, ymax=mean+sem),width=.2, position=position_dodge())+
  geom_abline(intercept = 0, slope = 0)+
  theme_bw()+
  facet_grid(Phylum ~ .,labeller = label_parsed, scales="free", space="free_x")+
  theme(legend.position="none",axis.text.x=element_text(angle = 45, vjust = 1, hjust=1,face = "bold",color="black", size=9),strip.text.y=element_text(angle=0,face = "bold", color="black", size=9),strip.text.x=element_text(angle=0,face = "bold",color="black",size=9), axis.text= element_text(face = "bold",color="black", size=9), axis.title=element_text(face = "bold", size=9, color="black"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),aspect.ratio = 2/1.5)+
  labs(x="", y="Average Relative Abundance")

fig1b <- ggplot(phyla_low, aes(x=Sample_Type , y=Phylum , size=mean)) +
  geom_point(aes(color=Sample_Type)) + 
  scale_color_manual(values=myColors)+
  guides(color=FALSE) +
  scale_size_continuous(trans="exp", range=c(0, 7), breaks=c(0,0.1,0.2,0.5,1,1.5,2))+
    theme(legend.title=element_blank(),title=element_text(face = "bold", size=9, color="black"),strip.text = element_text(face="bold",size=9, color="black"), axis.text= element_text(color="black",face = "bold",size=9), axis.title=element_text(face = "bold", size=9, color="black"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold", size=9, color="black"), axis.text.x=element_text(angle = 45, vjust = 1, hjust=1,face = "bold",color="black", size=9))+
  labs(x="", color = "Sample Type", size = "Mean Percent \n Abundance")

fig1c <- plot.differentialTest_custom(fullAnalysis_all_Phy,level=c("Phylum"))+
    theme(legend.title=element_blank(),title=element_text(face = "bold", size=9, color="black"),strip.text = element_text(face="bold",size=8.5, color="black"), axis.text= element_text(color="black",face = "bold",size=9), axis.title=element_text(face = "bold", size=9, color="black"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold", size=8.5, color="black"), strip.text.x=element_text(angle=0,face = "bold", color="black", size=8.5))

fig1a
fig1b
top_row <- plot_grid(fig1a,fig1b, labels = c('A', 'B'), label_size = 11)

plot_grid(top_row, fig1c, labels = c('', 'C'), label_size = 11, ncol = 1)
ggsave("fig1_phyla.tiff",device = "tiff", dpi=300, width = 180, height = 210, units = c("mm"))
ggsave("Figure 1.tiff",device = "tiff", dpi=300, width = 180, height = 210, units = c("mm"))
```


Looking at the models from corncob.

```{r warning=FALSE}
ps_Phy <- tax_glom(ps, "Phylum")
sigtaxa <- otu_to_taxonomy(fullAnalysis_all_Phy$significant_taxa, ps_Phy)
sigmodels <- fullAnalysis_all_Phy$significant_models
names(sigmodels) <- sigtaxa
sigmodels
```

Next and collapse ASVs in the families and determine what families are differentially abundant.

```{r fig.height=9, fig.width=8}
#call in data
fullAnalysis_all_Fam<- readRDS("fullAnalysis_072319_all_Fam_GS1st.rds")
#changing the order of factor levels
sample_data(ps)$Sample_Type <- factor(sample_data(ps)$Sample_Type, levels = c("Grab Sample","Feces","Stomach Tube","Liquid Strained","Liquid Unstrained","Solid"))
#Family
ps_Fam <- tax_glom(ps, "Family")

#Plotting
plot.differentialTest_custom(fullAnalysis_all_Fam,level=c("Family","Genus","Species"))+
#theme(strip.text.x=element_text(size=11,face = "bold", color="black"),axis.text=element_text(size=11,face = "bold", color="black"), axis.title=element_text(size=11,face = "bold", color="black"))
theme(legend.title=element_blank(),title=element_text(face = "bold", size=9, color="black"),strip.text = element_text(face="bold",size=8, color="black"), axis.text= element_text(color="black",face = "bold",size=8), axis.title=element_text(face = "bold", size=11, color="black"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold", size=9, color="black"), strip.text.x=element_text(angle=0,face = "bold", color="black", size=8), legend.position="bottom")

ggsave("GSvsAll_Fam.tiff",device = "tiff", dpi=300, width = 180, height = 200, units = c("mm"))
ggsave("Figure 6.tiff",device = "tiff", dpi=300, width = 180, height = 200, units = c("mm"))
```

This is a broad over view of the families that are significant differentially abundant in sample types. We will also dig down further and look at the genus and ASV level. 

```{r}
#changing the order of factor levels
sample_data(ps)$Sample_Type <- factor(sample_data(ps)$Sample_Type, levels = c("Grab Sample","Feces","Stomach Tube","Liquid Strained","Liquid Unstrained","Solid"))
#call in data
fullAnalysis_all<- readRDS("fullAnalysis_072319_all_GS1st.rds")
#getting number of significant taxa assigned to teach phyla
df_new <- as.data.frame(fullAnalysis_all$significant_taxa)
colnames(df_new) <- c("taxa")
ltax <- as.list(fullAnalysis_all$significant_taxa)
df_new$Phylum <- unlist(lapply(ltax, function(ltax) otu_to_taxonomy(ltax, fullAnalysis_all$data, level = "Phylum")))
keep <- as.data.frame(table(df_new$Phylum))
keep <- merge(keep,as.data.frame(table(tax_table(ps)[,"Phylum"])),by="Var1",all=TRUE)
keep[is.na(keep)] <- 0  #change NAs to 0
keep$percent <- (keep$Freq.x/keep$Freq.y)*100
colnames(keep) <- c("Phylum", "#Significant ASVs", "Total ASVs", "Percent Significant ASVs")
kable(keep, caption="Phyla with Significant ASVs") %>% 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10) %>%
  scroll_box(width = "100%", height = "300px")
```

There are `r length(fullAnalysis_all$significant_taxa)` significantly differentially abundant ASVs with p < 0.05. Most of these ASVs were from the phyla Firmicutes and Bacteroidetes, but that is in part due to them being the dominant ASVs in the data set. As a percentage of ASVs, Chloroflexi and Euryarchaeota played a large role in distinguish different sample types.

We will graph out the significantly different ASVs from these phylums.
First, we look at signficantly differentially abundant ASVs Chloroflexi and Euryarchaeota.

```{r fig.height=5, fig.width=13}
plot.differentialTest_custom(fullAnalysis_all, level=c("Phylum","Genus","Species"), taxa_filter="Chloroflexi")+
  theme(strip.text.x=element_text(size=11,face = "bold"),axis.text=element_text(size=11,face = "bold", color="black"), axis.title=element_text(size=11,face = "bold", color = "black"))

plot.differentialTest_custom(fullAnalysis_all, level=c("Phylum","Genus","Species"), taxa_filter="Euryarchaeota")+
  theme(strip.text.x=element_text(size=11,face = "bold"),axis.text=element_text(size=11,face = "bold", color="black"), axis.title=element_text(size=11,face = "bold", color = "black"))
```

Here we can see that the Euryarchaeota that are important for telling samples types apart are all methanogens. Feces has a strong negative effect on most of these methanogens (methogens are lower in feces). Interestingly, fecal samples have lower **Flexilinea**. 

In addition, based on the DPCoA without Bacteroidetes and Firmicutes we can see that Actinobacteria and Spirochaetes also play and important role in distinguishing liquid strained and fecal from grab samples respectively. 

```{r fig.height=5, fig.width=15}
plot.differentialTest_custom(fullAnalysis_all, level=c("Phylum","Family","Genus","Species"), taxa_filter="Actinobacteria")+
  theme(strip.text.x=element_text(size=11,face = "bold"),axis.text=element_text(size=11,face = "bold", color="black"), axis.title=element_text(size=11,face = "bold", color = "black"))
```

```{r fig.height=3, fig.width=13}
plot.differentialTest_custom(fullAnalysis_all, level=c("Phylum","Genus","Species"), taxa_filter="Spirochaetes")+
  theme(strip.text.x=element_text(size=11,face = "bold"),axis.text=element_text(size=11,face = "bold", color="black"), axis.title=element_text(size=11,face = "bold", color = "black"))
```

Next we examine the different ASVs in the phylum Bacteroidetes and Firmicutes.

```{r fig.width=13, fig.height=9}
#plotting
plot.differentialTest_custom(fullAnalysis_all, level=c("Phylum","Genus","Species"), taxa_filter="Bacteroidetes")+
  theme(strip.text.x=element_text(size=11,face = "bold"),axis.text=element_text(size=11,face = "bold", color="black"), axis.title=element_text(size=11,face = "bold", color = "black"))
```

This is a graph of the significant differentially abundant ASVs in the phylum Bacteroidetes.

```{r fig.width=13, fig.height=50}
plot.differentialTest_custom(fullAnalysis_all, level=c("Phylum","Genus","Species"), taxa_filter="Firmicutes")+
  theme(strip.text.x=element_text(size=11,face = "bold"),axis.text=element_text(size=11,face = "bold", color="black"), axis.title=element_text(size=11,face = "bold", color = "black"))
```

This is a graph of the significant differentially abundant ASVs in the phylum Firmicutes.

Now we will look further up  the phylogenetic tree and collapse ASVs into genera and look for genera differentially abundnant. 

```{r include=FALSE}
#changing the order of factor levels
sample_data(ps)$Sample_Type <- factor(sample_data(ps)$Sample_Type, levels = c("Grab Sample","Feces","Stomach Tube","Liquid Strained","Liquid Unstrained","Solid"))
ps_gen <- tax_glom(ps, "Genus")
#calling in data
fullAnalysis_all_Gen <- readRDS("fullAnalysis_072319_all_Gen_GS1st.rds")
fullAnalysis_all_Gen_p01 <- readRDS("fullAnalysis_072319_all_Gen_GS1st_p01.rds")
length(fullAnalysis_all_Gen$significant_taxa)
length(fullAnalysis_all_Gen_p01$significant_taxa)
```

There are `r length(fullAnalysis_all_Gen$significant_taxa)` significantly differentially abundant genera with p < 0.05 and `r length(fullAnalysis_all_Gen_p01$significant_taxa)` with a p < 0.01. After running corncob `r length(which(is.na(fullAnalysis_all_Gen$p)) %>% names)` genera could not be fit with the model, but `r length(which(!is.na(fullAnalysis_all_Gen$p)) %>% names)` were fit to the model.

Let's extract the ASVs and their p-values.

```{r Taxa for DA}
#getting taxonomy
DA_taxa <- otu_to_taxonomy(OTU=fullAnalysis_all_Gen$significant_taxa, data=ps_gen)
#Getting p-values of Differentially variable ASVs
ASVs <- c(row.names(as.data.frame(DA_taxa)))
df_taxa_sig <- as.data.frame(fullAnalysis_all_Gen$p_fdr)
df_taxa_sig$ASV <- row.names(df_taxa_sig)
df_taxa_sig <- df_taxa_sig[row.names(df_taxa_sig) %in% ASVs, ]
df_taxa_sig <- merge(df_taxa_sig,as.data.frame(DA_taxa),by=0, all=TRUE) #by=0 means merge by row names
#df <- subset(df, select = c(DA_taxa,fullAnalysis$p_fdr,Row.names))
#df$ASV <- df$Row.names
df_taxa_sig$Row.names <- NULL
colnames(df_taxa_sig) <- c("p_value", "ASV", "Taxa")
#df$p_value <- format(df$p_value, digits = 3)
options(scipen = 999) #take out of scientific notation
df_taxa_sig <- df_taxa_sig[order(df_taxa_sig$p_value),] #order column p_value
df_taxa_sig$p_value <- format(df_taxa_sig$p_value, digits = 3, scientific = TRUE) #put back into scientific notation
kable(df_taxa_sig, caption="Differentially Abundant Taxa") %>% 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10) %>%
  scroll_box(width = "100%", height = "300px")
```

These are genera that are differentailly **abundant** between sample types and their false discovery corrected p-value. ASVs are listed by significance (p < 0.05). 

We will now plot out all these taxa in comparison to grab samples.

```{r fig.height=10, fig.width=15}
plot.differentialTest_custom_color(fullAnalysis_all_Gen,level=c("Phylum","Family","Genus"))+
theme(legend.title=element_blank(),title=element_text(face = "bold", size=9, color="black"),strip.text = element_text(face="bold",size=8, color="black"), axis.text= element_text(color="black",face = "bold",size=8), axis.title=element_text(face = "bold", size=11, color="black"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold", size=9, color="black"), strip.text.x=element_text(angle=0,face = "bold", color="black", size=8), legend.position = "bottom")

ggsave("GSvsAll.tiff",device = "tiff", dpi=300, width = 280, height =350, units = c("mm"))
ggsave("Sup Figure 2.tiff",device = "tiff", dpi=300, width = 180, height = 250, units = c("mm"))


plot.differentialTest_custom(fullAnalysis_all_Gen,level=c("Phylum", "Family","Genus"), taxa_filter="Bacteroidetes")+
theme(legend.title=element_blank(),title=element_text(face = "bold", size=9, color="black"),strip.text = element_text(face="bold",size=8, color="black"), axis.text= element_text(color="black",face = "bold",size=8), axis.title=element_text(face = "bold", size=11, color="black"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold", size=9, color="black"), strip.text.x=element_text(angle=0,face = "bold", color="black", size=8))
```

This is a graph of genera that are significantly differentially abundant across a sample type. The graph is of our model coefficent with a 95% confidence interval. Negative coefficients suggest that a taxon is differentially abundant across that sample type and that samples from that type are expected to have lower relative abundance. Conversely, postive coefficients suggest that a taxon is differentially abundant across that sample type and that samples from that type are expected to have higher relative abundance.

Let's take a deeper dive into how these gnera separate by phyla. 

```{r}
#getting number of significant genera assigned to teach phyla
df_new <- as.data.frame(fullAnalysis_all_Gen$significant_taxa)
colnames(df_new) <- c("Genus")
ltax <- as.list(fullAnalysis_all_Gen$significant_taxa)
df_new$Phylum <- unlist(lapply(ltax, function(ltax) otu_to_taxonomy(ltax, fullAnalysis_all_Gen$data, level = "Phylum")))
table(df_new$Phylum)
```

80 of the 121 significantly different taxa are Firmicutes. If we move the p-value to > 0.01 there are 75 significantly different taxa which are Firmicutes. We will graph out just these 80 taxa.

```{r fig.height=16, fig.width=12}
plot.differentialTest_custom(fullAnalysis_all_Gen_p01, level=c("Phylum","Genus"), taxa_filter="Firmicutes")+
  theme(strip.text.x=element_text(size=11,face = "bold"),axis.text.y=element_text(size=11,color="black",face = "bold"), axis.text.x=element_text(size=11,face = "bold"), axis.title.x=element_text(size=11,face = "bold"))

ggsave("GSvsAll_Firmicutes.png",device = "png", dpi=320, width = 14, height = 16)
```

A majority of these genera in the phylum Firmicutes are in the families **Ruminococcaceae** and **Lachnospiraceae**.

```{r fig.height=4, fig.width=14}
plot.differentialTest_custom(fullAnalysis_all_Gen_p01, level=c("Phylum","Family","Genus"), taxa_filter="Bacteroidetes")+
  theme(strip.text.x=element_text(size=11,face = "bold"),axis.text.y=element_text(size=11,color="black", face = "bold"), axis.text.x=element_text(size=11,face = "bold"), axis.title.x=element_text(size=11,face = "bold"))
```

This is the graph of the genera that are significantly differentially abundant in the phylum Bacteroidetes.

We can look more closely at one of these ASVs (ASV_622) that feces has a strong negative impact on. 

```{r warning=FALSE}
otu_to_taxonomy("ASV_622", data=ps_gen)
otu_table(ps_gen)["ASV_622",]
```

This is the feature table for ASV_622, it looks as those there is only one read for *Lachnospiraceae Oribacterium* in feces. 

```{r warning=FALSE}
ASV_622 <- bbdml(formula = ASV_622 ~ Sample_Type + CowID + Day, phi.formula = ~ 1, data = ps_gen) 
summary(ASV_622)
```

Here we see the output of our hypothesis test. Feces is significantly different that other rumen samples. Additionally, both liquid straind and stomach tube samples have significantly lower abundance of this taxa when compared to grab samples. There is individual cow variation, but the day doesn't make a significant difference.

Let's graph out the abundance of *Lachnospiraceae Oribacterium*. 

```{r}
plot(ASV_622, color="Sample_Type", shape="CowID")+
  scale_color_manual(values=myColors)
plot(ASV_622, AA = TRUE, color="Sample_Type", shape="CowID")+
  scale_color_manual(values=myColors)
```

These graphs show that ASV_622 *Lachnospiraceae Oribacterium* is in lower abundance in fecal samples.

```{r}
ps_fam2 <- ps_Fam
#changing the order of factor levels
sample_data(ps_fam2)$Sample_Type <- factor(sample_data(ps_Fam)$Sample_Type, levels = c("Grab Sample","Feces","Stomach Tube","Liquid Strained","Liquid Unstrained","Solid"))
sig_fams2 <- ps_fam2 %>% subset_taxa(Family %in% c("Lachnospiraceae","Prevotellaceae", "Ruminococcaceae"))
sig_fams <- ps_Fam %>% subset_taxa(Family %in% c("Lachnospiraceae","Prevotellaceae", "Ruminococcaceae"))
#comparing to  rumen samples
ASV_20 <-bbdml(ASV_20 ~ Sample_Type + CowID + Day, phi.formula = ~ 1, data=sig_fams) #Prevotellaceae
summary(ASV_20)
#comparing to grab sample
ASV_20 <-bbdml(ASV_20 ~ Sample_Type + CowID + Day, phi.formula = ~ 1, data=sig_fams2) #Prevotellaceae
summary(ASV_20)
```

Hypothesis testing of relative abundance of *Prevotellaceae*.

```{r}
#comparing to  rumen samples
ASV_3 <- bbdml(ASV_3 ~ Sample_Type + CowID + Day, phi.formula = ~ 1, data=sig_fams) #Ruminococcaceae
summary(ASV_3)
#comparing to grab sample
ASV_3 <- bbdml(ASV_3 ~ Sample_Type + CowID + Day, phi.formula = ~ 1, data=sig_fams2) #Ruminococcaceae
summary(ASV_3)
```

Hypothesis testing of relative abundance of *Ruminococcaceae*.

```{r}
#comparing to  rumen samples
ASV_2 <- bbdml(ASV_2 ~ Sample_Type + CowID + Day, phi.formula = ~ 1, data=sig_fams) #Lachnospiraceae
summary(ASV_2)
#comparing to grab sample
ASV_2 <- bbdml(ASV_2 ~ Sample_Type + CowID + Day, phi.formula = ~ 1, data=sig_fams2) #Lachnospiraceae
summary(ASV_2)
```

Hypothesis testing of relative abundance of *Lachnospiraceae*.

```{r fig.width=5, fig.height=10}
p1 <- plot(ASV_20, AA = FALSE, color="Sample_Type", shape="CowID") +
  scale_color_manual(values=myColors)+
  labs(title="Prevotellaceae", x="Samples") +
  theme(legend.title=element_blank(),title=element_text(face = "bold", size=9, color="black"),strip.text = element_text(face="bold",size=8.5, color="black"), axis.text= element_text(color="black",face = "bold",size=9), axis.title=element_text(face = "bold", size=9, color="black"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold", size=8.5, color="black"), strip.text.x=element_text(angle=0,face = "bold", color="black", size=8.5), axis.text.x=element_blank(),axis.title.x=element_blank())

p2 <-plot(ASV_3, AA = FALSE, color="Sample_Type", shape="CowID") +
  scale_color_manual(values=myColors)+
  labs(title="Ruminococcaceae", x="Samples") +
  theme(legend.title=element_blank(),title=element_text(face = "bold", size=9, color="black"),strip.text = element_text(face="bold",size=8.5, color="black"), axis.text= element_text(color="black",face = "bold",size=9), axis.title=element_text(face = "bold", size=9, color="black"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold", size=8.5, color="black"), strip.text.x=element_text(angle=0,face = "bold", color="black", size=8.5), axis.text.x=element_blank(), axis.title.x=element_blank())

p3 <- plot(ASV_2, AA = FALSE, color="Sample_Type", shape="CowID") +
  scale_color_manual(values=myColors)+
  labs(title="Lachnospiraceae", x="Samples") +
  theme(legend.title=element_blank(),title=element_text(face = "bold", size=9, color="black"),strip.text = element_text(face="bold",size=8.5, color="black"), axis.text= element_text(color="black",face = "bold",size=9), axis.title=element_text(face = "bold", size=9, color="black"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold", size=8.5, color="black"), strip.text.x=element_text(angle=0,face = "bold", color="black", size=8.5), axis.text.x=element_blank())

# extract the legend from one of the plots
legend_b <- get_legend(p1 + 
    guides(color = guide_legend(nrow = 3)) +
    theme(legend.position = "bottom", legend.box="vertical", legend.direction ="vertical",legend.margin=margin()))

prow1 <- plot_grid(p1 + theme(legend.position="none"), p2 + theme(legend.position="none"),labels = "AUTO", ncol = 1)
prow2 <- plot_grid(p3 + theme(legend.position="none"), legend_b,labels = c("C",""), ncol = 1)
plot_grid(prow1, prow2,labels = c("",""), ncol = 1)

# add the legend to the row we made earlier. Give it one-third of 
# the width of one plot (via rel_widths).
#plot_grid(prow, legend_b, ncol = 1, rel_heights = c(1, .1))
ggsave("GSvsall_Fams2.tiff",device = "tiff", dpi=300, width = 85, height = 200, units = c("mm"))
ggsave("Figure 5.tiff",device = "tiff", dpi=300, width = 85, height = 200, units = c("mm"))
#plot_grid(prow, legend, rel_widths = c(3, .4))
#ggsave("GSvsall_Fams2.png",device = "png", dpi=320, width = 16, height = 3)
```

#Alpha Diversity {.tabset}

##Richness

Richness is defined as an estimate the number of ASVs in a sample. Next we will use breakaway to estimate the number of missing species based on the sequence depth and number of rare taxa in the data. These estimates account for different sequencing depths!

```{r fig.width=14, fig.height=4}
sample_data(ps)$Sample_Type <- factor(sample_data(ps)$Sample_Type, levels = c("Feces","Grab Sample","Liquid Strained","Liquid Unstrained", "Stomach Tube","Solid"))

#This is an alpha diversity estimate -- a special class for alpha diversity estimates
ba <- breakaway(ps)

#checking model
plot(ba)
```

```{r}
#plotting 
plot_alpha_estimates_custom(ba, ps, facet.x = "Sample_Type", color = "Sample_Type", trim_plot = TRUE)+
  labs(x="Samples")+
  scale_color_manual(values=myColors)+
  theme(legend.title=element_blank(),strip.text = element_text(face="bold",size=12, color="black"), axis.text= element_text(color="black",face = "bold",size=11), axis.title=element_text(size=12,face = "bold"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold",size=12), legend.position = "none", axis.text.x = element_blank())
```

```{r eval=FALSE}
richness <- plot_alpha_estimates_custom(ba, ps, facet.x = "Sample_Type", color = "Sample_Type")+
  labs(x="Samples")+
  scale_color_manual(values=myColors)+
  theme(legend.title=element_blank(),strip.text = element_text(face="bold",size=14, color="black"), axis.text= element_text(color="black",face = "bold",size=14), axis.title=element_text(size=14,face = "bold"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold",size=14), legend.position = "none")

ggsave("richness.png", device = "png", dpi=320, width=16, height=4)

#getting data for supplemental file
All <- summary(ba) %>% 
  add_column("SampleNames" = ps %>% otu_table %>% sample_names) %>% 
  add_column("Sample_Type" = ps %>% sample_data %>% get_variable("Sample_Type")) %>%
  add_column("CowID" = ps %>% sample_data %>% get_variable("CowID")) %>%
  add_column("Day" = ps %>% sample_data %>% get_variable("Day"))
#write to excel file
write.xlsx(All, "Richness Estimates.xlsx", sheetName="Richness Estimates")
```

The error bars here are quite large, but this is to be expected as there is a lot of uncertainty in estimating alpha diversity.

Next we will test the hypothesis that different sample types have the same microbial diversity.

```{r}
sample_data(ps)$Sample_Type <- factor(sample_data(ps)$Sample_Type, levels = c("Feces","Grab Sample","Liquid Strained","Liquid Unstrained", "Stomach Tube","Solid"))

#making design matrix
predictors <- ps %>% sample_data %>% get_variable(c("Sample_Type","CowID","Day"))
design_matrix <-  model.matrix( ~ Sample_Type + CowID + Day, data = predictors %>% as.data.frame)

#comparing everything to feces
#Testing differences between Sample Types
bt_ST <- betta(summary(ba)$estimate,
            summary(ba)$error,
            design_matrix)
            #make_design_matrix(ps, "Sample_Type"))
bt_ST$table

#making design matrix
design_matrix <-  model.matrix( ~ 0+ Sample_Type + CowID + Day, data = predictors %>% as.data.frame)

#getting richness estimates for each group
#Testing differences between Sample Types
bt_ST <- betta(summary(ba)$estimate,
            summary(ba)$error,
            design_matrix)
            #make_design_matrix(ps, "Sample_Type"))
bt_ST$table

#Comparing everything to Grab Sample
ps_GBTop <- ps
sample_data(ps_GBTop)$Sample_Type <- factor(sample_data(ps_GBTop)$Sample_Type, levels = c("Grab Sample","Feces","Liquid Strained","Liquid Unstrained", "Stomach Tube","Solid"))
#making design matrix
predictors <- ps_GBTop %>% sample_data %>% get_variable(c("Sample_Type","CowID","Day"))
design_matrix <-  model.matrix( ~ Sample_Type + CowID + Day, data = predictors %>% as.data.frame)

#Testing differences between Sample Types
bt_ST <- betta(summary(ba)$estimate,
            summary(ba)$error,
            design_matrix)
            #make_design_matrix(ps, "Sample_Type"))
bt_ST$table
```

When you break the rumen samples up into different sample types betta() estimates the mean species-level diversity are significantly different compared to fecal samples. Neither the cow or day caused a significant shift in species level diversity. When compared to the grab sample the stomach tube and solid samples have significatly less mean species-level diversity. 

##Evenness

Evenness is defined as how balanced the ASVs are; in other words do they exist in approximately the same relative abundance (1=very even). DivNet will estimate Shannon diversity in the presence of an ecological/microbial network! It also adjusts for different sequencing depths. 

Here we will first look to see if samples types differ. 

```{r eval=FALSE}
#DivNet needs multiple cores to run, at least 10-15 with this dataset so this was saved and run on a cluster
#saveRDS(ps, "C:/Users/Jill/OneDrive - UC Davis/Documents/collaboration/Depeters/ps.rds")
#The following command was using for running divnet
predictors <- ps %>% sample_data %>% get_variable(c("Sample_Type","CowID","Day"))
design_matrix <-  model.matrix( ~ Sample_Type + CowID + Day, data = predictors %>% as.data.frame)

dv_ps_all_gen <- ps_gen %>% divnet(design_matrix, ncores = 15)
```

```{r eval=TRUE}
ps_nonum <- ps
sample_names(ps_nonum) <- paste0("sample_", sample_names(ps_nonum))
#calling in DivNet object
dv_ps_all_ExMod_GS1st <- readRDS("dv_ps_all_ExMod_GS1st.rds")
dv_ps_all_ExMod <- readRDS("dv_ps_all_ExMod.rds")
dv_ps_all <- readRDS("dv_ps_all.rds")
dv_ps_all_gen <- readRDS("dv_ps_all_gen.rds")
```

First, we will graph divnet's estimation of shannon diversity.

```{r fig.height=4, fig.width=13}
#Plotting divnet shannon and simpson diversity
plot_alpha_estimates_custom(dv_ps_all_ExMod_GS1st$shannon, ps_nonum, col = "Sample_Type", facet.x = "Sample_Type",trim_plot = TRUE) +
  scale_color_manual(values=myColors)+
  theme(legend.title=element_blank(),strip.text = element_text(face="bold",size=11, color="black"), axis.text= element_text(color="black",face = "bold",size=11), axis.title=element_text(face = "bold"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.position = "none", axis.text.x = element_blank())
```

Let's look at hypothesis testing for DivNet estimates of shannon and diversity.

```{r eval=TRUE, include=TRUE}
#this is just a wrapper for the betta() function
print("hypothesis test for Shannon diversity")
testDiversity(dv_ps_all_ExMod, "shannon")
testDiversity(dv_ps_all_ExMod_GS1st, "shannon")
```

Both the cow and day had a significant effect on evenness. Fecal samples had significantly lower evenness than samples from the rumen. 

We will put a graph of the richness and evenness together as a figure for publication

```{r fig.height=7, fig.width=9}
richness <- plot_alpha_estimates_custom(ba, ps, facet.x = "Sample_Type", color = "Sample_Type", trim_plot = TRUE)+
  labs(x="Samples")+
  scale_color_manual(values=myColors)+
    theme(legend.title=element_blank(),title=element_text(face = "bold", size=9, color="black"),strip.text = element_text(face="bold",size=9, color="black"), axis.text= element_text(color="black",face = "bold",size=9), axis.title=element_text(face = "bold", size=9, color="black"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold", size=9, color="black"), strip.text.x=element_text(angle=0,face = "bold", color="black", size=9), axis.text.x = element_blank(), legend.position="none")

shannon <-plot_alpha_estimates_custom(dv_ps_all_ExMod_GS1st$shannon, ps_nonum, col = "Sample_Type", facet.x = "Sample_Type",trim_plot = TRUE) +
  scale_color_manual(values=myColors)+
    theme(legend.title=element_blank(),title=element_text(face = "bold", size=9, color="black"),strip.text = element_text(face="bold",size=9, color="black"), axis.text= element_text(color="black",face = "bold",size=9), axis.title=element_text(face = "bold", size=9, color="black"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold", size=9, color="black"), strip.text.x=element_text(angle=0,face = "bold", color="black", size=9), axis.text.x = element_blank(), legend.position="none")

#  theme(legend.title=element_blank(),strip.text = element_text(face="bold",size=11, color="black"), axis.text= element_text(color="black",face = "bold",size=11), axis.title=element_text(face = "bold"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.position = "none", axis.text.x = element_blank())

plots <- plot_grid(shannon, richness,labels = "AUTO", ncol = 1, align = "v")

# extract a legend that is laid out horizontally
legend_b <- get_legend(
  richness +
    guides(color = guide_legend(nrow = 1)) +
    theme(legend.position = "bottom"))

# add the legend underneath the row we made earlier. Give it 10%
# of the height of one plot (via rel_heights).
plot_grid(plots, legend_b, ncol = 1, rel_heights = c(1, .1))
ggsave("alpha_Diversity.tiff",device = "tiff", dpi=300, width = 180, height = 140, units = c("mm"))
ggsave("Figure 2.tiff",device = "tiff", dpi=300, width = 180, height = 140, units = c("mm"))
```

#Beta Diversity

We will plot the bray-curtis distances from Divnet. DivNet uses covariate information to share strength across samples and obtain an estimate about the beta diversity of the *ecosystem* not the samples.

```{r , fig.height=6, fig.width=21, eval=FALSE}
simplifyBeta(dv_ps_all_ExMod_GS1st, ps_nonum, "bray-curtis", "Sample_Type") %>%
  ggplot(aes(x=interaction(Covar1, Covar2), y = beta_est, col=interaction(Covar1, Covar2))) +
  #geom_point()+
  geom_boxplot(size=1) +
  facet_grid(. ~ Covar2, scales="free")+
  geom_linerange(aes(ymin = lower, ymax = upper)) + 
  theme(axis.text.x = element_text(angle = 45, hjust=1,face = "bold", color="black" , size=11), legend.title=element_blank(),axis.text.y= element_text(face = "bold", size=11, color="black"), axis.title=element_text(face = "bold"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold", size=11), legend.position = "none") +
  xlab("") + ylab("DivNet Estimates of Bray-Curtis distance")
ggsave("divnet_beta_Exmod.png")
```

Graph of estimates of bray-curtis distance from model with day and cowID

```{r , fig.height=13, fig.width=8.5, eval=FALSE}
beta <- simplifyBeta(dv_ps_all, ps_nonum, "bray-curtis", "Sample_Type") %>%
  ggplot(aes(x=interaction(Covar1, Covar2), y = beta_est, col=interaction(Covar1, Covar2))) +
  geom_point(col="blue")+
  geom_linerange(aes(ymin = lower, ymax = upper), color="blue") + 
  theme(axis.text.x = element_text(angle = 45, hjust=1,face = "bold", color="black" , size=11), legend.title=element_blank(),axis.text.y= element_text(face = "bold", size=11, color="black"), axis.title=element_text(face = "bold"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),
        legend.text = element_text(face="bold", size=11), legend.position = "none") +
  xlab("") + ylab("DivNet Estimates of Bray-Curtis distance")
ggsave("divnet_beta.png")

pslog <- transform_sample_counts(ps, function(x) log(1 + x))
out.pcoa.log <- ordinate(pslog,  method = "MDS", distance = "bray")
evals <- out.pcoa.log$values[,1]
bray_curtis <- plot_ordination(pslog, out.pcoa.log, color = "Sample_Type") +
  labs(col = "Sample Type", title="Bray-Curtis")+
  scale_color_manual(values=myColors)+
  coord_fixed(sqrt(evals[2] / evals[1])) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, face = "bold", color= "black", size=11), legend.title=element_blank(),axis.text.y= element_text(face = "bold"), axis.title=element_text(face = "bold"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold", size=11))

ggsave("bray-curtis1.png", device = "png", dpi=320, width = 7, height = 10)

plot_grid(beta, bray_curtis, labels = "AUTO", align = "v",  ncol = 1)
ggsave("bray_curtis_both.png",plot_grid(beta, bray_curtis, labels = "AUTO", ncol = 1 , align = "v"), device = "png", dpi=320, height = 13, width=8.5)
```

Graph of estimates of bray-curtis distance from model without day and cowID.

```{r eval=FALSE}
beta <- simplifyBeta(dv_ps_all_gen, ps_nonum, "bray-curtis", "Sample_Type") %>%
  ggplot(aes(x = interaction(Covar1, Covar2), 
             y = beta_est,
            col = interaction(Covar1, Covar2))) +
  geom_point() +
  theme_bw()+
  geom_linerange(aes(ymin = lower, ymax = upper)) + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, face = "bold", color= "black", size=11), legend.title=element_blank(),axis.text.y= element_text(face = "bold"), axis.title=element_text(face = "bold"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.position = "none") +
  xlab("") + ylab("DivNet Estimates of Bray-Curtis distance")

ggsave("bray-curtis.png", device = "png", dpi=320, width = 7, height = 5.5)
```

```{r eval=FALSE}
estimates <- dv_ps_all_gen$`bray-curtis` %>% summary %$% estimate
ses <- sqrt(dv_ps_all_gen$`bray-curtis-variance`)
X <- breakaway::make_design_matrix(ps_nonum, "Sample_Type")
betta(estimates, ses, X)$table
```

```{r eval=FALSE}
betta(estimates, ses, X)$global[2]
```

#Grab vs fecal samples

We will remove other a few sample types to compress the data down and look just at the grab sample and feces.

```{r subsetting sample types}
sample_data(ps)$Sample_Type <- factor(sample_data(ps)$Sample_Type, levels = c("Grab Sample","Feces","Stomach Tube","Liquid Strained","Solid","Liquid Unstrained"))
ps_sub <- subset_samples(ps, Sample_Type == c("Grab Sample") | Sample_Type == c("Feces"))
ps_sub <- prune_taxa(taxa_sums(ps_sub) > 0, ps_sub)
ps_sub
```

After subsetting the data we have `r ntaxa(ps_sub)` ASVs in 24 samples.

Since we saw in the DPCoA that there were two populations of Firmicutes and Bacteriodests that separated rumen and fecal samples, we can ingestigate whether these are different genera or different species that make up this differences. 

```{r}
GS <- subset_samples(ps_sub, Sample_Type == c("Grab Sample"))
Feces <- subset_samples(ps_sub, Sample_Type == c("Feces"))
GS <- subset_taxa(GS, Phylum=="Firmicutes" | Phylum=="Bacteroidetes")
GS <- prune_taxa(taxa_sums(GS) > 0, GS)
Feces <- subset_taxa(Feces, Phylum=="Firmicutes" | Phylum=="Bacteroidetes")
Feces <- prune_taxa(taxa_sums(Feces) > 0, Feces)
#checking Family
print("Found in feces not in GS")
setdiff(get_taxa_unique(Feces, "Family"), get_taxa_unique(GS, "Family"))
print("Found in GS not in feces")
setdiff(get_taxa_unique(GS, "Family"), get_taxa_unique(Feces, "Family"))
```

These are the families that are found in one and not the other sample type. 

```{r}
#checking Genus
print("Found in feces not in GS")
setdiff(get_taxa_unique(Feces, "Genus"), get_taxa_unique(GS, "Genus"))
print("Found in GS not in feces")
setdiff(get_taxa_unique(GS, "Genus"), get_taxa_unique(Feces, "Genus"))
```

These are the genera that are found in one and not the other sample type. 

```{r}
#checking species
print("Found in feces not in GS")
setdiff(get_taxa_unique(Feces, "Species"), get_taxa_unique(GS, "Species"))
print("Found in GS not in feces")
setdiff(get_taxa_unique(GS, "Species"), get_taxa_unique(Feces, "Species"))
```

These are the species that are found in one and not the other sample type. 

There are `r length(setdiff(rownames(otu_table(Feces)), rownames(otu_table(GS))))` ASVs that are in feces, which are not in grab samples and there are `r length(setdiff(rownames(otu_table(GS)), rownames(otu_table(Feces))))` ASVs in grab samples that are not in feces. 

##Corncob

We test all the taxa in our data to see if they are differentially-abundant or differentially-variable. The differentialTest function will these tests on all taxa, while controlling the false discovery rate to account for multiple comparisons.

Although there are species to species differences between samples we might expect this to be the normal variation in sample collection. Thus, we are most concerned with particular family or generna that might be excluded when sampling via different methods.  

```{r include=FALSE, eval=TRUE}
set.seed(1)
ps_sub_gen <- tax_glom(ps_sub, "Genus")
ps_sub_gen
ps_sub_fam <- tax_glom(ps_sub, "Family")
ps_sub_fam
```

```{r Running corncob sub, eval=FALSE, warning=FALSE}
#we do not include the response term because we are testing multiple taxa.

#We specify the covariates of our model using formula and phi.formula 
#We also specify which covariates we want to test for by removing them in the formula_null and phi.formula_null arguments.

# The difference between the formulas and the null version of the formulas
# will be the variables that are tested. In this case, we will be testing the coefficients of Sample Type for
# both the expected relative abundance and the overdispersion.

#We set fdr_cutoff to be our controlled false discovery rate.
#testing @genus level with fdr_cutoff 0.05
set.seed(1)
fullAnalysis_072319_Gen_GSvsF <- differentialTest(formula = ~ Sample_Type + CowID + Day, phi.formula = ~ 1,formula_null = ~ CowID + Day, phi.formula_null = ~ 1, test="Wald", boot=TRUE, data = ps_sub_gen, fdr_cutoff = 0.05)
saveRDS(fullAnalysis_072319_Gen_GSvsF, "/share/magalab/Jill/Depeters/DADA2/July_2019/fullAnalysis_072319_Gen_GSvsF.rds")
#testing @genus level with fdr_cutoff 0.01
set.seed(1)
fullAnalysis_072319_Gen_GSvsF_p01 <- differentialTest(formula = ~ Sample_Type + CowID + Day, phi.formula = ~ 1,formula_null = ~ CowID + Day, phi.formula_null = ~ 1, test="Wald", boot=TRUE, data = ps_sub_gen, fdr_cutoff = 0.01)
saveRDS(fullAnalysis_072319_Gen_GSvsF_p01, "/share/magalab/Jill/Depeters/DADA2/July_2019/fullAnalysis_072319_Gen_GSvsF_p01.rds")
#testing @species level with fdr_cutoff 0.05
set.seed(1)
fullAnalysis_072319_GSvsF <- differentialTest(formula = ~ Sample_Type + CowID + Day, phi.formula = ~ 1,formula_null = ~ CowID + Day,phi.formula_null = ~ 1, test="Wald", boot=TRUE, data = ps_sub, fdr_cutoff = 0.05)
saveRDS(fullAnalysis_072319_GSvsF, "/share/magalab/Jill/Depeters/DADA2/July_2019/fullAnalysis_072319_GSvsF.rds")
```

As we saw that when we graphed out relative abundance of families we will check to see if these differences are significant after taking into account library size differences. Then we will look lower taxonomically. 

```{r}
fullAnalysis_072319_Fam_GSvsF <- readRDS("fullAnalysis_072319_Fam_GSvsF.rds")
otu_to_taxonomy(OTU = fullAnalysis_072319_Fam_GSvsF$significant_taxa, data = ps_sub_fam, level=c("Phylum", "Family","Genus","Species"))
```

There are `r length(fullAnalysis_072319_Fam_GSvsF$significant_taxa)` families that are significantly different between fecal and grab samples.  

```{r fig.height=9, fig.width=6}
#plotting
plot.differentialTest_custom(fullAnalysis_072319_Fam_GSvsF, level=c("Phylum","Family"))+
  theme(strip.text.x=element_text(size=11,face = "bold"),axis.text=element_text(size=11,face = "bold", color="black"), axis.title=element_text(size=11,face = "bold", color = "black"))
```

These are the families that are significantly lower and higher in abundance. 

```{r}
df_fam <- get_data_CC(fullAnalysis_072319_Fam_GSvsF)
table(sign(df_fam$x)) #get numbers of positive and negative coefficents
```

There are 18 families significantly increased and 30 significantly decreased in relative abundance compared to grab samples. 

```{r}
df_fam[order(df_fam$x),c(1,4,8)] %>%
  kable(caption="") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10) %>%
  scroll_box(width = "100%", height = "300px")
```

```{r}
sig_fams <- ps_sub_fam %>% subset_taxa(Family %in% c("Akkermansiaceae","Bacteroidaceae", "Peptostreptococcaceae","Veillonellaceae","Endomicrobiaceae", "Marinifilaceae","Bacteroidales_BS11_gut_group", "Fibrobacteraceae", "Spirochaetaceae","Christensenellaceae","Rikenellaceae"))
print("Model for Peptostreptococcaceae")
ASV_48 <-bbdml(ASV_48 ~ Sample_Type + CowID + Day, phi.formula = ~ 1, data=sig_fams) #Peptostreptococcaceae
summary(ASV_48)
print("Model for Akkermansiaceae")
ASV_330 <-bbdml(ASV_330 ~ Sample_Type + CowID + Day, phi.formula = ~ 1, data=sig_fams) #Akkermansiaceae
summary(ASV_330)
print("Model for Bacteroidaceae")
ASV_36 <- bbdml(ASV_36 ~ Sample_Type + CowID + Day, phi.formula = ~ 1, data=sig_fams) #Bacteroidaceae
summary(ASV_36)
print("Model for Veillonellaceae")
ASV_23 <- bbdml(ASV_23 ~ Sample_Type + CowID + Day, phi.formula = ~ 1, data=sig_fams) #Veillonellaceae
summary(ASV_23)
print("Model for Marinifilaceae")
ASV_2770 <- bbdml(ASV_2770 ~ Sample_Type + CowID + Day, phi.formula = ~ 1, data=sig_fams) #Marinifilaceae
summary(ASV_2770)
print("Model for Bacteroidales_BS11_gut_group")
ASV_197 <- bbdml(ASV_197 ~ Sample_Type + CowID + Day, phi.formula = ~ 1, data=sig_fams) #Bacteroidales_BS11_gut_group
summary(ASV_197)
print("Model for Fibrobacteraceae")
ASV_29 <- bbdml(ASV_29 ~ Sample_Type + CowID + Day, phi.formula = ~ 1, data=sig_fams) #Fibrobacteraceae
summary(ASV_29)
print("Model for Spirochaetaceae")
ASV_68 <- bbdml(ASV_68 ~ Sample_Type + CowID + Day, phi.formula = ~ 1, data=sig_fams) #Spirochaetaceae
summary(ASV_68)
print("Model for Christensenellaceae")
ASV_1 <- bbdml(ASV_1 ~ Sample_Type + CowID + Day, phi.formula = ~ 1, data=sig_fams) #Christensenellaceae
summary(ASV_1)
print("Model for Rikenellaceae")
ASV_60 <- bbdml(ASV_60 ~ Sample_Type + CowID + Day, phi.formula = ~ 1, data=sig_fams) #Rikenellaceae
summary(ASV_60)
```

We will look at a couple genera. 

```{r}
sig_gen <- ps_sub_gen %>% subset_taxa(Genus %in% c("Fibrobacter","Treponema_2"))
print("Model for Fibrobacter")
ASV_68_g <-bbdml(ASV_68 ~ Sample_Type + CowID + Day, phi.formula = ~ 1, data=sig_gen) #Fibrobacter
summary(ASV_68_g)
print("Model for Treponema_2")
ASV_29_g <-bbdml(ASV_29 ~ Sample_Type + CowID + Day, phi.formula = ~ 1, data=sig_gen) #Treponema_2
summary(ASV_29_g)
```


```{r Figure 6, fig.width=13, fig.height=6}
p1 <- plot(ASV_48, AA = FALSE, color="Sample_Type", shape="CowID") +
  scale_color_manual(values=myColors)+
  labs(title="Peptostreptococcaceae", x="Samples") +
theme(legend.title=element_blank(),title=element_text(face = "bold", size=9, color="black"),strip.text = element_text(face="bold",size=9, color="black"), axis.text= element_text(color="black",face = "bold",size=9), axis.title=element_text(face = "bold", size=10, color="black"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold", size=9, color="black"), strip.text.x=element_text(angle=0,face = "bold", color="black", size=9), axis.text.x=element_blank())

p2 <-plot(ASV_330, AA = FALSE, color="Sample_Type", shape="CowID") +
  scale_color_manual(values=myColors)+
  labs(title="Akkermansiaceae", x="Samples") +
theme(legend.title=element_blank(),title=element_text(face = "bold", size=9, color="black"),strip.text = element_text(face="bold",size=9, color="black"), axis.text= element_text(color="black",face = "bold",size=9), axis.title=element_text(face = "bold", size=10, color="black"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold", size=9, color="black"), strip.text.x=element_text(angle=0,face = "bold", color="black", size=9), axis.text.x=element_blank())
p3 <- plot(ASV_36, AA = FALSE, color="Sample_Type", shape="CowID") +
  scale_color_manual(values=myColors)+
  labs(title="Bacteroidaceae", x="Samples") +
theme(legend.title=element_blank(),title=element_text(face = "bold", size=9, color="black"),strip.text = element_text(face="bold",size=9, color="black"), axis.text= element_text(color="black",face = "bold",size=9), axis.title=element_text(face = "bold", size=10, color="black"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold", size=9, color="black"), strip.text.x=element_text(angle=0,face = "bold", color="black", size=9), axis.text.x=element_blank())
p4 <- plot(ASV_23, AA = FALSE, color="Sample_Type", shape="CowID") +
  scale_color_manual(values=myColors)+
  labs(title="Veillonellaceae", x="Samples") +
theme(legend.title=element_blank(),title=element_text(face = "bold", size=9, color="black"),strip.text = element_text(face="bold",size=9, color="black"), axis.text= element_text(color="black",face = "bold",size=9), axis.title=element_text(face = "bold", size=10, color="black"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold", size=9, color="black"), strip.text.x=element_text(angle=0,face = "bold", color="black", size=9), axis.text.x=element_blank())
p5 <- plot(ASV_68, AA = FALSE, color="Sample_Type", shape="CowID") +
  scale_color_manual(values=myColors)+
  labs(title="Spirochaetaceae", x="Samples") +
theme(legend.title=element_blank(),title=element_text(face = "bold", size=9, color="black"),strip.text = element_text(face="bold",size=9, color="black"), axis.text= element_text(color="black",face = "bold",size=9), axis.title=element_text(face = "bold", size=10, color="black"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold", size=9, color="black"), strip.text.x=element_text(angle=0,face = "bold", color="black", size=9), axis.text.x=element_blank())

p6 <- plot(ASV_29, AA = FALSE, color="Sample_Type", shape="CowID") +
  scale_color_manual(values=myColors)+
  labs(title="Fibrobacteraceae", x="Samples") +
theme(legend.title=element_blank(),title=element_text(face = "bold", size=9, color="black"),strip.text = element_text(face="bold",size=9, color="black"), axis.text= element_text(color="black",face = "bold",size=9), axis.title=element_text(face = "bold", size=10, color="black"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold", size=9, color="black"), strip.text.x=element_text(angle=0,face = "bold", color="black", size=9), axis.text.x=element_blank())


prow <- plot_grid(
  p1 + theme(legend.position="none"),
  p2 + theme(legend.position="none"),
  p3 + theme(legend.position="none"),
  p4 + theme(legend.position="none"),
  p6 + theme(legend.position="none"),
  p5 + theme(legend.position="none"),
  align = 'vh', labels = c("A", "B", "C", "D", "E", "F"),
  hjust = -1, nrow = 2)

# extract the legend from one of the plots
legend <- get_legend(
  # extract a legend that is laid out horizontally
  p1 + 
    theme(legend.position="bottom", legend.text = element_text(face="bold", size=10, color="black"))) 

# add the legend underneath the row we made earlier. Give it 10%
# of the height of one plot (via rel_heights).
plot_grid(prow, legend, ncol=1, labels = "", rel_heights = c(1, .1))
ggsave("GSvsF_FamB.tiff",device = "tiff", dpi=300, width = 180, height = 120, units = c("mm"))
ggsave("Figure 7.tiff",device = "tiff", dpi=300, width = 180, height = 120, units = c("mm"))
```

These are graphs of the relative abundance of families significantly differently between fecal and grab samples. This is figure 6.

```{r}
#call in data
fullAnalysis_072319_GSvsF <- readRDS("fullAnalysis_072319_GSvsF.rds")
fullAnalysis_072319_Gen_GSvsF <- readRDS("fullAnalysis_072319_Gen_GSvsF.rds")
fullAnalysis_072319_Gen_GSvsF_p01 <- readRDS("fullAnalysis_072319_Gen_GSvsF_p01.rds")
```

After running corncob `r length(which(is.na(fullAnalysis_072319_Gen_GSvsF$p)) %>% names)` genera could not be fit with the model, but `r length(which(!is.na(fullAnalysis_072319_Gen_GSvsF$p)) %>% names)` were fit to the model and `r length(fullAnalysis_072319_Gen_GSvsF$significant_taxa)` were significanlty differentially abundant genera and `r length(fullAnalysis_072319_GSvsF$significant_taxa)`significanlty differentially abundant ASVs.

We will look into taxa could not be fit to the model and see if we can determine why. 

```{r}
#Calculating how many taxa didn't fit model
length(which(is.na(fullAnalysis_072319_Gen_GSvsF$p)) %>% names)
#extracting out unique names
goodTaxa <- which(is.na(fullAnalysis_072319_Gen_GSvsF$p)) %>% names #pulls out all ASVs
ps_check <- pop_taxa_keep(ps_sub_gen, goodTaxa)
otu_table(ps_check)
```

Although, `r length(which(is.na(fullAnalysis_072319_Gen_GSvsF$p)) %>% names)` taxa could not be fit with the model this looks to be due to circumstances where there is very few reads for a particular ASV (thus, a model can't be fit) or instances where there is only reads in one sample type. See the feature table above. 

```{r}
feces <- otu_table(ps_check) %>% as.data.frame() %>% select(13:24,) 
grab <- otu_table(ps_check) %>% as.data.frame() %>% select(1:12,) 
nrow(grab[which(rowSums(grab) == 0),]) #number of ASVs with zero reads in fecal samples.
nrow(feces[which(rowSums(feces) == 0),]) #number of ASVs with zero reads in grab samples.
```

There are `r nrow(feces[which(rowSums(feces) == 0),]) + nrow(grab[which(rowSums(grab) == 0),])` ASVs that have no reads in one sample type which is why they aren't being fit to the model. Another `r nrow(feces[which(rowSums(feces) == 1),]) + nrow(grab[which(rowSums(grab) == 1),])` ASVs only have one read per sample type which will not allow them to be fit to the model. 

```{r}
print("In feces, but not in grab samples")
#finding AVS with no reads in feces, but a lot in another.
df_fec <- as.data.frame(otu_table(ps_check))[which(rowSums(as.data.frame(otu_table(ps_check))[1:12]) == 0 & rowSums(as.data.frame(otu_table(ps_check))[13:24]) >= 50 ),]
otu_to_taxonomy(OTU = rownames(df_fec), data = ps_check, level=c("Phylum", "Family","Genus","Species"))
otu_table(ps_check)["ASV_1393",]
otu_table(ps_check)["ASV_113",]
otu_table(ps_check)["ASV_1448",]
print("In grab samples, but not in feces")
#finding AVS with no reads in feces, but a lot in another.
df_gs <- as.data.frame(otu_table(ps_check))[which(rowSums(as.data.frame(otu_table(ps_check))[13:24]) == 0 & rowSums(as.data.frame(otu_table(ps_check))[1:12]) >= 50 ),]
otu_to_taxonomy(OTU = rownames(df_gs), data = ps_check, level=c("Phylum", "Family","Genus","Species"))
```

These ASVs couldn't be fit to the model since there was zero reads in one sample type, but the other sample type has over 50 reads.

Let's graph the one with the most reads ASV_113 Clostridioides. 

```{r eval=FALSE}
unfit_taxa <- rownames(df_gs)
#Making relative
ps_sub_gen_rel <- transform_sample_counts(ps_sub_gen, function(x) 100*(x/sum(x)))
ps_unfit_rel <- pop_taxa_keep(ps_sub_gen_rel, unfit_taxa)

#calculating error bars to graph mean transformed abundance of major phyla
melted <- psmelt(ps_unfit_rel)
grouped <- dplyr::group_by(melted, Sample_Type, Genus)
unfit <- as.data.frame(dplyr::summarise(grouped, mean=mean(Abundance), sd=sd(Abundance), sem = (sd(Abundance)/sqrt(length(Abundance)))))

#Plotting relative abundance
ggplot(unfit, aes(x=Genus, y=mean, fill= Genus))+
  geom_bar(aes(color=Genus, fill=Genus), stat="identity", position=position_dodge(), width=0.5)+
  geom_errorbar(aes(ymin=mean-sem, ymax=mean+sem),width=.2, position=position_dodge())+
  geom_abline(intercept = 0, slope = 0)+
  theme_bw()+
  facet_grid(. ~ Sample_Type,scales="free", space="free_x")+
  theme(legend.position="none",axis.text.x=element_text(angle = 45, vjust = 1, hjust=1,face = "bold"),strip.text.y=element_text(angle=0,face = "bold"),strip.text.x=element_text(angle=0,face = "bold"),axis.text= element_text(face = "bold"), axis.title=element_text(face = "bold"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),aspect.ratio = 2/1.5)+
  labs(x="", y="Relative Abundance")
```

Now, we will return to the corncob output. We can see a list of differentially-abundant taxa using: 

```{r eval=FALSE}
unique(otu_to_taxonomy(OTU = fullAnalysis_072319_Gen_GSvsF$significant_taxa, data = ps_sub, level=c("Phylum", "Family","Genus", "Species")))
```

There are `r length(fullAnalysis_072319_Gen_GSvsF$significant_taxa)` genera differentially abundant. We will look at the unique families they represent. 

```{r fig.height=12, fig.width=7}
plot.differentialTest_custom(fullAnalysis_072319_Gen_GSvsF, level=c("Phylum","Family","Genus","Species"), taxa_filter="Firmicutes")+
  theme(strip.text.x=element_text(size=11,face = "bold"),axis.text=element_text(size=11,face = "bold", color="black"), axis.title=element_text(size=11,face = "bold", color = "black"))
```

```{r}
table(otu_to_taxonomy(OTU = fullAnalysis_072319_GSvsF$significant_taxa, data = ps_sub, level=c("Phylum","Family"))) %>% as.data.frame() %>% arrange(-Freq) %>%
#making table of phyla ASVs taxa 
kable(caption="# of significant genera in each family") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10) %>%
  scroll_box(width = "100%", height = "300px")
```

```{r fig.height=10, fig.width=8}
plot.differentialTest_custom(fullAnalysis_072319_GSvsF, level=c("Phylum","Family","Genus","Species"), taxa_filter="Bacteroidetes")+
  theme(strip.text.x=element_text(size=11,face = "bold"),axis.text=element_text(size=11,face = "bold", color="black"), axis.title=element_text(size=11,face = "bold", color = "black"))
```

This graphs the ASVs in Bacteroidetes differentially abundantant. We do see that *Rikenellaceae* is lower in abundandance in feces compared grab samples, but we saw this was familiy was higher in relative abundance before so we will double check that. 

```{r fig.height=6, fig.width=6}
ps_rel <- transform_sample_counts(ps, function(x) 100*(x/sum(x)))
Feces <- subset_samples(ps_rel, Sample_Type == c("Feces"))

Akk <- subset_taxa(Feces, Family == "Akkermansiaceae")
Akk <- prune_taxa(taxa_sums(Akk) > 0, Akk)
#calculating error bars to graph mean transformed abundance of major fam
melted <- psmelt(Feces)
grouped <- dplyr::group_by(melted, Sample_Type, Genus)
AKK <- as.data.frame(dplyr::summarise(grouped, mean=mean(Abundance), sd=sd(Abundance), sem = (sd(Abundance)/sqrt(length(Abundance)))))

#Ordering
AKK <- AKK[order(-AKK$mean),] #ordering by mean
kable(AKK, caption="Statistiscs for Abundance of Akkermansiaceae") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10) %>%
  scroll_box(width = "100%", height = "300px")
```

```{r}
#Making relative
ps_rel <- transform_sample_counts(ps, function(x) 100*(x/sum(x)))
Rik <- subset_taxa(ps_rel, Family=="Rikenellaceae")

#calculating error bars to graph mean transformed abundance of major fam
melted <- psmelt(Rik)
grouped <- dplyr::group_by(melted, Sample_Type, Genus)
RIK <- as.data.frame(dplyr::summarise(grouped, mean=mean(Abundance), sd=sd(Abundance), sem = (sd(Abundance)/sqrt(length(Abundance)))))

#Ordering
RIK <- RIK[order(-RIK$mean),] #ordering by mean
#fam <- fam[order(-fam$Phylum),] #change to order by name*
#fam[,3:5] <- format(RIK[,3:5], digits = 3, scientific=F)
kable(RIK, caption="Statistiscs for Abundance of Rikenellaceae") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10) %>%
  scroll_box(width = "100%", height = "300px")
#replacing - sign in genus names
RIK$Genus <- gsub("-", "_", RIK$Genus)

#Plotting relative abundance
ggplot(RIK, aes(x=Sample_Type, y=mean, fill= Genus))+
  geom_bar(aes(color=Genus, fill=Genus), stat="identity", position=position_dodge(1), size=0.5, width=0.5)+
  geom_errorbar(aes(ymin=mean-sem, ymax=mean+sem),width=.2, position=position_dodge(1))+
  geom_abline(intercept = 0, slope = 0)+
  theme_bw()+
  facet_grid(Genus ~ ., labeller = label_parsed, scales="free")+
  theme(legend.position="none",axis.text.x=element_text(angle = 45, vjust = 1, hjust=1,face = "bold"),strip.text.y=element_text(angle=0,face ="bold"),
        strip.text.x=element_text(angle=0,face = "bold"), axis.text= element_text(face ="bold"),
        axis.title=element_text(face="bold"),panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank())+
  labs(x="", y="Average Relative Abundance")
```

```{r fig.height=10, fig.width=6}
Rik_gen <- subset_taxa(ps_rel, Genus=="Rikenellaceae_RC9_gut_group")
#calculating error bars to graph mean transformed abundance of major fam
melted_gen <- psmelt(Rik_gen)
grouped_gen <- dplyr::group_by(melted_gen, Sample_Type, OTU)
RIK_gen <- as.data.frame(dplyr::summarise(grouped_gen, mean=mean(Abundance), sd=sd(Abundance), sem = (sd(Abundance)/sqrt(length(Abundance)))))
RIK_gen <- RIK_gen %>% filter(grepl('ASV_16$|ASV_111$ |ASV_98$|ASV_977$|ASV_804$|ASV_78$|ASV_609$|ASV_604$|ASV_56$|ASV_391$|ASV_30$|ASV_265$|ASV_226$|ASV_19$|ASV_180$|ASV_1686$|ASV_16$|ASV_132$|ASV_125$|ASV_1240$|ASV_115$', OTU))

ggplot(RIK_gen, aes(x=Sample_Type, y=mean, fill= OTU))+
  geom_bar(aes(color=OTU, fill=OTU), stat="identity", position=position_dodge(1), size=0.5, width=0.5)+
  geom_errorbar(aes(ymin=mean-sem, ymax=mean+sem),width=.2, position=position_dodge(1))+
  geom_abline(intercept = 0, slope = 0)+
  theme_bw()+
  facet_grid(OTU ~ ., labeller = label_parsed, scales="free")+
  theme(legend.position="none",axis.text.x=element_text(angle = 45, vjust = 1, hjust=1,face = "bold"),strip.text.y=element_text(angle=0,face ="bold"),
        strip.text.x=element_text(angle=0,face = "bold"), axis.text= element_text(face ="bold"),
        axis.title=element_text(face="bold"),panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank())+
  labs(x="", y="Average Relative Abundance")
```

From these graphs we can see that *Rikenellaceae_RC9_gut_group* appears to be higher in feces there is also other genera (*Alistipes*,*dgA_11_gut_group*) in the *Rikenellaceae* family that cause the overall relative abundance of this family to be higher than in grab samples. However, there are certain ASVs in *Rikenellaceae* that are significantly lower in feces. This backs up the corncob data. 

```{r fig.height=13, fig.width=7}
#plotting
plot.differentialTest_custom(fullAnalysis_072319_Gen_GSvsF, level=c("Phylum","Family","Genus","Species"), taxa_filter="Firmicutes")+
  theme(strip.text.x=element_text(size=11,face = "bold"),axis.text=element_text(size=11,face = "bold", color="black"), axis.title=element_text(size=11,face = "bold", color = "black"))
```

Genera in Firmicutes differentially abundantant between feces and grab samples.

```{r}
table(otu_to_taxonomy(OTU = fullAnalysis_072319_GSvsF$significant_taxa, data = ps_sub, level=c("Phylum"))) %>% as.data.frame() %>% arrange(-Freq) %>%
kable(caption="# of Significant ASVs by Phyla") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10) %>%
  scroll_box(width = "100%", height = "300px")
```

While the most common phyla to have significant differentially abundant taxa were Firmicutes and Bacteroidetes this could be because they are the most dominant taxa rather than really being "more important" in distinguishing sample types. 

```{r}
#getting number of significant taxa assigned to teach phyla
df_new <- as.data.frame(fullAnalysis_072319_GSvsF$significant_taxa)
colnames(df_new) <- c("taxa")
ltax <- as.list(fullAnalysis_072319_GSvsF$significant_taxa)
df_new$Phylum <- unlist(lapply(ltax, function(ltax) otu_to_taxonomy(ltax, fullAnalysis_072319_GSvsF$data, level = "Phylum")))
keep <- as.data.frame(table(df_new$Phylum))
keep <- merge(keep,as.data.frame(table(tax_table(ps)[,"Phylum"])),by="Var1",all=TRUE)
keep[is.na(keep)] <- 0  #change NAs to 0
keep$percent <- (keep$Freq.x/keep$Freq.y)*100
colnames(keep) <- c("Phylum", "#Significant ASVs", "Total ASVs", "Percent Significant ASVs")
#making table of phyla ASVs taxa 
kable(keep, caption="# of Significant ASVs by Phyla") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10) %>%
  scroll_box(width = "100%", height = "300px")
```

This table shows that while Firmicutes and Bacteroidetes are the most common phyla to have differentially abundant taxa this is in part due the fact that they are the most prevelant phyla. As a percent Chloroflexi and Euryarcheota are more common.  

We will graph out the significantly different taxa from these phylums.
```{r fig.height=4, fig.width=10}
plot.differentialTest_custom(fullAnalysis_072319_Gen_GSvsF, level=c("Phylum","Genus","Species"), taxa_filter="Chloroflexi")+
  theme(strip.text.x=element_text(size=11,face = "bold"),axis.text=element_text(size=11,face = "bold", color="black"), axis.title=element_text(size=11,face = "bold", color = "black"))

#plotting
plot.differentialTest_custom(fullAnalysis_072319_GSvsF, level=c("Phylum","Genus","Species"), taxa_filter="Euryarchaeota")+
  theme(strip.text.x=element_text(size=11,face = "bold"),axis.text=element_text(size=11,face = "bold", color="black"), axis.title=element_text(size=11,face = "bold", color = "black"))

#plotting
plot.differentialTest_custom(fullAnalysis_072319_GSvsF, level=c("Phylum","Family","Genus","Species"), taxa_filter="Proteobacteria")+
  theme(strip.text.x=element_text(size=11,face = "bold"),axis.text=element_text(size=11,face = "bold", color="black"), axis.title=element_text(size=11,face = "bold", color = "black"))
```

Here we can see that the Euryarchaeota that are important for telling samples types apart are all methogens. Feces has a strong negative effect on most of these methanogens (methogens are lower in feces). Interestingly, fecal samples have lower *Flexilinea*. 

```{r}
#otu_table(subset_taxa(ps_sub, Phylum=="Chloroflexi"))[c("ASV_1315","ASV_727", "ASV_612","ASV_594","ASV_409","ASV_2100","ASV_1877","ASV_1761", "ASV_1570","ASV_1003"),]
#is there any other taxa in chloroflexi?
get_taxa_unique(subset_taxa(ps_sub, Phylum=="Chloroflexi"), "Genus")
otu_table(subset_taxa(ps_sub, Phylum=="Chloroflexi"))
feces <- otu_table(subset_taxa(ps_sub, Phylum=="Chloroflexi")) %>% as.data.frame() %>% select(13:24,) 
nrow(feces[which(rowSums(feces) == 0),]) #number of ASVs with zero reads in fecal samples.
```

*Flexilinea* is the only Genus found in the phylum Chloroflexi in this data set. It also seems like there are a number of ASVs that don't have an reads in fecal samples. Let's look at the ASV level.

```{r fig.height=6, fig.width=8}
#Making relative
ps_sub_rel <- transform_sample_counts(ps_sub, function(x) 100*(x/sum(x)))
#is there any other taxa in chloroflexi?
ps_ch <- subset_taxa(ps_sub, Phylum=="Chloroflexi")
ps_sub_new <- ps_ch #make copy of phyloseq object
new_taxa <- as.data.frame(ps_sub_new@tax_table@.Data) #make new data frame of species
new_taxa$ASV <- row.names(new_taxa)
ps_sub_new@tax_table@.Data[,"Species"] <- new_taxa$ASV

#calculating error bars to graph mean transformed abundance of major phyla
melted <- psmelt(ps_sub_new)
grouped <- dplyr::group_by(melted, X.SampleID, Species, Sample_Type)
chlor <- as.data.frame(dplyr::summarise(grouped, mean=mean(Abundance), sd=sd(Abundance), sem = (sd(Abundance)/sqrt(length(Abundance)))))

#Plotting relative abundance
plot_bar(ps_sub_new, fill = "Species", title = "Relative Abundance of ASVs in Chloroflexi")+
  geom_bar(aes(color=Species, fill=Species), stat="identity", position="stack")+
  geom_abline(intercept = 0, slope = 0)+
  labs(y="Percent Relative Abundance", x="")+
  theme_bw()+
    theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1, face = "bold", color= "black", size=11), legend.title=element_blank(),axis.text.y= element_text(face = "bold"), axis.title=element_text(face = "bold"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text=element_text(face = "bold"))
```

We will look into Euryarchaeota ASVs a bit more now.

```{r}
feces <- otu_table(subset_taxa(ps_sub, Phylum=="Euryarchaeota")) %>% as.data.frame() %>% select(13:24,) 
nrow(feces[which(rowSums(feces) == 0),]) #number of ASVs in grab sample, not in fecal samples.
otu_to_taxonomy(OTU=rownames(feces[which(rowSums(feces) == 0),]), data =ps_sub) #What are those taxa that don't have reads in feces
otu_table(subset_taxa(ps_sub, Family=="Methanomethylophilaceae")) #checking this family
#otu_to_taxonomy(OTU=rownames(otu_table(subset_taxa(ps_sub, Phylum=="Euryarchaeota"))), data =ps_sub)
otu_table(subset_taxa(ps_sub, Phylum=="Euryarchaeota"))[c("ASV_1434","ASV_4298"),] #checking Methanocorpusculum ASVs
```

There are increased amounts of *Methanocorpusculum* in one ASV, but the other ASV has low amounts so it seems like not a strong enough association to bring this up. 

Based on the DPCoA the phyla Spirochaetes and Actinobacteria also play and important role in distinguishing feces from grab samples. 

```{r}
plot.differentialTest_custom(fullAnalysis_072319_GSvsF, level=c("Phylum","Genus","Species"), taxa_filter="Spirochaetes")+
  theme(strip.text.x=element_text(size=11,face = "bold"),axis.text=element_text(size=11,face = "bold", color="black"), axis.title=element_text(size=11,face = "bold", color = "black"))

#plotting
plot.differentialTest_custom(fullAnalysis_072319_GSvsF, level=c("Phylum","Family", "Genus","Species"), taxa_filter="Actinobacteria")+
  theme(strip.text.x=element_text(size=11,face = "bold"),axis.text=element_text(size=11,face = "bold", color="black"), axis.title=element_text(size=11,face = "bold", color = "black"))
```



```{r}
table(otu_to_taxonomy(OTU = fullAnalysis_072319_Gen_GSvsF$significant_taxa, data = ps_sub_gen, level=c("Phylum","Family"))) %>% as.data.frame() %>% arrange(-Freq) %>%
kable(caption="# of Significant ASVs by Phyla") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10) %>%
  scroll_box(width = "100%", height = "300px")
```

From this we can see that *Lachnospiraceae*, *Ruminococcaceae*, *Prevotellaceae* and *Erysipelotrichaceae* were the most common families to be differentially abundant in feces compared to grab samples. We will take a closer look at all ASVs differentially abundant.

```{r}
#getting taxonomy
DA_taxa <- otu_to_taxonomy(OTU=fullAnalysis_072319_Gen_GSvsF$significant_taxa, data=ps_sub_gen)
#Getting p-values of Differentially variable ASVs
ASVs <- c(row.names(as.data.frame(DA_taxa)))
df <- as.data.frame(fullAnalysis_072319_Gen_GSvsF$p_fdr)
df$ASV <- row.names(df)
df <- df[row.names(df) %in% ASVs, ]
df <- merge(df,as.data.frame(DA_taxa),by=0, all=TRUE) #by=0 means merge by row names
#df <- subset(df, select = c(DA_taxa,fullAnalysis$p_fdr,Row.names))
#df$ASV <- df$Row.names
df$Row.names <- NULL
colnames(df) <- c("p_value", "ASV", "Taxa")
#df$p_value <- format(df$p_value, digits = 3)
options(scipen = 999) #take out of scientific notation
df <- df[order(df$p_value),] #order column p_value
df$p_value <- format(df$p_value, digits = 3, scientific = TRUE) #put back into scientific notation
kable(df, caption="Differentially Abundant Taxa") %>% 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10) %>%
  scroll_box(width = "100%", height = "300px")
```

These are taxa that are differentailly **abundant** between grab samples and feces and their false discovery corrected p-value. ASVs are listed by significance. 

```{r warning=FALSE}
otu_table(ps_sub)["ASV_1",]
ASV_1 <- bbdml(formula = ASV_1 ~ Sample_Type+CowID+Day, phi.formula = ~ 1, data=ps_sub_gen)
summary(ASV_1)
```

This is the feature table for ASV_1, let's graph out this most significantly differentially abundant ASV *Christensenellaceae_R-7_group*. There is significantly less of this taxa in feces compared to grab samples. The day did not effect this, but there was significant cow differences in abundance. 

```{r}
plot(ASV_1, color="Sample_Type") +
  scale_color_manual(values=myColors)+
theme(axis.text = element_text(face = "bold"), legend.title=element_blank(),axis.title=element_text(face = "bold"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold"))
plot(ASV_1, AA = TRUE, color="Sample_Type") +
  scale_color_manual(values=myColors)+
  theme(axis.text = element_text(face = "bold"), legend.title=element_blank(),axis.title=element_text(face = "bold"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold"))
```

Here we see that *Christensenellaceae_R-7_group* is more abundant in feces vs grab samples. In fact if we look at the feature table above we see this taxa is almost not present in the rumen grab samples. 

```{r fig.height=18, fig.width=8}
plot(fullAnalysis_072319_Gen_GSvsF,level=c("Family","Genus","Species"))+
  theme(strip.text.x=element_text(size=11,face = "bold", color="black"),axis.text=element_text(face = "bold", size = 11, color = "black"), axis.title.y = element_text(color="black", size=11, face="bold"))
```

```{r fig.height=20, fig.width=8, eval=FALSE}
#making figure for paper
gs_F <- plot(fullAnalysis_072319_Gen_GSvsF,level=c("Family","Genus","Species"))+
  theme(strip.text.x=element_text(size=12,face = "bold", color="black"),axis.text=element_text(face = "bold", size = 12, color = "black"), axis.title.y = element_text(color="black", size=12, face="bold"))

ggsave("GSvsF.png",device = "png", dpi=320, width = 8, height = 20)
```

#Grab Samples vs other rumen samples

We will remove other a few sample types to compress the data down and look just at the grab sample and stomach tube.

```{r subsetting rumen sample types}
ps_sub <- subset_samples(ps, Sample_Type != c("Feces"))
ps_sub <- prune_taxa(taxa_sums(ps_sub) > 0, ps_sub)
ps_sub
```

After subsetting the data we have 4,690 ASVs in 56 samples.

###Metrics after filtering

```{r}
#number of taxa present
ntaxa(ps_sub)
#checking names of taxa present at specific rank
length(get_taxa_unique(ps_sub, "Phylum"))
length(get_taxa_unique(ps_sub, "Order"))
length(get_taxa_unique(ps_sub, "Family"))
length(get_taxa_unique(ps_sub, "Genus"))
```

Previously, there are `r ntaxa(ps)` ASVs in the dataset. This was composed of `r length(get_taxa_unique(ps, "Phylum"))` phyla, `r length(get_taxa_unique(ps, "Order"))` Orders, `r length(get_taxa_unique(ps, "Family"))` Families and `r length(get_taxa_unique(ps, "Genus"))` Genera. In the new subset we have `r ntaxa(ps_sub)` ASVs in the dataset. This is composed of `r length(get_taxa_unique(ps_sub, "Phylum"))` phyla, `r length(get_taxa_unique(ps_sub, "Order"))` Orders, `r length(get_taxa_unique(ps_sub, "Family"))` Families and `r length(get_taxa_unique(ps_sub, "Genus"))` Genera.

##Exploratory Analysis

```{r qplot sub}
qplot(log10(rowSums(otu_table(ps_sub))),binwidth=0.2) +
  xlab("Logged counts-per-sample")
```

Again we will transformed the data for some exploratory analysis.
 
```{r Bray-Curtis distance sub}
set.seed(1850)
#bray curtis distance
pslog <- transform_sample_counts(ps_sub, function(x) log(1 + x))
out.pcoa.log <- ordinate(pslog,  method = "MDS", distance = "bray")
evals <- out.pcoa.log$values[,1]
plot_ordination(pslog, out.pcoa.log, color = "Sample_Type") +
  labs(col = "Sample Type")+
  scale_color_manual(values=myColors)+
  coord_fixed(sqrt(evals[2] / evals[1]))+
  scale_color_manual(values=myColors)+
  theme(legend.title=element_blank(),title=element_text(face="bold"),strip.text = element_text(face="bold",size=11, color="black"), axis.text= element_text(color="black",face = "bold",size=11), axis.title=element_text(face = "bold"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold",size=12))
```

Here we begin to see that Stomach Tube samples are more variable than the grab sample. Liquid strained samples seem to be the most variable (maybe comprable to stomach tube samples). Additionally, there seems to be two clusters for stomach tube samples (it's probably not significant though). Potentially due to the presence of fiber in the sample or not?  

```{r Weighted Eigenvalues sub, warning=FALSE, fig.width=3, fig.height=3}
set.seed(1850)
#weighted unifrac
out.wuf.log <- ordinate(pslog, method = "MDS", distance = "wunifrac")
evals <- out.wuf.log$values$Eigenvalues
eval_per_wuf <- (out.wuf.log$values$Eigenvalues/(sum(out.wuf.log$values$Eigenvalues)))*100
#Plotting eigenvalues to determine how many axis should be shown in graph
barplot(eval_per_wuf[1:10],names.arg=paste0('Eigenvalue',1:10), ylab="Percent of explained variances", col="blue")
```

From the eigenvalues we can see that 2 axis is appropriate for graphing, together explaining almost 90% of the variance between the samples. 

```{r Plot weighted sub}
plot_ordination(pslog, out.wuf.log, color = "Sample_Type") +
  labs(col = "Sample Type", title="Weighted Unifrac") +
  coord_fixed(sqrt(evals[2] / evals[1]))+
  scale_color_manual(values=myColors)+
  theme(legend.title=element_blank(),title=element_text(face="bold"),strip.text = element_text(face="bold",size=11, color="black"), axis.text= element_text(color="black",face = "bold",size=11), axis.title=element_text(face = "bold"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold",size=12))
#saving
ggsave("rumen_weighted.png", device = "png", dpi=320, height = 8, width=8)
```

Now that we take into account phylogenetic information in the distance metric there is a lot more variation explained. The first Axsis contains more variation that the second and mostly separates liquid (strained and unstrained) and some stomach tube samples from solid and grab samples. 

Let's make a figure with Weighted unifrac with and without the fecal samples

```{r fig.width=8.5}
set.seed(1850)
pslog_or <- transform_sample_counts(ps, function(x) log(1 + x))
out.wuf.log_or <- ordinate(pslog_or, method = "MDS", distance = "wunifrac")
evals_or <- out.wuf.log_or$values$Eigenvalues

fecal_rumen <- plot_ordination(pslog_or, out.wuf.log_or, color = "Sample_Type") +
  coord_fixed(sqrt(evals_or[2] / evals_or[1]))+
  scale_color_manual(values=myColors)+
  theme(legend.title = element_blank(), strip.text = element_text(face="bold",size=11, color="black"), axis.text= element_text(color="black",face = "bold",size=11), axis.title=element_text(face = "bold"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold",size=12))
  
set.seed(1850)
rumen <- plot_ordination(pslog, out.wuf.log, color = "Sample_Type") +
  coord_fixed(sqrt(evals[2] / evals[1]))+
  scale_color_manual(values=myColors)+
  theme(legend.title = element_blank(),strip.text = element_text(face="bold",size=11, color="black"), axis.text= element_text(color="black",face = "bold",size=11), axis.title=element_text(face = "bold"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold",size=12))

#saving
prow <- plot_grid(
  fecal_rumen + theme(legend.position="none"),
  rumen + theme(legend.position="none"),
  align = 'v',labels = c("A", "B"),hjust = -1, ncol=1)

legend <- get_legend(
  # create some space to the left of the legend
  fecal_rumen +  guides(color = guide_legend(nrow = 1)) +
    theme(legend.position = "bottom"))

# add the legend to the row we made earlier. Give it one-third of 
# the width of one plot (via rel_widths).
plot_grid(prow, legend, ncol = 1, rel_heights = c(1, .1))

ggsave("weighted.png", device = "png", dpi=320, height = 8, width=8)
```


```{r Unweighted Eigenvalues sub, fig.width=3, fig.height=3}
out.unwuf.log <- ordinate(pslog, method = "MDS", distance = "unifrac")
eval_per_unwuf <- (out.unwuf.log$values$Eigenvalues/(sum(out.unwuf.log$values$Eigenvalues)))*100
#Plotting eigenvalues to determine how many axis should be shown in graph
barplot(eval_per_unwuf[1:10],names.arg=paste0('Eigenvalue',1:10), ylab="Percent of explained variances", col="blue")
```

The eigenvalues here show the variation is spread across many axis, thus a 3D graph is best. You can find it hosted online [here](https://Jill.github.io/Depeters_RumenSampling_2018/UnWeighted_unifrac.html).

```{r Unweighted plot sub, warning=FALSE}
#UnWeighted unifrac
evals_un <- out.unwuf.log$values$Eigenvalues
#plot_ordination(pslog, out.unwuf.log, color = "Sample_Type", shape="CowID") +
#  labs(col = "Sample Type", title="Unweighted unifrac") +
#  coord_fixed(sqrt(evals_un[2] / evals_un[1]))

#getting data for 3D scatter plot
Meta_Scatter <- data.frame(sample_data(pslog))
Axis_unweighted <- as.data.frame(out.unwuf.log$vectors[,1:4])
Axis_unweighted$Sample_Type <- Meta_Scatter$Sample_Type

plot_ly(Axis_unweighted, x = ~Axis.1,y = ~Axis.2, z = ~Axis.3, color= ~Sample_Type , colors = myColors)
#htmlwidgets::saveWidget(as.widget(plot_ly(Axis_unweighted, x = ~Axis.1,y = ~Axis.2, z = ~Axis.3, color= ~Sample_Type)), "UnWeighted_unifrac.html")
```

As before less of the variation is explained in each axis with the unweighted versus the weighted unifrac. Thus it seems that the difference between sample types is due to abundance differences and less about differences in species. We also, see the two clusters of stomach tube samples appearing again. This strengthens the hypothesis that stomach tube samples are more variable than grab samples and that there a minor taxa that explain differences between stomach tube samples. Addtionally, it looks like the two clusters of stomach tube samples might be forming due to individual cow differences (not a breed difference). Liquid sample remain different from grab and stomach tube samples. 

```{r fig.width=3, fig.height=3}
ps_nonum_sub <- ps_sub
sample_names(ps_nonum_sub) <- paste0("sample_", sample_names(ps_nonum_sub))
#Divnet/DPCoA don't like numbers for samples
#ps_nonum <- tax_glom(ps_nonum, "Genus")
pslog <- transform_sample_counts(ps_nonum_sub, function(x) log(1 + x))
set.seed(1)
#out.DP.log_sub <- ordinate(pslog, method = "DPCoA") #default distance is bray
#saveRDS(out.DP.log_sub, "out.DP.log_GSvsRumen_sub.RDS")
out.DP.log_sub <- readRDS("out.DP.log_GSvsRumen_sub.RDS")
plot_ordination(pslog, out.DP.log_sub, type="scree")
```

The eigenvalues here show 2 axis are sufficient to capture most of the total variation.
```{r eval=FALSE, fig.width=13, fig.height=9}
fig2a <-plot_ordination(pslog, out.DP.log_sub, color = "Sample_Type", type="biplot") +
  labs(col = "Sample Type", title="DPCoA of Bray distance") +
  coord_fixed(sqrt(evals_DP[2] / evals_DP[1]))+
  scale_color_manual(values=myColors_DPCoA)+
      theme(legend.title=element_blank(),strip.text = element_text(face="bold",size=11, color="black"), axis.text= element_text(color="black",face = "bold",size=11), axis.title=element_text(face = "bold"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold",size=12))

fig2b <-plot_ordination(pslog, out.DP.log_sub, type = "Species", color = "Phylum") +
  coord_fixed(sqrt(evals_DP[2] / evals_DP[1]))+
  #geom_text_repel(aes(label=Species))+
      theme(legend.title=element_blank(),strip.text = element_text(face="bold",size=11, color="black"), axis.text= element_text(color="black",face = "bold",size=11), axis.title=element_text(face = "bold"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold",size=12))

plot_grid(fig2a, fig2b, labels = "AUTO", align = "v", ncol = 1)
ggsave("rumen_dpcoa.png",plot_grid(fig2a, fig2b, labels = "AUTO", align = "v", ncol = 1), device = "png", dpi=320, height = 10, width=10)
```

We see again that the 1st axis corresponds is separtating liquid strained and unstrained samples from other rumen samples. This plot suggest there are more Bacteroidetes and Kiritimatiellaeota in the liquid samples, while rumen samples have more Firmicutes. This can also be seen in the first DPCoA we made where we said that Liquid samples have more Bacteroidetes and less Firmicutes than other rumen sample types. Thus, we will probably only need to have one DPCoA graph in the paper. 

We should see what taxa are differentially more or less abundant in grab sample vs other rumen sample types. 

```{r Running corncob rumen all, eval=FALSE, warning=FALSE}
#we do not include the response term because we are testing multiple taxa.

#We specify the covariates of our model using formula and phi.formula 
#We also specify which covariates we want to test for by removing them in the formula_null and phi.formula_null arguments.

# The difference between the formulas and the null version of the formulas
# will be the variables that are tested. In this case, we will be testing the coefficients of Sample Type for
# both the expected relative abundance and the overdispersion.

#changing the order of factor levels
sample_data(ps_sub)$Sample_Type <- factor(sample_data(ps_sub)$Sample_Type, levels = c("Grab Sample","Stomach Tube","Liquid Strained","Liquid Unstrained","Solid"))
#
#collapse by genus
ps_gen <- tax_glom(ps_sub, "Genus")
#collapse by family
ps_fam<- tax_glom(ps_sub, "Family")
#testing differential abundance of genera with p<0.05
set.seed(1)
fullAnalysis_072319 <- differentialTest(formula = ~ Sample_Type + CowID + Day, phi.formula = ~ 1,formula_null = ~ CowID + Day, phi.formula_null = ~ 1, test="Wald",boot=TRUE, data = ps_gen, fdr_cutoff = 0.05)
saveRDS(fullAnalysis_072319, "/share/magalab/Jill/Depeters/DADA2/July_2019/fullAnalysis_072319_Gen_GSvsAllRumen.rds")
#testing differential abundance of genera with p<0.01
set.seed(1)
fullAnalysis_072319_p01 <- differentialTest(formula = ~ Sample_Type + CowID + Day, phi.formula = ~ 1,formula_null = ~ CowID + Day, phi.formula_null = ~ 1, test="Wald",boot=TRUE, data = ps_gen, fdr_cutoff = 0.01)
saveRDS(fullAnalysis_072319_p01, "/share/magalab/Jill/Depeters/DADA2/July_2019/fullAnalysis_072319_Gen_GSvsAllRumen_p01.rds")
#
set.seed(1)
fullAnalysis_072319 <- differentialTest(formula = ~ Sample_Type + CowID + Day, phi.formula = ~ 1,formula_null = ~ CowID + Day, phi.formula_null = ~ 1, test="Wald",boot=TRUE, data = ps_sub, fdr_cutoff = 0.05)
saveRDS(fullAnalysis_072319, "/share/magalab/Jill/Depeters/DADA2/July_2019/fullAnalysis_072319_GSvsAllRumen.rds")
#
set.seed(1)
fullAnalysis_063020 <- differentialTest(formula = ~ Sample_Type + CowID + Day, phi.formula = ~ 1,formula_null = ~ CowID + Day, phi.formula_null = ~ 1, test="Wald",boot=TRUE, data = ps_fam, fdr_cutoff = 0.05)
saveRDS(fullAnalysis_063020, "/share/magalab/Jill/Depeters/DADA2/July_2019/fullAnalysis_063020_Fam_GSvsAllRumen.rds")
```

```{r}
#collapse by genus
ps_gen <- tax_glom(ps_sub, "Genus")
ps_gen
#call in data
fullAnalysis_072319_GSvsAllRumen <- readRDS("fullAnalysis_072319_GSvsAllRumen.rds")
fullAnalysis_072319_Gen_GSvsAllRumen <- readRDS("fullAnalysis_072319_Gen_GSvsAllRumen.rds")
fullAnalysis_063020_Fam_GSvsAllRumen <- readRDS("fullAnalysis_063020_Fam_GSvsAllRumen.rds")
fullAnalysis_072319_Gen_GSvsAllRumen_p01 <- readRDS("fullAnalysis_072319_Gen_GSvsAllRumen_p01.rds")
```

There are 278 genera in the rumen sample types.

`r length(which(is.na(fullAnalysis_072319_Gen_GSvsAllRumen$p)) %>% names)` taxa could not be fit with the model, but `r length(which(!is.na(fullAnalysis_072319_Gen_GSvsAllRumen$p)) %>% names)` were fit to the model and `r length(fullAnalysis_072319_Gen_GSvsAllRumen$significant_taxa)` were significantly differentially abundant genera p<0.05. `r length(fullAnalysis_072319_Gen_GSvsAllRumen_p01$significant_taxa)` were significantly differentially abundant genera p<0.01. Lastly, `r length(fullAnalysis_072319_Gen_GSvsAllRumen_p01$significant_taxa)` were significantly differentially abundant taxa p<0.05.

First, we will check if into the genera could not be fit to the model and see if we can determine why. 

```{r}
#extracting out unique names
goodTaxa <- which(is.na(fullAnalysis_072319_Gen_GSvsAllRumen$p)) %>% names #pulls out all ASVs
ps_check <- pop_taxa_keep(ps_gen, goodTaxa)
otu_table(ps_check)
```

Yikes, `r length(which(is.na(fullAnalysis_072319_Gen_GSvsAllRumen$p)) %>% names)` taxa could not be fit with the model, but this looks to be due to circumstances where there is very few reads for a particular ASV (thus, a model can't be fit) or instances where there is only reads in one sample type. 

Now, we will return to the corncob output. We can see a list of differentially-abundant taxa using: 

```{r}
unique(otu_to_taxonomy(OTU = fullAnalysis_072319_Gen_GSvsAllRumen$significant_taxa, data = ps_gen, level=c("Phylum", "Family","Genus", "Species")))
```

There are `r length(fullAnalysis_072319_Gen_GSvsAllRumen$significant_taxa)` genera differentially abundant. We will look at the unique families they represent. 

```{r}
table(otu_to_taxonomy(OTU = fullAnalysis_072319_Gen_GSvsAllRumen$significant_taxa, data = ps_gen, level=c("Phylum","Family"))) %>% as.data.frame() %>% arrange(-Freq) %>%
#making table of phyla ASVs taxa 
kable(caption="# of significant genera in each family") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10) %>%
  scroll_box(width = "100%", height = "300px")
```

From this we can see that *Lachnospiraceae*, *Ruminococcaceae*, and *Prevotellaceae* were the most common families to be differentially abundant in grab samples vs other rumen sample types. We will take a closer look at all ASVs differentially abundant.

```{r}
#getting taxonomy
DA_taxa <- otu_to_taxonomy(OTU=fullAnalysis_072319_Gen_GSvsAllRumen$significant_taxa, data=ps_gen)
#Getting p-values of Differentially variable ASVs
ASVs <- c(row.names(as.data.frame(DA_taxa)))
df <- as.data.frame(fullAnalysis_072319_Gen_GSvsAllRumen$p_fdr)
df$ASV <- row.names(df)
df <- df[row.names(df) %in% ASVs, ]
df <- merge(df,as.data.frame(DA_taxa),by=0, all=TRUE) #by=0 means merge by row names
#df <- subset(df, select = c(DA_taxa,fullAnalysis$p_fdr,Row.names))
#df$ASV <- df$Row.names
df$Row.names <- NULL
colnames(df) <- c("p_value", "ASV", "Taxa")
#df$p_value <- format(df$p_value, digits = 3)
options(scipen = 999) #take out of scientific notation
df <- df[order(df$p_value),] #order column p_value
df$p_value <- format(df$p_value, digits = 3, scientific = TRUE) #put back into scientific notation
kable(df, caption="Differentially Abundant Taxa") %>% 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10) %>%
  scroll_box(width = "100%", height = "300px")
```

Taking a closer look at the break down of *Fibrobacteraceae*

```{r}
tax_table(ps_fam)[grep("Fibrobacteraceae",tax_table(ps_fam)[,"Family"]),]
get_model_CC(fullAnalysis_063020_Fam_GSvsAllRumen, "ASV_68")
get_model_CC(fullAnalysis_072319_Gen_GSvsAllRumen, "ASV_68")

tax_table(ps_gen)[grep("Ruminococcus_1",tax_table(ps_gen)[,"Genus"]),]
get_model_CC(fullAnalysis_072319_Gen_GSvsAllRumen, "ASV_103")
```

Pulling out the model for *Fibrobacteraceae*

```{r}
tax_table(ps_gen)[grep("Fibrobact",tax_table(ps_gen)[,"Genus"]),]
sigtaxa <- otu_to_taxonomy(fullAnalysis_072319_Gen_GSvsAllRumen$significant_taxa, ps_gen)
sigmodels <- fullAnalysis_072319_Gen_GSvsAllRumen$significant_models
names(sigmodels) <- sigtaxa
print("Model for Fibrobacter")
sigmodels$Bacteria_Fibrobacteres_Fibrobacteria_Fibrobacterales_Fibrobacteraceae_Fibrobacter
```

Graphing out all the families that are significantly different between grab samples and other rumen sample types.

```{r fig.height=10, fig.width=13}
plot.differentialTest_custom(fullAnalysis_063020_Fam_GSvsAllRumen,level=c("Phylum","Family"))+
  theme(strip.text.x=element_text(size=11,face = "bold", color="black"),axis.text=element_text(face = "bold", size = 11, color = "black"), axis.title.y = element_text(color="black", size=11, face="bold"))
```

```{r fig.height=20, fig.width=13}
plot.differentialTest_custom(fullAnalysis_072319_Gen_GSvsAllRumen,level=c("Family","Genus"))+
  theme(strip.text.x=element_text(size=11,face = "bold", color="black"),axis.text=element_text(face = "bold", size = 11, color = "black"), axis.title.y = element_text(color="black", size=11, face="bold"))
```

Pulling out the model for genera in *Prevotellaceae* here. 

```{r}
tax_table(ps_gen)[grep("Prevotella",tax_table(ps_gen)[,"Genus"]),]
sigtaxa <- otu_to_taxonomy(fullAnalysis_072319_Gen_GSvsAllRumen$significant_taxa, ps_gen)
sigmodels <- fullAnalysis_072319_Gen_GSvsAllRumen$significant_models
names(sigmodels) <- sigtaxa
print("Model for Prevotella_1")
sigmodels$Bacteria_Bacteroidetes_Bacteroidia_Bacteroidales_Prevotellaceae_Prevotella_1

print("Model for Prevotellaceae_UCG-003")
sigmodels$`Bacteria_Bacteroidetes_Bacteroidia_Bacteroidales_Prevotellaceae_Prevotellaceae_UCG-003`

print("Model for Prevotellaceae_UCG-004")
sigmodels$`Bacteria_Bacteroidetes_Bacteroidia_Bacteroidales_Prevotellaceae_Prevotellaceae_UCG-004`

print("Model for Prevotellaceae_UCG-001")
sigmodels$`Bacteria_Bacteroidetes_Bacteroidia_Bacteroidales_Prevotellaceae_Prevotellaceae_UCG-001`

print("Model for Prevotellaceae_NK3B31_group")
sigmodels$Bacteria_Bacteroidetes_Bacteroidia_Bacteroidales_Prevotellaceae_Prevotellaceae_NK3B31_group
```


Just looking at methanogens.

```{r fig.height=2, fig.width=13}
plot.differentialTest_custom(fullAnalysis_072319_Gen_GSvsAllRumen,level=c("Phylum","Family","Genus"), taxa_filter="Euryarchaeota")+
  theme(strip.text.x=element_text(size=12,face = "bold", color="black"),axis.text=element_text(face = "bold", size = 12, color = "black"), axis.title.y = element_text(color="black", size=12, face="bold"))
```

Pulling out the model for the ASVs here. 

```{r}
tax_table(ps_gen)[grep("Methanosphaera",tax_table(ps_gen)[,"Genus"]),]
sigtaxa <- otu_to_taxonomy(fullAnalysis_072319_Gen_GSvsAllRumen$significant_taxa, ps_gen)
sigmodels <- fullAnalysis_072319_Gen_GSvsAllRumen$significant_models
names(sigmodels) <- sigtaxa
sigmodels$Archaea_Euryarchaeota_Methanobacteria_Methanobacteriales_Methanobacteriaceae_Methanosphaera
```

Checking to see what samples Methanimicrococcus is found in.

```{r}
tax_table(ps)[grep("Methanimicrococcus",tax_table(ps)[,"Genus"]),]
#Making phyloseq object with Deferribacteres
ps_Meth <- subset_taxa(ps, Genus == c("Methanimicrococcus"))
ps_Meth <- prune_samples(sample_sums(ps_Meth) > 0, ps_Meth)
ps_Meth@sam_data$Sample_Type
#which sample is it found in
otu_table(ps_Meth)
```

#Transfaunating what communites?

Based on the analysis above it would see that transfaunation by getting a stomach tube sample would be close to getting a full community into the sick cow. However, if you strain the sample you will bias the communities that your transfaunation gives. 

##Communites found in stomach tube samples

For this I can think we can just reference the phyla graphs in the "Abundance of Phyla" section. Also we will take a look more specifically at the taxa that the stomach tube samples have in common.

```{r include=FALSE, eval=TRUE}
GS <- subset_samples(ps, Sample_Type == c("Grab Sample"))
GS <- prune_taxa(taxa_sums(GS) > 0, GS)
ST <- subset_samples(ps, Sample_Type == c("Stomach Tube"))
ST <- prune_taxa(taxa_sums(ST) > 0, ST)
#making a list to determine core of each subset
Core_ST <- filter_taxa(ST, function(x) sum(x >= 1) > (0.99*length(x)), TRUE)
Core_ST
```

There are `r ntaxa(ST)` ASVs in Stomach tube samples, but only `r ntaxa(Core_ST)` taxa are present in all stomach tube samples. Due to the variability of stomach tube samples I suspect that with more sampls you will have reduce taxa in common. Stomach tube samples are composed of `r length(get_taxa_unique(ST, "Phylum"))` phyla, `r length(get_taxa_unique(ST, "Order"))` orders, `r length(get_taxa_unique(ST, "Family"))` families and `r length(get_taxa_unique(ST, "Genus"))` genera.

##Grab vs stomach tube samples

```{r eval=FALSE, fig.width=5, fig.height=4}
GS_ST_nonum <- GS_ST
sample_names(GS_ST_nonum) <- paste0("sample_", sample_names(GS_ST_nonum))#Divnet/DPCoA don't like numbers for samples
pslog <- transform_sample_counts(GS_ST_nonum, function(x) log(1 + x))
set.seed(1)
#out.DP.log_GS_ST <- ordinate(pslog, method = "DPCoA") #default distance is bray
#saveRDS(out.DP.log_GS_ST, "out.DP.log_GSvsST.RDS")
out.DP.log_GS_ST <- readRDS("out.DP.log_GSvsST.RDS")
plot_ordination(pslog, out.DP.log_GS_ST, type="scree")
```

```{r eval=FALSE, fig.width=13, fig.height=10}
set.seed(12)
evals_DP <- out.DP.log_GS_ST$eig
fig1A <- plot_ordination(pslog, out.DP.log_GS_ST, color = "Sample_Type", type="biplot") +
  labs(col = "Sample Type", title="DPCoA of Bray distance") +
  scale_color_manual(values=myColors_DPCoA)+
  coord_fixed(sqrt(evals_DP[2] / evals_DP[1])) +
  theme(legend.text = element_text(face="bold", size = 12), legend.title = element_text(face="bold", size = 12), axis.title = element_text(face="bold", size = 12), axis.text = element_text(face="bold", size = 12, color="black"))
plot_ordination(pslog, out.DP.log_GS_ST, color = "Sample_Type") +
  labs(col = "Sample Type", title="DPCoA of Bray distance") +
  coord_fixed(sqrt(evals_DP[2] / evals_DP[1]))
fig1B <- plot_ordination(pslog, out.DP.log_GS_ST, type = "Species", color = "Phylum") + #not readable in current form
  coord_fixed(sqrt(evals_DP[2] / evals_DP[1]))+
  theme(legend.text = element_text(face="bold", size = 12), legend.title = element_text(face="bold", size = 12), axis.title = element_text(face="bold", size = 12), axis.text = element_text(face="bold", size = 12, color="black"))
  #geom_text_repel(aes(label=Species, show.legend = FALSE)
plot_grid(fig1A, fig1B, labels = "AUTO", align = "hv", ncol = 1)
ggsave("GSvsST_DPCoA.png",plot_grid(fig1A, fig1B, labels = "AUTO", align = "hv", ncol = 1), device = "png", dpi=320)
```

```{r fig.width=13, fig.height=10}
pslog_new <- pslog #make copy of phyloseq object
sample_names(pslog_new) <- paste0("sample_", sample_names(pslog_new)) #Divnet/DPCoA don't like numbers for samples
new_taxa <- as.data.frame(pslog_new@tax_table@.Data) #make new data frame of species
new_taxa$Select_Family <- new_taxa[,"Family"] #add new column
pslog_new@tax_table@.Data[,"Family"] <- ifelse(grepl("Lachnospiraceae", new_taxa$Select_Family), "Lachnospiraceae",
         ifelse(grepl("Ruminococcaceae", new_taxa$Select_Family), "Ruminococcaceae",
        ifelse(grepl("Fibrobacteraceae", new_taxa$Select_Family), "Fibrobacteraceae",
         ifelse(grepl("Spirochaetaceae", new_taxa$Select_Family), "Spirochaetaceae", "All Other Families"))))

fig2b <-plot_ordination(pslog_new, out.DP.log_GS_ST, type = "Species", color = "Phylum", shape="Family") +
  coord_fixed(sqrt(evals_DP[2] / evals_DP[1]))+
  #geom_text_repel(aes(label=Species))+
      theme(legend.title=element_blank(),strip.text = element_text(face="bold",size=11, color="black"), axis.text= element_text(color="black",face = "bold",size=11), axis.title=element_text(face = "bold"), panel.grid.major.x = element_blank(),panel.grid.minor.y=element_blank(),legend.text = element_text(face="bold",size=12))
fig2b
```

```{r}
AP <- plot_ordination(pslog, out.DP.log_GS_ST, type = "Species", color = "Phylum")+
  geom_point(aes(Species=Species,Genus=Genus,Family=Family,Order=Order,Class=Class))+
  coord_fixed(sqrt(evals_DP[2] / evals_DP[1]))
ggplotly(AP, tooltip = c("Phylum","Class","Order","Family","Genus","Species")) %>% hide_legend()
htmlwidgets::saveWidget(as_widget(ggplotly(AP, tooltip = c("Phylum","Class","Order","Family","Genus","Species")) %>% hide_legend()), "DPCoA_GSvsST.html", selfcontained =FALSE)
```


```{r}
print("Families in grab samples not in stomach tube")
setdiff(get_taxa_unique(GS, "Genus"), get_taxa_unique(ST, "Genus"))
print("Families in stomach tube samples not in grab samples")
setdiff(get_taxa_unique(ST, "Genus"), get_taxa_unique(GS, "Genus"))
```


```{r}
setdiff(get_taxa_unique(GS, "Genus"), get_taxa_unique(ST, "Genus"))
```

These the `r length(setdiff(get_taxa_unique(GS, "Genus"), get_taxa_unique(ST, "Genus")))` genera found in the grab sample, but not the stomach tube.

```{r}
intersect(get_taxa_unique(GS, "Family"), get_taxa_unique(ST, "Family"))
intersect(get_taxa_unique(GS, "Genus"), get_taxa_unique(ST, "Genus"))
```

These are the `r length(intersect(get_taxa_unique(GS, "Genus"), get_taxa_unique(ST, "Genus")))` genera that are found in both the grab sample and stomach tube.

There are `r length(setdiff(rownames(otu_table(GS)), rownames(otu_table(ST))))` ASVs are found in the grab samples, but not found in the stomach tube samples and `r length(setdiff(rownames(otu_table(ST)), rownames(otu_table(GS))))` are found in the stomach tube samples, but not found in the grab samples. There is also `r length(intersect(rownames(otu_table(GS)), rownames(otu_table(ST))))` ASVs found in common between grab samples and stomach tube samples. Let's check at a higher taxonomic rank next.

Let's compare the stomach tube samples to the "gold standard" of grab sample. 

```{r}
GS_ST <- ps %>% subset_samples(Sample_Type %in% c("Grab Sample","Stomach Tube"))
GS_ST <- prune_taxa(taxa_sums(GS_ST) > 0, GS_ST)
```

```{r eval=FALSE}
#this was run on a HPC
ps_sub <- subset_samples(ps, Sample_Type == c("Grab Sample") | Sample_Type == c("Stomach Tube"))
ps_sub <- prune_taxa(taxa_sums(ps_sub) > 0, ps_sub)
ps_gen <- tax_glom(ps_sub, "Genus")
ps_fam <- tax_glom(ps_sub, "Family")

set.seed(1)
fullAnalysis_072319_Fam_GSvsST <- differentialTest(formula = ~ Sample_Type + CowID + Day, phi.formula = ~ 1,formula_null = ~ CowID + Day, phi.formula_null = ~ 1, test="Wald",boot=TRUE, data = ps_fam, fdr_cutoff = 0.05)
saveRDS(fullAnalysis_072319_Fam_GSvsST, "/share/magalab/Jill/Depeters/DADA2/July_2019/fullAnalysis_072319_Fam_GSvsST.rds")

set.seed(1)
fullAnalysis_072319_Gen_GSvsST <- differentialTest(formula = ~ Sample_Type + CowID + Day, phi.formula = ~ 1,formula_null = ~ CowID + Day,
                                 phi.formula_null = ~ 1, test="Wald",boot=TRUE, data = ps_gen, fdr_cutoff = 0.05)
saveRDS(fullAnalysis_072319_Gen_GSvsST, "/share/magalab/Jill/Depeters/DADA2/July_2019/fullAnalysis_071119_Gen_GSvsST.rds")

set.seed(1850)
fullAnalysis_071119 <- differentialTest(formula = ~ Sample_Type + CowID + Day, phi.formula = ~ 1,formula_null = ~ CowID + Day,
                                 phi.formula_null = ~ 1, test="Wald",boot=TRUE, data = ps_sub, fdr_cutoff = 0.05)
saveRDS(fullAnalysis_071119, "/share/magalab/Jill/Depeters/DADA2/July_2019/fullAnalysis_071119_GSvsST.rds")
```

```{r}
#calling data in
fullAnalysis_072319_GSvsST <- readRDS("fullAnalysis_072319_GSvsST.rds")
fullAnalysis_072319_Gen_GSvsST <- readRDS("fullAnalysis_072319_Gen_GSvsST.rds")
fullAnalysis_072319_Fam_GSvsST <- readRDS("fullAnalysis_072319_Fam_GSvsST.rds")
```

There are `r length(fullAnalysis_072319_GSvsST$significant_taxa)` ASVs and `r length(fullAnalysis_072319_Gen_GSvsST$significant_taxa)` genera and `r length(fullAnalysis_072319_Fam_GSvsST$significant_taxa)` significant differentially abundant between stomach tube and grab samples. 

```{r}
#plotting
plot.differentialTest_custom(fullAnalysis_072319_Fam_GSvsST, level=c("Family"))+
  theme(strip.text.x=element_text(size=11,face = "bold"),axis.text=element_text(size=11,face = "bold", color="black"), axis.title=element_text(size=11,face = "bold", color = "black"))
```


```{r}
unique(otu_to_taxonomy(OTU = fullAnalysis_072319_Gen_GSvsST$significant_taxa, data = GS_ST, level=c("Phylum", "Family","Genus", "Species"))) %>%
kable(caption="Unique Genera that are significant differentially abundant") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10) %>%
  scroll_box(width = "100%", height = "300px")
```

These are the the unique genera that are significant differentially abundant between grab and stomach tube samples. 

```{r}
table(otu_to_taxonomy(OTU = fullAnalysis_072319_GSvsST$significant_taxa, data = GS_ST, level=c("Phylum","Family"))) %>% as.data.frame() %>% arrange(-Freq) %>%
#making table of phyla ASVs taxa 
kable(caption="# of significant genera in each family") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10) %>%
  scroll_box(width = "100%", height = "300px")
```

From this we can see that *Lachnospiraceae*, *Ruminococcaceae*, *Prevotellaceae* and *Erysipelotrichaceae* were the most common families to have significant differentially abundant ASVs in grab vs stomach tube samples. We will take a closer look at all ASVs differentially abundant.

```{r}
df_GSvsST <- get_data_CC(fullAnalysis_072319_GSvsST) %>% filter(Family == "Ruminococcaceae")
table(sign(df_GSvsST$x)) #get numbers of positive and negative coefficents
```

This is the number of ASVs in the family *Lachnospiraceae* that are positively and negatively associated in stomach tube samples. 

```{r}
#which genera were lower?
df_gen_GSvsST <- get_data_CC(fullAnalysis_072319_Gen_GSvsST) %>% filter(Family == "Lachnospiraceae")
table(sign(df_gen_GSvsST$x))
df_gen_GSvsST[order(df_gen_GSvsST$x),c(1,8:9)] %>%
#making table of phyla
kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10) %>%
  scroll_box(width = "100%", height = "300px")
```


```{r fig.height=9 ,fig.width=7}
#plotting
plot.differentialTest_custom(fullAnalysis_072319_Gen_GSvsST, level=c("Family","Genus","Species"))+
  theme(strip.text.x=element_text(size=11,face = "bold"),axis.text=element_text(size=11,face = "bold", color="black"), axis.title=element_text(size=11,face = "bold", color = "black"))
```

```{r fig.height=3 ,fig.width=7}
#plotting
plot.differentialTest_custom(fullAnalysis_072319_GSvsST, level=c("Phylum","Family","Genus","Species"), taxa_filter="Euryarchaeota")+
  theme(strip.text.x=element_text(size=11,face = "bold"),axis.text=element_text(size=11,face = "bold", color="black"), axis.title=element_text(size=11,face = "bold", color = "black"))

plot.differentialTest_custom(fullAnalysis_072319_Gen_GSvsST, level=c("Phylum","Family","Genus","Species"), taxa_filter="Euryarchaeota")+
  theme(strip.text.x=element_text(size=11,face = "bold"),axis.text=element_text(size=11,face = "bold", color="black"), axis.title=element_text(size=11,face = "bold", color = "black"))
```

Going to check if there are an Euryarchaeota that are only found in one sample type.
```{r}
tax_table(ps)[grep("Methan",tax_table(ps)[,"Genus"]),]
#Making phyloseq object with Deferribacteres
ps_fam_M <- tax_glom(ps, "Family")
ps_Meth <- subset_taxa(ps_fam_M, Family == c("Methanomethylophilaceae"))
ps_Meth <- prune_samples(sample_sums(ps_Meth) > 0, ps_Meth)
ps_Meth@sam_data$Sample_Type
#which sample is it found in
otu_table(ps_Meth)
```



These are the genera that are significant differentially abundant genera in stomach tube vs grab samples. 

```{r}
get_taxa_unique(subset_taxa(ST, Family=="Fibrobacteraceae"), "Genus")
```

The only assigned genera in the family Fibrobacteraceae, Fibrobacter was significantly lower in abundance in stomach tubes compared to grab samples. 

```{r}
get_data_CC(fullAnalysis_072319_Gen_GSvsST) %>% select(1,7:9) %>% arrange(x) %>%
#making table
kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10) %>%
  scroll_box(width = "100%", height = "300px")
```

#Grab vs liquid strained samples

Let's compare the liquid strained samples to the "gold standard" of grab sample. 
```{r}
LS <- subset_samples(ps, Sample_Type == c("Liquid Strained"))
LS <- prune_taxa(taxa_sums(LS) > 0, LS)
#checking for ASVs that differ between the two sample types
#checking for ASVs in common
```

There are `r length(setdiff(rownames(otu_table(GS)), rownames(otu_table(LS))))` ASVs are found in the grab sample, but not found in the liquid strained samples. There is also `r length(intersect(rownames(otu_table(GS)),rownames(otu_table(LS))))` ASVs found in common between grab samples. Thus, stomach tube samples tend to be more like a grab sample than a strained sample. Let's check at a higher taxonomic rank next.

```{r}
setdiff(get_taxa_unique(GS, "Genus"), get_taxa_unique(LS, "Genus"))
```

These genera are found in the grab sample, but not the stomach tube.

```{r}
intersect(get_taxa_unique(GS, "Genus"), get_taxa_unique(LS, "Genus"))
```

These are the `r length(intersect(get_taxa_unique(GS, "Genus"), get_taxa_unique(LS, "Genus")))` genera that are found in both the grab sample and liquid strained samples.

Since we saw that liquid strained samples were distinguished from other rumen samples by Kiritimatiellaeota on the DPCoA we will investigate that further. 

```{r}
ps_K <- subset_taxa(ps, Phylum == "Kiritimatiellaeota")
ps_K <- prune_samples(sample_sums(ps_K) > 0, ps_K)
ps_K

unique(otu_to_taxonomy(row.names(otu_table(ps_K)), ps_K, level=c("Class", "Order", "Family")))
```

There are 180 ASVs assigned to the phylum Kiritimatiellaeota and these ASVs are only assigned down to the order level. Due to this you won't find these taxa in the corncob data that was run on genera. 

```{r}
ps_K@sam_data$Sample_Type
```

The phylum Kiritimatiellaeota is found in all sample types.

```{r}
#run on HPC
#Comparing grab to liquid strained samples at Genus level
GS_LS <- subset_samples(ps, Sample_Type == c("Grab Sample") | Sample_Type == c("Liquid Strained"))
GS_LS <- prune_taxa(taxa_sums(GS_LS) > 0, GS_LS)
GS_LS_gen <- tax_glom(GS_LS, "Genus")
```

```{r eval=FALSE}
set.seed(1)
fullAnalysis_072319_Gen_GSvsLS <- differentialTest(formula = ~ Sample_Type + CowID + Day, phi.formula = ~ 1,formula_null = ~ CowID + Day, phi.formula_null = ~ 1, test="Wald",boot=TRUE, data = GS_LS_gen, fdr_cutoff = 0.05)
saveRDS(fullAnalysis_072319_Gen_GSvsLS, "/share/magalab/Jill/Depeters/DADA2/July_2019/fullAnalysis_072319_Gen_GSvsLS.rds")
```


```{r fig.width=13, fig.height=5}
fullAnalysis_all <- readRDS("fullAnalysis_072319_all_GS1st.rds")
#plotting
plot.differentialTest_custom(fullAnalysis_all, level=c("Phylum","Class","Order"), taxa_filter="Kiritimatiellaeota")+
  theme(strip.text.x=element_text(size=11,face = "bold"),axis.text=element_text(size=11,face = "bold", color="black"), axis.title=element_text(size=11,face = "bold", color = "black"))
```

There are 180 ASVs assigned to the phylum Kiritimatiellaeota, 17 of these ASVs were significant differentially abundant. 

```{r}
#call in data
fullAnalysis_gen_GSvsLS <- readRDS("fullAnalysis_072319_Gen_GSvsLS.rds")
table(otu_to_taxonomy(OTU = fullAnalysis_gen_GSvsLS$significant_taxa, data = GS_LS_gen, level=c("Phylum","Family"))) %>% as.data.frame() %>% arrange(-Freq) %>%
#making table of phyla ASVs taxa 
kable(caption="# of significant genera in each family") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10) %>%
  scroll_box(width = "100%", height = "300px")
```

Here we see again that *Prevotellaceae*, *Lachnospiraceae* and *Ruminococcaceae* to have genera that are the significantly differentially abundant. 

```{r}
#changing the order of factor levels
sample_data(ps)$Sample_Type <- factor(sample_data(ps)$Sample_Type, levels = c("Grab Sample","Feces","Stomach Tube","Liquid Strained","Liquid Unstrained","Solid"))
#call in data
fullAnalysis_all<- readRDS("fullAnalysis_072319_all_GS1st.rds")
#getting number of significant taxa assigned to teach phyla
df_new <- as.data.frame(fullAnalysis_all$significant_taxa)
colnames(df_new) <- c("taxa")
ltax <- as.list(fullAnalysis_all$significant_taxa)
df_new$Phylum <- unlist(lapply(ltax, function(ltax) otu_to_taxonomy(ltax, fullAnalysis_all$data, level = "Phylum")))
keep <- as.data.frame(table(df_new$Phylum))
keep <- merge(keep,as.data.frame(table(tax_table(ps)[,"Phylum"])),by="Var1",all=TRUE)
keep[is.na(keep)] <- 0  #change NAs to 0
keep$percent <- (keep$Freq.x/keep$Freq.y)*100
colnames(keep) <- c("Phylum", "#Significant ASVs", "Total ASVs", "Percent Significant ASVs")
kable(keep, caption="Phyla with Significant ASVs") %>% 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10) %>%
  scroll_box(width = "100%", height = "300px")
```


We also saw on the DPCoA that a group of Bacteroidetes (*Prevotellaceae*) was associated with the liquid strained samples. Additionally, another family in the same phylum, *Lachnospiraceae*, wasn't associate with liquid strained samples. 

As a reminder we can do differential abundance testing on genera and graph all the results from the phylum Bacteroidetes.

```{r fig.width=5, fig.height=3}
#plotting
plot.differentialTest_custom(fullAnalysis_gen_GSvsLS, level=c("Phylum","Genus","Species"), taxa_filter="Bacteroidetes")+
  theme(strip.text.x=element_text(size=11,face = "bold"),axis.text=element_text(size=11,face = "bold", color="black"), axis.title=element_text(size=11,face = "bold", color = "black"))
```

```{r fig.width=7, fig.height=2}
#plotting
plot.differentialTest_custom(fullAnalysis_gen_GSvsLS, level=c("Phylum","Family","Genus","Species"), taxa_filter="Actinobacteria")+
  theme(strip.text.x=element_text(size=11,face = "bold"),axis.text=element_text(size=11,face = "bold", color="black"), axis.title=element_text(size=11,face = "bold", color = "black"))
```


We will look further into these families to decipher what genera are causing this difference between grab and liquid samples. 

```{r fig.width=4, fig.height=2}
#plotting genera 
plot.differentialTest_custom(fullAnalysis_gen_GSvsLS, level=c("Phylum","Genus","Species"), taxa_filter="Euryarchaeota")+
  theme(strip.text.x=element_text(size=11,face = "bold"),axis.text=element_text(size=11,face = "bold", color="black"), axis.title=element_text(size=11,face = "bold", color = "black"))
```


```{r}
df_gen <- get_data_CC(fullAnalysis_gen_GSvsLS) %>% filter(Family == "Prevotellaceae")
table(sign(df_gen$x))
```

In the family *Prevotellaceae* there are 2 genera significantly lower in relative abundance and 4 genera with significantly higher relative abundance in stomach tube compared to grab samples.

```{r}
df_gen[order(df_gen$x),c(1:3,5,9)] %>% filter(variable== "Liquid Strained\nDifferential Abundance")%>%
#making table 
kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10) %>%
  scroll_box(width = "100%", height = "300px")
```

These are the *Prevotellaceae* genera that have are either significantly higher (positive x) or lower (negative x) relative abundance.

```{r}
df_gen <- get_data_CC(fullAnalysis_gen_GSvsLS) %>% filter(Family == "Ruminococcaceae")
table(sign(df_gen$x))
```

In the family *Ruminococcaceae* there are 7 genera significantly lower in relative abundance and 8 genera with significantly higher relative abundance in stomach tube compared to grab samples.
 
```{r}
df_gen[order(df_gen$x),c(1:3,5,9)] %>% filter(variable== "Liquid Strained\nDifferential Abundance")%>%
#making table 
kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10) %>%
  scroll_box(width = "100%", height = "300px")
```

These are the *Ruminococcaceae* genera that have are either significantly higher (positive x) or lower (negative x) relative abundance.

```{r}
df_gen <- get_data_CC(fullAnalysis_gen_GSvsLS) %>% filter(Family == "Lachnospiraceae")
table(sign(df_gen$x))
```

In the family *Lachnospiraceae* there are 22 genera significantly lower in relative abundance and 3 genera with significantly higher relative abundance in stomach tube compared to grab samples.

```{r}
df_gen[order(df_gen$x),c(1:3,5,9)] %>% filter(variable== "Liquid Strained\nDifferential Abundance")%>%
#making table 
kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10) %>%
  scroll_box(width = "100%", height = "300px")
```

These are the *Lachnospiraceae* genera that have are either significantly higher (positive x) or lower (negative x) relative abundance.

```{r fig.width=13, fig.height=4}
#plotting
plot.differentialTest_custom(fullAnalysis_all, level=c("Phylum","Genus","Species"), taxa_filter="Euryarchaeota")+
  theme(strip.text.x=element_text(size=11,face = "bold"),axis.text=element_text(size=11,face = "bold", color="black"), axis.title=element_text(size=11,face = "bold", color = "black"))
```

```{r fig.width=7, fig.height=4}
#plotting
plot.differentialTest_custom(fullAnalysis_gen_GSvsLS, level=c("Phylum","Family","Genus","Species"), taxa_filter="Proteobacteria")+
  theme(strip.text.x=element_text(size=11,face = "bold"),axis.text=element_text(size=11,face = "bold", color="black"), axis.title=element_text(size=11,face = "bold", color = "black"))
#plotting
plot.differentialTest_custom(fullAnalysis_gen_GSvsLS, level=c("Phylum","Family","Genus","Species"), taxa_filter="Epsilonbacteraeota")+
  theme(strip.text.x=element_text(size=11,face = "bold"),axis.text=element_text(size=11,face = "bold", color="black"), axis.title=element_text(size=11,face = "bold", color = "black"))
```

```{r fig.width=7, fig.height=11}
#plotting
plot.differentialTest_custom(fullAnalysis_gen_GSvsLS, level=c("Phylum","Family","Genus","Species"), taxa_filter="Firmicutes")+
  theme(strip.text.x=element_text(size=11,face = "bold"),axis.text=element_text(size=11,face = "bold", color="black"), axis.title=element_text(size=11,face = "bold", color = "black"))
```

```{r fig.width=7, fig.height=3}
#plotting
plot.differentialTest_custom(fullAnalysis_gen_GSvsLS, level=c("Phylum","Family","Genus","Species"), taxa_filter="Bacteroidetes")+
  theme(strip.text.x=element_text(size=11,face = "bold"),axis.text=element_text(size=11,face = "bold", color="black"), axis.title=element_text(size=11,face = "bold", color = "black"))
```

#Stomach Tube vs Liquid Samples

```{r}
LS <- subset_samples(ps, Sample_Type == c("Liquid Strained"))
LS <- prune_taxa(taxa_sums(LS) > 0, LS)
print("Families in liquid strained samples not in stomach tube")
setdiff(get_taxa_unique(LS, "Genus"), get_taxa_unique(ST, "Genus"))
print("Families in stomach tube samples not in liquid strained samples")
setdiff(get_taxa_unique(ST, "Genus"), get_taxa_unique(LS, "Genus"))

ULS <- subset_samples(ps, Sample_Type == c("Liquid Unstrained"))
ULS <- prune_taxa(taxa_sums(ULS) > 0, ULS)
print("Families in liquid unstrained samples not in stomach tube")
setdiff(get_taxa_unique(ULS, "Genus"), get_taxa_unique(ST, "Genus"))
print("Families in stomach tube samples not in liquid unstrained samples")
setdiff(get_taxa_unique(ST, "Genus"), get_taxa_unique(ULS, "Genus"))
```

Looking to see if stomach tubes are much different than liquid samples
```{r}
ST_L <- ps %>% subset_samples(Sample_Type %in% c("Liquid Strained","Stomach Tube", "Liquid Unstrained"))
ST_L <- prune_taxa(taxa_sums(ST_L) > 0, ST_L)

ST_L_nonum <- ST_L
sample_names(ST_L_nonum) <- paste0("sample_", sample_names(ST_L_nonum))#Divnet/DPCoA don't like numbers for samples
pslog <- transform_sample_counts(ST_L_nonum, function(x) log(1 + x))
set.seed(1)
#out.DP.log_ST_L <- ordinate(pslog, method = "DPCoA") #default distance is bray
#saveRDS(out.DP.log_ST_L, "out.DP.log_GSvsST.RDS")
out.DP.log_ST_L <- readRDS("out.DP.log_GSvsST.RDS")
plot_ordination(pslog, out.DP.log_ST_L, type="scree")
```

Exploratory analysis of DPCoA. 

```{r fig.width=13, fig.height=10}
set.seed(12)
evals_DP <- out.DP.log_ST_L$eig

fig1A <- plot_ordination(pslog, out.DP.log_ST_L, color = "Sample_Type", type="biplot") +
  labs(col = "Sample Type", title="DPCoA of Bray distance") +
  scale_color_manual(values=myColors_DPCoA)+
  coord_fixed(sqrt(evals_DP[2] / evals_DP[1])) +
  theme(legend.text = element_text(face="bold", size = 12), legend.title = element_text(face="bold", size = 12), axis.title = element_text(face="bold", size = 12), axis.text = element_text(face="bold", size = 12, color="black"))

fig1B <- plot_ordination(pslog, out.DP.log_ST_L, type = "Species", color = "Phylum") + #not readable in current form
  coord_fixed(sqrt(evals_DP[2] / evals_DP[1]))+
  theme(legend.text = element_text(face="bold", size = 12), legend.title = element_text(face="bold", size = 12), axis.title = element_text(face="bold", size = 12), axis.text = element_text(face="bold", size = 12, color="black"))
  #geom_text_repel(aes(label=Species, show.legend = FALSE)
plot_grid(fig1A, fig1B, labels = "AUTO", align = "hv", ncol = 1)
```

```{r}
set.seed(12)
AP <- plot_ordination(pslog, out.DP.log_ST_L, type = "Species", color = "Phylum")+
  geom_point(aes(Species=Species,Genus=Genus,Family=Family,Order=Order,Class=Class))+
  coord_fixed(sqrt(evals_DP[2] / evals_DP[1]))
ggplotly(AP, tooltip = c("Phylum","Class","Order","Family","Genus","Species")) %>% hide_legend()
```

This looks like liquid samples (strained mostly) differ from stomach tube samples in due to increases in **Rikenellaceae**, **Prevotellaceae** and Kiritimatiellaeota. Stomach stube samples have an increase in **Christensenllaceae** and **Lachnospiraceae**. 


```{r eval=FALSE}
#changing the order of factor levels
sample_data(ST_L)$Sample_Type <- factor(sample_data(ST_L)$Sample_Type, levels = c("Stomach Tube","Liquid Strained","Liquid Unstrained"))

#this was run on a HPC
ps_gen_ST_L <- tax_glom(ST_L, "Genus")
ps_fam_ST_L <- tax_glom(ST_L, "Family")

set.seed(1)
fullAnalysis_072120_Fam_STvsL <- differentialTest(formula = ~ Sample_Type + CowID + Day, phi.formula = ~ 1,formula_null = ~ CowID + Day, phi.formula_null = ~ 1, test="Wald",boot=FALSE, data = ps_fam_ST_L, fdr_cutoff = 0.05)
saveRDS(fullAnalysis_072120_Fam_STvsL, "/share/magalab/Jill/Depeters/DADA2/July_2019/fullAnalysis_072120_Fam_STvsL.rds")

set.seed(1)
fullAnalysis_072120_Gen_STvsL <- differentialTest(formula = ~ Sample_Type + CowID + Day, phi.formula = ~ 1,formula_null = ~ CowID + Day,
                                 phi.formula_null = ~ 1, test="Wald",boot=TRUE, data = ps_gen_ST_L, fdr_cutoff = 0.05)
saveRDS(fullAnalysis_072120_Gen_STvsL, "/share/magalab/Jill/Depeters/DADA2/July_2019/fullAnalysis_072120_Gen_STvsL.rds")

#set.seed(1850)
#fullAnalysis_071119 <- differentialTest(formula = ~ Sample_Type + CowID + Day, phi.formula = ~ 1,formula_null = ~ CowID + Day,
#                                phi.formula_null = ~ 1, test="Wald",boot=TRUE, data = ps_sub, fdr_cutoff = 0.05)
#saveRDS(fullAnalysis_071119, "/share/magalab/Jill/Depeters/DADA2/July_2019/fullAnalysis_071119_STvsL.rds")
```

```{r eval=FALSE}
#call in data
fullAnalysis_072120_Gen_STvsL <- readRDS("fullAnalysis_072120_Gen_STvsL.rds")
table(otu_to_taxonomy(OTU = fullAnalysis_072120_Gen_STvsL$significant_taxa, data = ps_gen_ST_L, level=c("Phylum","Family"))) %>% as.data.frame() %>% arrange(-Freq) %>%
#making table of phyla ASVs taxa 
kable(caption="# of significant genera in each family") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 10) %>%
  scroll_box(width = "100%", height = "300px")
```

```{r eval=FALSE, fig.width=8, fig.height=7}
fullAnalysis_Gen_GSvsL <- readRDS("fullAnalysis_072120_Fam_STvsL.rds")
#plotting
plot.differentialTest_custom(fullAnalysis_072120_Fam_STvsL, level=c("Phylum","Family","Genus","Species"))+
  theme(strip.text.x=element_text(size=11,face = "bold"),axis.text=element_text(size=11,face = "bold", color="black"), axis.title=element_text(size=11,face = "bold", color = "black"))
```

```{r eval=FALSE, fig.width=9, fig.height=13}
#plotting
plot.differentialTest_custom(fullAnalysis_072120_Gen_STvsL, level=c("Family","Genus","Species"))+
  theme(strip.text.x=element_text(size=11,face = "bold"),axis.text=element_text(size=11,face = "bold", color="black"), axis.title=element_text(size=11,face = "bold", color = "black"))
```