Differential Expression Analysis: Simple pair, Interaction, Time-series

Identifying differentially expressed (DE) genes across specific conditions is vital in understanding phenotypic variation. The fast-growing RNA sequencing provides much information that efficiently quantifies gene expressions. Methods and tools dedicated to differential gene expression analysis from RNA-seq data also increased rapidly. More than 30 DE methods have been published; however, many comparison studies spotlight that no single method outperforms others in all circumstances. In this study, we test and compare the performances of three widely used R packages: edgeR, DESeq2, and limma voom with Arabidopsis thaliana data. Even though the standard DE analysis has been extensively used and improved over the past years, time course RNA-seq can also provide an advanced understanding of gene regulation, biological development, and identifying biologically DE genes. Therefore, we also conducted a time course analysis using another Arabidopsis time course dataset. These methods are initiated in separate R packages, then detailed R codes and explanations are constructed to help build a more convenient user experience.

To guide eBook authors having a better sense of the workflow layout, here we briefly introduce the specific purposes of the dir system.

cache: Here, it stores R codes for preprocessing Arabidopsis raw time course data.
graphs: The graphs/figures produced during the analysis.
input: Here, we store the raw input data, including both for simple pair DGE and time course analysis .
output: The final output results of the workflow, including all DE genes and significant DE genes of the three DGE methods.
workflow: Step by step pipeline for DGE and time course analysis.

Workflow

R packages required

R version 4.1.1 (2021-08-10)

Required R packages and versions:
- ggplot2 3.3.3, dplyr 1.0.7, GEOquery 2.60.0, DESeq2 1.32.0, edgeR 3.34.0, limma 3.48.0, pheatmap 1.0.12, Glimma 2.2.0, readr 1.4.0

Input Data

Case study 1: A comparison of three methods for DGE analysis

To demonstrate, here we use the Arabidopsis thaliana RNA-Seq data published by Cumbie et al., (Cumbie et al., 2011). Summarized count data is available as an R dataset, and readers can download the data from the input folder (arab.rds). In Cumbie’s experiment, they inoculate six-week-old Arabidopsis plants with the mutant of P.syringae. Control plants were inoculated with a mock pathogen. Each treatment was done as biological triplicates, with each pair of replicates done at separate times and derived from independently grown plants and bacteria.

Case study 2: Time course analysis

Here we demonstrate a fundamental time course analysis with an Arabidopsis dataset containing gene counts for an RNA-seq time course. This experiment aims to see if the differentiated endodermal cells have a distinct transcriptional response to auxin treatment. They performed a time series of 10µM NAA treatment and sample at t= 0, 2, 4, 8, 16, and 24hrs after NAA treatment (Ursache et al., 2021). For the time series, they compared roots of the solitary root 1 (slr-1) mutant to the CASP1::shy2-2/slr-1 double mutant. The raw data from the NCBI database (Ursache et al., 2021) was processed and saved as a RangedSummarizeExperiment RData file. The processed data can be downloaded from the input folder (arab_time.Rdata).