FredHutch · cansavvy · Mar 22, 2024 · Mar 11, 2024 · Mar 11, 2024 · Mar 11, 2024
diff --git a/vignettes/getting-started.Rmd b/vignettes/getting-started.Rmd
@@ -1,5 +1,5 @@
 ---
-title: "getting-started"
+title: "Getting Started"
 output: rmarkdown::html_vignette
 vignette: >
   %\VignetteIndexEntry{getting-started}
@@ -19,24 +19,27 @@ knitr::opts_chunk$set(
 gimap performs analysis of dual-targeting CRISPR screening data, with the goal of aiding the identification of genetic interactions (e.g. cooperativity, synthetic lethality) in models of disease and other biological contexts. gimap analyzes functional genomic data generated by the pgPEN (paired guide RNAs for genetic interaction mapping) approach, quantifying growth effects of single and paired gene knockouts upon application of a CRISPR library. A multitude of CRISPR screen types can be used for this analysis, with helpful descriptions found in this review (https://www.nature.com/articles/s43586-021-00093-4). Use of pgPEN and GI-mapping in a paired gRNA format can be found here (https://pubmed.ncbi.nlm.nih.gov/34469736/). 
 
 ```{r}
-library(magrittr)
+library(dplyr)
 library(gimap)
 ```
 
 ## Data requirements 
 
 Let's examine this example pgPEN counts table. It's divided into columns containing: 
 
-- an ID corresponding to the names of paired guides
-- gRNA sequence 1, targeting "paralog A"
-- gRNA sequence 2, targeting "paralog B"
-- The sample, day, and replicate number for which gRNAs were sequenced
+- `id`: an ID corresponding to the names of paired guides
+- `seq_1`: gRNA sequence 1, targeting "paralog A"
+- `seq_2`: gRNA sequence 2, targeting "paralog B"
+- `Day00_RepA`: Gene Counts from Day 00 for Replicate A 
+- `Day05_RepA`: Gene Counts from Day 05 for Replicate A 
+- `Day22_RepA`: Gene Counts from Day 22 for Replicate A 
+- `Day22_RepB`: Gene Counts from Day 22 for Replicate B 
 
 ```{r}
 example_data <- gimap::example_data()
 ```
+
 The metadata you have may vary slightly from this but you'll want to make sure you have the essential variables and information regarding how you collected your data. 
-example_data
 
 ```{r}
 colnames(example_data)
@@ -50,7 +53,7 @@ The first data set contains the readcounts from each sample type. Required for a
 
 ```{r}
 example_counts <- example_data %>%
-  dplyr::select(c("Day00_RepA", "Day05_RepA", "Day22_RepA", "Day22_RepB", "Day22_RepC")) %>%
+  select(c("Day00_RepA", "Day05_RepA", "Day22_RepA", "Day22_RepB", "Day22_RepC")) %>%
   as.matrix()
 ```
 
@@ -62,9 +65,8 @@ The next two datasets are metadata that describe the dimensions of the count dat
 `pg_metadata` is the information that describes the paired guide RNA targets. This information contains a table of the paired guide RNA sequences and the corresponding paralog gene that is being targeted for cutting by the gRNA-Cas9 complex.
 
 ```{r}
-
 example_pg_metadata <- example_data %>%
-  dplyr::select(c("id", "seq_1", "seq_2"))
+  select(c("id", "seq_1", "seq_2"))
 ```
 
 `sample_metadata` is the information that describes timepoint information and replicate information relating to each sample. In general, two replicates at each timepoint are carried through to analysis, where they are later collapsed. 
@@ -87,6 +89,15 @@ gimap_dataset <-  gimap::setup_data(counts = example_counts,
 
 You'll notice that this set up gives us a list of formatted data. This contains the original counts we gave `setup_data()` function but also normalized counts, and the total counts per sample. 
 
+- Raw counts: Original data
+- Counts per Sample: Add up all the counts for each sample
+- Transformed data: Contains normalized counts, counts per million (cpm)
+- Log2 CPM: log-2 transformed CPM
+- pg_metadata: paired guide metadata
+- sample_metadata: Metadata of dataset where columns are different samples and rows are different paired guide.
+
+
+
 ```{r}
 str(gimap_dataset)
 ```