Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback on Getting Started Vignette #18

Merged
merged 9 commits into from
Mar 22, 2024
Merged
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 21 additions & 10 deletions vignettes/getting-started.Rmd
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "getting-started"
title: "Getting Started"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{getting-started}
Expand All @@ -19,24 +19,27 @@ knitr::opts_chunk$set(
gimap performs analysis of dual-targeting CRISPR screening data, with the goal of aiding the identification of genetic interactions (e.g. cooperativity, synthetic lethality) in models of disease and other biological contexts. gimap analyzes functional genomic data generated by the pgPEN (paired guide RNAs for genetic interaction mapping) approach, quantifying growth effects of single and paired gene knockouts upon application of a CRISPR library. A multitude of CRISPR screen types can be used for this analysis, with helpful descriptions found in this review (https://www.nature.com/articles/s43586-021-00093-4). Use of pgPEN and GI-mapping in a paired gRNA format can be found here (https://pubmed.ncbi.nlm.nih.gov/34469736/).

```{r}
library(magrittr)
library(dplyr)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to use dplyr, we might as well load dplyr, which loads magrittr.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me! Can you add this in too?

library(gimap)
```

## Data requirements

Let's examine this example pgPEN counts table. It's divided into columns containing:

- an ID corresponding to the names of paired guides
- gRNA sequence 1, targeting "paralog A"
- gRNA sequence 2, targeting "paralog B"
- The sample, day, and replicate number for which gRNAs were sequenced
- `id`: an ID corresponding to the names of paired guides
- `seq_1`: gRNA sequence 1, targeting "paralog A"
- `seq_2`: gRNA sequence 2, targeting "paralog B"
- `Day00_RepA`: Gene Counts from Day 00 for Replicate A
- `Day05_RepA`: Gene Counts from Day 05 for Replicate A
- `Day22_RepA`: Gene Counts from Day 22 for Replicate A
- `Day22_RepB`: Gene Counts from Day 22 for Replicate B

```{r}
example_data <- gimap::example_data()
```

The metadata you have may vary slightly from this but you'll want to make sure you have the essential variables and information regarding how you collected your data.
example_data

```{r}
colnames(example_data)
Expand All @@ -50,7 +53,7 @@ The first data set contains the readcounts from each sample type. Required for a

```{r}
example_counts <- example_data %>%
dplyr::select(c("Day00_RepA", "Day05_RepA", "Day22_RepA", "Day22_RepB", "Day22_RepC")) %>%
select(c("Day00_RepA", "Day05_RepA", "Day22_RepA", "Day22_RepB", "Day22_RepC")) %>%
howardbaik marked this conversation as resolved.
Show resolved Hide resolved
as.matrix()
```

Expand All @@ -62,9 +65,8 @@ The next two datasets are metadata that describe the dimensions of the count dat
`pg_metadata` is the information that describes the paired guide RNA targets. This information contains a table of the paired guide RNA sequences and the corresponding paralog gene that is being targeted for cutting by the gRNA-Cas9 complex.

```{r}

example_pg_metadata <- example_data %>%
dplyr::select(c("id", "seq_1", "seq_2"))
select(c("id", "seq_1", "seq_2"))
```

`sample_metadata` is the information that describes timepoint information and replicate information relating to each sample. In general, two replicates at each timepoint are carried through to analysis, where they are later collapsed.
Expand All @@ -87,6 +89,15 @@ gimap_dataset <- gimap::setup_data(counts = example_counts,

You'll notice that this set up gives us a list of formatted data. This contains the original counts we gave `setup_data()` function but also normalized counts, and the total counts per sample.

- Raw counts: Original data
howardbaik marked this conversation as resolved.
Show resolved Hide resolved
- Counts per Sample: Add up all the counts for each sample
howardbaik marked this conversation as resolved.
Show resolved Hide resolved
- Transformed data: Contains normalized counts, counts per million (cpm)
howardbaik marked this conversation as resolved.
Show resolved Hide resolved
- Log2 CPM: log-2 transformed CPM
howardbaik marked this conversation as resolved.
Show resolved Hide resolved
- pg_metadata: paired guide metadata
howardbaik marked this conversation as resolved.
Show resolved Hide resolved
- sample_metadata: Metadata of dataset where columns are different samples and rows are different paired guide.
howardbaik marked this conversation as resolved.
Show resolved Hide resolved


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably can delete these extra lines.


howardbaik marked this conversation as resolved.
Show resolved Hide resolved
```{r}
str(gimap_dataset)
```
Expand Down
Loading