Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback on Getting Started Vignette #18

Merged
merged 9 commits into from
Mar 22, 2024
Merged
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 29 additions & 12 deletions vignettes/getting_started.Rmd
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "getting-started"
title: "getting_started"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{getting-started}
Expand All @@ -22,18 +22,26 @@ gimap performs analysis of dual-targeting CRISPR screening data, with the goal o
library(gimap)
```

```{r}
library(dplyr)
```

## Data requirements

Let's examine this example pgPEN counts table. It's divided into columns containing:

- an ID corresponding to the names of paired guides
- gRNA sequence 1, targeting "paralog A"
- gRNA sequence 2, targeting "paralog B"
- The sample, day, and replicate number for which gRNAs were sequenced
- `id`: an ID corresponding to the names of paired guides
- `seq_1`: gRNA sequence 1, targeting "paralog A"
- `seq_2`: gRNA sequence 2, targeting "paralog B"
- `Day00_RepA`: Gene Counts from Day 00 for Replicate A
- `Day05_RepA`: Gene Counts from Day 05 for Replicate A
- `Day22_RepA`: Gene Counts from Day 22 for Replicate A
- `Day22_RepB`: Gene Counts from Day 22 for Replicate B

```{r}
example_data <- get_example_data("count")
```

The metadata you have may vary slightly from this but you'll want to make sure you have the essential variables and information regarding how you collected your data.

```{r}
Expand Down Expand Up @@ -62,7 +70,7 @@ The first data set contains the readcounts from each sample type. Required for a

```{r}
example_counts <- example_data %>%
dplyr::select(c("Day00_RepA", "Day05_RepA", "Day22_RepA", "Day22_RepB", "Day22_RepC")) %>%
select(c("Day00_RepA", "Day05_RepA", "Day22_RepA", "Day22_RepB", "Day22_RepC")) %>%
as.matrix()
```

Expand All @@ -74,11 +82,9 @@ The next datasets are metadata that describe the dimensions of the count data.
One of these (`example_pg_metadata`) is required because it is necessary to know the IDs and be able to map them to pgRNA constructs.

```{r}
# pg metadata is the information that describes the paired guide RNA targets and will be loaded/explained later

#pg id are just the unique IDs listed in the same order/sorted the same way as the count data and can be used for mapping between the count data and the metadata
example_pg_id <- example_data %>%
dplyr::select("id")
example_pg_metadata <- example_data %>%
select(c("id", "seq_1", "seq_2"))
```

# sample metadata is the information that describes the samples and is sorted the same way as the columns in the count data
example_sample_metadata <- data.frame(
Expand Down Expand Up @@ -106,6 +112,17 @@ gimap_dataset <- setup_data(counts = example_counts,

You'll notice that this set up gives us a list of formatted data. This contains the original counts we gave `setup_data()` function but also normalized counts, and the total counts per sample.

- `raw_counts`: The original counts data that illustrates the number of cells alive in the sample. This data has samples as the columns and the paired guide constructs as rows.
- `counts_per_sample`: Add up all the counts for each sample over all of the paired guide designs.
- Transformed data: This section contains the various types of normalized and adjusted data made from the raw counts data.
- `count_norm` - For each sample, the data is normalized `-log10(( counts +1) / total counts for the sample over all the pg designs ))`
- `cpm` - For each sample this is calculated by taking the `counts / total counts for the sample over all the pg designs)*1e6`
- `log2cpm`: log-2 transformed counts per million this is calculated by `log2(cpms + 1)`
- pg_metadata: paired guide metadata - information that describes the paired-guided RNA designs. This may include the sequences used in the CRISPR design as well as what genes are targeted.
- `sample_metadata`: Metadata that describes the samples. This likely includes the time point information, replicates, sample IDs, and any other additional information that is needed regarding the experimental setup.



```{r}
str(gimap_dataset)
```
Expand All @@ -117,7 +134,7 @@ Later explain other parameters and how they can be used
```{r}
run_qc(gimap_dataset,
output_file = "example_qc_report.Rmd",
overwrite = TRUE,
overwrite = TRUE,
quiet = TRUE)
```

Expand Down
Loading