Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added todo comments for clarification #15

Merged
merged 9 commits into from
Mar 11, 2024
31 changes: 25 additions & 6 deletions vignettes/getting-started.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,27 @@ knitr::opts_chunk$set(

# gimap tutorial

TODO: Describe what this genetic interaction analysis actually is and why someone would want to do it.

cansavvy marked this conversation as resolved.
Show resolved Hide resolved
```{r}
library(magrittr)
library(gimap)
```

## Data requirements

TODO: What kind of data would someone need to run this? How much flexibility is there in what the experimental set up might look like?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sobrien29 this might be a good question to address with Alice and the broader team; e.g. how many timepoints are required for minimum functionality, will we require the pgPEN library, etc

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally agree with the idea let's develop this for the Berger lab first and then we can try to expand this package's flexibility as we move along. But totally, this is probably a bigger and ongoing discussion amongst the team!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At minimum we need a T0 (or plasmid) and a later timepoint (actual timing is dependent mainly on cell growth kinetics). Intermediate timepoints can be included for a time-based analysis if wanted and I believe the test data includes one intermediate timepoint.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sobrien29 How would you describe this to a user in this context? What is a T0? That means just the first time point at 0 days? Do you have a paper we can link to that give people the basics of CRISPR screens?


Let's examine this example pgPEN counts table. It's divided into columns containing:

- an ID corresponding to the names of paired guides
- gRNA sequence 1, targeting "paralog A"
- gRNA sequence 2, targeting "paralog B"
- The sample, day, and replicate number for which gRNAs were sequenced

```{r}
example_data <- example_data()
example_data <- gimap::example_data()

cansavvy marked this conversation as resolved.
Show resolved Hide resolved
# Let's examine this example metadata
example_data
```

Expand All @@ -34,7 +46,9 @@ colnames(example_data)

## Setting up data

We're going to set up three datasets. The first is required, its the counts that the genetic interaction analysis will be used for.
We're going to set up three datasets. The first is required, it's the counts that the genetic interaction analysis will be used for.

The first data set contains the readcounts from each sample type. Required for analysis is a Day 0 (or plasmid) sample, and at least one further timepoint sample. QC analysis will follow to correlate replicates if inputted. Comparison of early and late timepoints is possible in this function, but not required if early timepoints were not taken.

```{r}
example_counts <- example_data %>%
Expand All @@ -47,12 +61,17 @@ The next two datasets are metadata that describe the dimensions of the count dat
- The sizes of these metadata must correspond to the dimensions of the counts data.
- The first column of the pg_metadata must be a unique id

`pg_metadata` is the information that describes the paired guide RNA targets. This information contains a table of the paired guide RNA sequences and the corresponding paralog gene that is being targeted for cutting by the gRNA-Cas9 complex.

```{r}
# pg metadata is the information that describes the paired guide RNA targets

example_pg_metadata <- example_data %>%
dplyr::select(c("id", "seq_1", "seq_2"))
```

# sample metadata is the information that describes
`sample_metadata` is the information that describes timepoint information and replicate information relating to each sample. In general, two replicates at each timepoint are carried through to analysis, where they are later collapsed.

```{r}
example_sample_metadata <- data.frame(
id = 1:5,
day = as.factor(c("Day00", "Day05", "Day22", "Day22", "Day22")),
Expand All @@ -63,7 +82,7 @@ example_sample_metadata <- data.frame(
Now let's setup our data using `setup_data()`. Optionally we can provide the metadata in this function as well so that it is stored with the data.

```{r}
gimap_dataset <- setup_data(counts = example_counts,
gimap_dataset <- gimap::setup_data(counts = example_counts,
pg_metadata = example_pg_metadata,
sample_metadata = example_sample_metadata)
```
Expand Down
Loading