From 1507cb1edb95750d89b55073bd5f1a97f6414e8f Mon Sep 17 00:00:00 2001 From: "Daniel J. Groso" <95438884+danieljgroso@users.noreply.github.com> Date: Mon, 12 Feb 2024 09:36:59 -0800 Subject: [PATCH 1/9] added todo comments for clarification --- vignettes/getting-started.Rmd | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/vignettes/getting-started.Rmd b/vignettes/getting-started.Rmd index f6cfabf..17dfdea 100644 --- a/vignettes/getting-started.Rmd +++ b/vignettes/getting-started.Rmd @@ -25,6 +25,9 @@ library(gimap) example_data <- example_data() # Let's examine this example metadata + +# TODO: ## this may or may not be the right place to do it, but perhaps a more descriptive comment of what we're looking at would help. e.g. "Let's examine this example pgPEN counts table, with column IDs representing time points at which gDNA was extracted and the experimental (biological/technical?) replicate number" + example_data ``` @@ -34,9 +37,11 @@ colnames(example_data) ## Setting up data -We're going to set up three datasets. The first is required, its the counts that the genetic interaction analysis will be used for. +We're going to set up three datasets. The first is required, it's the counts that the genetic interaction analysis will be used for. ```{r} +# TODO: ## perhaps here we can clarify the flexibility of the workflow / what can be set up. e.g. does this function allow exploration / comparison of specific days and reps? or is that not really needed for the purposes of the vignette? + example_counts <- example_data %>% dplyr::select(c("Day00_RepA", "Day05_RepA", "Day22_RepA", "Day22_RepB", "Day22_RepC")) %>% as.matrix() @@ -49,10 +54,15 @@ The next two datasets are metadata that describe the dimensions of the count dat ```{r} # pg metadata is the information that describes the paired guide RNA targets + +# TODO: ## this is really minor but I think a more accurate description would be closer to: "pg metadata contains a table of paired guide RNAs targeting a paralog pair and their corresponding nucleotide sequences". What's being targeted is a 20bp region of genomic DNA, which is complementary to the gRNA sequence listed + example_pg_metadata <- example_data %>% dplyr::select(c("id", "seq_1", "seq_2")) # sample metadata is the information that describes +# TODO: ## not sure if the comment above was cut off or not, but more description could help here + example_sample_metadata <- data.frame( id = 1:5, day = as.factor(c("Day00", "Day05", "Day22", "Day22", "Day22")), From a4e77675b0c8ceae081e63b036c98fbab95a43c1 Mon Sep 17 00:00:00 2001 From: "Daniel J. Groso" <95438884+danieljgroso@users.noreply.github.com> Date: Wed, 14 Feb 2024 15:36:18 -0800 Subject: [PATCH 2/9] Update getting-started.Rmd example data caller (comment) --- vignettes/getting-started.Rmd | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/vignettes/getting-started.Rmd b/vignettes/getting-started.Rmd index 17dfdea..9fe5ef8 100644 --- a/vignettes/getting-started.Rmd +++ b/vignettes/getting-started.Rmd @@ -24,9 +24,11 @@ library(gimap) ```{r} example_data <- example_data() -# Let's examine this example metadata - -# TODO: ## this may or may not be the right place to do it, but perhaps a more descriptive comment of what we're looking at would help. e.g. "Let's examine this example pgPEN counts table, with column IDs representing time points at which gDNA was extracted and the experimental (biological/technical?) replicate number" +# Let's examine this example pgPEN counts table. It's divided into columns containing: +# - an ID corresponding to the names of paired guides +# - gRNA sequence 1, targeting "paralog A" +# - gRNA sequence 2, targeting "paralog B" +# - The sample, day, and replicate number for which gRNAs were sequenced example_data ``` From 8aeeec1ad5217a503ddf052a11fe0ac6e4637a18 Mon Sep 17 00:00:00 2001 From: sobrien29 <144387992+sobrien29@users.noreply.github.com> Date: Tue, 20 Feb 2024 13:23:06 -0800 Subject: [PATCH 3/9] SO_comments_getting-started.Rmd --- vignettes/getting-started.Rmd | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/vignettes/getting-started.Rmd b/vignettes/getting-started.Rmd index 9fe5ef8..cb42bdb 100644 --- a/vignettes/getting-started.Rmd +++ b/vignettes/getting-started.Rmd @@ -42,7 +42,7 @@ colnames(example_data) We're going to set up three datasets. The first is required, it's the counts that the genetic interaction analysis will be used for. ```{r} -# TODO: ## perhaps here we can clarify the flexibility of the workflow / what can be set up. e.g. does this function allow exploration / comparison of specific days and reps? or is that not really needed for the purposes of the vignette? +## The first data set contains the readcounts from each sample type. Required for analysis is a Day 0 (or plasmid) sample, and at least one further timepoint sample. QC analysis will follow to correlate replicates if inputted. Comparison of early and late timepoints is possible in this function, but not required if early timepoints were not taken. example_counts <- example_data %>% dplyr::select(c("Day00_RepA", "Day05_RepA", "Day22_RepA", "Day22_RepB", "Day22_RepC")) %>% @@ -55,15 +55,13 @@ The next two datasets are metadata that describe the dimensions of the count dat - The first column of the pg_metadata must be a unique id ```{r} -# pg metadata is the information that describes the paired guide RNA targets - -# TODO: ## this is really minor but I think a more accurate description would be closer to: "pg metadata contains a table of paired guide RNAs targeting a paralog pair and their corresponding nucleotide sequences". What's being targeted is a 20bp region of genomic DNA, which is complementary to the gRNA sequence listed +# pg metadata is the information that describes the paired guide RNA targets. This information contains a table of the paired guide RNA sequences and the corresponding paralog gene that is being targeted for cutting by the gRNA-Cas9 complex. example_pg_metadata <- example_data %>% dplyr::select(c("id", "seq_1", "seq_2")) -# sample metadata is the information that describes -# TODO: ## not sure if the comment above was cut off or not, but more description could help here +# sample metadata is the information that describes timepoint information and replicate information relating to each sample. In general, two replicates at each timepoint are carried through to analysis, where they are later collapsed. + example_sample_metadata <- data.frame( id = 1:5, From caa4688bb1154f5a2b72bfcd3ca51e225e4d71cf Mon Sep 17 00:00:00 2001 From: Candace Savonen Date: Wed, 21 Feb 2024 12:45:10 -0500 Subject: [PATCH 4/9] Candace and Daniel pair programming --- vignettes/getting-started.Rmd | 35 ++++++++++++++++++++++------------- 1 file changed, 22 insertions(+), 13 deletions(-) diff --git a/vignettes/getting-started.Rmd b/vignettes/getting-started.Rmd index cb42bdb..ff98ff2 100644 --- a/vignettes/getting-started.Rmd +++ b/vignettes/getting-started.Rmd @@ -16,19 +16,26 @@ knitr::opts_chunk$set( # gimap tutorial +TODO: Describe what this genetic interaction analysis actually is and why someone would want to do it. + ```{r} library(magrittr) library(gimap) ``` -```{r} -example_data <- example_data() +## Data requirements + +TODO: What kind of data would someone need to run this? How much flexibility is there in what the experimental set up might look like? -# Let's examine this example pgPEN counts table. It's divided into columns containing: -# - an ID corresponding to the names of paired guides -# - gRNA sequence 1, targeting "paralog A" -# - gRNA sequence 2, targeting "paralog B" -# - The sample, day, and replicate number for which gRNAs were sequenced +Let's examine this example pgPEN counts table. It's divided into columns containing: + +- an ID corresponding to the names of paired guides +- gRNA sequence 1, targeting "paralog A" +- gRNA sequence 2, targeting "paralog B" +- The sample, day, and replicate number for which gRNAs were sequenced + +```{r} +example_data <- gimap::example_data() example_data ``` @@ -41,9 +48,9 @@ colnames(example_data) We're going to set up three datasets. The first is required, it's the counts that the genetic interaction analysis will be used for. -```{r} -## The first data set contains the readcounts from each sample type. Required for analysis is a Day 0 (or plasmid) sample, and at least one further timepoint sample. QC analysis will follow to correlate replicates if inputted. Comparison of early and late timepoints is possible in this function, but not required if early timepoints were not taken. +The first data set contains the readcounts from each sample type. Required for analysis is a Day 0 (or plasmid) sample, and at least one further timepoint sample. QC analysis will follow to correlate replicates if inputted. Comparison of early and late timepoints is possible in this function, but not required if early timepoints were not taken. +```{r} example_counts <- example_data %>% dplyr::select(c("Day00_RepA", "Day05_RepA", "Day22_RepA", "Day22_RepB", "Day22_RepC")) %>% as.matrix() @@ -54,15 +61,17 @@ The next two datasets are metadata that describe the dimensions of the count dat - The sizes of these metadata must correspond to the dimensions of the counts data. - The first column of the pg_metadata must be a unique id +`pg_metadata` is the information that describes the paired guide RNA targets. This information contains a table of the paired guide RNA sequences and the corresponding paralog gene that is being targeted for cutting by the gRNA-Cas9 complex. + ```{r} -# pg metadata is the information that describes the paired guide RNA targets. This information contains a table of the paired guide RNA sequences and the corresponding paralog gene that is being targeted for cutting by the gRNA-Cas9 complex. example_pg_metadata <- example_data %>% dplyr::select(c("id", "seq_1", "seq_2")) +``` -# sample metadata is the information that describes timepoint information and replicate information relating to each sample. In general, two replicates at each timepoint are carried through to analysis, where they are later collapsed. - +`sample_metadata` is the information that describes timepoint information and replicate information relating to each sample. In general, two replicates at each timepoint are carried through to analysis, where they are later collapsed. +```{r} example_sample_metadata <- data.frame( id = 1:5, day = as.factor(c("Day00", "Day05", "Day22", "Day22", "Day22")), @@ -73,7 +82,7 @@ example_sample_metadata <- data.frame( Now let's setup our data using `setup_data()`. Optionally we can provide the metadata in this function as well so that it is stored with the data. ```{r} -gimap_dataset <- setup_data(counts = example_counts, +gimap_dataset <- gimap::setup_data(counts = example_counts, pg_metadata = example_pg_metadata, sample_metadata = example_sample_metadata) ``` From daf44467cde2872b118ca7d295987fdb1b9d4891 Mon Sep 17 00:00:00 2001 From: Candace Savonen Date: Mon, 4 Mar 2024 13:45:02 -0500 Subject: [PATCH 5/9] Update vignettes/getting-started.Rmd --- vignettes/getting-started.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/getting-started.Rmd b/vignettes/getting-started.Rmd index ff98ff2..d3b2c3b 100644 --- a/vignettes/getting-started.Rmd +++ b/vignettes/getting-started.Rmd @@ -36,7 +36,7 @@ Let's examine this example pgPEN counts table. It's divided into columns contain ```{r} example_data <- gimap::example_data() - +The metadata you have may vary slightly from this but you'll want to make sure you have the essential variables and information regarding how you collected your data. example_data ``` From 0a2b0c114d7feb7e0b6595031420749379a7fc4a Mon Sep 17 00:00:00 2001 From: Candace Savonen Date: Mon, 4 Mar 2024 13:45:58 -0500 Subject: [PATCH 6/9] Update vignettes/getting-started.Rmd Co-authored-by: Daniel J. Groso <95438884+danieljgroso@users.noreply.github.com> --- vignettes/getting-started.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/getting-started.Rmd b/vignettes/getting-started.Rmd index d3b2c3b..0c4e186 100644 --- a/vignettes/getting-started.Rmd +++ b/vignettes/getting-started.Rmd @@ -17,7 +17,7 @@ knitr::opts_chunk$set( # gimap tutorial TODO: Describe what this genetic interaction analysis actually is and why someone would want to do it. - +gimap performs analysis of dual-targeting CRISPR screening data, with the goal of aiding the identification of genetic interactions (e.g. cooperativity, synthetic lethality) in models of disease and other biological contexts. gimap analyzes functional genomic data generated by the pgPEN (paired guide RNAs for genetic interaction mapping) approach, quantifying growth effects of single and paired gene knockouts upon application of a CRISPR library. ```{r} library(magrittr) library(gimap) From 12ef57282439811d636ff54456c61b615579f2cf Mon Sep 17 00:00:00 2001 From: Candace Savonen Date: Wed, 6 Mar 2024 12:41:38 -0500 Subject: [PATCH 7/9] Candace and Daniel pair code commit --- vignettes/getting-started.Rmd | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/vignettes/getting-started.Rmd b/vignettes/getting-started.Rmd index 0c4e186..ea95a46 100644 --- a/vignettes/getting-started.Rmd +++ b/vignettes/getting-started.Rmd @@ -16,8 +16,8 @@ knitr::opts_chunk$set( # gimap tutorial -TODO: Describe what this genetic interaction analysis actually is and why someone would want to do it. gimap performs analysis of dual-targeting CRISPR screening data, with the goal of aiding the identification of genetic interactions (e.g. cooperativity, synthetic lethality) in models of disease and other biological contexts. gimap analyzes functional genomic data generated by the pgPEN (paired guide RNAs for genetic interaction mapping) approach, quantifying growth effects of single and paired gene knockouts upon application of a CRISPR library. + ```{r} library(magrittr) library(gimap) @@ -25,8 +25,6 @@ library(gimap) ## Data requirements -TODO: What kind of data would someone need to run this? How much flexibility is there in what the experimental set up might look like? - Let's examine this example pgPEN counts table. It's divided into columns containing: - an ID corresponding to the names of paired guides @@ -36,9 +34,9 @@ Let's examine this example pgPEN counts table. It's divided into columns contain ```{r} example_data <- gimap::example_data() +``` The metadata you have may vary slightly from this but you'll want to make sure you have the essential variables and information regarding how you collected your data. example_data -``` ```{r} colnames(example_data) From 883734b9aa84c851e547a1fa4ae9944931ffd917 Mon Sep 17 00:00:00 2001 From: sobrien29 <144387992+sobrien29@users.noreply.github.com> Date: Thu, 7 Mar 2024 09:39:44 -0800 Subject: [PATCH 8/9] Update getting-started.Rmd --- vignettes/getting-started.Rmd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/vignettes/getting-started.Rmd b/vignettes/getting-started.Rmd index ea95a46..f3d2c25 100644 --- a/vignettes/getting-started.Rmd +++ b/vignettes/getting-started.Rmd @@ -16,7 +16,7 @@ knitr::opts_chunk$set( # gimap tutorial -gimap performs analysis of dual-targeting CRISPR screening data, with the goal of aiding the identification of genetic interactions (e.g. cooperativity, synthetic lethality) in models of disease and other biological contexts. gimap analyzes functional genomic data generated by the pgPEN (paired guide RNAs for genetic interaction mapping) approach, quantifying growth effects of single and paired gene knockouts upon application of a CRISPR library. +gimap performs analysis of dual-targeting CRISPR screening data, with the goal of aiding the identification of genetic interactions (e.g. cooperativity, synthetic lethality) in models of disease and other biological contexts. gimap analyzes functional genomic data generated by the pgPEN (paired guide RNAs for genetic interaction mapping) approach, quantifying growth effects of single and paired gene knockouts upon application of a CRISPR library. A multitude of CRISPR screen types can be used for this analysis, with helpful descriptions found in this review (https://www.nature.com/articles/s43586-021-00093-4). Use of pgPEN and GI-mapping in a paired gRNA format can be found here (https://pubmed.ncbi.nlm.nih.gov/34469736/). ```{r} library(magrittr) @@ -46,7 +46,7 @@ colnames(example_data) We're going to set up three datasets. The first is required, it's the counts that the genetic interaction analysis will be used for. -The first data set contains the readcounts from each sample type. Required for analysis is a Day 0 (or plasmid) sample, and at least one further timepoint sample. QC analysis will follow to correlate replicates if inputted. Comparison of early and late timepoints is possible in this function, but not required if early timepoints were not taken. +The first data set contains the readcounts from each sample type. Required for analysis is a Day 0 (or plasmid) sample, and at least one further timepoint sample. The T0 sample, or plasmid sample, will represent the entire library before any type of selection has occurred during the length of the screen. This is the baseline for guide RNA representation. QC analysis will follow to correlate replicates if inputted. Comparison of early and late timepoints is possible in this function, but not required if early timepoints were not taken. ```{r} example_counts <- example_data %>% From 5781565cf34c7507455bb45fd8d42650150363ee Mon Sep 17 00:00:00 2001 From: sobrien29 <144387992+sobrien29@users.noreply.github.com> Date: Thu, 7 Mar 2024 09:46:59 -0800 Subject: [PATCH 9/9] Update getting-started.Rmd --- vignettes/getting-started.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/getting-started.Rmd b/vignettes/getting-started.Rmd index f3d2c25..bb074c7 100644 --- a/vignettes/getting-started.Rmd +++ b/vignettes/getting-started.Rmd @@ -46,7 +46,7 @@ colnames(example_data) We're going to set up three datasets. The first is required, it's the counts that the genetic interaction analysis will be used for. -The first data set contains the readcounts from each sample type. Required for analysis is a Day 0 (or plasmid) sample, and at least one further timepoint sample. The T0 sample, or plasmid sample, will represent the entire library before any type of selection has occurred during the length of the screen. This is the baseline for guide RNA representation. QC analysis will follow to correlate replicates if inputted. Comparison of early and late timepoints is possible in this function, but not required if early timepoints were not taken. +The first data set contains the readcounts from each sample type. Required for analysis is a Day 0 (or plasmid) sample, and at least one further timepoint sample. The T0 sample, or plasmid sample, will represent the entire library before any type of selection has occurred during the length of the screen. This is the baseline for guide RNA representation. The length of time cells should remain in culture throughout the screen is heavily dependent on the type of selection occurring, helpful advice can be found in (https://www.nature.com/articles/s43586-021-00093-4). QC analysis will follow to correlate replicates if inputted. Comparison of early and late timepoints is possible in this function, but not required if early timepoints were not taken. ```{r} example_counts <- example_data %>%