update QC report with two graphs #33

kweav · 2024-06-12T20:54:17Z

This PR adds two graphs to the QC report as requested in the May 20th meeting.

Histogram on all data (no filters applied) that shows the distribution of the variance within replicates of each pgRNA.
Bar plot specifically focusing on pgRNAs flagged by the zero count filter (working with subsets of data/filtered data), reporting the number of replicates (0,1,2, or 3) which have a zero count on the x-axis and the number of pgRNAs for each of those groups on the y-axis. (in my scratch code this was labeled as "How many day 22 pgRNAs have counts of 0 across x number of replicates")

Changes made in this PR to accomplish this goal:

Added two functions to the R/qc-plots.R file. One for the histogram (qc_variance_hist()) and one for the bar plot (qc_constructs_countzero_bar()). Both use the gimap_dataset and assume that replicates are stored in columns 3-5.
Called these functions within the inst/rmd/gimapQCTemplate.Rmd file (and added some appropriate headers)
Added an R/filters.R file where possible filter functions will be stored. Currently the only filter in there is the zero count filter (qc_filter_zerocounts())
Edited the descriptions in the vignette to include descriptions of these new plots

Open issues:
You may notice that I removed the three dots for the heatmap plot when calling the function and within the function. When I tested my code locally and rendered the vignette to drive rendering the qc report, the heatmap was rendered within the vignette rather than the output qc report. So I was trying to troubleshoot that, but it's still an open issue that I wasn't able to resolve.

Next steps for upcoming stacked PRs:

Include tables from the filters in the QC report so that users can make informed choices about which filters to apply
Add an option for the user to specify a cutoff different from the 1.5 * IQR preset for the plasmid filter
Move the plasmid filter steps themselves out of the histogram plotting function into the new R/filters.R file.
Add a report that lists pgRNA IDs when those pgRNAs have zero counts for all 3 replicates.

Requested Review:

Is the code for the new plots robust enough?
How to fix the heatmap rendering in the wrong location
Is the new R/filters.R file an ok addition, or would you rather I put those functions elsewhere?
Any feedback or requested changes are more than welcome!

cansavvy

First off, thanks for your great PR description. Really appreciate the rundown.

I think overall this looks great. ❤️

For me to give the feedback on the requested review questions (also great love this ❤️), can you let me know what example code I should run to test this? i.e. what code have you been running when you developed this? If I start with what you've been using to develop it will help guide me through what to review.

Requested Review:

Is the code for the new plots robust enough?

Looks good on a first pass, main question is 3 - 5 column

How to fix the heatmap rendering in the wrong location
I'll dig into this a bit on next round.

Is the new R/filters.R file an ok addition, or would you rather I put those functions elsewhere?

Can we add these to the 02-filter.R file?

cansavvy · 2024-06-13T13:11:01Z

R/filters.R

+
+qc_filter_zerocounts <- function(gimap_dataset){
+
+  counts_filter <- unlist(lapply(1:nrow(gimap_dataset$raw_counts), function(x) 0 %in% gimap_dataset$raw_counts[x,]))


Can you explain the goal of this line?

The goal of this line is to create a vector that reports for each pgRNA construct whether any one of the samples has a count of 0. I used lapply to loop through each row/pgRNA construct and for each row check for a zero and then unlist to make sure it was a simple vector, not nested

I figured out a tidyverse way to do it! Do you prefer the tidyverse way below or a different way all together?

counts_filter <- data.frame(gimap_dataset$raw_counts) %>% map(~.x %in% c(0)) %>% reduce(`|`)

I like this tidyverse way! I think its a bit more readable! Can you transfer this in?

cansavvy · 2024-06-13T13:16:42Z

R/plots-qc.R

+qc_variance_hist <- function(gimap_dataset, wide_ar = 0.75){
+
+  return(
+    gimap_dataset$transformed_data$log2_cpm[,3:5] %>%


What's the reasoning for picking columns 3 - 5 here? Is this going to be standard across datasets?

I don't know if it's going to be standard across datasets :( And it's one of the conversations we've had with the Berger lab, and I'm still unsure. But when I'm working with columns 3-5, I'm assuming I'm working with last day replicates. When I'm working with the first column, I'm assuming I'm working with plasmid/day 0. Would love to have a more robust way to do that! I think requiring sample metadata is probably the best way, but even then, the sample metadata would have to be formatted in ways we expect

We can make an argument that basically tells which samples to apply this filter by (so supply a vector that corresponds to the samples we want to target for this, so in this case 3-5).

How does this idea sound to you?

Alternatively at this point in the workflow, they should have metadata involved so we could also have them supply a column that says which columns should be used to determine a filter. But I think doing an easier method like what I've described above is fine to start if you are cool with that.

Love that idea! So I should just add arguments with defaults specifying the column and they can change the columns?

Yeah so like filter_target_col = NULL and then

is.null(filter_target_col) filter_target_col <- 1:ncol(gimap_dataset$transformed_data$log2_cpm)

But otherwise in the code:

gimap_dataset$transformed_data$log2_cpm[,filter_target_col]

Open to better argument name but this is the gist of what I'm thinking.

Works for me, and I'm working on adding several so each type of filter has its own. It'll be in the next stacked PR since it's a bit of a heavier change across several files. Thank you so much!

cansavvy · 2024-06-13T13:17:14Z

R/plots-qc.R

+#' @import ggplot2 
+#' @return a ggplot barplot
+#' @examples \dontrun{
+#' 


Can you add examples here? Helps me review this if I know how you envision this function will be called.

I found your function calls from the QC report and just brought them here. Correct these if this is not how this function could be called.

cansavvy · 2024-06-13T13:17:31Z

R/plots-qc.R

+  qc_filter_output <- qc_filter_zerocounts(gimap_dataset)
+
+  return(
+    example_counts[qc_filter_output$filter, c(3:5)] %>%


I have the same question here about 3:5.

cansavvy · 2024-06-13T13:18:00Z

inst/rmd/gimapQCTemplate.Rmd

+## Variance within replicates
+
+```{r}
+qc_variance_hist(gimap_dataset)


Ah. I see. 👍

R/plots-qc.R

kweav · 2024-06-13T13:28:26Z

First off, thanks for your great PR description. Really appreciate the rundown.

I think overall this looks great. ❤️

Thanks!

For me to give the feedback on the requested review questions (also great love this ❤️), can you let me know what example code I should run to test this? i.e. what code have you been running when you developed this? If I start with what you've been using to develop it will help guide me through what to review.

The example code that I ran was rendering the vignette getting_started.Rmd

Requested Review:

Is the code for the new plots robust enough?

Looks good on a first pass, main question is 3 - 5 column

That's a question I have as well, sorry if I forgot to notate it.

How to fix the heatmap rendering in the wrong location
I'll dig into this a bit on next round.

Thanks!

Is the new R/filters.R file an ok addition, or would you rather I put those functions elsewhere?

Can we add these to the 02-filter.R file?

Definitely can put them there!

Co-authored-by: Candace Savonen <cansav09@gmail.com>

cansavvy · 2024-06-21T13:49:03Z

@kweav I think all we need to do to merge this is:

Switch out that line of code for the tidyverse version you have
Make the argument that allows us to specify which columns to filter by. Default should be to use all columns (see above discussion we had to jog your memory if needed - I need to remind myself 😄 )

kweav · 2024-06-21T14:21:24Z

R/filters.R

I can push the tidyverse way to this branch
Will be in a later PR. I added an example one in PR Start with adding parameters that allow the user to select which column(s) are used in various filtering or filtering-related visualization steps #35

…o filter_qc_ki

kweav · 2024-06-21T14:33:45Z

R/filters.R

I can push the tidyverse way to this branch

Will be in a later PR. I added an example one in PR Start with adding parameters that allow the user to select which column(s) are used in various filtering or filtering-related visualization steps #35

Used commit edb662a to push change 1

kweav · 2024-06-24T19:57:12Z

@cansavvy pkgdown is happy now!

cansavvy · 2024-06-24T19:58:05Z

@cansavvy pkgdown is happy now!

YAY!!!! 🎉

cansavvy

Tests are passing and I think this gets us closer to our full pipeline so let's merge!!

update QC report with two graphs

967361b

kweav requested a review from cansavvy June 12, 2024 20:54

add description of plots to vignette

24ce58f

kweav mentioned this pull request Jun 12, 2024

split up filters from plots and add tables to report #34

Merged

cansavvy reviewed Jun 13, 2024

View reviewed changes

kweav and others added 5 commits June 13, 2024 09:34

Update R/plots-qc.R

5799955

Co-authored-by: Candace Savonen <cansav09@gmail.com>

Update R/plots-qc.R

b6a8ded

Co-authored-by: Candace Savonen <cansav09@gmail.com>

move filter to other file

e86efc3

remove filters file to resolve conflict

70a1d1a

Merge branch 'main' into filter_qc_ki

07b86f5

kweav mentioned this pull request Jun 13, 2024

Start with adding parameters that allow the user to select which column(s) are used in various filtering or filtering-related visualization steps #35

Merged

Merge branch 'main' into filter_qc_ki

ebd9b40

kweav added 2 commits June 21, 2024 09:23

switch out tidyverse way

edb662a

Merge branch 'filter_qc_ki' of https://github.com/FredHutch/gimap int…

c835a8f

…o filter_qc_ki

kweav and others added 4 commits June 24, 2024 13:03

tell it to import reduce

3c86ea8

Add purrr map @importFrom too

e2d0e92

Add Kate and add purrr to description file

ce69182

move some parentheses and see if pkgdown is happier

97241a7

cansavvy approved these changes Jun 25, 2024

View reviewed changes

cansavvy merged commit fa3cf08 into main Jun 25, 2024
6 of 7 checks passed

cansavvy deleted the filter_qc_ki branch June 25, 2024 14:30

kweav mentioned this pull request Jun 25, 2024

Add ability to select columns for the last sample replicates #39

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update QC report with two graphs #33

update QC report with two graphs #33

kweav commented Jun 12, 2024 •

edited

Loading

cansavvy left a comment

cansavvy Jun 13, 2024

kweav Jun 13, 2024

kweav Jun 13, 2024

cansavvy Jun 21, 2024

cansavvy Jun 13, 2024

kweav Jun 13, 2024

cansavvy Jun 13, 2024 •

edited

Loading

cansavvy Jun 13, 2024

kweav Jun 13, 2024

cansavvy Jun 13, 2024

kweav Jun 13, 2024

cansavvy Jun 13, 2024

cansavvy Jun 13, 2024

cansavvy Jun 13, 2024

cansavvy Jun 13, 2024

kweav commented Jun 13, 2024

cansavvy commented Jun 21, 2024

kweav commented Jun 21, 2024

kweav commented Jun 21, 2024

kweav commented Jun 24, 2024

cansavvy commented Jun 24, 2024

cansavvy left a comment


		qc_filter_zerocounts <- function(gimap_dataset){

		counts_filter <- unlist(lapply(1:nrow(gimap_dataset$raw_counts), function(x) 0 %in% gimap_dataset$raw_counts[x,]))

update QC report with two graphs #33

update QC report with two graphs #33

Conversation

kweav commented Jun 12, 2024 • edited Loading

cansavvy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cansavvy Jun 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kweav commented Jun 13, 2024

cansavvy commented Jun 21, 2024

kweav commented Jun 21, 2024

kweav commented Jun 21, 2024

kweav commented Jun 24, 2024

cansavvy commented Jun 24, 2024

cansavvy left a comment

Choose a reason for hiding this comment

kweav commented Jun 12, 2024 •

edited

Loading

cansavvy Jun 13, 2024 •

edited

Loading