Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update QC report with two graphs #33

Merged
merged 14 commits into from
Jun 25, 2024
Merged

update QC report with two graphs #33

merged 14 commits into from
Jun 25, 2024

Conversation

kweav
Copy link
Collaborator

@kweav kweav commented Jun 12, 2024

This PR adds two graphs to the QC report as requested in the May 20th meeting.

  1. Histogram on all data (no filters applied) that shows the distribution of the variance within replicates of each pgRNA.
  2. Bar plot specifically focusing on pgRNAs flagged by the zero count filter (working with subsets of data/filtered data), reporting the number of replicates (0,1,2, or 3) which have a zero count on the x-axis and the number of pgRNAs for each of those groups on the y-axis. (in my scratch code this was labeled as "How many day 22 pgRNAs have counts of 0 across x number of replicates")

Changes made in this PR to accomplish this goal:

  1. Added two functions to the R/qc-plots.R file. One for the histogram (qc_variance_hist()) and one for the bar plot (qc_constructs_countzero_bar()). Both use the gimap_dataset and assume that replicates are stored in columns 3-5.
  2. Called these functions within the inst/rmd/gimapQCTemplate.Rmd file (and added some appropriate headers)
  3. Added an R/filters.R file where possible filter functions will be stored. Currently the only filter in there is the zero count filter (qc_filter_zerocounts())
  4. Edited the descriptions in the vignette to include descriptions of these new plots

Open issues:
You may notice that I removed the three dots for the heatmap plot when calling the function and within the function. When I tested my code locally and rendered the vignette to drive rendering the qc report, the heatmap was rendered within the vignette rather than the output qc report. So I was trying to troubleshoot that, but it's still an open issue that I wasn't able to resolve.

Next steps for upcoming stacked PRs:

  • Include tables from the filters in the QC report so that users can make informed choices about which filters to apply
  • Add an option for the user to specify a cutoff different from the 1.5 * IQR preset for the plasmid filter
  • Move the plasmid filter steps themselves out of the histogram plotting function into the new R/filters.R file.
  • Add a report that lists pgRNA IDs when those pgRNAs have zero counts for all 3 replicates.

Requested Review:

  • Is the code for the new plots robust enough?
  • How to fix the heatmap rendering in the wrong location
  • Is the new R/filters.R file an ok addition, or would you rather I put those functions elsewhere?
  • Any feedback or requested changes are more than welcome!

@kweav kweav requested a review from cansavvy June 12, 2024 20:54
Copy link
Collaborator

@cansavvy cansavvy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First off, thanks for your great PR description. Really appreciate the rundown.

I think overall this looks great. ❤️

For me to give the feedback on the requested review questions (also great love this ❤️), can you let me know what example code I should run to test this? i.e. what code have you been running when you developed this? If I start with what you've been using to develop it will help guide me through what to review.

Requested Review:

  • Is the code for the new plots robust enough?

Looks good on a first pass, main question is 3 - 5 column

  • How to fix the heatmap rendering in the wrong location
    I'll dig into this a bit on next round.
  • Is the new R/filters.R file an ok addition, or would you rather I put those functions elsewhere?

Can we add these to the 02-filter.R file?

R/filters.R Outdated

qc_filter_zerocounts <- function(gimap_dataset){

counts_filter <- unlist(lapply(1:nrow(gimap_dataset$raw_counts), function(x) 0 %in% gimap_dataset$raw_counts[x,]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the goal of this line?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of this line is to create a vector that reports for each pgRNA construct whether any one of the samples has a count of 0. I used lapply to loop through each row/pgRNA construct and for each row check for a zero and then unlist to make sure it was a simple vector, not nested

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured out a tidyverse way to do it! Do you prefer the tidyverse way below or a different way all together?

counts_filter <- data.frame(gimap_dataset$raw_counts) %>% map(~.x %in% c(0)) %>% reduce(`|`)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this tidyverse way! I think its a bit more readable! Can you transfer this in?

qc_variance_hist <- function(gimap_dataset, wide_ar = 0.75){

return(
gimap_dataset$transformed_data$log2_cpm[,3:5] %>%
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reasoning for picking columns 3 - 5 here? Is this going to be standard across datasets?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it's going to be standard across datasets :( And it's one of the conversations we've had with the Berger lab, and I'm still unsure. But when I'm working with columns 3-5, I'm assuming I'm working with last day replicates. When I'm working with the first column, I'm assuming I'm working with plasmid/day 0. Would love to have a more robust way to do that! I think requiring sample metadata is probably the best way, but even then, the sample metadata would have to be formatted in ways we expect

Copy link
Collaborator

@cansavvy cansavvy Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make an argument that basically tells which samples to apply this filter by (so supply a vector that corresponds to the samples we want to target for this, so in this case 3-5).

How does this idea sound to you?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively at this point in the workflow, they should have metadata involved so we could also have them supply a column that says which columns should be used to determine a filter. But I think doing an easier method like what I've described above is fine to start if you are cool with that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love that idea! So I should just add arguments with defaults specifying the column and they can change the columns?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah so like filter_target_col = NULL and then

is.null(filter_target_col) filter_target_col <- 1:ncol(gimap_dataset$transformed_data$log2_cpm)

But otherwise in the code:

gimap_dataset$transformed_data$log2_cpm[,filter_target_col]

Open to better argument name but this is the gist of what I'm thinking.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me, and I'm working on adding several so each type of filter has its own. It'll be in the next stacked PR since it's a bit of a heavier change across several files. Thank you so much!

R/plots-qc.R Outdated
#' @import ggplot2
#' @return a ggplot barplot
#' @examples \dontrun{
#'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add examples here? Helps me review this if I know how you envision this function will be called.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found your function calls from the QC report and just brought them here. Correct these if this is not how this function could be called.

qc_filter_output <- qc_filter_zerocounts(gimap_dataset)

return(
example_counts[qc_filter_output$filter, c(3:5)] %>%
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same question here about 3:5.

## Variance within replicates

```{r}
qc_variance_hist(gimap_dataset)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. I see. 👍

R/plots-qc.R Outdated Show resolved Hide resolved
R/plots-qc.R Outdated Show resolved Hide resolved
@kweav
Copy link
Collaborator Author

kweav commented Jun 13, 2024

First off, thanks for your great PR description. Really appreciate the rundown.

I think overall this looks great. ❤️

Thanks!

For me to give the feedback on the requested review questions (also great love this ❤️), can you let me know what example code I should run to test this? i.e. what code have you been running when you developed this? If I start with what you've been using to develop it will help guide me through what to review.

The example code that I ran was rendering the vignette getting_started.Rmd

Requested Review:

  • Is the code for the new plots robust enough?

Looks good on a first pass, main question is 3 - 5 column

That's a question I have as well, sorry if I forgot to notate it.

  • How to fix the heatmap rendering in the wrong location
    I'll dig into this a bit on next round.

Thanks!

  • Is the new R/filters.R file an ok addition, or would you rather I put those functions elsewhere?

Can we add these to the 02-filter.R file?

Definitely can put them there!

kweav and others added 5 commits June 13, 2024 09:34
@cansavvy
Copy link
Collaborator

@kweav I think all we need to do to merge this is:

  1. Switch out that line of code for the tidyverse version you have
  2. Make the argument that allows us to specify which columns to filter by. Default should be to use all columns (see above discussion we had to jog your memory if needed - I need to remind myself 😄 )

@kweav
Copy link
Collaborator Author

kweav commented Jun 21, 2024

R/filters.R

  1. I can push the tidyverse way to this branch
  2. Will be in a later PR. I added an example one in PR Start with adding parameters that allow the user to select which column(s) are used in various filtering or filtering-related visualization steps #35

@kweav
Copy link
Collaborator Author

kweav commented Jun 21, 2024

R/filters.R

  1. I can push the tidyverse way to this branch
  2. Will be in a later PR. I added an example one in PR Start with adding parameters that allow the user to select which column(s) are used in various filtering or filtering-related visualization steps #35

Used commit edb662a to push change 1

@kweav
Copy link
Collaborator Author

kweav commented Jun 24, 2024

@cansavvy pkgdown is happy now!

@cansavvy
Copy link
Collaborator

@cansavvy pkgdown is happy now!

YAY!!!! 🎉

Copy link
Collaborator

@cansavvy cansavvy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are passing and I think this gets us closer to our full pipeline so let's merge!!

@cansavvy cansavvy merged commit fa3cf08 into main Jun 25, 2024
6 of 7 checks passed
@cansavvy cansavvy deleted the filter_qc_ki branch June 25, 2024 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants