-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update QC report with two graphs #33
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First off, thanks for your great PR description. Really appreciate the rundown.
I think overall this looks great. ❤️
For me to give the feedback on the requested review questions (also great love this ❤️), can you let me know what example code I should run to test this? i.e. what code have you been running when you developed this? If I start with what you've been using to develop it will help guide me through what to review.
Requested Review:
- Is the code for the new plots robust enough?
Looks good on a first pass, main question is 3 - 5 column
- How to fix the heatmap rendering in the wrong location
I'll dig into this a bit on next round.
- Is the new
R/filters.R
file an ok addition, or would you rather I put those functions elsewhere?
Can we add these to the 02-filter.R
file?
R/filters.R
Outdated
|
||
qc_filter_zerocounts <- function(gimap_dataset){ | ||
|
||
counts_filter <- unlist(lapply(1:nrow(gimap_dataset$raw_counts), function(x) 0 %in% gimap_dataset$raw_counts[x,])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain the goal of this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal of this line is to create a vector that reports for each pgRNA construct whether any one of the samples has a count of 0. I used lapply
to loop through each row/pgRNA construct and for each row check for a zero and then unlist
to make sure it was a simple vector, not nested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured out a tidyverse way to do it! Do you prefer the tidyverse way below or a different way all together?
counts_filter <- data.frame(gimap_dataset$raw_counts) %>% map(~.x %in% c(0)) %>% reduce(`|`)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this tidyverse way! I think its a bit more readable! Can you transfer this in?
qc_variance_hist <- function(gimap_dataset, wide_ar = 0.75){ | ||
|
||
return( | ||
gimap_dataset$transformed_data$log2_cpm[,3:5] %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reasoning for picking columns 3 - 5 here? Is this going to be standard across datasets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if it's going to be standard across datasets :( And it's one of the conversations we've had with the Berger lab, and I'm still unsure. But when I'm working with columns 3-5, I'm assuming I'm working with last day replicates. When I'm working with the first column, I'm assuming I'm working with plasmid/day 0. Would love to have a more robust way to do that! I think requiring sample metadata is probably the best way, but even then, the sample metadata would have to be formatted in ways we expect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make an argument that basically tells which samples to apply this filter by (so supply a vector that corresponds to the samples we want to target for this, so in this case 3-5).
How does this idea sound to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively at this point in the workflow, they should have metadata involved so we could also have them supply a column that says which columns should be used to determine a filter. But I think doing an easier method like what I've described above is fine to start if you are cool with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love that idea! So I should just add arguments with defaults specifying the column and they can change the columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah so like filter_target_col = NULL
and then
is.null(filter_target_col) filter_target_col <- 1:ncol(gimap_dataset$transformed_data$log2_cpm)
But otherwise in the code:
gimap_dataset$transformed_data$log2_cpm[,filter_target_col]
Open to better argument name but this is the gist of what I'm thinking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me, and I'm working on adding several so each type of filter has its own. It'll be in the next stacked PR since it's a bit of a heavier change across several files. Thank you so much!
R/plots-qc.R
Outdated
#' @import ggplot2 | ||
#' @return a ggplot barplot | ||
#' @examples \dontrun{ | ||
#' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add examples here? Helps me review this if I know how you envision this function will be called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found your function calls from the QC report and just brought them here. Correct these if this is not how this function could be called.
qc_filter_output <- qc_filter_zerocounts(gimap_dataset) | ||
|
||
return( | ||
example_counts[qc_filter_output$filter, c(3:5)] %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the same question here about 3:5
.
## Variance within replicates | ||
|
||
```{r} | ||
qc_variance_hist(gimap_dataset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. I see. 👍
Thanks!
The example code that I ran was rendering the vignette
That's a question I have as well, sorry if I forgot to notate it.
Thanks!
Definitely can put them there! |
Co-authored-by: Candace Savonen <cansav09@gmail.com>
Co-authored-by: Candace Savonen <cansav09@gmail.com>
@kweav I think all we need to do to merge this is:
|
|
Used commit edb662a to push change 1 |
@cansavvy pkgdown is happy now! |
YAY!!!! 🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests are passing and I think this gets us closer to our full pipeline so let's merge!!
This PR adds two graphs to the QC report as requested in the May 20th meeting.
Changes made in this PR to accomplish this goal:
R/qc-plots.R
file. One for the histogram (qc_variance_hist()
) and one for the bar plot (qc_constructs_countzero_bar()
). Both use the gimap_dataset and assume that replicates are stored in columns 3-5.inst/rmd/gimapQCTemplate.Rmd
file (and added some appropriate headers)R/filters.R
file where possible filter functions will be stored. Currently the only filter in there is the zero count filter (qc_filter_zerocounts()
)Open issues:
You may notice that I removed the three dots for the heatmap plot when calling the function and within the function. When I tested my code locally and rendered the vignette to drive rendering the qc report, the heatmap was rendered within the vignette rather than the output qc report. So I was trying to troubleshoot that, but it's still an open issue that I wasn't able to resolve.
Next steps for upcoming stacked PRs:
R/filters.R
file.Requested Review:
R/filters.R
file an ok addition, or would you rather I put those functions elsewhere?