Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add check for factor consistency #134

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -43,9 +43,11 @@ importFrom(boot,logit)
importFrom(dplyr,any_of)
importFrom(dplyr,arrange)
importFrom(dplyr,cummean)
importFrom(dplyr,distinct)
importFrom(dplyr,distinct_at)
importFrom(dplyr,enquo)
importFrom(dplyr,filter)
importFrom(dplyr,pull)
importFrom(dplyr,if_else)
importFrom(dplyr,left_join)
importFrom(dplyr,mutate)
Expand Down
4 changes: 4 additions & 0 deletions R/methods.R
Original file line number Diff line number Diff line change
Expand Up @@ -387,6 +387,10 @@ sccomp_estimate.data.frame = function(.data,
.count = enquo(.count)
.sample_cell_group_pairs_to_exclude = enquo(.sample_cell_group_pairs_to_exclude)

# Check Sample Consistency of Factors
check_sample_consistency_of_factors(.data, formula_composition, !!.sample)


if( quo_is_null(.count))
res = sccomp_glm_data_frame_raw(
.data,
Expand Down
54 changes: 54 additions & 0 deletions R/utilities.R
Original file line number Diff line number Diff line change
Expand Up @@ -2866,6 +2866,60 @@ contains_only_valid_chars_for_column <- function(column_names) {
sapply(column_names, check_validity)
}

#' Check Sample Consistency of Factors
#'
#' This function checks for each sample in the provided data frame if the number of unique
#' covariate values from a specified formula matches the number of samples. It is useful for
#' verifying data consistency before statistical analysis. The function stops and throws an
#' error if inconsistencies are found.
#'
#' @importFrom dplyr select
#' @importFrom dplyr filter
#' @importFrom dplyr mutate
#' @importFrom dplyr pull
#' @importFrom dplyr distinct
#' @importFrom tidyr pivot_longer
#' @importFrom purrr map_lgl
#'
#' @param .data A data frame containing the samples and covariates.
#' @param my_formula A formula specifying the covariates to check, passed as a string.
#'
#' @details The function selects the sample and covariates based on `my_formula`, pivots
#' the data longer so each row represents a unique sample-covariate combination, nests
#' the data by covariate name, and checks if the number of unique sample-covariate
#' pairs matches the number of samples for each covariate.
#'
#' @return This function does not return a value; it stops with an error message if any
#' inconsistencies are found.
#'
#' @noRd
#' @keywords internal
check_sample_consistency_of_factors = function(.data, my_formula, .sample){

.sample = enquo(.sample)

# Check that I have one set of covariates per sample
any_covariate_not_matching_sample_size =
.data |>
select(!!.sample, parse_formula(my_formula)) |>
pivot_longer(-!!.sample) |>
nest(data = -name) |>
mutate(correct_size = map_lgl(data,
~
(.x |> distinct(!!.sample, value) |> nrow()) <=
(.x |> distinct(!!.sample) |> nrow())
)) |>
filter(!correct_size)

if( any_covariate_not_matching_sample_size |> nrow() > 0 ) stop(
sprintf("sccomp says: your \"%s\" factor(s) is(are) mismatched across samples. ", any_covariate_not_matching_sample_size |> pull(name) |> paste(collapse = ", ")),
"For example, sample_bar having more than one value for factor_foo. ",
"For sample_bar you should have one value for factor_foo. consistent across groups (e.g. cell types)."
)

}



#' chatGPT - Intelligently Remove Surrounding Brackets from Each String in a Vector
#'
Expand Down
Loading