preserve task attributes when creating reduced_task by modifying data using mlr3pipelines #11

mikoontz · 2022-12-07T23:36:44Z

This PR fixes what I think could be a problematic approach to creating a new mlr3 task with knockoff data.

Currently, the reduced task is created by creating a new data.frame with knockoff data substituted for a feature (or group of features), then creating a new task using mlr3::as_task_regr() or mlr3::as_task_classif(), then specifying the new data.frame as the backend, then specifying the same target as the original task as the target.

This approach doesn't respect other settings of the Task, such as observation-level weights or coordinates (as in the case when a user is setting up a spatial cross validation). It also requires logic that queries whether a task is a regression or classification task in order to call the correct function (mlr3::as_task_regr() or mlr3::as_task_classif()). This logic break with other possible task types that should still work (e.g., classif_st in the case where a user wants to set up spatial cross validation using coordinates associated with each observation with mlr3spatiotempcv::as_task_classif_st().

From here, the preferred way to modify the data in a task is to use mlr3pipelines. This PR implements this approach and adds a "suggests" dependency (mlr3pipelines package) in order to use the PipeOpMutate method. It works for both individual features and grouped features. An ancillary benefit is that we can reduce the logic complexity because we no longer need to query whether the task is regression or classification. It also allows for other task types that aren't strictly "classif" or "regr" (e.g., "classif_st" for spatial cross validation).

All checks pass when building and NAMESPACE and DESCRIPTION documentation was updated using {roxygen2::roxygenize()` and manual editing, respectively.

…koff data substituted for one feature or a group of features) using the mlr3pipelines mutate operation rather than creating a new task with the same target and a new data backend. This reduces the need to query what kind of task (regression or classification), and also preserves other task attributes (e.g., weights, coordinates) in the reduced task.

mikoontz · 2022-12-08T13:28:06Z

I just noticed that the logic of deciding whether the task_type is "regr" or "classif" does exist at least one other place in {cpi}. In the cpi() function, this block:

  if (is.null(measure)) {
    if (task$task_type == "regr") {
      measure <- msr("regr.mse")
    } else if (task$task_type == "classif") {
      measure <- msr("classif.logloss")
    } else {
      stop("Unknown task type.")
    }
  }

So this will break if a user has a "classif_st" task_type, even though the code block that would execute for the "classif" task_type should work fine.

Just noting that the "ancillary benefit" I point out above doesn't fully realize that benefit if a user doesn't specify a value for the measure argument!

Using grepl could work, but maybe that overcorrects and is too anti-conservative (happy to make a separate PR if helpful):

  if (is.null(measure)) {
    if (grepl(x = task$task_type, pattern = "regr")) {
      measure <- msr("regr.mse")
    } else if (grepl(x = task$task_type, pattern = "classif")) {
      measure <- msr("classif.logloss")
    } else {
      stop("Unknown task type.")
    }
  }

…>4.0 dependency that adds)

mnwright · 2022-12-13T08:19:59Z

Thanks for the PR! I think using pipelines for this makes perfect sense and I wonder why I didn't come up with that myself?

I will look more into this later.

mikoontz changed the title ~~preserve task features when creating reduced_task by modifying data using mlr3pipelines~~ preserve task attributes when creating reduced_task by modifying data using mlr3pipelines Dec 8, 2022

no benefit to using pipe, even the base R version (since there's an R…

7134310

…>4.0 dependency that adds)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

preserve task attributes when creating reduced_task by modifying data using mlr3pipelines #11

preserve task attributes when creating reduced_task by modifying data using mlr3pipelines #11

mikoontz commented Dec 7, 2022 •

edited

Loading

mikoontz commented Dec 8, 2022

mnwright commented Dec 13, 2022

preserve task attributes when creating reduced_task by modifying data using mlr3pipelines #11

Are you sure you want to change the base?

preserve task attributes when creating reduced_task by modifying data using mlr3pipelines #11

Conversation

mikoontz commented Dec 7, 2022 • edited Loading

mikoontz commented Dec 8, 2022

mnwright commented Dec 13, 2022

mikoontz commented Dec 7, 2022 •

edited

Loading