preserve task attributes when creating reduced_task by modifying data using mlr3pipelines #11
+20
−13
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes what I think could be a problematic approach to creating a new mlr3 task with knockoff data.
Currently, the reduced task is created by creating a new data.frame with knockoff data substituted for a feature (or group of features), then creating a new task using
mlr3::as_task_regr()
ormlr3::as_task_classif()
, then specifying the new data.frame as thebackend
, then specifying the same target as the original task as thetarget
.This approach doesn't respect other settings of the
Task
, such as observation-level weights or coordinates (as in the case when a user is setting up a spatial cross validation). It also requires logic that queries whether a task is a regression or classification task in order to call the correct function (mlr3::as_task_regr()
ormlr3::as_task_classif()
). This logic break with other possible task types that should still work (e.g.,classif_st
in the case where a user wants to set up spatial cross validation using coordinates associated with each observation withmlr3spatiotempcv::as_task_classif_st()
.From here, the preferred way to modify the data in a task is to use mlr3pipelines. This PR implements this approach and adds a "suggests" dependency (mlr3pipelines package) in order to use the
PipeOpMutate
method. It works for both individual features and grouped features. An ancillary benefit is that we can reduce the logic complexity because we no longer need to query whether the task is regression or classification. It also allows for other task types that aren't strictly "classif" or "regr" (e.g., "classif_st" for spatial cross validation).All checks pass when building and NAMESPACE and DESCRIPTION documentation was updated using {roxygen2::roxygenize()` and manual editing, respectively.