preserve task attributes when creating reduced_task by modifying data using mlr3pipelines#11
Open
mikoontz wants to merge 2 commits intobips-hb:masterfrom
Open
Conversation
…koff data substituted for one feature or a group of features) using the mlr3pipelines mutate operation rather than creating a new task with the same target and a new data backend. This reduces the need to query what kind of task (regression or classification), and also preserves other task attributes (e.g., weights, coordinates) in the reduced task.
Author
|
I just noticed that the logic of deciding whether the So this will break if a user has a "classif_st" Just noting that the "ancillary benefit" I point out above doesn't fully realize that benefit if a user doesn't specify a value for the Using |
…>4.0 dependency that adds)
Member
|
Thanks for the PR! I think using pipelines for this makes perfect sense and I wonder why I didn't come up with that myself? I will look more into this later. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR fixes what I think could be a problematic approach to creating a new mlr3 task with knockoff data.
Currently, the reduced task is created by creating a new data.frame with knockoff data substituted for a feature (or group of features), then creating a new task using
mlr3::as_task_regr()ormlr3::as_task_classif(), then specifying the new data.frame as thebackend, then specifying the same target as the original task as thetarget.This approach doesn't respect other settings of the
Task, such as observation-level weights or coordinates (as in the case when a user is setting up a spatial cross validation). It also requires logic that queries whether a task is a regression or classification task in order to call the correct function (mlr3::as_task_regr()ormlr3::as_task_classif()). This logic break with other possible task types that should still work (e.g.,classif_stin the case where a user wants to set up spatial cross validation using coordinates associated with each observation withmlr3spatiotempcv::as_task_classif_st().From here, the preferred way to modify the data in a task is to use mlr3pipelines. This PR implements this approach and adds a "suggests" dependency (mlr3pipelines package) in order to use the
PipeOpMutatemethod. It works for both individual features and grouped features. An ancillary benefit is that we can reduce the logic complexity because we no longer need to query whether the task is regression or classification. It also allows for other task types that aren't strictly "classif" or "regr" (e.g., "classif_st" for spatial cross validation).All checks pass when building and NAMESPACE and DESCRIPTION documentation was updated using {roxygen2::roxygenize()` and manual editing, respectively.