Skip to content

Supervised tasks with missing data in the target #98

@be-marc

Description

@be-marc

Hi, I’m a member of the mlr3 team. We’re currently considering disallowing supervised tasks with missing values in the target column. With that change, your vimpute.Rmd vignette no longer runs.

library(reactable)
library(VIM)
data(iris)

# Create complete copy before introducing NAs
complete_data <- iris
colnames(complete_data) <- c("S.Length","S.Width","P.Length","P.Width","Species")
df <- copy(complete_data)

# Randomly produce missing values
set.seed(1)
nbr_missing <- 50
y <- data.frame(row = sample(nrow(df), size = nbr_missing, replace = TRUE),
                col = sample(ncol(df), size = nbr_missing, replace = TRUE))
y <- y[!duplicated(y),]
df[as.matrix(y)] <- NA

# Perform imputation with proper method specification
result <- vimpute(
  data = df,
  method = setNames(lapply(names(df), function(x) "xgboost"),names(df)),
  pred_history = TRUE
)

#>Variables with Missing Data: S.Length,S.Width,P.Length,P.Width,Species
#>data is data.table
#> Precheck done.
#> Error: 
#> ✖ Target column 'S.Length' must not contain missing values
#> → Class: Mlr3ErrorInput

Would this mlr3 change be a general issue for your package, or would it only affect this particular example?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions