Skip to content

Calculate shap values only for the selected features#18

Open
mnwright wants to merge 1 commit intomasterfrom
fix_features_shap
Open

Calculate shap values only for the selected features#18
mnwright wants to merge 1 commit intomasterfrom
fix_features_shap

Conversation

@mnwright
Copy link
Copy Markdown
Contributor

While the components were correct, the SHAP values were all 0 for the non-selected features, as in this example from #13:

library(glex)
library(xgboost)

x <- as.matrix(mtcars[, -1])
y <- mtcars$mpg

xg <- xgboost(data = x, label = y, params = list(max_depth = 4, eta = .1), nrounds = 100, verbose = 0)

glex(xg, x, features = c("cyl", "disp"))

But I wonder, should we even calculate the SHAP values? They are not correct SHAP values anyway because we just ignored some of the features. That means, in the above example and

res1 <- glex(xg, x, features = c("cyl", "disp"))
res2 <- glex(xg, x)

the components res1$m and res2$m are the same, but the SHAP values res1$shap and res2$shap not.

@jemus42
Copy link
Copy Markdown
Collaborator

jemus42 commented Mar 19, 2024

We started carrying around a $remainder term for more or less that reason, no?
Feels like it would need a huge disclaimer in the docs though.

@jemus42
Copy link
Copy Markdown
Collaborator

jemus42 commented Mar 19, 2024

After discussing this a little offline I think the safest choice is to omit the $shap part in cases where we artificially restrict the number of features in glex(), and we should maybe consider whether we might want to do the same thing when max_interactions is restricted as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants