If max_interaction is limited below the corresponding limit in the model, our sums of m/shap + intercept no longer equal the global model prediction.
This affects both max_interaction in rpf and max_depth in xgboost, as both glex methods have a max_interaction argument. (rpf even technically allows limiting to certain predictors)
As a workaround to not just drop that information, we considered a remainder vector as part of the return value of glex, which can then also be used to gauge if max_interaction in glex was set to a reasonable value, or if too much of the prediction was lost.
Generally, adding a remainder (or similarly named) vector to m would complicate downstream handling in plot functions, glex_vi, glex_explain etc.
Then again, these functions already allow limiting the output to terms below a given degree of interaction and/or terms with negligible contribution to the prediction, aggregating them under "Remaining terms" (as of now).
This might lead to confusion, because now we have two stages of remainder-ness: Those terms left out by glex, and those terms left out by plot functions.
Maybe plot functions should just always include a "Remaining terms" element that, at least, contains the glex-remainder?
If
max_interactionis limited below the corresponding limit in the model, our sums ofm/shap+interceptno longer equal the global model prediction.This affects both
max_interactioninrpfandmax_depthinxgboost, as bothglexmethods have amax_interactionargument. (rpfeven technically allows limiting to certain predictors)As a workaround to not just drop that information, we considered a
remaindervector as part of the return value ofglex, which can then also be used to gauge ifmax_interactioninglexwas set to a reasonable value, or if too much of the prediction was lost.Generally, adding a
remainder(or similarly named) vector tomwould complicate downstream handling in plot functions,glex_vi,glex_explainetc.Then again, these functions already allow limiting the output to terms below a given degree of interaction and/or terms with negligible contribution to the prediction, aggregating them under "Remaining terms" (as of now).
This might lead to confusion, because now we have two stages of remainder-ness: Those terms left out by
glex, and those terms left out by plot functions.Maybe plot functions should just always include a "Remaining terms" element that, at least, contains the
glex-remainder?