Skip to content

Add remainder vector to glex output #11

@jemus42

Description

@jemus42

If max_interaction is limited below the corresponding limit in the model, our sums of m/shap + intercept no longer equal the global model prediction.
This affects both max_interaction in rpf and max_depth in xgboost, as both glex methods have a max_interaction argument. (rpf even technically allows limiting to certain predictors)

As a workaround to not just drop that information, we considered a remainder vector as part of the return value of glex, which can then also be used to gauge if max_interaction in glex was set to a reasonable value, or if too much of the prediction was lost.

Generally, adding a remainder (or similarly named) vector to m would complicate downstream handling in plot functions, glex_vi, glex_explain etc.
Then again, these functions already allow limiting the output to terms below a given degree of interaction and/or terms with negligible contribution to the prediction, aggregating them under "Remaining terms" (as of now).

This might lead to confusion, because now we have two stages of remainder-ness: Those terms left out by glex, and those terms left out by plot functions.
Maybe plot functions should just always include a "Remaining terms" element that, at least, contains the glex-remainder?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions