Improve feature_selection: richer return type + verbose progress#293
Open
adrian-prior wants to merge 2 commits into
Open
Improve feature_selection: richer return type + verbose progress#293adrian-prior wants to merge 2 commits into
adrian-prior wants to merge 2 commits into
Conversation
…bose + KV-cache caveat
- Rewrite `feature_selection` around a `FeatureSelectionResult` dataclass.
The wrapper now returns the fitted SFS plus the support mask, selected
indices/names, and the pre/post CV scores it already had to compute —
collapses the typical caller-side mask -> names dance into one
attribute access. Backward-incompatible: callers using
`sfs.get_support()` directly need to switch to `result.support_mask`
(or `result.selector.get_support()`).
- Make `n_features_to_select` a required positional argument — there's
no sensible default.
- Expose the SFS knobs we were swallowing: `cv`, `scoring`, `direction`,
`n_jobs`, `tol`. All keyword-only after the `*`. `cv=5` is the
pre-existing hardcoded value, just configurable now.
- Always compute baseline (all-features) and selected (subset) CV scores
using the same `cv` / `scoring` as SFS, surface them on the result.
- Add `verbose: bool = True`. When set:
- print a config header (direction, cv, scoring, k)
- print the baseline CV score before SFS runs
- print per-round picks ("round i/k: picked feature 'x', cv = ...")
via a `_VerboseSFS` subclass that overrides the private
`_get_best_new_feature_score` method (sklearn doesn't expose a
`verbose` parameter or callback hook on SFS itself — this is the
cleanest workaround; documented in a class docstring with the
private-API dependency caveat)
- print the selected names + final CV score
`verbose=False` keeps everything silent; the scores are still available
on the returned `FeatureSelectionResult`.
- Add a docstring note: TabPFN is very robust to noisy features in its
in-context-learning regime, so accuracy gain from running SFS is often
marginal — the value is more interpretability / parsimony / faster
predict-time. Verified by a quick noise-trajectory benchmark
(n_features 3 -> 13, CV score stays in 0.92–0.94 throughout). Mention
SHAP as the alternative interpretability route since it can use the
KV cache and is generally much faster.
- Re-export `FeatureSelectionResult` from `tabpfn_extensions.interpretability`
so callers can type-annotate without reaching into the submodule.
- Update `examples/interpretability/feature_selection.py` to use the new
return shape, and align params with the public TabPFN demo notebook
(`n_estimators=1`, `n_features_to_select=4`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e-selection # Conflicts: # src/tabpfn_extensions/interpretability/__init__.py
Contributor
There was a problem hiding this comment.
Code Review
This pull request enhances the feature selection module by introducing a "FeatureSelectionResult" dataclass and a verbose mode for tracking selection progress. The "feature_selection" utility was refactored to return detailed results, including baseline and post-selection cross-validation scores, and now supports additional parameters such as "cv", "scoring", and "n_jobs". The example script was updated to reflect these changes. I have no feedback to provide.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR improves the usability of the feature selection extensions by: