Skip to content

Improve feature_selection: richer return type + verbose progress#293

Open
adrian-prior wants to merge 2 commits into
mainfrom
adrian/improve-feature-selection
Open

Improve feature_selection: richer return type + verbose progress#293
adrian-prior wants to merge 2 commits into
mainfrom
adrian/improve-feature-selection

Conversation

@adrian-prior
Copy link
Copy Markdown
Collaborator

@adrian-prior adrian-prior commented May 13, 2026

This PR improves the usability of the feature selection extensions by:

  • changing the return interface
  • adding verbosity so users can see what's going on (in particular, we computed some CV scores before and after, which we never logged or saved)
  • Adds notes about feature selection not being that useful for TabPFN

adrian-prior and others added 2 commits May 13, 2026 17:14
…bose + KV-cache caveat

- Rewrite `feature_selection` around a `FeatureSelectionResult` dataclass.
  The wrapper now returns the fitted SFS plus the support mask, selected
  indices/names, and the pre/post CV scores it already had to compute —
  collapses the typical caller-side mask -> names dance into one
  attribute access. Backward-incompatible: callers using
  `sfs.get_support()` directly need to switch to `result.support_mask`
  (or `result.selector.get_support()`).
- Make `n_features_to_select` a required positional argument — there's
  no sensible default.
- Expose the SFS knobs we were swallowing: `cv`, `scoring`, `direction`,
  `n_jobs`, `tol`. All keyword-only after the `*`. `cv=5` is the
  pre-existing hardcoded value, just configurable now.
- Always compute baseline (all-features) and selected (subset) CV scores
  using the same `cv` / `scoring` as SFS, surface them on the result.
- Add `verbose: bool = True`. When set:
  - print a config header (direction, cv, scoring, k)
  - print the baseline CV score before SFS runs
  - print per-round picks ("round i/k: picked feature 'x', cv = ...")
    via a `_VerboseSFS` subclass that overrides the private
    `_get_best_new_feature_score` method (sklearn doesn't expose a
    `verbose` parameter or callback hook on SFS itself — this is the
    cleanest workaround; documented in a class docstring with the
    private-API dependency caveat)
  - print the selected names + final CV score
  `verbose=False` keeps everything silent; the scores are still available
  on the returned `FeatureSelectionResult`.
- Add a docstring note: TabPFN is very robust to noisy features in its
  in-context-learning regime, so accuracy gain from running SFS is often
  marginal — the value is more interpretability / parsimony / faster
  predict-time. Verified by a quick noise-trajectory benchmark
  (n_features 3 -> 13, CV score stays in 0.92–0.94 throughout). Mention
  SHAP as the alternative interpretability route since it can use the
  KV cache and is generally much faster.
- Re-export `FeatureSelectionResult` from `tabpfn_extensions.interpretability`
  so callers can type-annotate without reaching into the submodule.
- Update `examples/interpretability/feature_selection.py` to use the new
  return shape, and align params with the public TabPFN demo notebook
  (`n_estimators=1`, `n_features_to_select=4`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e-selection

# Conflicts:
#	src/tabpfn_extensions/interpretability/__init__.py
@adrian-prior adrian-prior marked this pull request as ready for review May 13, 2026 15:19
@adrian-prior adrian-prior requested a review from a team as a code owner May 13, 2026 15:19
@adrian-prior adrian-prior requested review from priorjulien and removed request for a team May 13, 2026 15:19
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the feature selection module by introducing a "FeatureSelectionResult" dataclass and a verbose mode for tracking selection progress. The "feature_selection" utility was refactored to return detailed results, including baseline and post-selection cross-validation scores, and now supports additional parameters such as "cv", "scoring", and "n_jobs". The example script was updated to reflect these changes. I have no feedback to provide.

@adrian-prior adrian-prior changed the title Improve feature_selection: richer return type + verbose progress + KV-cache caveat Improve feature_selection: richer return type + verbose progress May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant