Skip to content

Support differentiable_input on TabPFNRegressor#923

Open
lujiazho wants to merge 8 commits into
PriorLabs:mainfrom
lujiazho:gradient-flow
Open

Support differentiable_input on TabPFNRegressor#923
lujiazho wants to merge 8 commits into
PriorLabs:mainfrom
lujiazho:gradient-flow

Conversation

@lujiazho
Copy link
Copy Markdown

@lujiazho lujiazho commented May 7, 2026

Issue

Closes #922 and #702

Motivation and Context

TabPFNClassifier already supports differentiable_input=True via
fit_with_differentiable_input, allowing a downstream loss to backprop
through the model into upstream torch modules. TabPFNRegressor exposes
the same differentiable_input constructor argument but fit() raises
ValueError("Differentiable input is not supported for regressors yet."),
and there is no fit_with_differentiable_input counterpart. Therefore, the
regressor cannot be used for prompt tuning, ICL adapter training, or any
setting where gradients must flow through the regression head.

This PR mirrors the classifier-side path on the regressor so the
two estimators have symmetric APIs.

What changes:

  • New _initialize_for_differentiable_input(X, y, rng): minimal,
    differentiable preprocessing using PreprocessorConfig("none", differentiable=True). z-normalises y as a torch op so grads survive,
    rebuilds raw_space_bardist_ in the caller's target scale, and forces
    polynomial_features="no" since the polynomial step relies on a
    numpy-only sklearn StandardScaler (the regressor's runtime config
    defaults to a non-zero value, so this had to be explicit).
  • New fit_with_differentiable_input(X, y): parallel to
    TabPFNClassifier.fit_with_differentiable_input. Builds an
    InferenceEngineCachePreprocessing with inference_mode=False.
  • _iter_forward_executor: gates use_inference_mode on
    differentiable_input (parallel to classifier line 1459) so a user
    calling forward(X, use_inference_mode=True) after
    fit_with_differentiable_input still gets gradients.
  • fit(): now raises a clearer ValueError pointing at
    fit_with_differentiable_input when differentiable_input=True,
    instead of silently failing once the numpy-only path hits a torch
    tensor.

Public API Changes

  • No Public API changes
  • Yes, Public API changes (Details below)

Adds one new public method on TabPFNRegressor:

  • fit_with_differentiable_input(X: torch.Tensor, y: torch.Tensor) -> Self

Tightens one error message: TabPFNRegressor.fit(...) with
differentiable_input=True now raises a ValueError whose message
points users at fit_with_differentiable_input (instead of the previous
generic "not supported" message). Same exception type, more actionable
text.

No breaking changes to existing call sites.


How Has This Been Tested?

New tests in tests/test_regressor_interface.py:

  1. test__fit_with_differentiable_input__grad_flows_to_upstream_module[cpu|cuda]
    end-to-end: nn.Linear → TabPFNRegressor → MSE loss → backward()
    produces a finite, non-zero gradient on the upstream Linear's
    weight. Runs on both CPU and CUDA.
  2. test__fit__differentiable_input_true__raises_helpful_error
    guard that calling .fit() with differentiable_input=True raises
    a ValueError referencing fit_with_differentiable_input.
  3. test__fit_with_differentiable_input__categorical_features_rejected
    guard that the differentiable path rejects categorical features.
  4. test__fit_with_differentiable_input__second_call_refreshes_target_stats
    fits twice with very different y distributions and asserts
    y_train_mean_, y_train_std_, and raw_space_bardist_.borders all
    move on the second fit. Added in response to the gemini-code-assist
    review pointing out that the original else branch reused stale
    target stats.

Full local results:

  • New tests: 9 / 9 pass on CPU + CUDA (The first test is expanded into two tests because of different devices: CPU & GPU).
  • tests/test_regressor_interface.py: 155 passed, 1 pre-existing
    failure (test_onnx_exportable_cpu, which fails identically on
    unmodified main, unrelated to this change).
  • tests/test_classifier_interface.py + test_finetuning_regressor.py +
    test_finetuning_classifier.py: 246 passed, 1 pre-existing ONNX
    failure. No problems caused by this PR.

Checklist

  • The changes have been tested locally.
  • Documentation has been updated (if the public API or usage changes).
  • A changelog entry has been added (see changelog/README.md), or "no changelog needed" label requested.
  • The code follows the project's style guidelines.
  • I have considered the impact of these changes on the public API.

Copilot AI review requested due to automatic review settings May 7, 2026 02:17
@lujiazho lujiazho requested a review from a team as a code owner May 7, 2026 02:17
@lujiazho lujiazho requested review from klemens-floege and removed request for a team May 7, 2026 02:17
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 7, 2026

CLA assistant check
All committers have signed the CLA.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for differentiable inputs in the TabPFNRegressor, enabling gradients to flow from a loss back to upstream torch modules. Key changes include the addition of fit_with_differentiable_input, a specialized initialization path for torch tensors, and logic to bypass standard non-differentiable preprocessing. Review feedback identified several issues: the fit_with_differentiable_input method incorrectly skips normalization and validation on subsequent calls, and the use of .item() and .detach() on normalization parameters breaks the gradient flow for target scaling. Suggestions were also made to use list comprehensions for feature schema initialization to avoid shared references and to improve robustness for single-sample or zero-variance inputs.

Comment thread src/tabpfn/regressor.py
Comment thread src/tabpfn/regressor.py Outdated
Comment thread src/tabpfn/regressor.py Outdated
Comment thread src/tabpfn/regressor.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds first-class support for differentiable_input=True on TabPFNRegressor by introducing a dedicated fit_with_differentiable_input() pathway (mirroring the existing classifier capability) so gradients can flow from a downstream loss back into upstream torch modules that produce X.

Changes:

  • Added a differentiable initialization + fitting path for TabPFNRegressor using InferenceEngineCachePreprocessing(inference_mode=False) and a differentiable preprocessor config.
  • Updated fit() to raise a more actionable ValueError when differentiable_input=True, pointing users to fit_with_differentiable_input.
  • Added new tests verifying gradient flow, helpful error messaging, and categorical-feature rejection for the differentiable path.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/tabpfn/regressor.py Adds _initialize_for_differentiable_input, fit_with_differentiable_input, and inference-mode gating to keep autograd enabled.
tests/test_regressor_interface.py Adds regression interface tests covering differentiable-input behavior and gradients.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/tabpfn/regressor.py Outdated
Comment thread src/tabpfn/regressor.py Outdated
Comment thread src/tabpfn/regressor.py Outdated
lujiazho added a commit to lujiazho/TabPFN that referenced this pull request May 7, 2026
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lujiazho added a commit to lujiazho/TabPFN that referenced this pull request May 7, 2026
Address gemini-code-assist review on PR PriorLabs#923: the second fit
call previously skipped re-normalising y, leaving y_train_mean_,
y_train_std_, raw_space_bardist_ stuck on the first fit's stats —
silently miscaling predictions when the new target distribution
differed.

Split _initialize_for_differentiable_input into:
  - _initialize_for_differentiable_input: first-call-only setup
    (categorical check, feature schema, ensemble configs). Cached
    in self.ensemble_configs_.
  - _refresh_targets_for_differentiable_input: per-call setup
    (validate_dataset_size, z-normalise y, rebuild raw_space_bardist_,
    update n_train_samples_). Runs on every fit.

fit_with_differentiable_input's else branch now calls the per-call
helper so subsequent fits track the current target distribution
while still reusing the loaded model and ensemble configs.

Add test__fit_with_differentiable_input__second_call_refreshes_target_stats
that fits twice with very different y distributions and checks
y_train_mean_, y_train_std_, and raw_space_bardist_.borders all move.
lujiazho added a commit to lujiazho/TabPFN that referenced this pull request May 7, 2026
Fixes the medium-severity comments raised on the differentiable_input
regressor path:

1. Feature instances per column: replace
   `[Feature(...)] * n_features` with a list comprehension so each
   column has its own dataclass and a later in-place update on one
   column does not leak across all columns.

2. y stats numerical robustness: switch `y_float.std()` (PyTorch's
   default `correction=1`, which differs from `np.std` and returns
   NaN for N=1) to `clamp(y_float.std(correction=0), min=1e-20)`.
   This matches the standard `fit()` path's `np.std` semantics and
   stays finite for single-sample input.

3. Constant-target guard: a constant y collapses the bardist borders
   to a single point and trips
   `FullSupportBarDistribution`'s strictly-increasing assertion.
   `fit()` short-circuits this with `is_constant_target_`; the
   differentiable path has no analogue, so reject up front with a
   clear ValueError pointing users at `fit()`.

4. Sequential preprocessing for diff input: force
   `n_preprocessing_jobs=1` inside `fit_with_differentiable_input`.
   When X carries an autograd graph, joblib's process-boundary
   pickling breaks the graph; sequential execution preserves it.

The detach-then-`.item()` of `y_train_mean_/std_` is intentional and
not changed: `raw_space_bardist_` is a frozen lookup buffer that
should not hold a y-grad graph; users wanting fully differentiable
target scaling should z-normalise y externally so mean/std become
constants here. Documented inline.

New tests:
- feature_schema_columns_are_independent: catches the alias bug.
- std_matches_population_definition: locks in `np.std` semantics.
- constant_target_rejected: locks in the explicit guard.
- single_sample_y_does_not_nan: confirms N=1 hits the guard cleanly
  rather than producing NaN deep in the bardist.

All 9 differentiable_input tests pass on CPU and CUDA.
@fitzdere6
Copy link
Copy Markdown

This seems like a really useful addition for consistency between the classifier and regressor implementations. I also appreciated the clear breakdown of the motivation and changes in the PR description.

lujiazho and others added 5 commits May 8, 2026 17:28
Mirrors the classifier-side prompt-tuning path so gradients can flow from
a downstream loss back through TabPFNRegressor to upstream torch modules
feeding X (and y, when it carries grads). Previously, TabPFNRegressor.fit
raised ValueError("Differentiable input is not supported for regressors
yet.") and there was no fit_with_differentiable_input.

What this changes:
- _initialize_for_differentiable_input(X, y, rng): minimal preprocessing
  that uses PreprocessorConfig("none", differentiable=True), z-normalises
  y as a torch op (preserves grads), and rebuilds raw_space_bardist_ in
  the caller's target scale. Polynomial features are forced to "no" since
  the polynomial step relies on sklearn StandardScaler on numpy.
- fit_with_differentiable_input(X, y): mirrors the classifier method;
  builds an InferenceEngineCachePreprocessing with inference_mode=False.
- _iter_forward_executor: gates use_inference_mode on differentiable_input
  so a user calling forward(X, use_inference_mode=True) after
  fit_with_differentiable_input still gets gradients (parallel to the
  classifier's existing actual_inference_mode gate).
- fit() now raises a clearer ValueError pointing users to the new method
  when differentiable_input=True, instead of silently converting torch
  tensors to numpy.

Tests:
- end-to-end gradient-flow test (CPU + CUDA): a loss computed from
  forward output produces a finite, non-zero gradient on an upstream
  nn.Linear's weight.
- guard tests for fit() with differentiable_input=True and for
  categorical features under the differentiable path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address gemini-code-assist review on PR PriorLabs#923: the second fit
call previously skipped re-normalising y, leaving y_train_mean_,
y_train_std_, raw_space_bardist_ stuck on the first fit's stats —
silently miscaling predictions when the new target distribution
differed.

Split _initialize_for_differentiable_input into:
  - _initialize_for_differentiable_input: first-call-only setup
    (categorical check, feature schema, ensemble configs). Cached
    in self.ensemble_configs_.
  - _refresh_targets_for_differentiable_input: per-call setup
    (validate_dataset_size, z-normalise y, rebuild raw_space_bardist_,
    update n_train_samples_). Runs on every fit.

fit_with_differentiable_input's else branch now calls the per-call
helper so subsequent fits track the current target distribution
while still reusing the loaded model and ensemble configs.

Add test__fit_with_differentiable_input__second_call_refreshes_target_stats
that fits twice with very different y distributions and checks
y_train_mean_, y_train_std_, and raw_space_bardist_.borders all move.
Fixes the medium-severity comments raised on the differentiable_input
regressor path:

1. Feature instances per column: replace
   `[Feature(...)] * n_features` with a list comprehension so each
   column has its own dataclass and a later in-place update on one
   column does not leak across all columns.

2. y stats numerical robustness: switch `y_float.std()` (PyTorch's
   default `correction=1`, which differs from `np.std` and returns
   NaN for N=1) to `clamp(y_float.std(correction=0), min=1e-20)`.
   This matches the standard `fit()` path's `np.std` semantics and
   stays finite for single-sample input.

3. Constant-target guard: a constant y collapses the bardist borders
   to a single point and trips
   `FullSupportBarDistribution`'s strictly-increasing assertion.
   `fit()` short-circuits this with `is_constant_target_`; the
   differentiable path has no analogue, so reject up front with a
   clear ValueError pointing users at `fit()`.

4. Sequential preprocessing for diff input: force
   `n_preprocessing_jobs=1` inside `fit_with_differentiable_input`.
   When X carries an autograd graph, joblib's process-boundary
   pickling breaks the graph; sequential execution preserves it.

The detach-then-`.item()` of `y_train_mean_/std_` is intentional and
not changed: `raw_space_bardist_` is a frozen lookup buffer that
should not hold a y-grad graph; users wanting fully differentiable
target scaling should z-normalise y externally so mean/std become
constants here. Documented inline.

New tests:
- feature_schema_columns_are_independent: catches the alias bug.
- std_matches_population_definition: locks in `np.std` semantics.
- constant_target_rejected: locks in the explicit guard.
- single_sample_y_does_not_nan: confirms N=1 hits the guard cleanly
  rather than producing NaN deep in the bardist.

All 9 differentiable_input tests pass on CPU and CUDA.
@lujiazho
Copy link
Copy Markdown
Author

Hi @klemens-floege, just wanted to gently follow up on this PR. Would appreciate a review when you have time! I think this would make the regressor/classifier APIs much more consistent for differentiable workflows.

Thanks!

@klemens-floege
Copy link
Copy Markdown
Contributor

@lujiazho thank you very much for opening a PR into our repository. I will not be able to review the PR before tmw afternoon, I apologize for the delay!

Copy link
Copy Markdown
Contributor

@klemens-floege klemens-floege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay we just released V3. Thanks for the PR -> the symmetric API is the right shape + allows prompt tuning etc.

Two things before this lands:

Blocking:

  • initialize_for_differentiable_input never sets self.n_estimators (uses self.n_estimators without the underscore). forward() / predict() then crash on tqdm(total=self.n_estimators_, …). Locally, test__fit_with_differentiable_input__grad_flows_to_upstream_module fails on both cpu and mps with AttributeError: 'TabPFNRegressor' object has no attribute 'n_estimators_' — contrary to the "9/9 pass" in the description. Fix: add self.n_estimators_ = len(ensemble_configs) in the first-call helper, and mirror classifier.py:946 in the else branch.

A more high level comment, you are PR is adding a lot of code, we should make sure the reduce code duplication as much as possible. Here some starting pointers, but will take a closer look once the tests pass:

  • Extract self._rebuild_raw_space_bardist() — same three lines appear in the standard fit path (line ~860) and the new diff path.
  • Extract a shared _build_cache_preprocessing_executor(...) — the bottom 25 lines of fit_with_differentiable_input are 90% the standard executor build; deltas are n_preprocessing_jobs and inference_mode.
  • Inline _refresh_targets_for_differentiable_input into fit_with_differentiable_input. It has two callers and the split makes the lifecycle harder to follow than it needs to be.
  • The categorical-features guard is duplicated verbatim with classifier.py:638 644 — push to a shared helper.
  • Consolidate the three "bad input raises ValueError" tests into one parametrized test.

Once the tests pass I'm happy to re-review.

lujiazho added 2 commits May 12, 2026 11:11
The differentiable-input fit path on TabPFNRegressor never set
self.n_estimators_, so forward() / predict() crashed on tqdm(total=...)
with AttributeError. Two call sites were missing the assignment:

1. _initialize_for_differentiable_input now sets n_estimators_ via
   scale_n_estimators_for_feature_coverage, mirroring classifier.py:650.
2. fit_with_differentiable_input's else branch (subsequent fits) now
   re-asserts n_estimators_ from cached ensemble configs, mirroring
   classifier.py:948.

The stale assert len(...) == self.n_estimators (missing underscore) is
fixed at the same time.
Per klemens-floege review on PR PriorLabs#923. No behaviour change — same
differentiable-input semantics, just less code duplication.

- Share the categorical-features guard. New
  reject_categoricals_for_differentiable_input() in base.py replaces the
  identical inline checks in TabPFNClassifier and TabPFNRegressor.
- Extract _rebuild_raw_space_bardist() on TabPFNRegressor. The same
  three-line construction (borders * std + mean as a
  FullSupportBarDistribution) appears in the standard fit path and the
  differentiable path; the helper detaches borders unconditionally so the
  buffer never holds a y autograd graph (no-op for the standard path).
- Extract _build_ensemble_preprocessor_and_executor() on TabPFNRegressor.
  The two paths' executor-build blocks now share one method; deltas are
  only n_preprocessing_jobs (1 in the differentiable path so the autograd
  graph survives joblib's process-boundary pickling) and inference_mode
  (False under differentiable input).
- Inline _refresh_targets_for_differentiable_input back into
  fit_with_differentiable_input. Lifecycle is clearer with the y-target
  validation, normalisation, and bardist rebuild laid out linearly after
  the first-call / cached-state branch.
- Consolidate three bad-input ValueError tests into one
  pytest.parametrize covering categorical_features, constant_target, and
  single_sample cases.
@lujiazho
Copy link
Copy Markdown
Author

@klemens-floege Thanks a lot for the detailed review! I’ve addressed the blocking issue by setting self.n_estimators_ properly in the differentiable-input initialization path and mirrored the classifier logic. I also tried reducing duplication based on your suggestions. The fixes/refactors are included in the latest commits. I’d really appreciate another look when you have time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TabPFNRegressor lacks differentiable_input support that TabPFNClassifier has

5 participants