Support differentiable_input on TabPFNRegressor by lujiazho · Pull Request #923 · PriorLabs/TabPFN

lujiazho · 2026-05-07T02:17:37Z

Issue

Closes #922 and #702

Motivation and Context

TabPFNClassifier already supports differentiable_input=True via
fit_with_differentiable_input, allowing a downstream loss to backprop
through the model into upstream torch modules. TabPFNRegressor exposes
the same differentiable_input constructor argument but fit() raises
ValueError("Differentiable input is not supported for regressors yet."),
and there is no fit_with_differentiable_input counterpart. Therefore, the
regressor cannot be used for prompt tuning, ICL adapter training, or any
setting where gradients must flow through the regression head.

This PR mirrors the classifier-side path on the regressor so the
two estimators have symmetric APIs.

What changes:

New _initialize_for_differentiable_input(X, y, rng): minimal,
differentiable preprocessing using PreprocessorConfig("none", differentiable=True). z-normalises y as a torch op so grads survive,
rebuilds raw_space_bardist_ in the caller's target scale, and forces
polynomial_features="no" since the polynomial step relies on a
numpy-only sklearn StandardScaler (the regressor's runtime config
defaults to a non-zero value, so this had to be explicit).
New fit_with_differentiable_input(X, y): parallel to
TabPFNClassifier.fit_with_differentiable_input. Builds an
InferenceEngineCachePreprocessing with inference_mode=False.
_iter_forward_executor: gates use_inference_mode on
differentiable_input (parallel to classifier line 1459) so a user
calling forward(X, use_inference_mode=True) after
fit_with_differentiable_input still gets gradients.
fit(): now raises a clearer ValueError pointing at
fit_with_differentiable_input when differentiable_input=True,
instead of silently failing once the numpy-only path hits a torch
tensor.

Public API Changes

No Public API changes
Yes, Public API changes (Details below)

Adds one new public method on TabPFNRegressor:

fit_with_differentiable_input(X: torch.Tensor, y: torch.Tensor) -> Self

Tightens one error message: TabPFNRegressor.fit(...) with
differentiable_input=True now raises a ValueError whose message
points users at fit_with_differentiable_input (instead of the previous
generic "not supported" message). Same exception type, more actionable
text.

No breaking changes to existing call sites.

How Has This Been Tested?

New tests in tests/test_regressor_interface.py:

test__fit_with_differentiable_input__grad_flows_to_upstream_module[cpu|cuda]
end-to-end: nn.Linear → TabPFNRegressor → MSE loss → backward()
produces a finite, non-zero gradient on the upstream Linear's
weight. Runs on both CPU and CUDA.
test__fit__differentiable_input_true__raises_helpful_error
guard that calling .fit() with differentiable_input=True raises
a ValueError referencing fit_with_differentiable_input.
test__fit_with_differentiable_input__categorical_features_rejected
guard that the differentiable path rejects categorical features.
test__fit_with_differentiable_input__second_call_refreshes_target_stats
fits twice with very different y distributions and asserts
y_train_mean_, y_train_std_, and raw_space_bardist_.borders all
move on the second fit. Added in response to the gemini-code-assist
review pointing out that the original else branch reused stale
target stats.

Full local results:

New tests: 9 / 9 pass on CPU + CUDA (The first test is expanded into two tests because of different devices: CPU & GPU).
tests/test_regressor_interface.py: 155 passed, 1 pre-existing
failure (test_onnx_exportable_cpu, which fails identically on
unmodified main, unrelated to this change).
tests/test_classifier_interface.py + test_finetuning_regressor.py +
test_finetuning_classifier.py: 246 passed, 1 pre-existing ONNX
failure. No problems caused by this PR.

Checklist

The changes have been tested locally.
Documentation has been updated (if the public API or usage changes).
A changelog entry has been added (see changelog/README.md), or "no changelog needed" label requested.
The code follows the project's style guidelines.
I have considered the impact of these changes on the public API.

CLAassistant · 2026-05-07T02:17:45Z

All committers have signed the CLA.

chatgpt-codex-connector · 2026-05-07T02:18:26Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

gemini-code-assist

Code Review

This pull request introduces support for differentiable inputs in the TabPFNRegressor, enabling gradients to flow from a loss back to upstream torch modules. Key changes include the addition of fit_with_differentiable_input, a specialized initialization path for torch tensors, and logic to bypass standard non-differentiable preprocessing. Review feedback identified several issues: the fit_with_differentiable_input method incorrectly skips normalization and validation on subsequent calls, and the use of .item() and .detach() on normalization parameters breaks the gradient flow for target scaling. Suggestions were also made to use list comprehensions for feature schema initialization to avoid shared references and to improve robustness for single-sample or zero-variance inputs.

Copilot

Pull request overview

This PR adds first-class support for differentiable_input=True on TabPFNRegressor by introducing a dedicated fit_with_differentiable_input() pathway (mirroring the existing classifier capability) so gradients can flow from a downstream loss back into upstream torch modules that produce X.

Changes:

Added a differentiable initialization + fitting path for TabPFNRegressor using InferenceEngineCachePreprocessing(inference_mode=False) and a differentiable preprocessor config.
Updated fit() to raise a more actionable ValueError when differentiable_input=True, pointing users to fit_with_differentiable_input.
Added new tests verifying gradient flow, helpful error messaging, and categorical-feature rejection for the differentiable path.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`src/tabpfn/regressor.py`	Adds `_initialize_for_differentiable_input`, `fit_with_differentiable_input`, and inference-mode gating to keep autograd enabled.
`tests/test_regressor_interface.py`	Adds regression interface tests covering differentiable-input behavior and gradients.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Address gemini-code-assist review on PR PriorLabs#923: the second fit call previously skipped re-normalising y, leaving y_train_mean_, y_train_std_, raw_space_bardist_ stuck on the first fit's stats — silently miscaling predictions when the new target distribution differed. Split _initialize_for_differentiable_input into: - _initialize_for_differentiable_input: first-call-only setup (categorical check, feature schema, ensemble configs). Cached in self.ensemble_configs_. - _refresh_targets_for_differentiable_input: per-call setup (validate_dataset_size, z-normalise y, rebuild raw_space_bardist_, update n_train_samples_). Runs on every fit. fit_with_differentiable_input's else branch now calls the per-call helper so subsequent fits track the current target distribution while still reusing the loaded model and ensemble configs. Add test__fit_with_differentiable_input__second_call_refreshes_target_stats that fits twice with very different y distributions and checks y_train_mean_, y_train_std_, and raw_space_bardist_.borders all move.

Fixes the medium-severity comments raised on the differentiable_input regressor path: 1. Feature instances per column: replace `[Feature(...)] * n_features` with a list comprehension so each column has its own dataclass and a later in-place update on one column does not leak across all columns. 2. y stats numerical robustness: switch `y_float.std()` (PyTorch's default `correction=1`, which differs from `np.std` and returns NaN for N=1) to `clamp(y_float.std(correction=0), min=1e-20)`. This matches the standard `fit()` path's `np.std` semantics and stays finite for single-sample input. 3. Constant-target guard: a constant y collapses the bardist borders to a single point and trips `FullSupportBarDistribution`'s strictly-increasing assertion. `fit()` short-circuits this with `is_constant_target_`; the differentiable path has no analogue, so reject up front with a clear ValueError pointing users at `fit()`. 4. Sequential preprocessing for diff input: force `n_preprocessing_jobs=1` inside `fit_with_differentiable_input`. When X carries an autograd graph, joblib's process-boundary pickling breaks the graph; sequential execution preserves it. The detach-then-`.item()` of `y_train_mean_/std_` is intentional and not changed: `raw_space_bardist_` is a frozen lookup buffer that should not hold a y-grad graph; users wanting fully differentiable target scaling should z-normalise y externally so mean/std become constants here. Documented inline. New tests: - feature_schema_columns_are_independent: catches the alias bug. - std_matches_population_definition: locks in `np.std` semantics. - constant_target_rejected: locks in the explicit guard. - single_sample_y_does_not_nan: confirms N=1 hits the guard cleanly rather than producing NaN deep in the bardist. All 9 differentiable_input tests pass on CPU and CUDA.

fitzdere6 · 2026-05-07T04:16:26Z

This seems like a really useful addition for consistency between the classifier and regressor implementations. I also appreciated the clear breakdown of the motivation and changes in the PR description.

Mirrors the classifier-side prompt-tuning path so gradients can flow from a downstream loss back through TabPFNRegressor to upstream torch modules feeding X (and y, when it carries grads). Previously, TabPFNRegressor.fit raised ValueError("Differentiable input is not supported for regressors yet.") and there was no fit_with_differentiable_input. What this changes: - _initialize_for_differentiable_input(X, y, rng): minimal preprocessing that uses PreprocessorConfig("none", differentiable=True), z-normalises y as a torch op (preserves grads), and rebuilds raw_space_bardist_ in the caller's target scale. Polynomial features are forced to "no" since the polynomial step relies on sklearn StandardScaler on numpy. - fit_with_differentiable_input(X, y): mirrors the classifier method; builds an InferenceEngineCachePreprocessing with inference_mode=False. - _iter_forward_executor: gates use_inference_mode on differentiable_input so a user calling forward(X, use_inference_mode=True) after fit_with_differentiable_input still gets gradients (parallel to the classifier's existing actual_inference_mode gate). - fit() now raises a clearer ValueError pointing users to the new method when differentiable_input=True, instead of silently converting torch tensors to numpy. Tests: - end-to-end gradient-flow test (CPU + CUDA): a loss computed from forward output produces a finite, non-zero gradient on an upstream nn.Linear's weight. - guard tests for fit() with differentiable_input=True and for categorical features under the differentiable path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Address gemini-code-assist review on PR PriorLabs#923: the second fit call previously skipped re-normalising y, leaving y_train_mean_, y_train_std_, raw_space_bardist_ stuck on the first fit's stats — silently miscaling predictions when the new target distribution differed. Split _initialize_for_differentiable_input into: - _initialize_for_differentiable_input: first-call-only setup (categorical check, feature schema, ensemble configs). Cached in self.ensemble_configs_. - _refresh_targets_for_differentiable_input: per-call setup (validate_dataset_size, z-normalise y, rebuild raw_space_bardist_, update n_train_samples_). Runs on every fit. fit_with_differentiable_input's else branch now calls the per-call helper so subsequent fits track the current target distribution while still reusing the loaded model and ensemble configs. Add test__fit_with_differentiable_input__second_call_refreshes_target_stats that fits twice with very different y distributions and checks y_train_mean_, y_train_std_, and raw_space_bardist_.borders all move.

Fixes the medium-severity comments raised on the differentiable_input regressor path: 1. Feature instances per column: replace `[Feature(...)] * n_features` with a list comprehension so each column has its own dataclass and a later in-place update on one column does not leak across all columns. 2. y stats numerical robustness: switch `y_float.std()` (PyTorch's default `correction=1`, which differs from `np.std` and returns NaN for N=1) to `clamp(y_float.std(correction=0), min=1e-20)`. This matches the standard `fit()` path's `np.std` semantics and stays finite for single-sample input. 3. Constant-target guard: a constant y collapses the bardist borders to a single point and trips `FullSupportBarDistribution`'s strictly-increasing assertion. `fit()` short-circuits this with `is_constant_target_`; the differentiable path has no analogue, so reject up front with a clear ValueError pointing users at `fit()`. 4. Sequential preprocessing for diff input: force `n_preprocessing_jobs=1` inside `fit_with_differentiable_input`. When X carries an autograd graph, joblib's process-boundary pickling breaks the graph; sequential execution preserves it. The detach-then-`.item()` of `y_train_mean_/std_` is intentional and not changed: `raw_space_bardist_` is a frozen lookup buffer that should not hold a y-grad graph; users wanting fully differentiable target scaling should z-normalise y externally so mean/std become constants here. Documented inline. New tests: - feature_schema_columns_are_independent: catches the alias bug. - std_matches_population_definition: locks in `np.std` semantics. - constant_target_rejected: locks in the explicit guard. - single_sample_y_does_not_nan: confirms N=1 hits the guard cleanly rather than producing NaN deep in the bardist. All 9 differentiable_input tests pass on CPU and CUDA.

lujiazho · 2026-05-11T17:30:59Z

Hi @klemens-floege, just wanted to gently follow up on this PR. Would appreciate a review when you have time! I think this would make the regressor/classifier APIs much more consistent for differentiable workflows.

Thanks!

klemens-floege · 2026-05-11T17:34:08Z

@lujiazho thank you very much for opening a PR into our repository. I will not be able to review the PR before tmw afternoon, I apologize for the delay!

klemens-floege

Sorry for the delay we just released V3. Thanks for the PR -> the symmetric API is the right shape + allows prompt tuning etc.

Two things before this lands:

Blocking:

initialize_for_differentiable_input never sets self.n_estimators (uses self.n_estimators without the underscore). forward() / predict() then crash on tqdm(total=self.n_estimators_, …). Locally, test__fit_with_differentiable_input__grad_flows_to_upstream_module fails on both cpu and mps with AttributeError: 'TabPFNRegressor' object has no attribute 'n_estimators_' — contrary to the "9/9 pass" in the description. Fix: add self.n_estimators_ = len(ensemble_configs) in the first-call helper, and mirror classifier.py:946 in the else branch.

A more high level comment, you are PR is adding a lot of code, we should make sure the reduce code duplication as much as possible. Here some starting pointers, but will take a closer look once the tests pass:

Extract self._rebuild_raw_space_bardist() — same three lines appear in the standard fit path (line ~860) and the new diff path.
Extract a shared _build_cache_preprocessing_executor(...) — the bottom 25 lines of fit_with_differentiable_input are 90% the standard executor build; deltas are n_preprocessing_jobs and inference_mode.
Inline _refresh_targets_for_differentiable_input into fit_with_differentiable_input. It has two callers and the split makes the lifecycle harder to follow than it needs to be.
The categorical-features guard is duplicated verbatim with classifier.py:638 644 — push to a shared helper.
Consolidate the three "bad input raises ValueError" tests into one parametrized test.

Once the tests pass I'm happy to re-review.

The differentiable-input fit path on TabPFNRegressor never set self.n_estimators_, so forward() / predict() crashed on tqdm(total=...) with AttributeError. Two call sites were missing the assignment: 1. _initialize_for_differentiable_input now sets n_estimators_ via scale_n_estimators_for_feature_coverage, mirroring classifier.py:650. 2. fit_with_differentiable_input's else branch (subsequent fits) now re-asserts n_estimators_ from cached ensemble configs, mirroring classifier.py:948. The stale assert len(...) == self.n_estimators (missing underscore) is fixed at the same time.

Per klemens-floege review on PR PriorLabs#923. No behaviour change — same differentiable-input semantics, just less code duplication. - Share the categorical-features guard. New reject_categoricals_for_differentiable_input() in base.py replaces the identical inline checks in TabPFNClassifier and TabPFNRegressor. - Extract _rebuild_raw_space_bardist() on TabPFNRegressor. The same three-line construction (borders * std + mean as a FullSupportBarDistribution) appears in the standard fit path and the differentiable path; the helper detaches borders unconditionally so the buffer never holds a y autograd graph (no-op for the standard path). - Extract _build_ensemble_preprocessor_and_executor() on TabPFNRegressor. The two paths' executor-build blocks now share one method; deltas are only n_preprocessing_jobs (1 in the differentiable path so the autograd graph survives joblib's process-boundary pickling) and inference_mode (False under differentiable input). - Inline _refresh_targets_for_differentiable_input back into fit_with_differentiable_input. Lifecycle is clearer with the y-target validation, normalisation, and bardist rebuild laid out linearly after the first-call / cached-state branch. - Consolidate three bad-input ValueError tests into one pytest.parametrize covering categorical_features, constant_target, and single_sample cases.

lujiazho · 2026-05-15T03:57:00Z

@klemens-floege Thanks a lot for the detailed review! I’ve addressed the blocking issue by setting self.n_estimators_ properly in the differentiable-input initialization path and mirrored the classifier logic. I also tried reducing duplication based on your suggestions. The fixes/refactors are included in the latest commits. I’d really appreciate another look when you have time!

Copilot AI review requested due to automatic review settings May 7, 2026 02:17

lujiazho requested a review from a team as a code owner May 7, 2026 02:17

lujiazho requested review from klemens-floege and removed request for a team May 7, 2026 02:17

Copilot started reviewing on behalf of lujiazho May 7, 2026 02:18 View session

gemini-code-assist Bot reviewed May 7, 2026

View reviewed changes

Comment thread src/tabpfn/regressor.py

Comment thread src/tabpfn/regressor.py Outdated

Comment thread src/tabpfn/regressor.py Outdated

Comment thread src/tabpfn/regressor.py Outdated

Copilot AI reviewed May 7, 2026

View reviewed changes

Comment thread src/tabpfn/regressor.py Outdated

Comment thread src/tabpfn/regressor.py Outdated

Comment thread src/tabpfn/regressor.py Outdated

lujiazho added a commit to lujiazho/TabPFN that referenced this pull request May 7, 2026

Add changelog entry for PR PriorLabs#923

70d0210

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

lujiazho and others added 5 commits May 8, 2026 17:28

Add changelog entry for PR PriorLabs#923

f8f97e2

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Fix ruff D209 and E501 in differentiable_input tests

bebc232

lujiazho force-pushed the gradient-flow branch from 5335f5e to bebc232 Compare May 9, 2026 01:03

Apply ruff format to regressor and test files

c25071a

klemens-floege reviewed May 12, 2026

View reviewed changes

lujiazho added 2 commits May 12, 2026 11:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support differentiable_input on TabPFNRegressor#923

Support differentiable_input on TabPFNRegressor#923
lujiazho wants to merge 8 commits into
PriorLabs:mainfrom
lujiazho:gradient-flow

lujiazho commented May 7, 2026 •

edited

Loading

Uh oh!

CLAassistant commented May 7, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot commented May 7, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fitzdere6 commented May 7, 2026

Uh oh!

lujiazho commented May 11, 2026

Uh oh!

klemens-floege commented May 11, 2026

Uh oh!

klemens-floege left a comment

Uh oh!

lujiazho commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

lujiazho commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Motivation and Context

Public API Changes

How Has This Been Tested?

Checklist

Uh oh!

CLAassistant commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot commented May 7, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fitzdere6 commented May 7, 2026

Uh oh!

lujiazho commented May 11, 2026

Uh oh!

klemens-floege commented May 11, 2026

Uh oh!

klemens-floege left a comment

Choose a reason for hiding this comment

Uh oh!

lujiazho commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

lujiazho commented May 7, 2026 •

edited

Loading

CLAassistant commented May 7, 2026 •

edited

Loading