Release 0.4.1 — fix sklearn 1.8 ImportError#286
Conversation
scikit-learn 1.8 renamed sklearn.utils.validation._is_pandas_df to is_pandas_df (dropping the leading underscore). misc/sklearn_compat.py imports the old name at module top-level, so on a fresh `pip install tabpfn-extensions` resolving sklearn>=1.8, every `from tabpfn_extensions import ...` immediately ImportErrors. Make the import resilient: try _is_pandas_df, fall back to is_pandas_df, fall back to the same pure-Python implementation already used in the older-sklearn branch. Caught by a TestPyPI smoke install of 0.4.0 before publishing to real PyPI — 0.4.0 was never published. Bumping straight to 0.4.1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request updates the project version to 0.4.1 and addresses an import failure caused by the renaming of _is_pandas_df in scikit-learn 1.8+. The fix includes a compatibility shim with a pure-Python fallback. Feedback highlights a missing sys import in the fallback function, a version mismatch in the uv.lock file, and the accidental removal of platform-specific wheels, suggesting a regeneration of the lock file for all architectures.
I am having trouble creating individual review comments. Click here to see my feedback.
src/tabpfn_extensions/misc/sklearn_compat.py (273-279)
The fallback implementation of _is_pandas_df uses sys.modules, but sys is not imported in this scope. This will cause a NameError if the fallback path is ever executed. Adding a local import ensures the fallback works correctly.
def _is_pandas_df(X):
"""Return True if X is a pandas DataFrame."""
import sys
try:
pd = sys.modules["pandas"]
except KeyError:
return False
return isinstance(X, pd.DataFrame)
uv.lock (3356)
There is a version mismatch in the lock file. pyproject.toml is being updated to 0.4.1, but uv.lock is only being updated to 0.4.0. Please run uv lock to ensure the lock file correctly reflects the project version.
uv.lock (299-300)
Multiple platform-specific wheels (e.g., ppc64le, s390x) are being removed from the lock file. This often happens when the lock file is updated in an environment that doesn't resolve all supported architectures. If these platforms are still supported, please regenerate the lock file using uv lock --all-platforms.
Replaces the targeted three-line shim from the prior commit with the upstream sklearn-compat 0.1.5 file. The previous vendored copy was at 0.1.3 (March 2025) and predated scikit-learn 1.8, which renamed sklearn.utils.validation._is_pandas_df -> is_pandas_df. Upstream 0.1.5 ships a proper "# Upgrading for scikit-learn 1.8" block plus general cleanups (typing, docstrings, etc). Only validate_data is consumed downstream — by many_class and hpo — and its signature hasn't changed across these point releases. Verified the full public API imports cleanly against sklearn 1.8 in a clean TestPyPI-style smoke venv. Also: add `# ruff: noqa` to the file. It's vendored upstream, so we don't want our linter to push us toward modifying it (we'd just have to redo the same hand-edits on every re-vendor). `# mypy: ignore-errors` was already there for the same reason. Source: https://github.com/sklearn-compat/sklearn-compat/blob/0.1.5/src/sklearn_compat/_sklearn_compat.py (NB: the docstring version string in upstream 0.1.5 still says "Version: 0.1.4" — they don't always bump the docstring with the tag. Set to 0.1.5 in our copy to match the tag we pulled from.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
LeoGrin
left a comment
There was a problem hiding this comment.
lgtm! You updated sklearn_compat to latest version right?
|
@LeoGrin yes, indeeed |
Ben caught that my previous re-vendor commit wasn't actually 1:1 with upstream — pre-commit ruff had auto-fixed 48 things during one of the intermediate commit attempts (stripping # noqa: F401 markers, merging adjacent imports, adding ":" to "Returns" docstring headers, etc.). Re-fetch from https://github.com/sklearn-compat/sklearn-compat/blob/0.1.5/src/sklearn_compat/_sklearn_compat.py and replace the file. Diff against upstream now reduces to one line: the docstring `Version: 0.1.4` -> `Version: 0.1.5` (upstream forgot to bump the docstring with the 0.1.5 tag), plus a trailing-newline at EOF added by pre-commit's end-of-file-fixer. To keep it clean going forward, add the file to `[tool.ruff] extend-exclude` in pyproject.toml so both ruff check and ruff format skip it. (`# ruff: noqa` only covers lint, not the formatter.) The ruff-pre-commit hooks already pass `--force-exclude` internally, so the exclude is honored even when pre-commit passes the file as a positional argument — no .pre-commit-config.yaml change needed. Verified the full public-API import chain and ManyClassClassifier fit/predict still work against sklearn 1.8. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Hotfix on top of #285 (v0.4.0). Caught by a TestPyPI smoke install before publishing 0.4.0 to real PyPI — v0.4.0 was never published to PyPI, going straight to v0.4.1.
What broke
scikit-learn1.8.0 (released recently, the latest resolved version on a fresh install) renamedsklearn.utils.validation._is_pandas_df→is_pandas_df(no underscore).src/tabpfn_extensions/misc/sklearn_compat.pyimports the old name at module top-level, so anyfrom tabpfn_extensions import ...on sklearn ≥ 1.8 immediately raisesImportError. That's the entire public API broken for anyone resolving sklearn 1.8 — which is the default on a cleanpip install tabpfn-extensionstoday.Fix
sklearn_compat.pynow tries_is_pandas_df, falls back tois_pandas_df, and finally to the same pure-Python implementation already used in the older-sklearn branch. Three-line defensive shim.Verification
Reproduced the original bug with a clean venv:
pip install -i https://test.pypi.org/simple/ tabpfn-extensions==0.4.0resolves sklearn 1.8 and crashes on import. After this patch (installed locally) the same import chain succeeds on sklearn 1.8.Smoke results post-patch:
from tabpfn_extensions import *on sklearn 1.8warn_if_no_kv_cachesilent on non-TabPFN estimatorSurvivalTabPFNimportTabEBMimportRelease flow after merge
git tag v0.4.1 <merge-commit> && git push origin v0.4.1python -m build && twine upload --repository testpypi dist/*→ install in clean venv to re-verify.twine upload dist/*.[0.4.1]section ofCHANGELOG.md.🤖 Generated with Claude Code