Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions THIRD-PARTY-NOTICES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Third-Party Notices

This file documents third-party code adapted into this repository, with
Comment thread
noahho marked this conversation as resolved.
upstream attribution preserved. Transitive dependencies installed via
`pip` are governed by their own licenses (see `pyproject.toml` for the
canonical list).

---

## Summary

| Upstream | Local path | Upstream license |
|---|---|---|
| skrub — `SquashingScaler` | `src/tabpfn/preprocessing/steps/squashing_scaler_transformer.py`<br>`src/tabpfn/preprocessing/torch/torch_squashing_scaler.py` | BSD-3-Clause |

---

## Per-upstream notices

### skrub — SquashingScaler

**Upstream:** https://github.com/skrub-data/skrub
**Local paths:**
- `src/tabpfn/preprocessing/steps/squashing_scaler_transformer.py` — CPU/scikit-learn implementation
- `src/tabpfn/preprocessing/torch/torch_squashing_scaler.py` — PyTorch port of the same algorithm, with explicit-state fit/apply semantics

**License:** BSD-3-Clause
**Copyright:** Copyright (c) 2018-2023, The dirty_cat developers, 2023-2026 the skrub developers. All rights reserved. (per the skrub `LICENSE.txt`)
**Modifications:** Adapted to fit TabPFN's preprocessing pipeline; algorithmic logic preserved across both implementations. Upstream does not ship a per-file copyright header; attribution is carried in this NOTICE plus the in-file blocks.

---

## Adding new entries

When vendoring or adapting third-party code:

1. Preserve any upstream per-file copyright and license header verbatim. If the upstream does not ship a per-file header, add an attribution block citing the upstream URL, copyright holder, and SPDX license identifier.
2. When vendoring a whole directory of upstream code, also vendor the upstream `LICENSE` / `NOTICE` file alongside it. For single-file adaptations, the in-file attribution plus the entry in this NOTICE file is sufficient.
3. Add a row to the summary table and a per-upstream notice to this file, including the upstream copyright line when one is published.
1 change: 1 addition & 0 deletions changelog/964.added.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add `THIRD-PARTY-NOTICES.md` at repo root documenting third-party code adapted into TabPFN (currently: skrub's `SquashingScaler`, used by both the CPU and PyTorch preprocessing implementations) with upstream attribution preserved.
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
"""Implementation of the SquashingScaler, adapted from skrub.

See https://skrub-data.org/stable/reference/generated/skrub.SquashingScaler.html
Adapted from skrub: https://github.com/skrub-data/skrub
reference: https://skrub-data.org/stable/reference/generated/skrub.SquashingScaler.html

Copyright (c) 2018-2023, The dirty_cat developers, 2023-2026 the skrub developers.
All rights reserved.
SPDX-License-Identifier: BSD-3-Clause

This preprocessing is used e.g. in RealMLP, see https://arxiv.org/abs/2407.04491
"""
Expand Down
14 changes: 10 additions & 4 deletions src/tabpfn/preprocessing/torch/torch_squashing_scaler.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,16 @@
"""Torch implementation of SquashingScaler with NaN handling.

Mirrors the CPU
:class:`tabpfn.preprocessing.steps.squashing_scaler_transformer.SquashingScaler`:
robust median-centering with quartile scaling (or a min-max fallback when the
inter-quartile range collapses), followed by an injective soft-clip into
``[-max_absolute_value, +max_absolute_value]``.
:class:`tabpfn.preprocessing.steps.squashing_scaler_transformer.SquashingScaler`,
which is itself adapted from skrub:
https://github.com/skrub-data/skrub
The algorithmic logic (robust median-centering with quartile scaling, min-max
fallback, soft-clip) is derived from skrub's ``SquashingScaler``.

Original skrub attribution:
Copyright (c) 2018-2023, The dirty_cat developers, 2023-2026 the skrub developers.
All rights reserved.
SPDX-License-Identifier: BSD-3-Clause

The state is returned explicitly from ``fit`` rather than stored on the
instance, matching the rest of ``preprocessing/torch``.
Expand Down
Loading