diff --git a/THIRD-PARTY-NOTICES.md b/THIRD-PARTY-NOTICES.md new file mode 100644 index 000000000..0e417bca7 --- /dev/null +++ b/THIRD-PARTY-NOTICES.md @@ -0,0 +1,39 @@ +# Third-Party Notices + +This file documents third-party code adapted into this repository, with +upstream attribution preserved. Transitive dependencies installed via +`pip` are governed by their own licenses (see `pyproject.toml` for the +canonical list). + +--- + +## Summary + +| Upstream | Local path | Upstream license | +|---|---|---| +| skrub — `SquashingScaler` | `src/tabpfn/preprocessing/steps/squashing_scaler_transformer.py`
`src/tabpfn/preprocessing/torch/torch_squashing_scaler.py` | BSD-3-Clause | + +--- + +## Per-upstream notices + +### skrub — SquashingScaler + +**Upstream:** https://github.com/skrub-data/skrub +**Local paths:** +- `src/tabpfn/preprocessing/steps/squashing_scaler_transformer.py` — CPU/scikit-learn implementation +- `src/tabpfn/preprocessing/torch/torch_squashing_scaler.py` — PyTorch port of the same algorithm, with explicit-state fit/apply semantics + +**License:** BSD-3-Clause +**Copyright:** Copyright (c) 2018-2023, The dirty_cat developers, 2023-2026 the skrub developers. All rights reserved. (per the skrub `LICENSE.txt`) +**Modifications:** Adapted to fit TabPFN's preprocessing pipeline; algorithmic logic preserved across both implementations. Upstream does not ship a per-file copyright header; attribution is carried in this NOTICE plus the in-file blocks. + +--- + +## Adding new entries + +When vendoring or adapting third-party code: + +1. Preserve any upstream per-file copyright and license header verbatim. If the upstream does not ship a per-file header, add an attribution block citing the upstream URL, copyright holder, and SPDX license identifier. +2. When vendoring a whole directory of upstream code, also vendor the upstream `LICENSE` / `NOTICE` file alongside it. For single-file adaptations, the in-file attribution plus the entry in this NOTICE file is sufficient. +3. Add a row to the summary table and a per-upstream notice to this file, including the upstream copyright line when one is published. diff --git a/changelog/964.added.md b/changelog/964.added.md new file mode 100644 index 000000000..edbea8220 --- /dev/null +++ b/changelog/964.added.md @@ -0,0 +1 @@ +Add `THIRD-PARTY-NOTICES.md` at repo root documenting third-party code adapted into TabPFN (currently: skrub's `SquashingScaler`, used by both the CPU and PyTorch preprocessing implementations) with upstream attribution preserved. diff --git a/src/tabpfn/preprocessing/steps/squashing_scaler_transformer.py b/src/tabpfn/preprocessing/steps/squashing_scaler_transformer.py index 7ff176afe..6b3c0abdd 100644 --- a/src/tabpfn/preprocessing/steps/squashing_scaler_transformer.py +++ b/src/tabpfn/preprocessing/steps/squashing_scaler_transformer.py @@ -1,6 +1,11 @@ """Implementation of the SquashingScaler, adapted from skrub. -See https://skrub-data.org/stable/reference/generated/skrub.SquashingScaler.html +Adapted from skrub: https://github.com/skrub-data/skrub + reference: https://skrub-data.org/stable/reference/generated/skrub.SquashingScaler.html + +Copyright (c) 2018-2023, The dirty_cat developers, 2023-2026 the skrub developers. +All rights reserved. +SPDX-License-Identifier: BSD-3-Clause This preprocessing is used e.g. in RealMLP, see https://arxiv.org/abs/2407.04491 """ diff --git a/src/tabpfn/preprocessing/torch/torch_squashing_scaler.py b/src/tabpfn/preprocessing/torch/torch_squashing_scaler.py index 356e02bb5..19a14ee26 100644 --- a/src/tabpfn/preprocessing/torch/torch_squashing_scaler.py +++ b/src/tabpfn/preprocessing/torch/torch_squashing_scaler.py @@ -3,10 +3,16 @@ """Torch implementation of SquashingScaler with NaN handling. Mirrors the CPU -:class:`tabpfn.preprocessing.steps.squashing_scaler_transformer.SquashingScaler`: -robust median-centering with quartile scaling (or a min-max fallback when the -inter-quartile range collapses), followed by an injective soft-clip into -``[-max_absolute_value, +max_absolute_value]``. +:class:`tabpfn.preprocessing.steps.squashing_scaler_transformer.SquashingScaler`, +which is itself adapted from skrub: + https://github.com/skrub-data/skrub +The algorithmic logic (robust median-centering with quartile scaling, min-max +fallback, soft-clip) is derived from skrub's ``SquashingScaler``. + +Original skrub attribution: + Copyright (c) 2018-2023, The dirty_cat developers, 2023-2026 the skrub developers. + All rights reserved. + SPDX-License-Identifier: BSD-3-Clause The state is returned explicitly from ``fit`` rather than stored on the instance, matching the rest of ``preprocessing/torch``.