Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 3 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -496,15 +496,14 @@ pre-commit run --all-files
pytest tests/
```

## Anonymized Telemetry
## Telemetry

This project collects fully anonymous usage telemetry with an option to opt-out of any telemetry or opt-in to extended telemetry.
This project collects usage telemetry with an option to opt-out.

The data is used exclusively to help us provide stability to the relevant products and compute environments and guide future improvements.

- **No personal data is collected**
- **Personal data is collected only if user provided consent and accepted the terms of service**
- **No code, model inputs, or outputs are ever sent**
- **Data is strictly anonymous and cannot be linked to individuals**

For details on telemetry, please see our [Telemetry Reference](https://github.com/PriorLabs/TabPFN/blob/main/TELEMETRY.md) and our [Privacy Policy](https://priorlabs.ai/privacy-policy/).

Expand Down
86 changes: 38 additions & 48 deletions TELEMETRY.md
Original file line number Diff line number Diff line change
@@ -1,71 +1,61 @@
# 📊 Telemetry
# Telemetry

This project includes lightweight, anonymous telemetry to help us improve TabPFN.
We've designed this with two goals in mind:
TabPFN includes lightweight, optional telemetry that helps us understand how the library is used and where to focus development. This page explains exactly what is collected, how it's handled, and how to opt out.

1. ✅ Be **fully GDPR-compliant** (no personal data, no sensitive data, no surprises)
2. ✅ Be **OSS-friendly and transparent** about what we track and why
## What we collect

If you'd rather not send telemetry, you can always opt out (see **Opting out**).
We gather high-level usage signals - enough to guide development, never enough to expose your data or code.

---
**Events**

## 🔍 What we collect
- `session` - sent when a TabPFN estimator is initialized
- `ping` - liveness check on model initialization
- `model_load` - sent when a model is loaded from disk or cache
- `fit_called` / `predict_called` - sent when you call `fit` or `predict`

We only gather **very high-level usage signals** — enough to guide development, never enough to identify you or your data.
**Metadata (all events)**

Here's the full list:
- `tabpfn_version`, `python_version`, `numpy_version`, `pandas_version` - software versions
- `gpu_type` - GPU type TabPFN is running on
- `timestamp` - time of the event
- `install_date` - date TabPFN was first used (year-month-day)
- `install_id` - random, locally generated installation identifier (see "Privacy" below)

### Events
- `ping` – sent when models initialize, used to check liveness
- `fit_called` – sent when you call `fit`
- `predict_called` – sent when you call `predict`
- `session` - sent whenever a user initializes a TabPFN estimator.
**Additional metadata (fit / predict only)**

### Metadata (all events)
- `python_version` – version of Python you're running
- `tabpfn_version` – TabPFN package version
- `timestamp` – time of the event
- `numpy_vesion` - local Numpy version
- `pandas_version` - local Pandas version
- `gpu_type` - type of GPU TabPFN is running on.
- `install_date` - `year-month-day` when TabPFN was used for the first time
- `install_id` - unique, random and anonymous installation ID.
- `task` - classification or regression
- `num_rows`, `num_columns` - dataset shape, rounded into ranges (exact values are never recorded)
- `duration_ms` - wall-clock time of the call

### Extra metadata (`fit` / `predict` only)
- `task` – whether the call was for **classification** or **regression**
- `num_rows` – *rounded* number of rows in your dataset
- `num_columns` – *rounded* number of columns in your dataset
- `duration_ms` – time it took to complete the call
## What we never collect

---
Regardless of account status, we never collect:

## 🛡️ How we protect your privacy
- Training data, features, labels, or model outputs
- File paths, environment variables, or hostnames
- Exact dataset dimensions
- Code of any kind

- **No inputs, no outputs, no code** ever leave your machine.
- **No personal data** is collected.
- Dataset shapes are **rounded into ranges** (e.g. `(953, 17)` → `(1000, 20)`) so exact dimensionalities can't be linked back to you.
- The data is strictly anonymous — it cannot be tied to individuals, projects, or datasets.
No inputs, outputs, or model weights ever leave your machine.

This approach lets us understand dataset *patterns* (e.g. "most users run with ~1k features") while ensuring no one's data is exposed.
## Privacy

---
TabPFN operates in two modes with different privacy properties:

## 🤔 Why collect telemetry?
**Without an account (anonymous).** Telemetry is tied only to a random `install_id` generated locally on first use. This ID is not linked to any personal information and cannot be traced back to you.

Open-source projects don't get much feedback unless people file issues. Telemetry helps us:
- See which parts of TabPFN are most used (fit vs predict, classification vs regression)
- Detect performance bottlenecks and stability issues
- Prioritize improvements that benefit the most users
**With an account (pseudonymous).** If you create a TabPFN account, your `user_id` is included in telemetry events.

This information goes directly into **making TabPFN better** for the community.
For further details we suggest you check out our [privacy policy](https://priorlabs.ai/privacy-policy).

---
## Opting out

## 🚫 Opting out

Don't want to send telemetry? No problem — just set the environment variable:
Set one environment variable to disable all telemetry:

```bash
export TABPFN_DISABLE_TELEMETRY=1
```
```

## Why collect telemetry?

Open-source projects get limited feedback unless people file issues. Telemetry helps us see which parts of TabPFN are most used, detect performance bottlenecks, and prioritize improvements that benefit the most users.
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,9 @@ dependencies = [
# Once Python 3.10 is the minimum version, this can be removed.
"eval-type-backport>=0.2.2",
"joblib>=1.2.0",
"tabpfn-common-utils[telemetry-interactive]>=0.2.13",
"tabpfn-common-utils[telemetry-interactive]>=0.2.19",
"filelock>=3.11.0",
"pyjwt>=2.12.1",
]
requires-python = ">=3.9"
authors = [
Expand Down
11 changes: 0 additions & 11 deletions src/tabpfn/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@
from sklearn.base import (
check_is_fitted,
)
from tabpfn_common_utils.telemetry.interactive import capture_session, ping

# --- TabPFN imports ---
from tabpfn.constants import (
Expand Down Expand Up @@ -418,16 +417,6 @@ def estimator_to_device(
return byte_size


def initialize_telemetry() -> None:
"""Initialize telemetry and acknowledge anonymous session.

If user opted out of telemetry using `TABPFN_DISABLE_TELEMETRY`,
no action is taken.
"""
ping()
capture_session()


def get_embeddings(
model: TabPFNClassifier | TabPFNRegressor,
X: XType,
Expand Down
8 changes: 5 additions & 3 deletions src/tabpfn/classifier.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@
import torch
from sklearn import config_context
from sklearn.base import BaseEstimator, ClassifierMixin, check_is_fitted
from tabpfn_common_utils.telemetry import track_model_call

from tabpfn.base import (
ClassifierModelSpecs,
Expand All @@ -39,7 +38,6 @@
estimator_to_device,
get_embeddings,
initialize_model_variables_helper,
initialize_telemetry,
)
from tabpfn.constants import (
PROBABILITY_EPSILON_ROUND_ZERO,
Expand Down Expand Up @@ -81,6 +79,10 @@
from tabpfn.preprocessing.ensemble import TabPFNEnsemblePreprocessor
from tabpfn.preprocessing.label_encoder import TabPFNLabelEncoder
from tabpfn.preprocessing.modality_detection import detect_feature_modalities
from tabpfn.telemetry import (
init as init_telemetry,
track_model_call,
)
from tabpfn.utils import (
DevicesSpecification,
balance_probas_by_class_counts,
Expand Down Expand Up @@ -482,7 +484,7 @@ class in Fine-Tuning. The fit_from_preprocessed() function sets this
self.n_preprocessing_jobs = n_preprocessing_jobs
self.eval_metric = eval_metric
self.tuning_config = tuning_config
initialize_telemetry()
init_telemetry()

# Only anonymously record `fit_mode` usage
log_model_init_params(self, {"fit_mode": self.fit_mode})
Expand Down
4 changes: 2 additions & 2 deletions src/tabpfn/model_loading.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@
import joblib
import torch
from filelock import FileLock
from tabpfn_common_utils.telemetry import set_model_config
from torch import nn

from tabpfn.architectures import ARCHITECTURES
Expand All @@ -36,6 +35,7 @@
from tabpfn.inference import InferenceEngine
from tabpfn.inference_config import InferenceConfig
from tabpfn.settings import settings
from tabpfn.telemetry import set_model_config

if TYPE_CHECKING:
from sklearn.base import BaseEstimator
Expand Down Expand Up @@ -767,7 +767,7 @@ def log_model_init_params(
# We conditionally import here to avoid introducing breaking changes as
# this interface was introduced in tabpfn_common_utils 0.2.13 and not all
# users have upgraded to this version yet.
from tabpfn_common_utils.telemetry import set_init_params # noqa: PLC0415
from tabpfn.telemetry import set_init_params # noqa: PLC0415

set_init_params(logged_params)
except ImportError:
Expand Down
8 changes: 5 additions & 3 deletions src/tabpfn/regressor.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,6 @@
TransformerMixin,
check_is_fitted,
)
from tabpfn_common_utils.telemetry import track_model_call

from tabpfn.architectures.base.bar_distribution import FullSupportBarDistribution
from tabpfn.base import (
Expand All @@ -45,7 +44,6 @@
estimator_to_device,
get_embeddings,
initialize_model_variables_helper,
initialize_telemetry,
)
from tabpfn.constants import REGRESSION_CONSTANT_TARGET_BORDER_EPSILON, ModelVersion
from tabpfn.errors import TabPFNValidationError, handle_oom_errors
Expand All @@ -70,6 +68,10 @@
from tabpfn.preprocessing.steps import (
get_all_reshape_feature_distribution_preprocessors,
)
from tabpfn.telemetry import (
init as init_telemetry,
track_model_call,
)
from tabpfn.utils import (
DevicesSpecification,
convert_batch_of_cat_ix_to_schema,
Expand Down Expand Up @@ -466,7 +468,7 @@ class in Fine-Tuning. The fit_from_preprocessed() function sets this
)
self.n_jobs = n_jobs
self.n_preprocessing_jobs = n_preprocessing_jobs
initialize_telemetry()
init_telemetry()

# Only anonymously record `fit_mode` usage
log_model_init_params(self, {"fit_mode": self.fit_mode})
Expand Down
Loading
Loading