Skip to content

Additive Terms#158

Open
simon-hirsch wants to merge 239 commits intomainfrom
structured-additive-model
Open

Additive Terms#158
simon-hirsch wants to merge 239 commits intomainfrom
structured-additive-model

Conversation

@simon-hirsch
Copy link
Copy Markdown
Owner

No description provided.

@simon-hirsch
Copy link
Copy Markdown
Owner Author

simon-hirsch commented Nov 12, 2025

Questions

Questions I'm encountering during development

  • What about the global model selection
  • Do we want to push forget to the EstimationMethod - I think we don't so that we can have different forgets for different terms that are estimated jointly.
  • Do we want to save the first result of each inner iteration in terms regardless of performance to have a fall-back?

Remaining To-Do

  • Make non-constant start values
  • Generalize Terms a bit more s.t. we can have all terms jointly that use a certain estimation method
  • Add a weight to coordinate descent for the regularization -> Allow for regularized estimation of centered, but unscaled variables with correct relative penalization

@simon-hirsch simon-hirsch requested a review from BerriJ November 14, 2025 13:53
simon-hirsch and others added 25 commits February 4, 2026 14:13
Copilot AI review requested due to automatic review settings April 22, 2026 12:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a “terms + feature transformations” architecture to support structured additive distributional regression (including time-series/autoregressive terms), expands estimation method capabilities (bounds/constraints, extra kwargs), and adds/updates several distributions and supporting utilities.

Changes:

  • Added new ondil.terms module (linear / regularized linear, time-series terms, feature transformations, and scikit-learn estimator wrapper).
  • Added a new estimator OnlineStructuredAdditiveDistributionRegressor implementing an additive IRLS-style fit/update loop.
  • Extended estimation methods to accept additional kwargs (bounds, constraints, regularization weights), introduced constrained coordinate descent utilities, and added logging infrastructure (Loguru).

Reviewed changes

Copilot reviewed 43 out of 44 changed files in this pull request and generated 19 comments.

Show a summary per file
File Description
tests/distributions/test_dist.py Adds debug output and richer assert messages for distribution function parity vs R
tests/distributions/test_derivatives.py Adds debug output for derivative parity tests
src/ondil/terms/time_series.py New time-series term implementations (regularized + online updates)
src/ondil/terms/special.py Adds a Term wrapper around scikit-learn estimators
src/ondil/terms/linear.py Introduces linear / regularized linear terms and shared fit/update logic
src/ondil/terms/features.py Adds feature transformation classes, including lagged time-series features
src/ondil/terms/init.py Exposes terms/features in the public ondil.terms API
src/ondil/methods/ridge.py Extends ridge/CD handling for bounds/weights and adds constrained CD implementation
src/ondil/methods/recursive_least_squares.py Adjusts method signatures to accept kwargs and adds logging
src/ondil/methods/linear_constrained.py Adds constrained coordinate descent + constrained elastic net path method
src/ondil/methods/factory.py Adds "ocd" method option and tightens typing for method strings
src/ondil/methods/elasticnet.py Refactors regularization weights handling and adds logging/kwargs plumbing
src/ondil/methods/init.py Re-exports newly added methods (constraint + coordinate descent variants)
src/ondil/logging.py Adds package-level logging configuration (Loguru-based)
src/ondil/links/identitylinks.py Adds LowerTruncatedIdentity link
src/ondil/links/init.py Exposes LowerTruncatedIdentity in link public API
src/ondil/incremental_statistics.py Adds online mean/variance utilities (used for weighted regularization)
src/ondil/estimators/online_struct_add_distreg.py Adds structured additive distribution regressor estimator
src/ondil/estimators/online_linear_model.py Updates fit call to pass named args to fit_beta
src/ondil/estimators/init.py Exports the new structured additive estimator
src/ondil/distributions/studentt.py Adds start-value mixing/optimism and constant initial values
src/ondil/distributions/skew_t.py Adds start-value mixing and constant/dynamic initial values logic
src/ondil/distributions/poisson.py Adds start-value mixing and constant initial values
src/ondil/distributions/normal.py Adds start-value mixing and constant initial values for Normal(+MeanVariance)
src/ondil/distributions/negative_binomial.py Adds a new Negative Binomial distribution
src/ondil/distributions/johnsonsu.py Adds start-value mixing and splits constant vs dynamic initial values
src/ondil/distributions/init.py Exposes NegativeBinomial in public API
src/ondil/design_matrix.py Adds make_lags() and reworks some utilities
src/ondil/coordinate_descent/utils.py Extracts shared CD helpers (soft_threshold, get_start_beta)
src/ondil/coordinate_descent/cd_linear_constrained.py Adds constrained coordinate descent kernels (numba)
src/ondil/coordinate_descent/cd_base.py Refactors base CD to use shared helpers
src/ondil/coordinate_descent/init.py Exposes coordinate descent functions in a proper package
src/ondil/base/terms.py Introduces Term and FeatureTransformation base classes and helper logic
src/ondil/base/estimation_method.py Refactors method interfaces to accept **kwargs
src/ondil/base/distribution.py Adds _svm helper and distribution variance helper
src/ondil/base/init.py Exposes Term from ondil.base
src/ondil/init.py Imports new submodules and configures logging at import time
pyproject.toml Adds loguru dependency
mkdocs.yml Adds a new docs page for terms and features
docs/terms_and_features.md Documents the new estimator→equation→terms→features hierarchy
docs/coordinate_descent.md Fixes doc reference to soft_threshold after refactor
design/terms.md Design notes for terms/feature transformations
design/additive_models.md Design proposal for additive models in ondil
.gitignore Normalizes formatting (adds line number alignment)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/ondil/terms/linear.py
Comment on lines +571 to +577
if self.is_regularized:
is_regularized = self.is_regularized
else:
n_features = X_mat.shape[1]
is_regularized = np.repeat(True, n_features)
if self.fit_intercept and not self.regularize_intercept:
is_regularized[0] = False
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue in the path-based model-selection code: if self.is_regularized: is ambiguous when is_regularized is an ndarray mask. This will prevent users from providing per-coefficient regularization masks. Use if self.is_regularized is not None: (and validate the length matches the design matrix).

Copilot uses AI. Check for mistakes.
Comment on lines +273 to 277
def fit_beta(self, x_gram, y_gram, **kwargs):
pass

def update_beta(self, x_gram, y_gram, is_regularized):
def update_beta(self, x_gram, y_gram, **kwargs):
pass
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ElasticNetPath.fit_beta() / update_beta() are currently pass, so if they are ever called they will silently return None instead of failing fast. Since this is a path-based method, these should raise NotImplementedError (or delegate to an appropriate single-lambda implementation) to avoid hard-to-debug downstream None errors.

Copilot uses AI. Check for mistakes.
Comment on lines +555 to +557
self._method = get_estimation_method(self.method)
if not self._method._path_based_method:
raise ValueError("Non-Path-based methods are not supported for LinearTerm.")
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue here: the exception says "...for LinearTerm" inside RegularizedTimeSeriesTerm._prepare_term(), but this class is RegularizedTimeSeriesTerm. Adjust the message so it points to the correct term type.

Copilot uses AI. Check for mistakes.
Comment on lines +49 to +86
class OnlineStructuredAdditiveDistributionRegressor(
OndilEstimatorMixin,
RegressorMixin,
BaseEstimator,
):
_parameter_constraints = {
"distribution": [Distribution],
"estimation_method": [EstimationMethod],
"terms": [dict, type(None)],
"scaler": [OnlineScaler, None],
}

def __init__(
self,
distribution: Distribution = Normal(),
terms: Optional[Dict[str, Any]] = None,
scale_inputs: bool = True,
learning_rate: float = 0.0,
max_outer_iterations: int = 10,
max_inner_iterations: int = 10,
rel_tol_outer: float = 1e-3,
rel_tol_inner: float = 1e-3,
abs_tol_outer: float = 1e-3,
abs_tol_inner: float = 1e-3,
step_size: float = 1.0,
):
self.distribution = distribution
self.terms = terms
self.scale_inputs = scale_inputs
self.learning_rate = learning_rate
self.max_outer_iterations = max_outer_iterations
self.max_inner_iterations = max_inner_iterations
self.rel_tol_outer = rel_tol_outer
self.rel_tol_inner = rel_tol_inner
self.abs_tol_outer = abs_tol_outer
self.abs_tol_inner = abs_tol_inner
self.step_size = step_size

Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OnlineStructuredAdditiveDistributionRegressor introduces a large new estimator API (fit/update/predict, term orchestration, convergence logic), but there are no unit tests covering basic fit/update behavior or failure modes (e.g. step-size reduction, term state handling). Since the repo already has estimator-level tests, adding a minimal smoke test (fit + update on a small synthetic dataset) would help prevent regressions.

Copilot uses AI. Check for mistakes.
Comment on lines +36 to +49
if isinstance(lags, int):
logger.trace(f"Got integer lags, converting to list. Lags: {lags}")
lags = np.linspace(1, lags, lags, dtype=int)
logger.trace(f"Converted lags: {lags}")
n_lags = len(lags)

if n_lags == 0:
logger.warning("No lags specified. Expect a crash.")
if np.any(lags < 1):
logger.warning("We have lags smaller than 1. Are you sure?")

X = np.hstack([np.roll(y[:, None], i) for i in lags])
X[np.triu_indices(n_lags, k=0)] = np.mean(y)

Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make_lags() masks the rolled values using X[np.triu_indices(n_lags, k=0)] = mean(y), which only works when lags are exactly [1, 2, ..., n_lags] in order. For arbitrary lag sets (e.g. [1, 4]) this leaves look-ahead leakage from the np.roll wrap-around. Mask the first lag rows per lag-column (e.g. loop over (col, lag) and set X[:lag, col] = mean(y)), or build the lag matrix without wrap-around in the first place.

Copilot uses AI. Check for mistakes.
Comment on lines +99 to +101
self._method = get_estimation_method(self.method)
if self._method._path_based_method:
raise ValueError("Path-based methods are not supported for LinearTerm.")
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error messages in _prepare_term() refer to LinearTerm, but this is TimeSeriesTerm / RegularizedTimeSeriesTerm. This makes debugging confusing when users select an unsupported method. Update the message strings to reference the correct class names.

Copilot uses AI. Check for mistakes.
Comment on lines +206 to +210
class LinearConstrainedCoordinateDescent(EstimationMethod):
r"""Linear Constrained (unconstrained) Estimation.

We use `numba` to speed up the coordinate descent algorithm.
"""
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LinearConstrainedCoordinateDescent is defined here in ridge.py and also in methods/linear_constrained.py (same class name/purpose). Keeping two implementations increases maintenance burden and makes it unclear which one is the supported public API. Consider removing one (or re-exporting a single implementation) to avoid divergence.

Copilot uses AI. Check for mistakes.
Comment thread src/ondil/__init__.py
Comment on lines 23 to +41
from . import (
base,
diagnostics,
distributions,
error,
estimators,
incremental_statistics,
information_criteria,
links,
logging,
methods,
scaler,
terms,
utils,
warnings,
)

logging.set_log_level("INFO")

Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling logging.set_log_level("INFO") at import time introduces global side effects (it installs a stderr sink and emits a SUCCESS log). This can be surprising for library users and can interfere with applications/tests that configure logging themselves. Prefer leaving logging unconfigured by default and letting users opt in (or gate this behind an env var / explicit config call).

Copilot uses AI. Check for mistakes.
forget_weight = init_forget_vector(forget, X.shape[0])
effective_weight = sample_weight * forget_weight

mean = np.average(X, weights=sample_weight * effective_weight, axis=0)
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calculate_statistics() applies sample_weight twice when computing mean (weights=sample_weight * effective_weight where effective_weight already includes sample_weight). This will bias the mean/variance whenever non-uniform weights are provided. Use effective_weight as the weights (or keep forget_weight separate) so weights are applied exactly once.

Suggested change
mean = np.average(X, weights=sample_weight * effective_weight, axis=0)
mean = np.average(X, weights=effective_weight, axis=0)

Copilot uses AI. Check for mistakes.
Comment on lines +193 to +199
constraint_matrix: np.ndarray | None,
constraint_bounds: np.ndarray | None,
relaxation_method: Literal["alm", "dpga"] = "alm",
which_start_value: Literal[
"previous_lambda", "previous_fit", "average"
] = "previous_lambda",
selection: Literal["cyclic", "random"] = "cyclic",
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

online_linear_constrained_coordinate_descent_path() declares relaxation_method: Literal["alm", "dpga"], which is a different spelling from the earlier "pdga" and from the method-level API ("pgda"). This mismatch makes it very easy to pass an option that compiles but is never handled. Use a single spelling (e.g. "pgda") consistently throughout signatures and conditionals.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants