Additive Terms#158
Conversation
QuestionsQuestions I'm encountering during development
Remaining To-Do
|
…rc/ondil/base/terms.py
…hirsch/rolch into structured-additive-model
…hirsch/rolch into structured-additive-model
…hirsch/rolch into structured-additive-model
…hirsch/rolch into structured-additive-model
…hirsch/rolch into structured-additive-model
…hirsch/rolch into structured-additive-model
…hirsch/rolch into structured-additive-model
There was a problem hiding this comment.
Pull request overview
This PR introduces a “terms + feature transformations” architecture to support structured additive distributional regression (including time-series/autoregressive terms), expands estimation method capabilities (bounds/constraints, extra kwargs), and adds/updates several distributions and supporting utilities.
Changes:
- Added new
ondil.termsmodule (linear / regularized linear, time-series terms, feature transformations, and scikit-learn estimator wrapper). - Added a new estimator
OnlineStructuredAdditiveDistributionRegressorimplementing an additive IRLS-style fit/update loop. - Extended estimation methods to accept additional kwargs (bounds, constraints, regularization weights), introduced constrained coordinate descent utilities, and added logging infrastructure (Loguru).
Reviewed changes
Copilot reviewed 43 out of 44 changed files in this pull request and generated 19 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/distributions/test_dist.py | Adds debug output and richer assert messages for distribution function parity vs R |
| tests/distributions/test_derivatives.py | Adds debug output for derivative parity tests |
| src/ondil/terms/time_series.py | New time-series term implementations (regularized + online updates) |
| src/ondil/terms/special.py | Adds a Term wrapper around scikit-learn estimators |
| src/ondil/terms/linear.py | Introduces linear / regularized linear terms and shared fit/update logic |
| src/ondil/terms/features.py | Adds feature transformation classes, including lagged time-series features |
| src/ondil/terms/init.py | Exposes terms/features in the public ondil.terms API |
| src/ondil/methods/ridge.py | Extends ridge/CD handling for bounds/weights and adds constrained CD implementation |
| src/ondil/methods/recursive_least_squares.py | Adjusts method signatures to accept kwargs and adds logging |
| src/ondil/methods/linear_constrained.py | Adds constrained coordinate descent + constrained elastic net path method |
| src/ondil/methods/factory.py | Adds "ocd" method option and tightens typing for method strings |
| src/ondil/methods/elasticnet.py | Refactors regularization weights handling and adds logging/kwargs plumbing |
| src/ondil/methods/init.py | Re-exports newly added methods (constraint + coordinate descent variants) |
| src/ondil/logging.py | Adds package-level logging configuration (Loguru-based) |
| src/ondil/links/identitylinks.py | Adds LowerTruncatedIdentity link |
| src/ondil/links/init.py | Exposes LowerTruncatedIdentity in link public API |
| src/ondil/incremental_statistics.py | Adds online mean/variance utilities (used for weighted regularization) |
| src/ondil/estimators/online_struct_add_distreg.py | Adds structured additive distribution regressor estimator |
| src/ondil/estimators/online_linear_model.py | Updates fit call to pass named args to fit_beta |
| src/ondil/estimators/init.py | Exports the new structured additive estimator |
| src/ondil/distributions/studentt.py | Adds start-value mixing/optimism and constant initial values |
| src/ondil/distributions/skew_t.py | Adds start-value mixing and constant/dynamic initial values logic |
| src/ondil/distributions/poisson.py | Adds start-value mixing and constant initial values |
| src/ondil/distributions/normal.py | Adds start-value mixing and constant initial values for Normal(+MeanVariance) |
| src/ondil/distributions/negative_binomial.py | Adds a new Negative Binomial distribution |
| src/ondil/distributions/johnsonsu.py | Adds start-value mixing and splits constant vs dynamic initial values |
| src/ondil/distributions/init.py | Exposes NegativeBinomial in public API |
| src/ondil/design_matrix.py | Adds make_lags() and reworks some utilities |
| src/ondil/coordinate_descent/utils.py | Extracts shared CD helpers (soft_threshold, get_start_beta) |
| src/ondil/coordinate_descent/cd_linear_constrained.py | Adds constrained coordinate descent kernels (numba) |
| src/ondil/coordinate_descent/cd_base.py | Refactors base CD to use shared helpers |
| src/ondil/coordinate_descent/init.py | Exposes coordinate descent functions in a proper package |
| src/ondil/base/terms.py | Introduces Term and FeatureTransformation base classes and helper logic |
| src/ondil/base/estimation_method.py | Refactors method interfaces to accept **kwargs |
| src/ondil/base/distribution.py | Adds _svm helper and distribution variance helper |
| src/ondil/base/init.py | Exposes Term from ondil.base |
| src/ondil/init.py | Imports new submodules and configures logging at import time |
| pyproject.toml | Adds loguru dependency |
| mkdocs.yml | Adds a new docs page for terms and features |
| docs/terms_and_features.md | Documents the new estimator→equation→terms→features hierarchy |
| docs/coordinate_descent.md | Fixes doc reference to soft_threshold after refactor |
| design/terms.md | Design notes for terms/feature transformations |
| design/additive_models.md | Design proposal for additive models in ondil |
| .gitignore | Normalizes formatting (adds line number alignment) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if self.is_regularized: | ||
| is_regularized = self.is_regularized | ||
| else: | ||
| n_features = X_mat.shape[1] | ||
| is_regularized = np.repeat(True, n_features) | ||
| if self.fit_intercept and not self.regularize_intercept: | ||
| is_regularized[0] = False |
There was a problem hiding this comment.
Same issue in the path-based model-selection code: if self.is_regularized: is ambiguous when is_regularized is an ndarray mask. This will prevent users from providing per-coefficient regularization masks. Use if self.is_regularized is not None: (and validate the length matches the design matrix).
| def fit_beta(self, x_gram, y_gram, **kwargs): | ||
| pass | ||
|
|
||
| def update_beta(self, x_gram, y_gram, is_regularized): | ||
| def update_beta(self, x_gram, y_gram, **kwargs): | ||
| pass |
There was a problem hiding this comment.
ElasticNetPath.fit_beta() / update_beta() are currently pass, so if they are ever called they will silently return None instead of failing fast. Since this is a path-based method, these should raise NotImplementedError (or delegate to an appropriate single-lambda implementation) to avoid hard-to-debug downstream None errors.
| self._method = get_estimation_method(self.method) | ||
| if not self._method._path_based_method: | ||
| raise ValueError("Non-Path-based methods are not supported for LinearTerm.") |
There was a problem hiding this comment.
Same issue here: the exception says "...for LinearTerm" inside RegularizedTimeSeriesTerm._prepare_term(), but this class is RegularizedTimeSeriesTerm. Adjust the message so it points to the correct term type.
| class OnlineStructuredAdditiveDistributionRegressor( | ||
| OndilEstimatorMixin, | ||
| RegressorMixin, | ||
| BaseEstimator, | ||
| ): | ||
| _parameter_constraints = { | ||
| "distribution": [Distribution], | ||
| "estimation_method": [EstimationMethod], | ||
| "terms": [dict, type(None)], | ||
| "scaler": [OnlineScaler, None], | ||
| } | ||
|
|
||
| def __init__( | ||
| self, | ||
| distribution: Distribution = Normal(), | ||
| terms: Optional[Dict[str, Any]] = None, | ||
| scale_inputs: bool = True, | ||
| learning_rate: float = 0.0, | ||
| max_outer_iterations: int = 10, | ||
| max_inner_iterations: int = 10, | ||
| rel_tol_outer: float = 1e-3, | ||
| rel_tol_inner: float = 1e-3, | ||
| abs_tol_outer: float = 1e-3, | ||
| abs_tol_inner: float = 1e-3, | ||
| step_size: float = 1.0, | ||
| ): | ||
| self.distribution = distribution | ||
| self.terms = terms | ||
| self.scale_inputs = scale_inputs | ||
| self.learning_rate = learning_rate | ||
| self.max_outer_iterations = max_outer_iterations | ||
| self.max_inner_iterations = max_inner_iterations | ||
| self.rel_tol_outer = rel_tol_outer | ||
| self.rel_tol_inner = rel_tol_inner | ||
| self.abs_tol_outer = abs_tol_outer | ||
| self.abs_tol_inner = abs_tol_inner | ||
| self.step_size = step_size | ||
|
|
There was a problem hiding this comment.
OnlineStructuredAdditiveDistributionRegressor introduces a large new estimator API (fit/update/predict, term orchestration, convergence logic), but there are no unit tests covering basic fit/update behavior or failure modes (e.g. step-size reduction, term state handling). Since the repo already has estimator-level tests, adding a minimal smoke test (fit + update on a small synthetic dataset) would help prevent regressions.
| if isinstance(lags, int): | ||
| logger.trace(f"Got integer lags, converting to list. Lags: {lags}") | ||
| lags = np.linspace(1, lags, lags, dtype=int) | ||
| logger.trace(f"Converted lags: {lags}") | ||
| n_lags = len(lags) | ||
|
|
||
| if n_lags == 0: | ||
| logger.warning("No lags specified. Expect a crash.") | ||
| if np.any(lags < 1): | ||
| logger.warning("We have lags smaller than 1. Are you sure?") | ||
|
|
||
| X = np.hstack([np.roll(y[:, None], i) for i in lags]) | ||
| X[np.triu_indices(n_lags, k=0)] = np.mean(y) | ||
|
|
There was a problem hiding this comment.
make_lags() masks the rolled values using X[np.triu_indices(n_lags, k=0)] = mean(y), which only works when lags are exactly [1, 2, ..., n_lags] in order. For arbitrary lag sets (e.g. [1, 4]) this leaves look-ahead leakage from the np.roll wrap-around. Mask the first lag rows per lag-column (e.g. loop over (col, lag) and set X[:lag, col] = mean(y)), or build the lag matrix without wrap-around in the first place.
| self._method = get_estimation_method(self.method) | ||
| if self._method._path_based_method: | ||
| raise ValueError("Path-based methods are not supported for LinearTerm.") |
There was a problem hiding this comment.
The error messages in _prepare_term() refer to LinearTerm, but this is TimeSeriesTerm / RegularizedTimeSeriesTerm. This makes debugging confusing when users select an unsupported method. Update the message strings to reference the correct class names.
| class LinearConstrainedCoordinateDescent(EstimationMethod): | ||
| r"""Linear Constrained (unconstrained) Estimation. | ||
|
|
||
| We use `numba` to speed up the coordinate descent algorithm. | ||
| """ |
There was a problem hiding this comment.
LinearConstrainedCoordinateDescent is defined here in ridge.py and also in methods/linear_constrained.py (same class name/purpose). Keeping two implementations increases maintenance burden and makes it unclear which one is the supported public API. Consider removing one (or re-exporting a single implementation) to avoid divergence.
| from . import ( | ||
| base, | ||
| diagnostics, | ||
| distributions, | ||
| error, | ||
| estimators, | ||
| incremental_statistics, | ||
| information_criteria, | ||
| links, | ||
| logging, | ||
| methods, | ||
| scaler, | ||
| terms, | ||
| utils, | ||
| warnings, | ||
| ) | ||
|
|
||
| logging.set_log_level("INFO") | ||
|
|
There was a problem hiding this comment.
Calling logging.set_log_level("INFO") at import time introduces global side effects (it installs a stderr sink and emits a SUCCESS log). This can be surprising for library users and can interfere with applications/tests that configure logging themselves. Prefer leaving logging unconfigured by default and letting users opt in (or gate this behind an env var / explicit config call).
| forget_weight = init_forget_vector(forget, X.shape[0]) | ||
| effective_weight = sample_weight * forget_weight | ||
|
|
||
| mean = np.average(X, weights=sample_weight * effective_weight, axis=0) |
There was a problem hiding this comment.
calculate_statistics() applies sample_weight twice when computing mean (weights=sample_weight * effective_weight where effective_weight already includes sample_weight). This will bias the mean/variance whenever non-uniform weights are provided. Use effective_weight as the weights (or keep forget_weight separate) so weights are applied exactly once.
| mean = np.average(X, weights=sample_weight * effective_weight, axis=0) | |
| mean = np.average(X, weights=effective_weight, axis=0) |
| constraint_matrix: np.ndarray | None, | ||
| constraint_bounds: np.ndarray | None, | ||
| relaxation_method: Literal["alm", "dpga"] = "alm", | ||
| which_start_value: Literal[ | ||
| "previous_lambda", "previous_fit", "average" | ||
| ] = "previous_lambda", | ||
| selection: Literal["cyclic", "random"] = "cyclic", |
There was a problem hiding this comment.
online_linear_constrained_coordinate_descent_path() declares relaxation_method: Literal["alm", "dpga"], which is a different spelling from the earlier "pdga" and from the method-level API ("pgda"). This mismatch makes it very easy to pass an option that compiles but is never handled. Use a single spelling (e.g. "pgda") consistently throughout signatures and conditionals.
No description provided.