factorlassopackage implements sign-constrained LASSO, prior-centered regularisation, and hierarchical group LASSO (HCGL) for sparse multi-output factor model estimation with integrated factor covariance assembly
| 📊 Metric | 🔢 Value |
|---|---|
| PyPI Version | |
| Python Versions | |
| License |
| 📊 Metric | 🔢 Value |
|---|---|
| Total Downloads | |
| CI Status | |
| Coverage | |
| GitHub Stars | |
| GitHub Forks |
In many applications — portfolio construction, genomics, macro-econometrics — you need to estimate a factor model
where
In practice, you face several challenges that standard LASSO packages don't handle:
- Domain knowledge constrains coefficient signs — equity assets should have non-negative equity beta; government bonds should not load on commodity factors. Standard LASSO ignores this.
-
You have prior estimates and want to shrink toward them, not toward zero — the penalty should be
$|\beta - \beta_0|$ not$|\beta|$ . - Variables have different history lengths — some assets start trading later than others. Dropping rows with any NaN discards valid data for all other variables.
-
You need a consistent covariance matrix — the factor covariance
$\Sigma_y = \beta \Sigma_x \beta^\top + D$ must use the same$\beta$ from estimation, not a separate estimate. - Data is non-stationary — recent observations should carry more weight (EWMA weighting).
factorlasso solves all five in a single fit() call. The implementation follows scikit-learn conventions (fit / predict / score / coef_ / intercept_).
The methodology is based on the Hierarchical Clustering Group LASSO (HCGL) framework introduced in:
Sepp A., Ossa I., Kastenholz M. (2026), "Robust Optimization of Strategic and Tactical Asset Allocation for Multi-Asset Portfolios", The Journal of Portfolio Management, 52(4), 86–120. Paper link
and the Capital Market Assumptions framework in the companion paper:
Sepp A., Hansen E., Kastenholz M. (2026), "Capital Market Assumptions and Strategic Asset Allocation Using Multi-Asset Tradable Factors", Under revision at the Journal of Portfolio Management.
Install using
pip install factorlassoUpgrade using
pip install --upgrade factorlassoClone using
git clone https://github.com/ArturSepp/factorlasso.gitCore dependencies:
numpy, pandas, scipy, cvxpy, openpyxl
- Quick Start
- Convention: Paper vs Code
- Sign Constraints
- Prior-Centered Regularisation
- Hierarchical Clustering Group LASSO (HCGL)
- NaN-Aware Estimation
- Factor Covariance Assembly
- API Summary
- Estimation Methods
- Applications
- Related Packages
- References
- Citation
import numpy as np, pandas as pd
from factorlasso import LassoModel, LassoModelType
# Simulate Y_t = β X_t + noise (code uses row-major: Y = X @ β' + noise)
np.random.seed(42)
T, M, N = 200, 3, 5
X = pd.DataFrame(np.random.randn(T, M), columns=['f0', 'f1', 'f2'])
beta_true = np.array([[1, 0, .5], [0, 1, 0], [.3, 0, 0], [0, .8, .2], [1, .5, 0]])
Y = pd.DataFrame(X.values @ beta_true.T + .1*np.random.randn(T, N),
columns=[f'y{i}' for i in range(N)])
# Fit sparse factor model
model = LassoModel(model_type=LassoModelType.LASSO, reg_lambda=1e-4)
model.fit(x=X, y=Y)
print(model.coef_.round(2)) # β (N × M)
print(model.intercept_.round(4)) # α (N,)
# Predict and score (scikit-learn compatible)
y_hat = model.predict(X) # Ŷ_t = α + β X_t (code: X @ β' + α)
r2 = model.score(X, Y) # mean R² across response variablesThe factor model in the paper uses column vectors:
where
In Python, pandas DataFrames store observations as rows. The code works with the row-major equivalent:
| Symbol | Paper (column-vector) | Code (row-major, pandas) |
|---|---|---|
y: DataFrame |
||
x: DataFrame |
||
coef_: DataFrame |
||
intercept_: Series |
The coefficient matrix coef_ is stored in the paper convention Y = X @ β' + α in code is the row-major form of the paper's Y_t = α + β X_t.
Enforce domain knowledge on coefficient signs using a constraint matrix where
1 = non-negative, -1 = non-positive, 0 = constrained to zero, NaN = free:
signs = pd.DataFrame([[1, np.nan, 1], [np.nan, 1, 0], [1, 0, np.nan],
[np.nan, 1, 1], [1, 1, np.nan]],
index=Y.columns, columns=X.columns)
model = LassoModel(reg_lambda=1e-4, factors_beta_loading_signs=signs)
model.fit(x=X, y=Y)
# All constrained coefficients satisfy their sign requirements by constructionShrink toward a non-zero prior instead of zero. When you have prior estimates
beta_prior = pd.DataFrame(beta_true, index=Y.columns, columns=X.columns)
model = LassoModel(reg_lambda=1e-2, factors_beta_prior=beta_prior)
model.fit(x=X, y=Y) # shrinks toward beta_prior instead of zeroAutomatically discover group structure among response variables via hierarchical clustering on their correlation matrix (Ward's method), then apply Group LASSO with group-adaptive penalties:
model = LassoModel(
model_type=LassoModelType.GROUP_LASSO_CLUSTERS,
reg_lambda=1e-5, span=52,
)
model.fit(x=X, y=Y)
print(model.clusters) # auto-discovered groupsVariables with different history lengths are handled naturally.
Instead of dropping any row containing a NaN (which discards valid observations
for all other variables), factorlasso applies a binary validity mask that
zeros out the contribution of missing observations per variable while
preserving all available data:
Y_with_gaps = Y.copy()
Y_with_gaps.iloc[:50, 3] = np.nan # variable y3 starts 50 periods later
Y_with_gaps.iloc[:100, 4] = np.nan # variable y4 starts 100 periods later
model = LassoModel(reg_lambda=1e-4)
model.fit(x=X, y=Y_with_gaps)
# All 5 variables estimated using their full available history
# No data discarded for y0, y1, y2 despite gaps in y3, y4After estimation, assemble the consistent factor covariance decomposition
from factorlasso import CurrentFactorCovarData, VarianceColumns
sigma_y = CurrentFactorCovarData(
x_covar=factor_covariance, # Σ_x (M × M)
y_betas=model.coef_, # β (N × M) from estimation
y_variances=diagnostics_df, # residual variances D
).get_y_covar()
# sigma_y is (N × N) positive semi-definite by constructionThe API follows scikit-learn conventions: fit / predict / score.
| Method | Description |
|---|---|
model.fit(x, y) |
Estimate α, β — returns self |
model.predict(x) |
Return Ŷ_t = α + β X_t (row-major: X @ β' + α) |
model.score(x, y) |
Return mean R² |
| Fitted attribute | Shape | Description |
|---|---|---|
coef_ |
(N, M) | Factor loadings β |
intercept_ |
(N,) | Intercept α |
estimated_betas |
(N, M) | Alias for coef_ (backward compat) |
clusters_ |
(N,) | HCGL cluster labels |
estimation_result_ |
— | Full diagnostics (r2, ss_res, ss_total) |
| Parameter | Type | Default | Description |
|---|---|---|---|
model_type |
LassoModelType |
LASSO |
Estimation method |
reg_lambda |
float |
1e-5 |
Regularisation strength |
span |
int |
None |
EWMA span for observation weighting |
factors_beta_loading_signs |
DataFrame |
None |
Sign constraint matrix (N × M) |
factors_beta_prior |
DataFrame |
None |
Prior β₀ matrix (N × M) |
group_data |
Series |
None |
Group labels (required for GROUP_LASSO) |
demean |
bool |
True |
Subtract (rolling) mean before estimation |
solver |
str |
'CLARABEL' |
CVXPY solver name |
warmup_period |
int |
12 |
Min observations before including a variable |
| Method | LassoModelType |
Penalty |
|---|---|---|
| LASSO | LASSO |
|
| Group LASSO | GROUP_LASSO |
$\sum_g \lambda\sqrt{ |
| HCGL | GROUP_LASSO_CLUSTERS |
Same as Group LASSO with auto-clustering |
All methods support sign constraints, prior-centered shrinkage, EWMA weighting, and NaN-aware estimation.
The methodology is domain-agnostic. Examples are provided for:
examples/finance_factor_model.py— Multi-asset factor models with sign-constrained betas and consistent covariance estimationexamples/genomics_factor_model.py— Gene expression driven by pathway activity factors with biological sign priors
The same estimation problem (sparse factor loadings with sign priors and consistent covariance) appears in macro-econometrics, signal processing, and multi-task learning.
from factorlasso import LassoModel, LassoModelType
model = LassoModel(
model_type=LassoModelType.GROUP_LASSO_CLUSTERS,
reg_lambda=1e-5,
span=52, # 1-year EWMA half-life (weekly data)
factors_beta_loading_signs=sign_matrix, # domain-knowledge constraints
factors_beta_prior=prior_betas, # shrink toward prior, not zero
)
model.fit(x=factor_returns, y=asset_returns)
# Inspect results
print(model.coef_) # sparse factor loadings (N × M)
print(model.intercept_) # intercept α (N,)
print(model.clusters_) # auto-discovered asset groups
print(model.score(factor_returns, asset_returns)) # mean R²| Package | Key Difference |
|---|---|
scikit-learn Lasso |
No sign constraints, no multi-output Group LASSO |
| skglm | No sign constraints, no prior-centered shrinkage |
| abess | Best-subset selection (L0), not L1/Group L2 |
| group-lasso | No sign constraints, no EWMA, no prior-centered |
factorlasso is the only package that combines sign-constrained penalised regression, prior-centered shrinkage, HCGL clustering, NaN-aware estimation, and integrated factor covariance assembly.
-
Sepp A., Ossa I., Kastenholz M. (2026), "Robust Optimization of Strategic and Tactical Asset Allocation for Multi-Asset Portfolios", The Journal of Portfolio Management, 52(4), 86–120. Paper link
-
Sepp A., Hansen E., Kastenholz M. (2026), "Capital Market Assumptions and Strategic Asset Allocation Using Multi-Asset Tradable Factors", Under revision at the Journal of Portfolio Management.
If you use factorlasso in your research, please cite the software and the underlying papers:
@software{sepp2026factorlasso,
author = {Sepp, Artur},
title = {factorlasso: Sparse Factor Model Estimation with Constrained LASSO in Python},
year = {2026},
url = {https://github.com/ArturSepp/factorlasso}
}
@article{seppossa2026,
author = {Sepp, Artur and Ossa, Ivan and Kastenholz, Mika},
title = {Robust Optimization of Strategic and Tactical Asset Allocation for Multi-Asset Portfolios},
journal = {The Journal of Portfolio Management},
volume = {52},
number = {4},
pages = {86--120},
year = {2026}
}
@article{sepphansen2026,
author = {Sepp, Artur and Hansen, Emilie and Kastenholz, Mika},
title = {Capital Market Assumptions and Strategic Asset Allocation Using Multi-Asset Tradable Factors},
journal = {Under revision at the Journal of Portfolio Management},
year = {2026}
}factorlasso package is distributed FREE & WITHOUT ANY WARRANTY under the MIT License.
See LICENSE for details.
Please report any bugs or suggestions by opening an issue.