This repository contains the code for the paper:
When Deep Learning Fails: Treatment Effect Estimation in Small-Animal Longitudinal Studies
We demonstrate that simple statistical models (Gaussian Process, Exponential Curve) outperform complex SSM architectures for treatment effect estimation in small-N veterinary studies, using feline chronic kidney disease (CKD) as a case study.
Despite the popularity of state-space models (SSM/Mamba) for longitudinal causal inference, we show that on N<10 subjects:
- Gaussian Process: R² ≈ 0.24 (best)
- Exponential Curve: R² ≈ 0.23
- SSM/Mamba: R² ≈ -0.63 (severely overfit)
This finding challenges the assumption that deep learning methods are universally superior.
.
├── README.md
├── requirements.txt
├── setup.py
├── ckd_experiments/
│ ├── __init__.py
│ ├── main.py # Main entry point
│ ├── run_experiments.py # Experiment runner
│ ├── data_loader.py # Data loading from Excel files
│ ├── preprocessing.py # Data preprocessing
│ ├── simulation.py # Synthetic data generation
│ ├── baseline_models.py # GP, ITSA, ExpCurve models
│ └── ssm_model.py # SSM/Mamba treatment effect model
└── data/ # Place your data here
pip install -r requirements.txt- Python 3.10+
- PyTorch 2.0+ (for SSM models, optional)
- NumPy, Pandas, SciPy
- scikit-learn
- openpyxl (for Excel file reading)
- GPyTorch (optional, for advanced GP models)
Place your longitudinal data in the data/ directory. The expected format is:
- Excel files with sheet names representing dates
- Columns for different subjects/cats
- Rows for different variables
cd ckd_experiments
python main.pyThis will:
- Load and preprocess your data
- Run Leave-One-Subject-Out Cross-Validation (LOSO-CV)
- Estimate treatment effects with bootstrap confidence intervals
- Run sensitivity analyses
- Generate simulation studies
from ckd_experiments import load_all_data, create_unified_dataframe
from ckd_experiments.preprocessing import select_core_variables, align_to_common_timepoints
from ckd_experiments.baseline_models import GaussianProcessModel, LinearITSA, ExponentialCurveModel
from ckd_experiments.ssm_model import SSMTreatmentEffectModel, S4ModelWrapper
from ckd_experiments.simulation import SyntheticCKDGenerator
# Load data
data = load_all_data('path/to/data')
unified = create_unified_dataframe(data)
# Select variables and align
df_sel = select_core_variables(unified)
df_aligned = align_to_common_timepoints(df_sel)
# Train models
gp = GaussianProcessModel()
gp.fit(X_train, y_train, groups_train)
predictions = gp.predict(X_test)
# Estimate treatment effects with bootstrap
from ckd_experiments.run_experiments import run_bootstrap_ate
bootstrap_results = run_bootstrap_ate(X, y, groups, n_bootstrap=1000)- Linear ITSA - Interrupted Time Series Analysis
- Gaussian Process - RBF kernel regression
- Exponential Curve - Log-linear trajectory modeling
- Simple LSTM - PyTorch LSTM baseline
- SSM (Mamba) - State-space model with treatment conditioning
- S4 Model - Structured State Space Sequence Model
We use subject-level bootstrap resampling (1000 iterations) for ATE uncertainty quantification:
def run_bootstrap_ate(X, y, groups, n_bootstrap=1000):
"""
Bootstrap confidence intervals for Average Treatment Effect.
Resamples entire subjects within groups.
"""
# Implementation in run_experiments.pyOn our feline CKD dataset (N=9 cats, 11 timepoints):
| Model | MAE | RMSE | R² |
|---|---|---|---|
| Exponential Curve | 55.90 | 96.12 | 0.234 |
| Gaussian Process | 56.25 | 96.24 | 0.240 |
| Linear ITSA | 56.91 | 91.28 | -0.628 |
| S4 Treatment | 70.72 | 124.35 | -0.549 |
| SSM Mamba | 75.09 | 127.52 | -0.631 |
Bootstrap ATE: -61.13 mg/dL (95% CI [-80.79, -42.29])
If you use this code in your research, please cite:
@article{ckd2026treatment,
title={When Deep Learning Fails: Treatment Effect Estimation in Small-Animal Longitudinal Studies},
author={},
journal={},
year={2026}
}MIT License
For questions about the code, please open an issue on GitHub.