Alberta Income Support: Caseload Forecast

End-to-end forecasting pipeline — from open data to 24-month projections.

This repository implements a complete monthly caseload forecasting system for Alberta's Income Support program. It ingests publicly available Government of Alberta data (April 2005 – September 2025, ~50,000 rows), transforms it through a six-stage reproducible pipeline, and produces a static HTML report with 24-month ARIMA forecasts and prediction intervals. The entire pipeline runs on-premises with a single command (Rscript flow.R) and is designed as the cloud-agnostic core for subsequent migration to Azure ML and Snowflake.

At a Glance

Parameter	Value
Data Source	Alberta Open Data – Income Support Caseload
Temporal Coverage	April 2005 – September 2025 (246 months)
Focal Date	`2025-09-01` (last month with observed data)
Forecast Horizon	24 months (October 2025 – September 2027)
Primary Model	ARIMA(3,1,1)(1,0,0)[12] with log transform
Baseline Model	Seasonal Naïve (Tier 1)
Output	Static HTML report with forecasts + 95% prediction intervals
Reproduce	`Rscript flow.R`

Pipeline Architecture

The system follows a Ferry → Ellis → Mint → Train → Forecast → Report pattern, adapted from RAnalysisSkeleton. Each stage is a self-contained R script orchestrated by flow.R.

Diagram source: manipulation/pipeline.md — render with Rscript utility/render-pipeline-diagram.R

Pipeline Stages

#	Stage	Script	Role	Key Output
1	Ferry	`1-ferry.R`	Import raw data from 4 equivalent sources (URL, CSV, SQLite, SQL Server)	Staging SQLite database
2	Ellis	`2-ellis.R`	Transform raw data into 11 analysis-ready tables (wide + long)	Parquet files + SQLite + CACHE-manifest
—	EDA	`eda-2.qmd`	Advisory — trend, seasonality, stationarity diagnostics	Quarto HTML report (not in pipeline flow)
3	Mint	`3-mint-IS.R`	Train/test split, log transform, regressor matrices	`forge/` Parquet slices + `forge_manifest.yml`
4	Train	`4-train-IS.R`	Fit Tier 1 (Seasonal Naïve) and Tier 2 (ARIMA) models	Model `.rds` objects + `model_registry.csv`
5	Forecast	`5-forecast-IS.R`	Generate 24-month projections + backtest diagnostics	Forecast CSVs + `forecast_manifest.yml`
6	Report	`report-1.qmd`	Assemble final HTML deliverable	`report-1.html`

Lineage: A forge_hash in forge_manifest.yml links every model and forecast back to the exact data slice that produced it. Changing the focal_date in config.yml invalidates all Mint/Train/Forecast artifacts downstream.

Dual format: Ellis outputs both Apache Parquet (primary — fast, columnar, cloud-ready) and SQLite (secondary — SQL exploration, portability).

See manipulation/pipeline.md for full technical documentation.

Reports & Exploratory Analysis

Forecast Report

The final deliverable: 24-month forward projections for Alberta Income Support caseload (October 2025 – September 2027).

Includes executive summary, hero forecast chart, model comparison (Seasonal Naïve vs ARIMA), backtest validation against a held-out 24-month window, policy implications, and full data provenance.

_{20-year history + 24-month ARIMA projection with 80%/95% prediction intervals}

_{Backtest evidence — actual vs. fitted on held-out 24-month window}

Output: analysis/report-1/report-1.html · Generated as Lane 6 of flow.R

Exploratory Data Analysis (EDA)

The EDA report (eda-2.qmd) runs outside the pipeline as an advisory step. It diagnoses the time series properties that inform modeling decisions in Mint and Train.

Coverage: 20-year trends · client type composition · 7 historical periods (2008 crisis, oil collapse, COVID-19, recovery) · year-over-year seasonality · growth rates · stationarity tests (ADF, KPSS) · ACF/PACF for ARIMA order selection · STL decomposition · log transform assessment.

_{Total caseload — 20 years of monthly Income Support data annotated with historical periods}

_{Train/test partition used for backtest evaluation}

_{STL seasonal decomposition — trend, seasonal, and remainder components}

Output: analysis/eda-2/eda-2.html · Run manually, not part of flow.R

Data

Source: Alberta Open Data — monthly aggregates of Income Support caseload, intakes, and exits by geography and demographics.

Ellis transforms the raw data into 11 analysis-ready tables:

Dimension	Wide	Long	Coverage
Total Caseload	246 rows	—	Apr 2005 – Sep 2025
Client Type	162	648	Apr 2012 – Sep 2025
Family Composition	162	648	Apr 2012 – Sep 2025
Regions	90	720	Apr 2018 – Sep 2025
Age Groups	66	990	Apr 2020 – Sep 2025
Gender	66	198	Apr 2020 – Sep 2025

Dimensional availability expanded over five historical phases as the Government of Alberta added breakdowns to the published dataset.

See data-public/metadata/CACHE-manifest.md for table schemas and row counts.

Cloud Migration Roadmap

This repository is the cloud-agnostic on-premises core. It establishes a fully functional forecasting pipeline that can be extended to cloud ML platforms through provider-specific forks.

Phase 1 — On-Premises (Current)

All six pipeline stages execute end-to-end on a local workstation. Reproduction requires only R, Quarto, and Rscript flow.R. Outputs are static HTML reports suitable for manual distribution or SharePoint hosting.

Phase 2 — Cloud ML Platforms (Planned)

Capability	Azure ML (primary)	Snowflake ML (secondary)
Compute	Azure ML Compute Instances	Snowflake Warehouses
Pipeline Orchestration	Azure ML Pipelines	Snowflake Tasks
Model Registry	Azure ML Model Registry + MLflow	Snowflake Model Registry
Experiment Tracking	MLflow	Snowflake ML
Report Delivery	Cloud-hosted app with Entra ID auth	Streamlit in Snowflake
Scheduling	Azure ML scheduled runs	Snowflake Tasks (cron)
Fork Repository	`caseload-forecast-demo-azure`	`caseload-forecast-demo-snowflake`

What Stays / What Changes

Stays On-Prem (R)	Migrates to Cloud
Data wrangling (Ferry, Ellis)	Pipeline orchestration (flow.R → cloud scheduler)
EDA diagnostics	Model training (if Python SDK integration is smoother)
Domain logic & config	Model registry & experiment tracking
	Report hosting & access control
	Monthly automated refresh

Design-for-Portability Decisions

Apache Parquet artifacts — columnar, cross-platform, cloud-native
Modular pipeline stages — each script is a self-contained step
Config-driven paths — all parameters in config.yml, no hardcoded values
Forge manifest — versioned data contract linking Mint → Train → Forecast
Deterministic seeds — reproducible results across environments

The companion workspace azure-aml-demo/ contains a read-only Azure ML reference project with notebooks for data exploration, feature engineering, training pipelines, and deployment.

Reproduce the Pipeline

Prerequisites

R (4.0+) · Quarto · Git
VS Code or RStudio

Run

git clone https://github.com/andkov/caseload-forecast-demo.git
cd caseload-forecast-demo
Rscript flow.R

This executes all pipeline stages in order and produces analysis/report-1/report-1.html.

Install R Dependencies

Choose one approach (see guides/environment-management.md for details):

Approach	Command	Best For
CSV System (default)	`Rscript utility/install-packages.R`	Day-to-day development
renv	`Rscript utility/init-renv.R`	Exact reproducibility
Conda	`conda env create -f environment.yml`	R + Python workflows

Configuration

All pipeline parameters live in config.yml: focal date, forecast horizon, model settings, directory paths, and database connections. Changing focal_date triggers a full re-run of Mint → Train → Forecast.

Repository Structure

caseload-forecast-demo/
├── flow.R                  # Pipeline orchestrator — single entry point
├── config.yml              # All pipeline parameters
├── manipulation/           # Pipeline scripts (1-ferry → 5-forecast)
│   ├── 1-ferry.R           #   Data import from 4 equivalent sources
│   ├── 2-ellis.R           #   Transform to 11 analysis-ready tables
│   ├── 3-mint-IS.R         #   Model-ready data preparation
│   ├── 4-train-IS.R        #   Model estimation (Naïve + ARIMA)
│   ├── 5-forecast-IS.R     #   24-month projections + backtest
│   └── pipeline.md         #   Full pipeline documentation
├── analysis/               # Reports and exploratory analysis
│   ├── eda-2/              #   EDA: trends, seasonality, stationarity
│   └── report-1/           #   Final forecast report (HTML)
├── data-public/            # Open data, metadata, manifests
├── data-private/           # Derived artifacts (gitignored)
├── scripts/                # Shared functions and graphing utilities
├── guides/                 # Documentation and how-to guides
├── ai/                     # AI support system (personas, memory)
└── philosophy/             # Research methodology (FIDES framework)

AI Support

This project includes a persona-based AI support system with 9 specialized roles: Developer, Project Manager, Research Scientist, Prompt Engineer, Data Engineer, Grapher, Reporter, DevOps Engineer, and Frontend Architect. Context is managed dynamically via VS Code tasks (Ctrl+Shift+P → "Tasks: Run Task" → "Activate [Persona] Persona"). See ai/README.md for full documentation.

Additional Resources

manipulation/pipeline.md — Full pipeline architecture and data flow documentation
guides/getting-started.md — Extended setup walkthrough
guides/flow-usage.md — How flow.R orchestration works
data-public/metadata/CACHE-manifest.md — Ellis output table schemas
philosophy/ — FIDES research methodology framework
RAnalysisSkeleton — Foundational reproducible research patterns

Licensed under MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
.github		.github
.vscode		.vscode
_frontend-1		_frontend-1
ai		ai
analysis		analysis
data-private		data-private
data-public		data-public
guides		guides
libs		libs
manipulation		manipulation
philosophy		philosophy
scripts		scripts
utility		utility
.copilot-persona		.copilot-persona
.gitignore		.gitignore
.lintr		.lintr
LICENSE		LICENSE
README.md		README.md
caseload-forecast-demo.Rproj		caseload-forecast-demo.Rproj
caseload-forecast-demo.code-workspace		caseload-forecast-demo.code-workspace
config.yml		config.yml
environment.yml		environment.yml
flow.R		flow.R
install-all-packages.R		install-all-packages.R
llms.txt		llms.txt
setup-nodejs.ps1		setup-nodejs.ps1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Alberta Income Support: Caseload Forecast

At a Glance

Pipeline Architecture

Pipeline Stages

Reports & Exploratory Analysis

Forecast Report

Exploratory Data Analysis (EDA)

Data

Cloud Migration Roadmap

Phase 1 — On-Premises (Current)

Phase 2 — Cloud ML Platforms (Planned)

What Stays / What Changes

Design-for-Portability Decisions

Reproduce the Pipeline

Prerequisites

Run

Install R Dependencies

Configuration

Repository Structure

AI Support

Additional Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Alberta Income Support: Caseload Forecast

At a Glance

Pipeline Architecture

Pipeline Stages

Reports & Exploratory Analysis

Forecast Report

Exploratory Data Analysis (EDA)

Data

Cloud Migration Roadmap

Phase 1 — On-Premises (Current)

Phase 2 — Cloud ML Platforms (Planned)

What Stays / What Changes

Design-for-Portability Decisions

Reproduce the Pipeline

Prerequisites

Run

Install R Dependencies

Configuration

Repository Structure

AI Support

Additional Resources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages