Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
c446d64
feat: implement external signals ingestion pipeline with database int…
JasonEran Feb 1, 2026
66be642
feat: Add external signal feed health tracking and status API
JasonEran Feb 2, 2026
8b48bee
feat: add external signals and feeds integration with dashboard compo…
JasonEran Feb 2, 2026
330de64
feat: add support for optional HTTP listener when mTLS is enabled for…
JasonEran Feb 4, 2026
da6c1ce
feat(v2.3): finish milestone 0 signals retention and normalization
JasonEran Feb 11, 2026
cf49688
Define and version the S_v/P_v/B_s schema
JasonEran Feb 11, 2026
7cc4627
Added FinBERT integration + rollback mechanism
JasonEran Feb 11, 2026
d162326
Added separate SUMMARY_SCHEMA_VERSION and /signals/summarize output v…
JasonEran Feb 11, 2026
7cb95b3
Core integration with AI Engine
JasonEran Feb 11, 2026
99a5864
feat(v2.3): implement batch enrichment endpoint and update enrichment…
JasonEran Feb 19, 2026
d77f3b2
feat(v2.3): enhance observability with custom metrics and tracing for…
JasonEran Feb 22, 2026
6703703
feat(v2.3): add OTLP endpoint resolution for tracing and metrics conf…
JasonEran Feb 22, 2026
44ced1c
feat(v2.3): add M2 data acquisition scripts and provenance docs
JasonEran Feb 22, 2026
43cc805
feat(v2.3): add TSMixer baseline training and ONNX export workflow
JasonEran Feb 24, 2026
8f2b5ba
feat(v2.3): add fusion baseline training and input contract
JasonEran Feb 24, 2026
cfc8064
feat(v2.3): add backtesting harness for fusion vs v2.2 heuristic
JasonEran Feb 24, 2026
86bcc96
feat(v2.3): add model artifact versioning and reproducibility checks
JasonEran Feb 25, 2026
95b4f4f
feat(v2.3): extend heartbeat payload with semantic features
JasonEran Feb 25, 2026
0783022
feat(v2.3): add agent local inference runtime with rollout gates
JasonEran Feb 25, 2026
e817897
feat(v2.3): add per-agent inference rollout gating and fallback
JasonEran Feb 25, 2026
71b66d4
feat(v2.3): add canary rollback runbook and evaluator
JasonEran Feb 25, 2026
112ab1a
feat(v2.3): implement dynamic risk alpha policy with guardrails
JasonEran Feb 25, 2026
ebb8b4b
feat(m4): add dashboard explainability with alpha and top signals (#44)
JasonEran Feb 25, 2026
5887cad
docs(v2.3): add release notes and acceptance PR checklist
JasonEran Feb 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Pull Request

For v2.3 release-track PRs, you can use:
`docs/PR-Template-v2.3-Acceptance.md`

## Summary
Describe the changes and their purpose.

Expand Down
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,23 @@ Semantic Versioning.
- Snapshot retention sweeper with optional S3 lifecycle configuration.
- Supply-chain workflow for SBOM generation, cosign signing, and SLSA container provenance.
- API key protection for telemetry ingestion and snapshot artifact endpoints.
- External signals ingestion pipeline (RSS feeds) with persisted `external_signals` table.
- External signal feed health tracking (`external_signal_feeds`) and feed status API.
- Parser regression tests for RSS/Atom feeds.
- AI Engine semantic enrichment service (`/signals/enrich`) with FinBERT/heuristic fallback.
- Batch semantic enrichment endpoint (`/signals/enrich/batch`) with schema-versioned vectors.
- v2.3 Milestone 1 smoke-test checklist in `docs/QA-SmokeTest-v2.3-M1.md`.
- v2.3 multimodal predictive architecture document in `docs/ARCHITECTURE-v2.3.md`.
- v2.3 delivery roadmap in `docs/ROADMAP-v2.3.md`.
- Expanded v2.3 roadmap with model choices, data sources, and validation guidance.
- Verification scripts now support API key headers and optional agent build flags.
- Optional HTTP listener when mTLS is enabled to keep dashboard/AI traffic on port 8080.
- v2.3 release notes (`docs/Release-Notes-v2.3.md`) and PR acceptance template (`docs/PR-Template-v2.3-Acceptance.md`).

### Changed
- Agent now injects W3C trace headers for HTTP requests.
- Dashboard dependencies updated to Next.js 16.1.6.
- Core external-signal ingestion now prefers batch enrichment and falls back to per-item enrichment.

### Deprecated
-
Expand Down
28 changes: 23 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ v2.2 reference architecture with a concrete implementation guide.

## Project Status

- Stage: v2.2 baseline delivered (Phase 0-4). v2.3 transition roadmap in docs/ARCHITECTURE-v2.3.md.
- Stage: v2.2 baseline delivered (Phase 0-4). v2.3 Milestones 1-4 delivered.
- License: MIT
- Authors: Qi Junyi, Xiao Erdong (2026)
- Sponsor: https://github.com/sponsors/JasonEran
Expand Down Expand Up @@ -131,10 +131,10 @@ This project targets a product-grade release, not a demo. The following standard
- [x] Add snapshot retention automation and S3 lifecycle policy support.
- [x] Generate SBOMs and sign container images with cosign in CI.

## v2.3 Preview (Roadmap)
## v2.3 Delivery (Roadmap)

We keep the current README focused on v2.2 implementation details. The next evolution is documented in
`docs/ARCHITECTURE-v2.3.md`. In brief, v2.3 moves from reactive thresholds to predictive, multimodal risk allocation:
v2.3 architecture and delivery detail are documented in `docs/ARCHITECTURE-v2.3.md` and
`docs/ROADMAP-v2.3.md`. In brief, v2.3 moves from reactive thresholds to predictive, multimodal risk allocation:

- Multimodal inputs: telemetry plus external cloud signals (status pages, incident reports, capacity advisories).
- Lightweight time-series forecasting on agents, with semantic enrichment computed in the control plane.
Expand Down Expand Up @@ -194,6 +194,23 @@ Open the dashboard at http://localhost:3000.
- Observability (OpenTelemetry): docs/Observability.md
- v2.3 architecture roadmap: docs/ARCHITECTURE-v2.3.md
- v2.3 delivery roadmap: docs/ROADMAP-v2.3.md
- v2.3 Milestone 1 smoke test: docs/QA-SmokeTest-v2.3-M1.md
- v2.3 M2 data provenance: docs/Data-Provenance-v2.3-M2.md
- v2.3 M2 data acquisition scripts: scripts/data_acquisition/README.md
- v2.3 M2 TSMixer baseline guide: docs/AI-TSMixer-Baseline-v2.3-M2.md
- v2.3 M2 fusion baseline guide: docs/AI-Fusion-Model-v2.3-M2.md
- v2.3 M2 backtesting guide: docs/AI-Backtesting-v2.3-M2.md
- v2.3 M2 artifact versioning + reproducibility guide: docs/AI-Artifact-Versioning-v2.3-M2.md
- v2.3 M2 model training scripts: scripts/model_training/README.md
- v2.3 M3 heartbeat semantic payload contract: docs/PROTO-Heartbeat-Semantic-v2.3-M3.md
- v2.3 M3 agent ONNX inference + gating: docs/Agent-ONNX-Inference-v2.3-M3.md
- v2.3 M3 core semantic rollout + per-agent gating: docs/Core-Semantic-Rollout-v2.3-M3.md
- v2.3 M3 canary + rollback plan: docs/QA-Canary-Rollback-v2.3-M3.md
- v2.3 M3 canary evaluator script: scripts/qa/README.md
- v2.3 M4 dynamic risk allocation (core): docs/Core-Dynamic-Risk-v2.3-M4.md
- v2.3 M4 dashboard explainability: docs/Web-Explainability-v2.3-M4.md
- v2.3 release notes: docs/Release-Notes-v2.3.md
- v2.3 PR acceptance template: docs/PR-Template-v2.3-Acceptance.md

If you want to simulate migrations, start at least two agents:

Expand Down Expand Up @@ -358,7 +375,8 @@ sidecars to issue and rotate X.509 SVIDs:

- Core serves mTLS on `https://core-service:8443` (host-mapped to 5001).
- Agent uses SPIFFE-issued certs from `/run/spiffe/certs` and calls the mTLS endpoint.
- HTTP on `http://core-service:8080` remains for dashboard/AI traffic.
- When `Security__Mtls__AllowHttp=true`, Core also listens on `http://core-service:8080` for
dashboard/AI traffic (host-mapped to 5000).

Disable mTLS locally by setting:

Expand Down
2 changes: 2 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ services:
ArtifactBaseUrl: "https://core-service:8443"
Security__Mtls__Enabled: "true"
Security__Mtls__Port: "8443"
Security__Mtls__AllowHttp: "true"
Security__Mtls__HttpPort: "8080"
Security__Mtls__CertificatePath: "/run/spiffe/certs/svid.pem"
Security__Mtls__KeyPath: "/run/spiffe/certs/svid_key.pem"
Security__Mtls__BundlePath: "/run/spiffe/certs/bundle.pem"
Expand Down
77 changes: 77 additions & 0 deletions docs/AI-Artifact-Versioning-v2.3-M2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# v2.3 M2 Model Artifact Versioning + Reproducible Runs

This document captures the delivery for issue #38 (`[Ops] Model artifact versioning + reproducible runs`).

## Goal

Make offline model artifacts release-safe and reproducible for Milestone 2.

## Implemented Components

- Shared utility: `scripts/model_training/artifact_registry.py`
- Repro check runner: `scripts/model_training/verify_reproducible_run.py`
- Integrated into:
- `scripts/model_training/train_tsmixer_baseline.py`
- `scripts/model_training/train_fusion_baseline.py`
- `scripts/model_training/backtest_fusion_vs_v22.py`

## Artifact Naming / Versioning Scheme

Each run computes:

- `run_version` (CLI flag, default `v2.3-m2`)
- deterministic `run_id` = `<run_version>-<12-char-fingerprint>`
- full `run_fingerprint_sha256` derived from config + dataset descriptor + git commit

Each run outputs:

- base artifacts (legacy names kept for compatibility)
- `run_manifest.json` with file hashes and provenance metadata
- `versioned/` copies named:
- `<pipeline>-<run_id>-<artifact-role>.<ext>`

## Run Manifest Schema (`run_manifest.json`)

`schema_version: v1` payload includes:

- pipeline metadata (`pipeline`, `run_version`, `run_id`, `run_fingerprint_sha256`)
- git metadata (`commit`, `dirty_worktree`)
- deterministic run config
- dataset/input descriptors (including file hash when a file path exists)
- key metrics used for promotion decisions
- artifact inventory (`path`, `sha256`, `bytes`)

## Reproducibility Verification

Use `verify_reproducible_run.py` to execute the same command twice and compare artifact hashes.

TSMixer example:

```bash
python scripts/model_training/verify_reproducible_run.py \
--script scripts/model_training/train_tsmixer_baseline.py \
--base-output-dir .tmp/repro-check/tsmixer \
--artifacts tsmixer_baseline.pt,tsmixer_baseline.onnx,training_summary.json,run_manifest.json \
-- --epochs 6 --batch-size 128
```

Fusion example:

```bash
python scripts/model_training/verify_reproducible_run.py \
--script scripts/model_training/train_fusion_baseline.py \
--base-output-dir .tmp/repro-check/fusion \
--artifacts telemetry_only_baseline.pt,fusion_baseline.pt,fusion_evaluation_summary.json,run_manifest.json \
-- --epochs 8 --batch-size 128
```

Verification report location:

- `<base-output-dir>/reproducibility_check.json`

Acceptance is met when `all_artifacts_identical` is `true`.

## Acceptance Criteria Mapping

- [x] Artifact naming/versioning scheme
- [x] Re-run produces identical outputs (validated by hash comparison)
56 changes: 56 additions & 0 deletions docs/AI-Backtesting-v2.3-M2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# v2.3 M2 Backtesting Harness (v2.3 Fusion vs v2.2 Heuristic)

This document describes the offline backtesting runner delivered for issue #37.

## Goal

Validate v2.3 fusion model improvements against v2.2 heuristic decisions on held-out windows.

## Runner

- `scripts/model_training/backtest_fusion_vs_v22.py`

## Inputs

- Fusion checkpoint (`fusion_baseline.pt`) from issue #36.
- Optional dataset CSV with telemetry + semantic columns.
- If dataset contract is not met, deterministic synthetic fallback is used and `fallback_reason` is recorded.

## Compared Strategies

1. **v2.2 heuristic**
- Uses legacy `RiskScorer` decision (`CRITICAL` => positive preemption signal).
2. **v2.3 fusion**
- Uses fusion model probability with configurable decision threshold.

## Held-Out Backtest Protocol

- Build chronological windows from replay dataset.
- Reserve the tail portion (`backtest_ratio`) as held-out period.
- Evaluate both strategies on the same held-out windows.

## Metrics

Reported per strategy:

- Accuracy
- Precision
- Recall
- F1
- AUROC (if both classes present)
- Average Precision (if both classes present)
- Positive prediction rate

Reported deltas:

- `f1_delta_fusion_minus_v22`
- `recall_delta_fusion_minus_v22`
- `precision_delta_fusion_minus_v22`
- `auroc_delta_fusion_minus_v22`

## Outputs

Per run output directory contains:

- `backtest_summary.json`
- `backtest_report.md`
76 changes: 76 additions & 0 deletions docs/AI-Fusion-Model-v2.3-M2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# v2.3 M2 Fusion Model Baseline (`P(preempt)`)

This document defines the input contract and offline baseline evaluation for issue #36.

## Goal

Fuse telemetry windows with semantic exogenous vectors (`S_v`, `P_v`, `B_s`) and produce `P(preempt)`.

## Training Entry Point

- `scripts/model_training/train_fusion_baseline.py`

## Input Contract (Offline CSV)

### Required semantic columns

- `s_v_negative`
- `s_v_neutral`
- `s_v_positive`
- `p_v`
- `b_s`

### Telemetry columns

Configured by `--telemetry-columns`. Default order:

- `spot_price_usd`
- `cpu_utilization`
- `memory_utilization`
- `network_io`

At least one configured telemetry column must exist in the dataset.

### Optional label

- `label_preempt` (binary, 0/1)

If missing, labels are derived by:

- future return >= `--label-threshold` OR
- current `p_v >= 0.75`

### Windowing semantics

- `window_size`: telemetry lookback length.
- `horizon`: prediction target offset.
- Per training sample:
- telemetry tensor: `[window_size, telemetry_dim]`
- semantic tensor: `[semantic_dim]` at the end of window
- label: binary target at `end + horizon`

## Offline Baseline Evaluation

The script trains and evaluates two models on the same split:

1. Telemetry-only baseline
2. Fusion baseline (telemetry branch + semantic branch)

Outputs:

- `telemetry_only_baseline.pt`
- `fusion_baseline.pt`
- `fusion_evaluation_summary.json`

Summary includes:

- train/val/test metrics: loss, accuracy, precision, recall, F1, AUROC, average precision
- comparison deltas:
- `test_f1_delta_fusion_minus_telemetry`
- `test_auroc_delta_fusion_minus_telemetry`

## Reproducibility

- Fixed `--seed` for Python/NumPy/PyTorch RNG.
- Deterministic PyTorch algorithms enabled.
- Deterministic split and normalization based on train partition.
49 changes: 49 additions & 0 deletions docs/AI-TSMixer-Baseline-v2.3-M2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# v2.3 M2 TSMixer Baseline + ONNX Export

This guide documents the baseline workflow delivered for issue #35.

## Goal

Train a lightweight time-series model for `P(preempt)` baseline inference and export ONNX artifacts for agent-side runtime integration.

## Entry Points

- Training script: `scripts/model_training/train_tsmixer_baseline.py`
- Script usage: `scripts/model_training/README.md`
- Dependency file: `scripts/model_training/requirements.txt`

## Reproducible Training

The script is reproducible by design:

- Fixed seed controls Python, NumPy, and PyTorch RNG.
- Deterministic PyTorch execution is enabled.
- Dataset split is deterministic for the same seed and input.
- Run configuration and metrics are written into `training_summary.json`.

## Dataset Modes

1. Real dataset mode: pass a spot-history CSV generated by data acquisition scripts.
2. Synthetic fallback mode: automatically used when the input dataset cannot produce enough windows.

Fallback reason is persisted in the summary metadata.

## ONNX Export and Validation

The script exports `tsmixer_baseline.onnx` and validates by default:

1. ONNX structure check (`onnx.checker`).
2. Inference parity check (PyTorch vs ONNX Runtime logits on held-out samples).

Validation details are saved under `onnx_validation` in `training_summary.json`.

## Artifacts

Per run output directory contains:

- `tsmixer_baseline.pt`
- `tsmixer_baseline.onnx`
- `training_summary.json`

These artifacts should be versioned by downstream issue #38 once model governance flow is implemented.
Artifact versioning is now implemented in issue #38 (see `docs/AI-Artifact-Versioning-v2.3-M2.md`).
1 change: 1 addition & 0 deletions docs/ARCHITECTURE-v2.3.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ Telemetry alone misses off-chart events. We introduce a semantic pipeline for cl
- **Model**: a domain-adapted transformer (BERT-class). If economic signals are used, FinBERT is a reasonable baseline
for finance-domain text; an LLM summarizer handles longer advisories and provider policy updates.
- **Outputs (standardized)**:
- `schemaVersion`: semantic vector schema version.
- `S_v`: sentiment vector (normalized polarity + severity).
- `P_v`: volatility probability (0-1).
- `B_s`: supply or capacity bias (long-horizon).
Expand Down
Loading
Loading