JasonEran · JasonEran · Feb 25, 2026 · Feb 1, 2026 · Feb 2, 2026 · Feb 2, 2026
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -1,5 +1,8 @@
 # Pull Request
 
+For v2.3 release-track PRs, you can use:
+`docs/PR-Template-v2.3-Acceptance.md`
+
 ## Summary
 Describe the changes and their purpose.
 

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -12,14 +12,23 @@ Semantic Versioning.
 - Snapshot retention sweeper with optional S3 lifecycle configuration.
 - Supply-chain workflow for SBOM generation, cosign signing, and SLSA container provenance.
 - API key protection for telemetry ingestion and snapshot artifact endpoints.
+- External signals ingestion pipeline (RSS feeds) with persisted `external_signals` table.
+- External signal feed health tracking (`external_signal_feeds`) and feed status API.
+- Parser regression tests for RSS/Atom feeds.
+- AI Engine semantic enrichment service (`/signals/enrich`) with FinBERT/heuristic fallback.
+- Batch semantic enrichment endpoint (`/signals/enrich/batch`) with schema-versioned vectors.
+- v2.3 Milestone 1 smoke-test checklist in `docs/QA-SmokeTest-v2.3-M1.md`.
 - v2.3 multimodal predictive architecture document in `docs/ARCHITECTURE-v2.3.md`.
 - v2.3 delivery roadmap in `docs/ROADMAP-v2.3.md`.
 - Expanded v2.3 roadmap with model choices, data sources, and validation guidance.
 - Verification scripts now support API key headers and optional agent build flags.
+- Optional HTTP listener when mTLS is enabled to keep dashboard/AI traffic on port 8080.
+- v2.3 release notes (`docs/Release-Notes-v2.3.md`) and PR acceptance template (`docs/PR-Template-v2.3-Acceptance.md`).
 
 ### Changed
 - Agent now injects W3C trace headers for HTTP requests.
 - Dashboard dependencies updated to Next.js 16.1.6.
+- Core external-signal ingestion now prefers batch enrichment and falls back to per-item enrichment.
 
 ### Deprecated
 - 

diff --git a/README.md b/README.md
@@ -18,7 +18,7 @@ v2.2 reference architecture with a concrete implementation guide.
 
 ## Project Status
 
-- Stage: v2.2 baseline delivered (Phase 0-4). v2.3 transition roadmap in docs/ARCHITECTURE-v2.3.md.
+- Stage: v2.2 baseline delivered (Phase 0-4). v2.3 Milestones 1-4 delivered.
 - License: MIT
 - Authors: Qi Junyi, Xiao Erdong (2026)
 - Sponsor: https://github.com/sponsors/JasonEran
@@ -131,10 +131,10 @@ This project targets a product-grade release, not a demo. The following standard
 - [x] Add snapshot retention automation and S3 lifecycle policy support.
 - [x] Generate SBOMs and sign container images with cosign in CI.
 
-## v2.3 Preview (Roadmap)
+## v2.3 Delivery (Roadmap)
 
-We keep the current README focused on v2.2 implementation details. The next evolution is documented in
-`docs/ARCHITECTURE-v2.3.md`. In brief, v2.3 moves from reactive thresholds to predictive, multimodal risk allocation:
+v2.3 architecture and delivery detail are documented in `docs/ARCHITECTURE-v2.3.md` and
+`docs/ROADMAP-v2.3.md`. In brief, v2.3 moves from reactive thresholds to predictive, multimodal risk allocation:
 
 - Multimodal inputs: telemetry plus external cloud signals (status pages, incident reports, capacity advisories).
 - Lightweight time-series forecasting on agents, with semantic enrichment computed in the control plane.
@@ -194,6 +194,23 @@ Open the dashboard at http://localhost:3000.
 - Observability (OpenTelemetry): docs/Observability.md
 - v2.3 architecture roadmap: docs/ARCHITECTURE-v2.3.md
 - v2.3 delivery roadmap: docs/ROADMAP-v2.3.md
+- v2.3 Milestone 1 smoke test: docs/QA-SmokeTest-v2.3-M1.md
+- v2.3 M2 data provenance: docs/Data-Provenance-v2.3-M2.md
+- v2.3 M2 data acquisition scripts: scripts/data_acquisition/README.md
+- v2.3 M2 TSMixer baseline guide: docs/AI-TSMixer-Baseline-v2.3-M2.md
+- v2.3 M2 fusion baseline guide: docs/AI-Fusion-Model-v2.3-M2.md
+- v2.3 M2 backtesting guide: docs/AI-Backtesting-v2.3-M2.md
+- v2.3 M2 artifact versioning + reproducibility guide: docs/AI-Artifact-Versioning-v2.3-M2.md
+- v2.3 M2 model training scripts: scripts/model_training/README.md
+- v2.3 M3 heartbeat semantic payload contract: docs/PROTO-Heartbeat-Semantic-v2.3-M3.md
+- v2.3 M3 agent ONNX inference + gating: docs/Agent-ONNX-Inference-v2.3-M3.md
+- v2.3 M3 core semantic rollout + per-agent gating: docs/Core-Semantic-Rollout-v2.3-M3.md
+- v2.3 M3 canary + rollback plan: docs/QA-Canary-Rollback-v2.3-M3.md
+- v2.3 M3 canary evaluator script: scripts/qa/README.md
+- v2.3 M4 dynamic risk allocation (core): docs/Core-Dynamic-Risk-v2.3-M4.md
+- v2.3 M4 dashboard explainability: docs/Web-Explainability-v2.3-M4.md
+- v2.3 release notes: docs/Release-Notes-v2.3.md
+- v2.3 PR acceptance template: docs/PR-Template-v2.3-Acceptance.md
 
 If you want to simulate migrations, start at least two agents:
 
@@ -358,7 +375,8 @@ sidecars to issue and rotate X.509 SVIDs:
 
 - Core serves mTLS on `https://core-service:8443` (host-mapped to 5001).
 - Agent uses SPIFFE-issued certs from `/run/spiffe/certs` and calls the mTLS endpoint.
-- HTTP on `http://core-service:8080` remains for dashboard/AI traffic.
+- When `Security__Mtls__AllowHttp=true`, Core also listens on `http://core-service:8080` for
+  dashboard/AI traffic (host-mapped to 5000).
 
 Disable mTLS locally by setting:
 

diff --git a/docker-compose.yml b/docker-compose.yml
@@ -21,6 +21,8 @@ services:
       ArtifactBaseUrl: "https://core-service:8443"
       Security__Mtls__Enabled: "true"
       Security__Mtls__Port: "8443"
+      Security__Mtls__AllowHttp: "true"
+      Security__Mtls__HttpPort: "8080"
       Security__Mtls__CertificatePath: "/run/spiffe/certs/svid.pem"
       Security__Mtls__KeyPath: "/run/spiffe/certs/svid_key.pem"
       Security__Mtls__BundlePath: "/run/spiffe/certs/bundle.pem"

diff --git a/docs/AI-Artifact-Versioning-v2.3-M2.md b/docs/AI-Artifact-Versioning-v2.3-M2.md
@@ -0,0 +1,77 @@
+# v2.3 M2 Model Artifact Versioning + Reproducible Runs
+
+This document captures the delivery for issue #38 (`[Ops] Model artifact versioning + reproducible runs`).
+
+## Goal
+
+Make offline model artifacts release-safe and reproducible for Milestone 2.
+
+## Implemented Components
+
+- Shared utility: `scripts/model_training/artifact_registry.py`
+- Repro check runner: `scripts/model_training/verify_reproducible_run.py`
+- Integrated into:
+  - `scripts/model_training/train_tsmixer_baseline.py`
+  - `scripts/model_training/train_fusion_baseline.py`
+  - `scripts/model_training/backtest_fusion_vs_v22.py`
+
+## Artifact Naming / Versioning Scheme
+
+Each run computes:
+
+- `run_version` (CLI flag, default `v2.3-m2`)
+- deterministic `run_id` = `<run_version>-<12-char-fingerprint>`
+- full `run_fingerprint_sha256` derived from config + dataset descriptor + git commit
+
+Each run outputs:
+
+- base artifacts (legacy names kept for compatibility)
+- `run_manifest.json` with file hashes and provenance metadata
+- `versioned/` copies named:
+  - `<pipeline>-<run_id>-<artifact-role>.<ext>`
+
+## Run Manifest Schema (`run_manifest.json`)
+
+`schema_version: v1` payload includes:
+
+- pipeline metadata (`pipeline`, `run_version`, `run_id`, `run_fingerprint_sha256`)
+- git metadata (`commit`, `dirty_worktree`)
+- deterministic run config
+- dataset/input descriptors (including file hash when a file path exists)
+- key metrics used for promotion decisions
+- artifact inventory (`path`, `sha256`, `bytes`)
+
+## Reproducibility Verification
+
+Use `verify_reproducible_run.py` to execute the same command twice and compare artifact hashes.
+
+TSMixer example:
+
+```bash
+python scripts/model_training/verify_reproducible_run.py \
+  --script scripts/model_training/train_tsmixer_baseline.py \
+  --base-output-dir .tmp/repro-check/tsmixer \
+  --artifacts tsmixer_baseline.pt,tsmixer_baseline.onnx,training_summary.json,run_manifest.json \
+  -- --epochs 6 --batch-size 128
+```
+
+Fusion example:
+
+```bash
+python scripts/model_training/verify_reproducible_run.py \
+  --script scripts/model_training/train_fusion_baseline.py \
+  --base-output-dir .tmp/repro-check/fusion \
+  --artifacts telemetry_only_baseline.pt,fusion_baseline.pt,fusion_evaluation_summary.json,run_manifest.json \
+  -- --epochs 8 --batch-size 128
+```
+
+Verification report location:
+
+- `<base-output-dir>/reproducibility_check.json`
+
+Acceptance is met when `all_artifacts_identical` is `true`.
+
+## Acceptance Criteria Mapping
+
+- [x] Artifact naming/versioning scheme
+- [x] Re-run produces identical outputs (validated by hash comparison)
diff --git a/docs/AI-Backtesting-v2.3-M2.md b/docs/AI-Backtesting-v2.3-M2.md
@@ -0,0 +1,56 @@
+# v2.3 M2 Backtesting Harness (v2.3 Fusion vs v2.2 Heuristic)
+
+This document describes the offline backtesting runner delivered for issue #37.
+
+## Goal
+
+Validate v2.3 fusion model improvements against v2.2 heuristic decisions on held-out windows.
+
+## Runner
+
+- `scripts/model_training/backtest_fusion_vs_v22.py`
+
+## Inputs
+
+- Fusion checkpoint (`fusion_baseline.pt`) from issue #36.
+- Optional dataset CSV with telemetry + semantic columns.
+- If dataset contract is not met, deterministic synthetic fallback is used and `fallback_reason` is recorded.
+
+## Compared Strategies
+
+1. **v2.2 heuristic**
+   - Uses legacy `RiskScorer` decision (`CRITICAL` => positive preemption signal).
+2. **v2.3 fusion**
+   - Uses fusion model probability with configurable decision threshold.
+
+## Held-Out Backtest Protocol
+
+- Build chronological windows from replay dataset.
+- Reserve the tail portion (`backtest_ratio`) as held-out period.
+- Evaluate both strategies on the same held-out windows.
+
+## Metrics
+
+Reported per strategy:
+
+- Accuracy
+- Precision
+- Recall
+- F1
+- AUROC (if both classes present)
+- Average Precision (if both classes present)
+- Positive prediction rate
+
+Reported deltas:
+
+- `f1_delta_fusion_minus_v22`
+- `recall_delta_fusion_minus_v22`
+- `precision_delta_fusion_minus_v22`
+- `auroc_delta_fusion_minus_v22`
+
+## Outputs
+
+Per run output directory contains:
+
+- `backtest_summary.json`
+- `backtest_report.md`
diff --git a/docs/AI-Fusion-Model-v2.3-M2.md b/docs/AI-Fusion-Model-v2.3-M2.md
@@ -0,0 +1,76 @@
+# v2.3 M2 Fusion Model Baseline (`P(preempt)`)
+
+This document defines the input contract and offline baseline evaluation for issue #36.
+
+## Goal
+
+Fuse telemetry windows with semantic exogenous vectors (`S_v`, `P_v`, `B_s`) and produce `P(preempt)`.
+
+## Training Entry Point
+
+- `scripts/model_training/train_fusion_baseline.py`
+
+## Input Contract (Offline CSV)
+
+### Required semantic columns
+
+- `s_v_negative`
+- `s_v_neutral`
+- `s_v_positive`
+- `p_v`
+- `b_s`
+
+### Telemetry columns
+
+Configured by `--telemetry-columns`. Default order:
+
+- `spot_price_usd`
+- `cpu_utilization`
+- `memory_utilization`
+- `network_io`
+
+At least one configured telemetry column must exist in the dataset.
+
+### Optional label
+
+- `label_preempt` (binary, 0/1)
+
+If missing, labels are derived by:
+
+- future return >= `--label-threshold` OR
+- current `p_v >= 0.75`
+
+### Windowing semantics
+
+- `window_size`: telemetry lookback length.
+- `horizon`: prediction target offset.
+- Per training sample:
+  - telemetry tensor: `[window_size, telemetry_dim]`
+  - semantic tensor: `[semantic_dim]` at the end of window
+  - label: binary target at `end + horizon`
+
+## Offline Baseline Evaluation
+
+The script trains and evaluates two models on the same split:
+
+1. Telemetry-only baseline
+2. Fusion baseline (telemetry branch + semantic branch)
+
+Outputs:
+
+- `telemetry_only_baseline.pt`
+- `fusion_baseline.pt`
+- `fusion_evaluation_summary.json`
+
+Summary includes:
+
+- train/val/test metrics: loss, accuracy, precision, recall, F1, AUROC, average precision
+- comparison deltas:
+  - `test_f1_delta_fusion_minus_telemetry`
+  - `test_auroc_delta_fusion_minus_telemetry`
+
+## Reproducibility
+
+- Fixed `--seed` for Python/NumPy/PyTorch RNG.
+- Deterministic PyTorch algorithms enabled.
+- Deterministic split and normalization based on train partition.
diff --git a/docs/AI-TSMixer-Baseline-v2.3-M2.md b/docs/AI-TSMixer-Baseline-v2.3-M2.md
@@ -0,0 +1,49 @@
+# v2.3 M2 TSMixer Baseline + ONNX Export
+
+This guide documents the baseline workflow delivered for issue #35.
+
+## Goal
+
+Train a lightweight time-series model for `P(preempt)` baseline inference and export ONNX artifacts for agent-side runtime integration.
+
+## Entry Points
+
+- Training script: `scripts/model_training/train_tsmixer_baseline.py`
+- Script usage: `scripts/model_training/README.md`
+- Dependency file: `scripts/model_training/requirements.txt`
+
+## Reproducible Training
+
+The script is reproducible by design:
+
+- Fixed seed controls Python, NumPy, and PyTorch RNG.
+- Deterministic PyTorch execution is enabled.
+- Dataset split is deterministic for the same seed and input.
+- Run configuration and metrics are written into `training_summary.json`.
+
+## Dataset Modes
+
+1. Real dataset mode: pass a spot-history CSV generated by data acquisition scripts.
+2. Synthetic fallback mode: automatically used when the input dataset cannot produce enough windows.
+
+Fallback reason is persisted in the summary metadata.
+
+## ONNX Export and Validation
+
+The script exports `tsmixer_baseline.onnx` and validates by default:
+
+1. ONNX structure check (`onnx.checker`).
+2. Inference parity check (PyTorch vs ONNX Runtime logits on held-out samples).
+
+Validation details are saved under `onnx_validation` in `training_summary.json`.
+
+## Artifacts
+
+Per run output directory contains:
+
+- `tsmixer_baseline.pt`
+- `tsmixer_baseline.onnx`
+- `training_summary.json`
+
+These artifacts should be versioned by downstream issue #38 once model governance flow is implemented.
+Artifact versioning is now implemented in issue #38 (see `docs/AI-Artifact-Versioning-v2.3-M2.md`).
diff --git a/docs/ARCHITECTURE-v2.3.md b/docs/ARCHITECTURE-v2.3.md
@@ -65,6 +65,7 @@ Telemetry alone misses off-chart events. We introduce a semantic pipeline for cl
 - **Model**: a domain-adapted transformer (BERT-class). If economic signals are used, FinBERT is a reasonable baseline
   for finance-domain text; an LLM summarizer handles longer advisories and provider policy updates.
 - **Outputs (standardized)**:
+  - `schemaVersion`: semantic vector schema version.
   - `S_v`: sentiment vector (normalized polarity + severity).
   - `P_v`: volatility probability (0-1).
   - `B_s`: supply or capacity bias (long-horizon).