diff --git a/docs/adr/ADR-180-through-wall-handoff-demo.md b/docs/adr/ADR-180-through-wall-handoff-demo.md new file mode 100644 index 0000000000..df65a9b503 --- /dev/null +++ b/docs/adr/ADR-180-through-wall-handoff-demo.md @@ -0,0 +1,272 @@ +# ADR-180: Through-Wall Camera↔CSI Hand-off Demo ("Behind the Wall") + +| Field | Value | +|-------|-------| +| **Status** | Proposed | +| **Date** | 2026-06-15 | +| **Deciders** | ruv | +| **Codename** | **BEHIND-THE-WALL** | +| **Builds on** | ADR-079 (camera ground-truth training), ADR-031 (sensing-first RF mode), ADR-134 (CSI→CIR multipath), ADR-029/030 (RuvSense multistatic + persistent field), ADR-024 (AETHER re-ID), ADR-151 (per-room calibration), ADR-173 (metric-locked PCK), ADR-095/096 (rvcsi nexmon) | + +## Context + +### The demo we want +A single self-contained **HTML page** that tells one honest, visceral story: + +1. You stand in front of the laptop. The camera tracks your **full skeletal pose**; + the WiFi-CSI model, trained on *your* movements moments earlier, infers the **same + skeleton** in parallel — a side-by-side "camera vs RF agree" view. +2. You **walk out the door and behind the wall**. The camera **goes blind** (you are + occluded — it honestly shows "no person in frame"). The CSI model **keeps inferring + your skeleton** from the WiFi signal alone — the 3D figure keeps walking, behind the + wall, smoothly. A badge flips from `CAMERA` to `RF-INFERRED (through-wall)`. +3. You **walk back into view**. The camera **re-acquires**; the badge flips back to + `CAMERA`, and the RF-inferred and camera skeletons reconverge. + +This is the "WiFi sees through walls" demo — and the user explicitly wants the **inferred +skeleton through the wall**, not just a blob. The project's "prove everything / no AI-slop" +bar means we make that claim **only because we measure it**: a second camera on the far side +of the wall records ground-truth pose *behind* the wall, so the through-wall skeleton's +accuracy is a **reported, reproducible number** — never an unfalsifiable "trust me." + +### Honest capability framing (the load-bearing section) +Through-wall **per-joint skeletal inference from WiFi CSI is not a generally-validated +capability** in open settings — WiFi-DensePose (CMU) is camera-*co-located*. What makes it +defensible *here* is the tightly-controlled regime and the measurement: + +- **Controlled regime:** one room, one subject (you), one doorway, a model **camera-supervised + on your exact gait and your exact through-door transition** (ADR-079) minutes earlier. This + is in-distribution for *this* demo, not a universal claim. +- **Measured, not asserted:** a far-side camera (cognitum-v0 has 17 `/dev/video*` nodes — use + one, or a phone) records ground-truth pose behind the wall. The through-wall CSI skeleton is + scored against it with the metric-locked PCK harness (ADR-173). **We publish the number.** +- **Uncertainty is rendered, not hidden:** the through-wall skeleton is drawn **translucent**, + with a live **per-joint confidence** and an explicit `RF-INFERRED` badge. High-confidence + joints render solid; low-confidence joints fade. It never masquerades as the camera's + ground-truth pose. + +| While… | Camera | WiFi CSI (S3 / Pi5 nexmon, fused) | 60 GHz mmWave (C6 + MR60BHA2) | +|--------|--------|-----------------------------------|-------------------------------| +| In frame | **Full 17-kpt pose** — ground truth | full skeleton (supervised model) — *agrees with camera* | presence + range + micro-motion | +| Behind a **drywall** | nothing (occluded) | **inferred full skeleton** (camera-supervised model + multistatic fusion), confidence-scored, **measured vs far-side camera** | presence + range + breathing — independent through-thin-wall confirm | +| Behind **brick/metal** | nothing | degrades to coarse motion/position only — report honestly | blocked | + +**The claim — stated precisely:** *"A WiFi-CSI model, camera-supervised on this subject and +room, infers a continuous skeletal pose that tracks the subject through a drywall partition; +through-wall accuracy is measured at X% PCK@k against a far-side camera (declared, not +claimed)."* If X turns out low, that is the **honest result we report** — the skeleton is still +rendered (the user wants it) but flagged with its true confidence, and the headline number is +whatever we measured, good or bad. + +### Why multistatic + supervision is the enabler +A single node behind a wall sees only "something moved." Three spatially-diverse vantage points +around the doorway (RuvSense multistatic + cross-viewpoint fusion, ADR-029/030) triangulate the +moving scatterer — drywall attenuates and diffracts 2.4/5 GHz but does not block it — giving the +model a rich enough multipath signature to regress a skeleton it was *trained* to associate with +your through-door motion. AETHER re-ID embeddings (ADR-024) keep it locked to **you** across the +camera→RF→camera hand-off. + +### Available hardware (the user's actual rig) +| Role | Device | Where | Stream | +|------|--------|-------|--------| +| Near ground truth (visible) | Laptop / USB camera | front of workstation (ruvzen) | MediaPipe pose → keypoints | +| **Far ground truth (validation)** | cognitum-v0 camera (1 of 17 `/dev/video*`) or a phone | **behind the wall** | MediaPipe pose → keypoints (for MEASURING the through-wall skeleton) | +| CSI node A | ESP32-S3 (8 MB) | COM9 (ruvzen) | UDP CSI :5005 | +| CSI + mmWave node B | ESP32-C6 + Seeed MR60BHA2 | COM12 (ruvzen) | WiFi CSI + 60 GHz FMCW presence/range | +| CSI node C (through-wall vantage) | Pi 5, BCM43455c0 | cognitum-v0 (other room) | nexmon_csi `.pcap` → rvcsi → CsiFrame | +| Fusion + serving | sensing-server | ruvzen :3000/:8765 | `/ws/sensing`, `/ws/pose`, new `/ws/handoff` | + +Place **node C (Pi 5) and the far camera on the far side of the wall** — the Pi 5 gives the +fuser a vantage the camera lacks, and the far camera turns the through-wall claim into a +measurement. + +## Decision + +Build a **camera↔CSI hand-off demo** as a thin, additive layer over existing components (no new +heavy crate). Five parts: a multi-source capture plane, a camera-supervised calibration walk +that **learns to infer the skeleton through the wall**, a **hand-off state machine**, a +**dead-reckoning smoother** so dropped CSI never makes the figure jump, and a single-file HTML +viewer that renders the inferred skeleton with honest confidence. + +### 1. Capture plane (reuse, don't rebuild) +- **Near camera:** `scripts/collect-ground-truth.py` already does MediaPipe pose + ESP32 CSI + paired capture (ADR-079). Extend it to also subscribe to the Pi 5 nexmon stream (rvcsi), the + C6 mmWave presence, **and the far camera**, so every frame is + `(near_pose|null, far_pose|null, csi_S3, csi_C6, mmwave_C6, csi_Pi5, t)`. +- **CSI nodes:** S3 over UDP :5005, Pi 5 via `rvcsi` (vendor/rvcsi nexmon adapter → `CsiFrame`), + C6 WiFi CSI + the MR60BHA2 60 GHz presence/range/breathing. +- **Fusion:** all CSI sources into the existing `MultistaticFuser` + (`signal/src/ruvsense/multistatic.rs`); node positions around the doorway via + `--node-positions` (geometric-diversity index drives confidence). **#1049:** with 3 + independently-clocked nodes set `WDP_GUARD_INTERVAL_US` to the real inter-node spread or + fusion demotes. + +### 2. Calibration walk — "it learns my movements **and infers them through the wall**" (ADR-079) +A 3–5 minute guided routine. The HTML page scripts the walk: stand, step left/right, walk to the +door, **cross fully behind the wall and back**, repeat — covering the visible AND the occluded +zone, because **both cameras label ground truth**: +- **Visible-zone supervision:** near camera labels pose; synchronized CSI window is the input. +- **Through-wall supervision (the key part):** while you are behind the wall, the **far camera** + labels your pose. So the CSI→skeleton model is trained on *real behind-wall poses* paired with + the *behind-wall multistatic CSI* — the model genuinely learns to infer your skeleton through + the wall, supervised by ground truth, not extrapolated blindly. +- Train/fine-tune on `ruvultra` (RTX 5080) if available, else the local recipe. Persist as a + per-room calibration bank (ADR-151 `baseline → enroll → extract → train`). AETHER re-ID + embeddings (ADR-024) bind the track to you across the hand-off. +- **Held-out split:** reserve some behind-wall passes for evaluation so through-wall PCK is + measured on data the model never trained on (no leakage — the ADR-152 measurement discipline). + +### 3. Hand-off state machine (`sensing-server/src/handoff.rs`, < 300 lines) +States: `CAMERA` → `HANDOFF_OUT` → `RF_INFERRED` → `HANDOFF_IN` → `CAMERA` (+ `LOST`). +- **`CAMERA`** — near camera has a confident pose → render it; RF-inferred skeleton ghosted + alongside for the "they agree" effect. +- **`HANDOFF_OUT`** — near-camera confidence drops at the doorway **while** CSI motion stays high + and the multistatic track heads into the door zone → cross-fade source camera→RF. +- **`RF_INFERRED`** — no camera pose; the CSI model emits a **full 17-kpt skeleton** + per-joint + confidence; AETHER confirms it is still you. Render the translucent skeleton + confidence, + badge `RF-INFERRED (through-wall)`. (When fusion confidence is too low for a credible skeleton, + degrade gracefully to a coarse marker rather than a flailing one — honest fallback.) +- **`HANDOFF_IN`** — near camera re-acquires a pose positionally consistent with the last RF + skeleton (continuity gate) → cross-fade RF→camera. +- **`LOST`** — neither source for N cycles → "no track," never invented. + +Fail-closed: `RF_INFERRED` requires real multistatic motion energy + an AETHER identity match +above calibrated floors; absent that → `LOST`, never a phantom. Mirrors the governed-trust gate +(ADR-031 / ADR-141). + +### 4. Dead reckoning & smoothing — fluid, never jumpy (the user's requirement) +CSI does **not** arrive cleanly: UDP frames drop, nexmon `.pcap` has gaps, the fuser skips +cycles when the #1049 guard rejects a spread, and the model's per-frame skeleton jitters. Render +only on real frames and the figure teleports and shakes — which also *reads as fake*. A +**predict/correct (dead-reckoning) layer** keeps the skeleton continuous and smooth between +measurements, with **bounded** extrapolation so we never invent motion that didn't happen: + +- **Per-joint constant-velocity Kalman filter** — reuse `signal/src/ruvsense/pose_tracker.rs` + (the project's existing 17-keypoint Kalman tracker with AETHER re-ID). The renderer runs at a + **fixed ~30 Hz, decoupled from CSI arrival**: + - **Measurement this tick** → Kalman *update* (correct) each joint with the new inferred pose. + - **Dropped CSI this tick** → Kalman *predict* only: advance each joint by `x += v·dt`, so the + skeleton keeps moving along its trajectory instead of freezing then snapping. **This is the + dead reckoning** — the limbs keep their motion through a dropout. +- **Confidence decay (honesty governor):** every predict-only tick multiplies confidence and + widens covariance. Dead reckoning is trusted for a **bounded** horizon (default ≤ ~500 ms, + `WDP_DEADRECKON_MAX_MS`); past it, confidence hits the floor → state machine → `LOST`. **We + coast briefly to stay smooth; we never coast forever to fake a track.** Someone who actually + stopped behind the wall converges to a still pose then `LOST`, not perpetual phantom walking. +- **Re-acquire smoothing:** a returning measurement after a gap is blended in with a + critically-damped step (no overshoot) over 2–3 ticks, so the skeleton eases onto truth. +- **Client render smoothing (already present):** `ui/observatory/js/figure-pool.js` + `applyKeypoints` already `lerp`s joints with a small velocity overshoot for secondary motion; + the hand-off viewer reuses it. The camera↔RF cross-fade is an alpha-lerp over ~300 ms. + +**Dead-reckoning honesty invariants (testable):** +1. Predicted-only frames carry `"dead_reckoned": true` + `"age_ms"`; the UI dims them — + extrapolation is never shown as a fresh measurement. +2. Confidence is **monotonically non-increasing** across consecutive predict-only ticks. +3. After `WDP_DEADRECKON_MAX_MS` of silence the state **must** become `LOST` (pinned test: + measurements then silence → assert transition within the horizon; no perpetual motion). +4. Dead reckoning extrapolates an **existing** track only — no measurement ever ⇒ no track ⇒ + `LOST`, never a phantom from zero. + +### 5. The HTML demo (single file, vanilla — mirrors the Observatory) +`ui/through-wall/index.html` (+ a small JS bundle, zero build step, like `ui/observatory/`): +- **Left:** near camera feed with the MediaPipe skeleton overlaid while visible; greys to + "CAMERA BLIND" when occluded. (Optional second tile: the far camera, shown only in a + "validation" view, not the hero view.) +- **Right:** a top-down 3D room (Three.js) with the **wall** drawn, the doorway, the three + sensor positions, and the figure: a **solid skeleton** in `CAMERA`, a **translucent skeleton + with per-joint confidence fade** in `RF_INFERRED`, eased by the dead-reckoning smoother. +- **Banner / `BannerState`** (strict, mirrors rufield-viewer): `CAMERA` / `RF-INFERRED — through + wall (conf X%, measured Y% PCK@k)` / `DEAD-RECKONED (age N ms)` / `LOST` — mutually exclusive, + with a one-line honesty caption. The measured through-wall PCK is shown, not invented. +- Consumes a new `GET /ws/handoff` WS/SSE topic of `HandoffFrame`s; `?demo=1` replays a recorded + session badged `REPLAY`. + +### Output contract (`HandoffFrame`, JSON) +```jsonc +{ + "t_ns": 1718400000000, + "state": "RF_INFERRED", // CAMERA | HANDOFF_OUT | RF_INFERRED | HANDOFF_IN | LOST + "source": "fused_csi", // camera | fused_csi | mmwave | dead_reckoned + "pose": [[x,y,z,conf], …×17], // inferred skeleton WITH per-joint confidence (present in CAMERA/HANDOFF/RF_INFERRED) + "pose_confidence": 0.58, // aggregate; the rendered translucency + "identity_match": 0.81, // AETHER re-ID — is it still you? + "coarse": { "cell":[x,y], "zone":"behind_wall", "heading_deg":95, "node_diversity":0.48 }, + "dead_reckoned": false, // true on predict-only (extrapolated) ticks + "age_ms": 0, // ms since the last real measurement (0 = fresh) + "camera_blind": true, + "measured_pck": { "k": 20, "value": null }, // filled from the far-camera validation run; null until measured + "caption": "RF-inferred skeleton — model camera-supervised on this room; through-wall PCK measured separately" +} +``` + +## Phased plan (each phase independently demoable + falsifiable) +- **P1 — wiring (no claim):** 3-source CSI capture (S3+C6+Pi5) + near camera into the multistatic + fuser. Gate: `/ws/sensing` shows ≥3 active nodes + a fused position with the camera running. +- **P2 — supervised calibration + through-wall training:** the guided walk with **both cameras**; + fine-tune CSI→skeleton on visible AND far-camera-labeled behind-wall poses (ADR-079). Gate: + while-visible PCK declared (metric-locked, ADR-173) on a held-out segment. +- **P3 — MEASURE the through-wall skeleton:** score the RF-inferred skeleton against the far + camera on held-out behind-wall passes → **publish the through-wall PCK@k** (good or bad). Gate: + a committed eval script reproduces the number; honest negative if low. +- **P4 — hand-off + dead reckoning + HTML:** the camera→RF→camera transition renders end-to-end, + smooth through dropped CSI. Gate: a recorded live walk where the camera goes blind, the inferred + skeleton keeps walking fluidly behind the wall, dead-reckons through dropouts without jumps, and + re-acquisition is position-continuous. **This is the demo.** +- **P5 — multi-modal corroboration (optional):** overlay C6 60 GHz presence/range as an + independent through-thin-wall confirm (two physics, one conclusion). + +## Consequences + +### Positive +- A genuinely compelling demo that does what the user asked — **infers and renders the skeleton + through the wall** — while staying honest because the through-wall accuracy is **measured** + against a far-side camera, not claimed. Reuses the multistatic fuser, ADR-079 supervision, the + Kalman pose tracker, AETHER re-ID, the calibration crate, and the Observatory UI: the new code + is a hand-off module + dead-reckoning smoother + an HTML page. + +### Negative / Risks +- **Through-wall skeletal accuracy may be modest or poor.** That is acceptable *iff* reported + honestly — the headline is the measured PCK, whatever it is; the skeleton renders with its true + per-joint confidence (low-confidence joints fade), never as fake certainty. +- **Material dependence:** drywall good; brick/metal degrades to coarse-only — shoot on drywall + and say so. +- **3-node clock sync** is the #1049 hazard — tune `WDP_GUARD_INTERVAL_US`. +- **Per-room, per-subject:** the model that "learned your movements" does not transfer without + re-calibration — stated on the page. +- **Over-claiming is the failure mode.** Mitigations baked in: translucent confidence-faded + skeleton, `dead_reckoned`/`age_ms` flags, the measured-PCK banner, bounded extrapolation→`LOST`. + +### Neutral +- No new heavy crate; signal-path proof (`verify.py`) untouched — capture/fusion/UI orchestration + over hardened, already-reviewed components. + +## Acceptance criteria (falsifiable — "prove the haters wrong") +On a recorded live session, all must hold: +1. A contiguous window where the **near camera reports no person** (verifiable from raw frames) + **and** the system renders an `RF_INFERRED` skeleton. +2. The inferred skeleton's **gross motion matches reality** — direction of travel and rough gait + phase — confirmed against the **far camera** (not eyeballed). +3. **Through-wall per-joint accuracy is MEASURED** against the far camera and **reported** as + PCK@k from a committed script. Low is fine *if* honestly published; fabricated is not. +4. The figure is **smooth through dropped CSI** — no teleports/jitter — and every predicted-only + frame is flagged `dead_reckoned`; after `WDP_DEADRECKON_MAX_MS` of silence it goes `LOST`. +5. Re-acquisition is **position-continuous** (camera re-detects within a cell of the last RF + position), and AETHER confirms identity across the hand-off. +6. Every number (visible PCK, through-wall PCK, confidences) is MEASURED and reproducible — no + hand-typed metrics. + +A demo that cannot meet (1)–(2) and (4)–(5) on the available hardware is reported as a **negative +result** (honest), not dressed up; a poor (3) is published as the real number. + +## Links +- ADR-079 — camera ground-truth training (supervision pipeline; extended here to a far camera) +- ADR-031 — sensing-first RF mode / coherence gate (fail-closed honesty pattern) +- ADR-134 — CSI→CIR multipath (through-wall multipath physics) +- ADR-029 / ADR-030 — RuvSense multistatic + persistent field (the localization engine) +- ADR-024 — AETHER contrastive re-ID (identity lock across the hand-off) +- ADR-151 — per-room calibration crate (bank persistence) +- ADR-152 / ADR-173 — measurement discipline + metric-locked PCK (the honest accuracy readout) +- ADR-095 / ADR-096 — rvcsi nexmon (Pi 5 BCM43455c0 capture) +- `signal/src/ruvsense/pose_tracker.rs` — 17-kpt Kalman tracker reused for dead reckoning +- `ui/observatory/` — the vanilla-JS 3D viewer pattern this demo mirrors