final-run · droid-ash · Apr 15, 2026 · Apr 15, 2026
diff --git a/docs/memory/cli/index.md b/docs/memory/cli/index.md
@@ -7,3 +7,17 @@ The `packages/cli` package is the main CLI entry point. It orchestrates test run
 | File | Description |
 |------|-------------|
 | [report-writer.md](report-writer.md) | ReportWriter artifact handling including device log copy and redaction |
+| [multi-device-workspace.md](multi-device-workspace.md) | Multi-device test workspace layout, `devices.yaml` schema, token syntax, and CLI entry points |
+| [multi-device-orchestration.md](multi-device-orchestration.md) | Multi-device test compilation, session runner, and router routing rules |
+
+## Cross-Package Cross-References
+
+Multi-device orchestration spans four packages; each owns a slice of the pipeline:
+
+| Concern | Package | Memory |
+|---------|---------|--------|
+| Workspace + CLI entry + compile + session | `packages/cli` | [multi-device-workspace.md](multi-device-workspace.md), [multi-device-orchestration.md](multi-device-orchestration.md) |
+| Orchestrator loop + planner sibling API + prompt | `packages/goal-executor` | [goal-executor/multi-device-planner.md](../goal-executor/multi-device-planner.md) |
+| Shared data models (`MultiDeviceConfig`, `PerDeviceArtifact`, optional fields) | `packages/common` | [common/multi-device-models.md](../common/multi-device-models.md) |
+| Per-device recording key scoping | `packages/device-node` | [device-node/recording-manager.md](../device-node/recording-manager.md) |
+| Sandwich report UI | `packages/report-web` | [report-web/renderers.md](../report-web/renderers.md) |
diff --git a/docs/memory/cli/multi-device-orchestration.md b/docs/memory/cli/multi-device-orchestration.md
@@ -0,0 +1,63 @@
+# Multi-Device Orchestration (cli)
+
+How the CLI loads, compiles, and routes multi-device tests into the `MultiDeviceOrchestrator`. Introduced by change `260415-1mzp-multi-device-orchestration`.
+
+## Loader: `multiDeviceTestLoader.ts`
+
+`packages/cli/src/multiDeviceTestLoader.ts` parses `.finalrun/multi-device/` workspaces. Validation invariants (all hard-fail):
+
+- Exactly 2 entries under `devices:` in `devices.yaml` (not 1, not 3+).
+- Both entries share identical `platform`; v1 rejects any value other than `android`.
+- Both entries carry a non-empty `app`.
+- Keys match `[A-Za-z0-9_-]+` and never contain `###` (the `MAP_KEY_DELIMITER`).
+- Every step in every multi-device test YAML references at least one `${devices.<key>}` token; unknown keys fail-fast with the offending step identified.
+
+Token regex (shared with the orchestrator): `/\$\{(variables|secrets|devices)\.([A-Za-z0-9_-]+)\}/g`.
+
+## Compiler: `multiDeviceTestCompiler.ts`
+
+`packages/cli/src/multiDeviceTestCompiler.ts` is a sibling to `testCompiler.ts` (single-device compiler untouched). Emits a goal string:
+
+- Interpolates `${variables.*}` eagerly.
+- Preserves `${devices.*}` and `${secrets.*}` literally for the planner.
+- Prepends a "Devices" header block listing each key, platform, and app before the numbered steps.
+
+## Session Runner: `multiDeviceSessionRunner.ts`
+
+`prepareMultiDeviceTestSession()` in `packages/cli/src/multiDeviceSessionRunner.ts`:
+
+1. Calls `DeviceNode.detectInventory()` and picks the first matching inventory entry for each key in declaration order. No interactive prompts (CI-compatible).
+2. Fails fast when fewer than 2 matching devices exist (message names the requested count and found count).
+3. Boots emulators and calls `setUpDevice()` on each device via `Promise.all`.
+4. Returns `MultiDeviceTestSession` with two independent `DeviceAgent` instances and a `cleanup()` method that invokes `stopRecording` and `tearDown` on both devices in parallel.
+
+Detection order is stable within one `detectInventory()` call but not across runs — auto-assignment is deterministic per invocation only.
+
+## Router Branch
+
+`packages/cli/src/testRunner.ts` and the CLI entrypoint branch on the selector prefix:
+
+| Selector prefix | Dispatch |
+|-----------------|----------|
+| `multi-device/tests/` | `MultiDeviceOrchestrator` via `multiDeviceTestRunner.ts` |
+| `multi-device/suites/` | Multi-device suite runner |
+| anything else | Existing `TestExecutor` (byte-identical) |
+
+Mixed selectors in one invocation are rejected at parse time. The multi-device test runner composes `prepareMultiDeviceTestSession()` + `MultiDeviceOrchestrator` + `reportWriter.writeTestRecord()`.
+
+## Per-Device Report Writer Branch
+
+`reportWriter.writeTestRecord()` branches on `result.multiDevice` at entry. Present → per-device subfolders `tests/{testId}/<deviceKey>/{screenshots,actions}/`. Absent → byte-identical single-device output.
+
+Key rules for the multi-device path:
+- Step numbering is shared across devices: `stepNumber = iteration`, zero-padded 3 digits.
+- Sequential step acting on alice writes `tests/{testId}/alice/actions/008.json`; bob's slot 008 is **absent on disk** (sparse). The report UI renders a dimmed spacer for the inactive device.
+- Parallel step fills both slots at the same iteration number.
+- Each `AgentAction` JSON includes the `device` key field.
+- `videoOffsetMs` is computed per device: `max(0, stepTimestamp - deviceRecordingStartedAt)`. The shared report scrubber anchors t=0 at `min(alice.startedAt, bob.startedAt)`.
+
+## Key Design Decisions (from change 260415-1mzp)
+
+- **Router-level branching, not `TestExecutor` subclassing** — `MultiDeviceOrchestrator` is a sibling. `TestExecutor` stays untouched so the single-device regression surface is zero.
+- **Multi-device suites reference multi-device tests only** — no mixed-mode suites. Suite runner branches once on the first test's prefix.
+- **Feature-gated writes** — every multi-device field is optional in JSON and absent in single-device output, making `run.json` byte-identical to baseline for single-device runs.
diff --git a/docs/memory/cli/multi-device-workspace.md b/docs/memory/cli/multi-device-workspace.md
@@ -0,0 +1,84 @@
+# Multi-Device Workspace (v1: 2 Devices, Same Platform)
+
+Multi-device tests exercise two devices running the same platform — for example, Alice's phone and Bob's phone in a chat scenario. v1 is Android-only; iOS multi-device lands in v2.
+
+## Workspace Layout
+
+A multi-device-enabled workspace lives under `.finalrun/multi-device/` alongside the single-device subtree:
+
+```
+.finalrun/
+  tests/                       # single-device tests (existing)
+  multi-device/
+    devices.yaml               # exactly 2 entries; shared platform
+    tests/
+      chat/
+        send_message.yaml      # test using ${devices.alice} / ${devices.bob}
+    suites/
+      chat-smoke.yaml          # optional suite aggregating multi-device tests
+```
+
+## `devices.yaml` Schema
+
+Exactly 2 entries, both sharing a `platform`. v1 rejects any `platform` other than `android`.
+
+```yaml
+devices:
+  alice:
+    platform: android
+    app: com.example.chat
+  bob:
+    platform: android
+    app: com.example.chat
+```
+
+Keys (`alice`, `bob`) become the identifiers referenced by `${devices.<key>}` tokens inside tests.
+
+## Token Syntax
+
+Multi-device tests use three interpolation namespaces:
+
+| Token                  | Evaluation                                                                 |
+| ---------------------- | -------------------------------------------------------------------------- |
+| `${variables.NAME}`    | Interpolated eagerly at compile time (same as single-device tests).        |
+| `${devices.<key>}`     | Passed through literally to the planner; marks the active device for a step. |
+| `${secrets.SECRET}`    | Passed through literally; redacted in logs/reports.                        |
+
+Every step must reference at least one `${devices.<key>}` token. A step referencing both devices (`${devices.alice} ${devices.bob} observe message`) signals a parallel step — the planner emits up to 2 actions dispatched via `Promise.all`.
+
+## CLI Entry Points
+
+Multi-device selectors start with `multi-device/tests/`:
+
+```bash
+# Run a single multi-device test
+finalrun test multi-device/tests/chat/send_message.yaml
+
+# Run all multi-device tests
+finalrun test multi-device/tests/
+
+# Run a multi-device suite
+finalrun test multi-device/suites/chat-smoke.yaml
+```
+
+Single-device selectors (`tests/...`) continue to route to the existing executor — multi-device and single-device selectors cannot be mixed in the same invocation.
+
+## Report Layout
+
+Runs with `multiDevice: true` render the sandwich workspace in the report UI:
+
+```
+┌─ alice video ─┐  ┌─ chat timeline ─┐  ┌─ bob video ─┐
+│   9:19 ratio  │  │  step bubbles   │  │  9:19 ratio │
+└───────────────┘  └─────────────────┘  └─────────────┘
+┌──────────── synced scrubber (click to seek both) ────────────┐
+```
+
+Per-device artifacts live under `tests/{testId}/{alice,bob}/{actions,screenshots}/`. Step JSON is numbered by iteration (zero-padded 3 digits); parallel iterations yield one file per device at the same number.
+
+## Fail-Fast Behavior
+
+- Action failure on either device → whole test aborts (5-second cleanup ceiling).
+- gRPC disconnect → whole test aborts; surviving recording stopped cleanly.
+- Planner emits `>2` actions, duplicate-device actions, or unknown device key → retry once, then abort.
+- Watchdog: same step pointer persists >5 iterations without progress → abort with reason `watchdog: step {N} stuck for >5 iterations`.
diff --git a/docs/memory/common/index.md b/docs/memory/common/index.md
@@ -0,0 +1,9 @@
+# common
+
+The `packages/common` package exports shared data models, constants, and utilities used across CLI, device-node, goal-executor, and report-web.
+
+## Memory Files
+
+| File | Description |
+|------|-------------|
+| [multi-device-models.md](multi-device-models.md) | `MultiDeviceConfig`, optional per-device fields on `AgentAction` / `TestResult`, `RunManifest.multiDevice`, schema v3 |
diff --git a/docs/memory/common/multi-device-models.md b/docs/memory/common/multi-device-models.md
@@ -0,0 +1,80 @@
+# Multi-Device Data Models (common)
+
+Shared types that flow through CLI → goal-executor → device-node → report-web for multi-device runs. Every multi-device field is **optional** and absent in single-device runs so emitted JSON remains byte-identical to the pre-change baseline. Introduced by change `260415-1mzp-multi-device-orchestration`.
+
+## `MultiDeviceConfig`
+
+`packages/common/src/models/MultiDeviceConfig.ts`:
+
+```typescript
+interface DeviceDefinition {
+  platform: string;   // v1: must be 'android'
+  app: string;        // packageName (Android) or bundleId (iOS, rejected in v1)
+}
+interface MultiDeviceConfig {
+  devices: Record<string, DeviceDefinition>;   // exactly 2 keys, shared platform
+}
+```
+
+Loader-enforced invariants are captured in [cli/multi-device-orchestration.md](../cli/multi-device-orchestration.md). The type is shared between loader and `prepareMultiDeviceTestSession()` without casts.
+
+## Optional Fields on `AgentAction`
+
+`packages/common/src/models/TestResult.ts`:
+
+```typescript
+interface AgentAction {
+  // ... all existing fields unchanged ...
+  device?: string;    // device key in multi-device runs; omitted in single-device
+}
+```
+
+Single-device runs MUST omit `device` entirely (not serialize `undefined`) so the JSON diff is empty.
+
+## `PerDeviceArtifact`
+
+```typescript
+interface PerDeviceArtifact {
+  folder: string;               // e.g. 'alice'
+  recordingFile?: string;       // mp4/mov path relative to run root
+  deviceLogFile?: string;       // device log path relative to run root
+  recordingStartedAt?: string;  // ISO timestamp, used for scrubber anchoring
+}
+```
+
+## Optional Fields on `TestResult`
+
+```typescript
+interface TestResult {
+  // ... existing fields ...
+  multiDevice?: { devices: Record<string, PerDeviceArtifact> };
+}
+```
+
+Present → `reportWriter.writeTestRecord()` branches into the per-device path. Absent → byte-identical single-device writer output.
+
+## `RunManifest.multiDevice`
+
+`packages/common/src/models/RunManifest.ts`:
+
+```typescript
+interface RunManifest {
+  schemaVersion: 2 | 3;          // 3 is the current cursor; 2 still loads in report-web
+  // ... existing fields ...
+  multiDevice?: {
+    devices: Record<string, { platform: string; app?: string; hardwareName: string }>;
+  };
+}
+```
+
+`hardwareName` records which physical/virtual device was auto-assigned to each logical key by `prepareMultiDeviceTestSession()`.
+
+## Schema Version Policy
+
+`schemaVersion` was bumped from `2` to `3` to accommodate multi-device fields. The type is `2 | 3` for backward compatibility — report-web's `artifacts.ts` accepts both. Single-device runs emit `schemaVersion: 3` but without any `multiDevice` or per-device fields, so the diff against a baseline v2 run is limited to the `schemaVersion` byte itself (the report-web loader accepts both values transparently).
+
+## Design Decisions (from change 260415-1mzp)
+
+- **Optional fields over new sibling types** at the top-level `TestResult` / `AgentAction` level (unlike the planner, which uses siblings) — these types cross many package boundaries; sibling types would force every consumer to branch. Optional fields keep the single-device path's type shape intact.
+- **Map-key delimiter `###` reserved** — loader rejects device keys containing `###` so recording map keys never collide.
+- **`schemaVersion: 2 | 3`** — union type rather than number; forces exhaustive handling in writers/readers and prevents accidental version regressions.
diff --git a/docs/memory/device-node/index.md b/docs/memory/device-node/index.md
@@ -7,3 +7,4 @@ The `packages/device-node` package manages physical/simulator device interaction
 | File | Description |
 |------|-------------|
 | [log-capture.md](log-capture.md) | Per-test device log capture: manager, providers, Device integration |
+| [recording-manager.md](recording-manager.md) | `RecordingManager` map-key scoping and parallel per-device recording |
diff --git a/docs/memory/device-node/recording-manager.md b/docs/memory/device-node/recording-manager.md
@@ -0,0 +1,49 @@
+# Recording Manager (device-node)
+
+`RecordingManager` (`packages/device-node/src/device/RecordingManager.ts`) tracks active screen recordings per test. It was extended by change `260415-1mzp-multi-device-orchestration` to support parallel per-device recordings for the same `(runId, testId)` pair without breaking any single-device call site.
+
+## Map-Key Scheme
+
+Internal maps (`_recordingProcessMap`, `_recordingInfoMap`, etc.) are keyed by a string built by `getMapKey()`:
+
+```typescript
+getMapKey(runId: string, testId: string, deviceId?: string): string
+```
+
+| Args | Return | Used by |
+|------|--------|---------|
+| `getMapKey(runId, testId)` | `${runId}###${testId}` | All existing single-device call sites (byte-identical to pre-change) |
+| `getMapKey(runId, testId, deviceId)` with non-empty `deviceId` | `${runId}###${testId}###${sanitizedDeviceId}` | Multi-device callers only |
+
+`MAP_KEY_DELIMITER` is `###`. Device keys are sanitized defense-in-depth (the loader already rejects keys containing `###`).
+
+## Opt-In Multi-Device Recording
+
+`RecordingSessionStartParams` and `RecordingStopOptions` gained optional opt-in fields:
+
+```typescript
+interface RecordingSessionStartParams {
+  deviceId: string;
+  recordingRequest: RecordingRequest;
+  platform: string;
+  sdkVersion?: string;
+  useDeviceScopedKey?: boolean;   // default false = byte-identical single-device behavior
+}
+interface RecordingStopOptions {
+  platform: string;
+  keepOutput?: boolean;
+  deviceId?: string;              // omitted = legacy 2-part key; provided = 3-part key
+}
+```
+
+Multi-device callers pass `useDeviceScopedKey: true` to `startRecording()` and `deviceId: <key>` to `stopRecording()`. Single-device callers continue to pass 2-arg `getMapKey()` calls and produce 2-part keys — unchanged on every byte.
+
+## Parallel Recording
+
+Multi-device orchestration wraps `Promise.all([startRecording(...alice), startRecording(...bob)])` at test start and `Promise.all([stopRecording(...alice), stopRecording(...bob)])` at teardown. Distinct map keys ensure no collision in any internal map.
+
+## Design Decisions (from change 260415-1mzp)
+
+- **Non-breaking optional 3rd arg** — alternative rejected: rename/resign the method signature. Every existing call site passes 2 args; a required 3rd arg would ripple across all single-device recording paths. The optional arg preserves byte-identical single-device behavior AND lets multi-device callers opt in explicitly.
+- **Device-scoped key via opt-in flag** — `useDeviceScopedKey: boolean` on `startRecording` params (rather than auto-deriving from `deviceId` presence) makes the intent explicit at the call site. The flag and `deviceId` on stop options mirror each other.
+- **Defense-in-depth sanitization** — loader enforcement is the primary guard against `###` in keys; `_sanitizeForFilename()` is the fallback in case an internal caller bypasses the loader.
diff --git a/docs/memory/goal-executor/index.md b/docs/memory/goal-executor/index.md
@@ -0,0 +1,9 @@
+# goal-executor
+
+The `packages/goal-executor` package owns the iteration loop and planner calls. `TestExecutor` runs single-device tests; `MultiDeviceOrchestrator` runs 2-device tests. Both compose the shared `AIAgent` (Vercel AI SDK wrapper) and `ActionExecutor`.
+
+## Memory Files
+
+| File | Description |
+|------|-------------|
+| [multi-device-planner.md](multi-device-planner.md) | `AIAgent.planMulti()` sibling API, `MultiDeviceOrchestrator` iteration loop, step pointer, fail-fast |