Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions docs/memory/cli/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,17 @@ The `packages/cli` package is the main CLI entry point. It orchestrates test run
| File | Description |
|------|-------------|
| [report-writer.md](report-writer.md) | ReportWriter artifact handling including device log copy and redaction |
| [multi-device-workspace.md](multi-device-workspace.md) | Multi-device test workspace layout, `devices.yaml` schema, token syntax, and CLI entry points |
| [multi-device-orchestration.md](multi-device-orchestration.md) | Multi-device test compilation, session runner, and router routing rules |

## Cross-Package Cross-References

Multi-device orchestration spans four packages; each owns a slice of the pipeline:

| Concern | Package | Memory |
|---------|---------|--------|
| Workspace + CLI entry + compile + session | `packages/cli` | [multi-device-workspace.md](multi-device-workspace.md), [multi-device-orchestration.md](multi-device-orchestration.md) |
| Orchestrator loop + planner sibling API + prompt | `packages/goal-executor` | [goal-executor/multi-device-planner.md](../goal-executor/multi-device-planner.md) |
| Shared data models (`MultiDeviceConfig`, `PerDeviceArtifact`, optional fields) | `packages/common` | [common/multi-device-models.md](../common/multi-device-models.md) |
| Per-device recording key scoping | `packages/device-node` | [device-node/recording-manager.md](../device-node/recording-manager.md) |
| Sandwich report UI | `packages/report-web` | [report-web/renderers.md](../report-web/renderers.md) |
63 changes: 63 additions & 0 deletions docs/memory/cli/multi-device-orchestration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Multi-Device Orchestration (cli)

How the CLI loads, compiles, and routes multi-device tests into the `MultiDeviceOrchestrator`. Introduced by change `260415-1mzp-multi-device-orchestration`.

## Loader: `multiDeviceTestLoader.ts`

`packages/cli/src/multiDeviceTestLoader.ts` parses `.finalrun/multi-device/` workspaces. Validation invariants (all hard-fail):

- Exactly 2 entries under `devices:` in `devices.yaml` (not 1, not 3+).
- Both entries share identical `platform`; v1 rejects any value other than `android`.
- Both entries carry a non-empty `app`.
- Keys match `[A-Za-z0-9_-]+` and never contain `###` (the `MAP_KEY_DELIMITER`).
- Every step in every multi-device test YAML references at least one `${devices.<key>}` token; unknown keys fail-fast with the offending step identified.

Token regex (shared with the orchestrator): `/\$\{(variables|secrets|devices)\.([A-Za-z0-9_-]+)\}/g`.

## Compiler: `multiDeviceTestCompiler.ts`

`packages/cli/src/multiDeviceTestCompiler.ts` is a sibling to `testCompiler.ts` (single-device compiler untouched). Emits a goal string:

- Interpolates `${variables.*}` eagerly.
- Preserves `${devices.*}` and `${secrets.*}` literally for the planner.
- Prepends a "Devices" header block listing each key, platform, and app before the numbered steps.

## Session Runner: `multiDeviceSessionRunner.ts`

`prepareMultiDeviceTestSession()` in `packages/cli/src/multiDeviceSessionRunner.ts`:

1. Calls `DeviceNode.detectInventory()` and picks the first matching inventory entry for each key in declaration order. No interactive prompts (CI-compatible).
2. Fails fast when fewer than 2 matching devices exist (message names the requested count and found count).
3. Boots emulators and calls `setUpDevice()` on each device via `Promise.all`.
4. Returns `MultiDeviceTestSession` with two independent `DeviceAgent` instances and a `cleanup()` method that invokes `stopRecording` and `tearDown` on both devices in parallel.

Detection order is stable within one `detectInventory()` call but not across runs — auto-assignment is deterministic per invocation only.

## Router Branch

`packages/cli/src/testRunner.ts` and the CLI entrypoint branch on the selector prefix:

| Selector prefix | Dispatch |
|-----------------|----------|
| `multi-device/tests/` | `MultiDeviceOrchestrator` via `multiDeviceTestRunner.ts` |
| `multi-device/suites/` | Multi-device suite runner |
| anything else | Existing `TestExecutor` (byte-identical) |

Mixed selectors in one invocation are rejected at parse time. The multi-device test runner composes `prepareMultiDeviceTestSession()` + `MultiDeviceOrchestrator` + `reportWriter.writeTestRecord()`.

## Per-Device Report Writer Branch

`reportWriter.writeTestRecord()` branches on `result.multiDevice` at entry. Present → per-device subfolders `tests/{testId}/<deviceKey>/{screenshots,actions}/`. Absent → byte-identical single-device output.

Key rules for the multi-device path:
- Step numbering is shared across devices: `stepNumber = iteration`, zero-padded 3 digits.
- Sequential step acting on alice writes `tests/{testId}/alice/actions/008.json`; bob's slot 008 is **absent on disk** (sparse). The report UI renders a dimmed spacer for the inactive device.
- Parallel step fills both slots at the same iteration number.
- Each `AgentAction` JSON includes the `device` key field.
- `videoOffsetMs` is computed per device: `max(0, stepTimestamp - deviceRecordingStartedAt)`. The shared report scrubber anchors t=0 at `min(alice.startedAt, bob.startedAt)`.

## Key Design Decisions (from change 260415-1mzp)

- **Router-level branching, not `TestExecutor` subclassing** — `MultiDeviceOrchestrator` is a sibling. `TestExecutor` stays untouched so the single-device regression surface is zero.
- **Multi-device suites reference multi-device tests only** — no mixed-mode suites. Suite runner branches once on the first test's prefix.
- **Feature-gated writes** — every multi-device field is optional in JSON and absent in single-device output, making `run.json` byte-identical to baseline for single-device runs.
84 changes: 84 additions & 0 deletions docs/memory/cli/multi-device-workspace.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Multi-Device Workspace (v1: 2 Devices, Same Platform)

Multi-device tests exercise two devices running the same platform — for example, Alice's phone and Bob's phone in a chat scenario. v1 is Android-only; iOS multi-device lands in v2.

## Workspace Layout

A multi-device-enabled workspace lives under `.finalrun/multi-device/` alongside the single-device subtree:

```
.finalrun/
tests/ # single-device tests (existing)
multi-device/
devices.yaml # exactly 2 entries; shared platform
tests/
chat/
send_message.yaml # test using ${devices.alice} / ${devices.bob}
suites/
chat-smoke.yaml # optional suite aggregating multi-device tests
```

## `devices.yaml` Schema

Exactly 2 entries, both sharing a `platform`. v1 rejects any `platform` other than `android`.

```yaml
devices:
alice:
platform: android
app: com.example.chat
bob:
platform: android
app: com.example.chat
```

Keys (`alice`, `bob`) become the identifiers referenced by `${devices.<key>}` tokens inside tests.

## Token Syntax

Multi-device tests use three interpolation namespaces:

| Token | Evaluation |
| ---------------------- | -------------------------------------------------------------------------- |
| `${variables.NAME}` | Interpolated eagerly at compile time (same as single-device tests). |
| `${devices.<key>}` | Passed through literally to the planner; marks the active device for a step. |
| `${secrets.SECRET}` | Passed through literally; redacted in logs/reports. |

Every step must reference at least one `${devices.<key>}` token. A step referencing both devices (`${devices.alice} ${devices.bob} observe message`) signals a parallel step — the planner emits up to 2 actions dispatched via `Promise.all`.

## CLI Entry Points

Multi-device selectors start with `multi-device/tests/`:

```bash
# Run a single multi-device test
finalrun test multi-device/tests/chat/send_message.yaml

# Run all multi-device tests
finalrun test multi-device/tests/

# Run a multi-device suite
finalrun test multi-device/suites/chat-smoke.yaml
```

Single-device selectors (`tests/...`) continue to route to the existing executor — multi-device and single-device selectors cannot be mixed in the same invocation.

## Report Layout

Runs with `multiDevice: true` render the sandwich workspace in the report UI:

```
┌─ alice video ─┐ ┌─ chat timeline ─┐ ┌─ bob video ─┐
│ 9:19 ratio │ │ step bubbles │ │ 9:19 ratio │
└───────────────┘ └─────────────────┘ └─────────────┘
┌──────────── synced scrubber (click to seek both) ────────────┐
```

Per-device artifacts live under `tests/{testId}/{alice,bob}/{actions,screenshots}/`. Step JSON is numbered by iteration (zero-padded 3 digits); parallel iterations yield one file per device at the same number.

## Fail-Fast Behavior

- Action failure on either device → whole test aborts (5-second cleanup ceiling).
- gRPC disconnect → whole test aborts; surviving recording stopped cleanly.
- Planner emits `>2` actions, duplicate-device actions, or unknown device key → retry once, then abort.
- Watchdog: same step pointer persists >5 iterations without progress → abort with reason `watchdog: step {N} stuck for >5 iterations`.
9 changes: 9 additions & 0 deletions docs/memory/common/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# common

The `packages/common` package exports shared data models, constants, and utilities used across CLI, device-node, goal-executor, and report-web.

## Memory Files

| File | Description |
|------|-------------|
| [multi-device-models.md](multi-device-models.md) | `MultiDeviceConfig`, optional per-device fields on `AgentAction` / `TestResult`, `RunManifest.multiDevice`, schema v3 |
80 changes: 80 additions & 0 deletions docs/memory/common/multi-device-models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Multi-Device Data Models (common)

Shared types that flow through CLI → goal-executor → device-node → report-web for multi-device runs. Every multi-device field is **optional** and absent in single-device runs so emitted JSON remains byte-identical to the pre-change baseline. Introduced by change `260415-1mzp-multi-device-orchestration`.

## `MultiDeviceConfig`

`packages/common/src/models/MultiDeviceConfig.ts`:

```typescript
interface DeviceDefinition {
platform: string; // v1: must be 'android'
app: string; // packageName (Android) or bundleId (iOS, rejected in v1)
}
interface MultiDeviceConfig {
devices: Record<string, DeviceDefinition>; // exactly 2 keys, shared platform
}
```

Loader-enforced invariants are captured in [cli/multi-device-orchestration.md](../cli/multi-device-orchestration.md). The type is shared between loader and `prepareMultiDeviceTestSession()` without casts.

## Optional Fields on `AgentAction`

`packages/common/src/models/TestResult.ts`:

```typescript
interface AgentAction {
// ... all existing fields unchanged ...
device?: string; // device key in multi-device runs; omitted in single-device
}
```

Single-device runs MUST omit `device` entirely (not serialize `undefined`) so the JSON diff is empty.

## `PerDeviceArtifact`

```typescript
interface PerDeviceArtifact {
folder: string; // e.g. 'alice'
recordingFile?: string; // mp4/mov path relative to run root
deviceLogFile?: string; // device log path relative to run root
recordingStartedAt?: string; // ISO timestamp, used for scrubber anchoring
}
```

## Optional Fields on `TestResult`

```typescript
interface TestResult {
// ... existing fields ...
multiDevice?: { devices: Record<string, PerDeviceArtifact> };
}
```

Present → `reportWriter.writeTestRecord()` branches into the per-device path. Absent → byte-identical single-device writer output.

## `RunManifest.multiDevice`

`packages/common/src/models/RunManifest.ts`:

```typescript
interface RunManifest {
schemaVersion: 2 | 3; // 3 is the current cursor; 2 still loads in report-web
// ... existing fields ...
multiDevice?: {
devices: Record<string, { platform: string; app?: string; hardwareName: string }>;
};
}
```

`hardwareName` records which physical/virtual device was auto-assigned to each logical key by `prepareMultiDeviceTestSession()`.

## Schema Version Policy

`schemaVersion` was bumped from `2` to `3` to accommodate multi-device fields. The type is `2 | 3` for backward compatibility — report-web's `artifacts.ts` accepts both. Single-device runs emit `schemaVersion: 3` but without any `multiDevice` or per-device fields, so the diff against a baseline v2 run is limited to the `schemaVersion` byte itself (the report-web loader accepts both values transparently).

## Design Decisions (from change 260415-1mzp)

- **Optional fields over new sibling types** at the top-level `TestResult` / `AgentAction` level (unlike the planner, which uses siblings) — these types cross many package boundaries; sibling types would force every consumer to branch. Optional fields keep the single-device path's type shape intact.
- **Map-key delimiter `###` reserved** — loader rejects device keys containing `###` so recording map keys never collide.
- **`schemaVersion: 2 | 3`** — union type rather than number; forces exhaustive handling in writers/readers and prevents accidental version regressions.
1 change: 1 addition & 0 deletions docs/memory/device-node/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ The `packages/device-node` package manages physical/simulator device interaction
| File | Description |
|------|-------------|
| [log-capture.md](log-capture.md) | Per-test device log capture: manager, providers, Device integration |
| [recording-manager.md](recording-manager.md) | `RecordingManager` map-key scoping and parallel per-device recording |
49 changes: 49 additions & 0 deletions docs/memory/device-node/recording-manager.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Recording Manager (device-node)

`RecordingManager` (`packages/device-node/src/device/RecordingManager.ts`) tracks active screen recordings per test. It was extended by change `260415-1mzp-multi-device-orchestration` to support parallel per-device recordings for the same `(runId, testId)` pair without breaking any single-device call site.

## Map-Key Scheme

Internal maps (`_recordingProcessMap`, `_recordingInfoMap`, etc.) are keyed by a string built by `getMapKey()`:

```typescript
getMapKey(runId: string, testId: string, deviceId?: string): string
```

| Args | Return | Used by |
|------|--------|---------|
| `getMapKey(runId, testId)` | `${runId}###${testId}` | All existing single-device call sites (byte-identical to pre-change) |
| `getMapKey(runId, testId, deviceId)` with non-empty `deviceId` | `${runId}###${testId}###${sanitizedDeviceId}` | Multi-device callers only |

`MAP_KEY_DELIMITER` is `###`. Device keys are sanitized defense-in-depth (the loader already rejects keys containing `###`).

## Opt-In Multi-Device Recording

`RecordingSessionStartParams` and `RecordingStopOptions` gained optional opt-in fields:

```typescript
interface RecordingSessionStartParams {
deviceId: string;
recordingRequest: RecordingRequest;
platform: string;
sdkVersion?: string;
useDeviceScopedKey?: boolean; // default false = byte-identical single-device behavior
}
interface RecordingStopOptions {
platform: string;
keepOutput?: boolean;
deviceId?: string; // omitted = legacy 2-part key; provided = 3-part key
}
```

Multi-device callers pass `useDeviceScopedKey: true` to `startRecording()` and `deviceId: <key>` to `stopRecording()`. Single-device callers continue to pass 2-arg `getMapKey()` calls and produce 2-part keys — unchanged on every byte.

## Parallel Recording

Multi-device orchestration wraps `Promise.all([startRecording(...alice), startRecording(...bob)])` at test start and `Promise.all([stopRecording(...alice), stopRecording(...bob)])` at teardown. Distinct map keys ensure no collision in any internal map.

## Design Decisions (from change 260415-1mzp)

- **Non-breaking optional 3rd arg** — alternative rejected: rename/resign the method signature. Every existing call site passes 2 args; a required 3rd arg would ripple across all single-device recording paths. The optional arg preserves byte-identical single-device behavior AND lets multi-device callers opt in explicitly.
- **Device-scoped key via opt-in flag** — `useDeviceScopedKey: boolean` on `startRecording` params (rather than auto-deriving from `deviceId` presence) makes the intent explicit at the call site. The flag and `deviceId` on stop options mirror each other.
- **Defense-in-depth sanitization** — loader enforcement is the primary guard against `###` in keys; `_sanitizeForFilename()` is the fallback in case an internal caller bypasses the loader.
9 changes: 9 additions & 0 deletions docs/memory/goal-executor/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# goal-executor

The `packages/goal-executor` package owns the iteration loop and planner calls. `TestExecutor` runs single-device tests; `MultiDeviceOrchestrator` runs 2-device tests. Both compose the shared `AIAgent` (Vercel AI SDK wrapper) and `ActionExecutor`.

## Memory Files

| File | Description |
|------|-------------|
| [multi-device-planner.md](multi-device-planner.md) | `AIAgent.planMulti()` sibling API, `MultiDeviceOrchestrator` iteration loop, step pointer, fail-fast |
Loading