From b62d8a1627ab674db56c3b1d3d02680ba4edae3b Mon Sep 17 00:00:00 2001 From: Rafael Garcia Date: Thu, 5 Feb 2026 15:14:15 -0500 Subject: [PATCH 1/3] RFC: browser event capture Add design document for a configurable browser event streaming system that captures CDP events (console, network, DOM, layout shifts, screenshots, interactions), tags them with tab/frame context, and writes them durably to S2 streams. Co-authored-by: Cursor --- .cursor/plans/2026-02-05-events.md | 259 +++++++++++++++++++++++++++++ devtools-protocol | 1 + 2 files changed, 260 insertions(+) create mode 100644 .cursor/plans/2026-02-05-events.md create mode 160000 devtools-protocol diff --git a/.cursor/plans/2026-02-05-events.md b/.cursor/plans/2026-02-05-events.md new file mode 100644 index 00000000..6fe160fe --- /dev/null +++ b/.cursor/plans/2026-02-05-events.md @@ -0,0 +1,259 @@ +# RFC: Browser Event Capture + +## Summary + +Add a configurable browser event streaming system to the image server that captures CDP events (console, network, DOM, layout shifts, screenshots, interactions), tags them with tab/frame context, and durably writes them to S2 streams for near-real-time multi-consumer access. Events are also available locally via an SSE endpoint. + +## Motivation + +Browser agents need real-time observability into what the browser is doing: console output, network traffic, DOM changes, navigation, layout shifts, and user interactions. Today there is no structured event stream from the image server. Agents rely on polling screenshots or manual CDP connections. + +This system provides: + +1. **Fine-grained, configurable capture** -- choose exactly which event categories to record, with per-category options (e.g., network with or without response bodies). +2. **Tab/iframe awareness** -- every event is tagged with target ID, session ID, and frame ID so consumers can distinguish events from different tabs and iframes. +3. **Smart waiting signals** -- computed meta-events (`network_idle`, `layout_settled`, `navigation_settled`) that are strictly more informative than Playwright's `networkidle` or `domcontentloaded`, enabling smarter wait strategies. +4. **Durable streaming via S2** -- events are written to an S2 stream for multi-consumer near-real-time access. + +## Architecture + +```mermaid +flowchart LR + Chrome[Chromium CDP] + Monitor[CDPMonitor goroutine] + LocalBuf[Local Ring Buffer] + S2Stream[S2 Stream] + SSE["GET /events/stream SSE"] + Agents[Agents / Consumers] + + Chrome -->|"WebSocket events"| Monitor + Monitor -->|"dual write"| LocalBuf + Monitor -->|"dual write"| S2Stream + LocalBuf --> SSE + SSE --> Agents + S2Stream --> Agents +``` + +The CDPMonitor opens its own CDP WebSocket to Chrome (using the existing `UpstreamManager.Current()` URL) and subscribes to configured CDP domains. It normalizes events into a common schema, tags each with tab/frame/target context, and dual-writes to both an S2 stream and a local ring buffer. The local buffer backs a `GET /events/stream` SSE endpoint. + +Default state is **off**. An explicit `POST /events/start` is required to begin capture. + +## CDP Library Choice + +Raw `coder/websocket` (already in `go.mod`). The protocol is just JSON-RPC over WebSocket: send `{id, method, params}`, receive events `{method, params, sessionId}` and responses `{id, result/error}`. This is the same approach the existing devtools proxy uses (`server/lib/devtoolsproxy/proxy.go`). No need for chromedp's abstraction layer since we're tapping events, not driving the browser. + +Reference protocol definitions are in `./devtools-protocol/` (cloned from [ChromeDevTools/devtools-protocol](https://github.com/ChromeDevTools/devtools-protocol)). + +## Event Schema + +Each event is a JSON record, capped at **1MB** (S2's record size limit): + +```go +type BrowserEvent struct { + Timestamp int64 `json:"ts"` // unix millis + Type string `json:"type"` // snake_case event name + TargetID string `json:"target_id,omitempty"` // CDP target ID (tab/window) + SessionID string `json:"session_id,omitempty"` // CDP session ID + FrameID string `json:"frame_id,omitempty"` // CDP frame ID + ParentFrameID string `json:"parent_frame_id,omitempty"` // non-empty = iframe + URL string `json:"url,omitempty"` // URL context + Data json.RawMessage `json:"data"` // event-specific payload + Truncated bool `json:"truncated,omitempty"` // true if payload was cut to fit 1MB +} +``` + +### Event Types + +**Raw CDP events** (forwarded from Chrome, enriched with target/frame context): + +| Type | CDP Source | Key Fields in `data` | +|------|-----------|---------------------| +| `console_log` | Runtime.consoleAPICalled | level, text, args, stack_trace | +| `console_error` | Runtime.exceptionThrown | text, line, column, url, stack_trace | +| `network_request` | Network.requestWillBeSent | method, url, headers, post_data, resource_type, initiator | +| `network_response` | Network.responseReceived + getResponseBody | status, status_text, url, headers, mime_type, timing, body (truncated at ~900KB) | +| `network_loading_failed` | Network.loadingFailed | url, error_text, canceled | +| `navigation` | Page.frameNavigated | url, frame_id, parent_frame_id | +| `dom_content_loaded` | Page.domContentEventFired | — | +| `page_load` | Page.loadEventFired | — | +| `dom_updated` | DOM.documentUpdated | — | +| `target_created` | Target.targetCreated | target_id, url, type | +| `target_destroyed` | Target.targetDestroyed | target_id | +| `interaction_click` | Injected JS | x, y, selector, tag, text | +| `interaction_key` | Injected JS | key, selector, tag | +| `interaction_scroll` | Injected JS | from_x, from_y, to_x, to_y, target_selector | +| `layout_shift` | Injected PerformanceObserver | score, sources (element, previous_rect, current_rect) | +| `screenshot` | ffmpeg x11grab (full display) | base64 PNG in data | + +**Computed meta-events** (emitted by the monitor's settling logic): + +| Type | Trigger | +|------|---------| +| `network_idle` | Pending request count at 0 for 500ms after navigation | +| `layout_settled` | No layout-shift entries for 1s after page_load | +| `scroll_settled` | No scroll events for 300ms with >5px movement | +| `navigation_settled` | `dom_content_loaded` AND `network_idle` AND `layout_settled` all fired | + +### How Computed Events Work + +**`network_idle`**: Counter incremented on `Network.requestWillBeSent`, decremented on `Network.loadingFinished` / `Network.loadingFailed`. After `Page.frameNavigated`, when counter hits 0, start a 500ms timer. If no new requests arrive in 500ms, emit `network_idle`. Reset on next navigation. + +**`layout_settled`**: After `Page.loadEventFired`, inject a [`PerformanceObserver`](https://developer.mozilla.org/en-US/docs/Web/API/PerformanceObserver) watching for [`layout-shift`](https://developer.mozilla.org/en-US/docs/Web/API/LayoutShift) entries. This is a browser API that fires whenever visible elements move position without user input (e.g., an image loads and pushes text down, a font swap changes line heights, lazy content appears). Each shift entry has a `value` (0-1 score) and `sources` (which DOM nodes moved, from/to rects). Poll via `Runtime.evaluate` every 500ms. Start a 1s timer after the last shift. If no new shifts in 1s, emit `layout_settled`. This captures visual stability that neither `networkidle` nor `domcontentloaded` can detect. + +**`scroll_settled`**: The injected interaction tracking JS coalesces scroll events with a 300ms debounce. When scrolling stops for 300ms with >5px total movement, emit `scroll_settled`. + +**`navigation_settled`**: Composite signal. After a navigation, track three booleans: `dom_content_loaded_fired`, `network_idle_fired`, `layout_settled_fired`. When all three are true, emit `navigation_settled`. This is strictly more informative than Playwright's `networkidle` or `domcontentloaded` because it also waits for visual stability. + +## API Endpoints + +Consistent with existing prefix pattern (`/recording/`, `/process/`, `/computer/`, `/fs/`, etc.): + +### `POST /events/start` + +Start event capture. Takes config body. If already running, reconfigures on the fly. Returns 200. + +```json +{ + "console": true, + "network": true, + "network_response_body": true, + "navigation": true, + "dom": true, + "layout_shifts": true, + "screenshots": true, + "screenshot_triggers": ["error", "navigation_settled"], + "targets": true, + "interactions": true, + "computed_events": true +} +``` + +All fields default to `false`. A minimal call: + +```json +{ "network": true } +``` + +### `POST /events/stop` + +Stop event capture. Returns 200. + +### `GET /events/stream` + +SSE stream of events from local ring buffer. Returns `text/event-stream`. Each SSE `data:` line is one `BrowserEvent` JSON. + +### Config Schema + +```yaml +EventCaptureConfig: + type: object + properties: + console: + type: boolean + description: Capture console logs and exceptions + network: + type: boolean + description: Capture network requests and responses + network_response_body: + type: boolean + description: Include response bodies (up to ~900KB, truncated beyond). Requires network=true + navigation: + type: boolean + description: Capture page navigation and load events + dom: + type: boolean + description: Capture DOM update events + layout_shifts: + type: boolean + description: Inject PerformanceObserver for layout shift detection + screenshots: + type: boolean + description: Capture full-display screenshots at key moments + screenshot_triggers: + type: array + items: + type: string + enum: [error, page_load, navigation_settled, scroll_settled, network_idle] + description: Which events trigger a screenshot. Default [error, navigation_settled] + targets: + type: boolean + description: Capture target (tab/window) creation/destruction + interactions: + type: boolean + description: Inject JS to track clicks, keys, scrolls + computed_events: + type: boolean + description: Emit computed meta-events (network_idle, layout_settled, scroll_settled, navigation_settled) +``` + +## Multi-Target via setAutoAttach + +To monitor all tabs and iframes, the monitor calls `Target.setAutoAttach` with `{autoAttach: true, waitForDebuggerOnStart: false, flatten: true}` on the browser-level CDP session. With `flatten: true`, all events from child targets arrive on the same WebSocket connection annotated with `sessionId`. The monitor maintains a `sessionId -> targetInfo` map (populated from `Target.targetCreated` / `Target.attachedToTarget` events) to enrich each event with target context (URL, type, targetId). + +## Screenshots + +Full-display screenshots using the existing ffmpeg x11grab approach (same as `TakeScreenshot` in `computer.go`). The PNG is base64-encoded and placed in the event `data` field. A typical 1920x1080 PNG screenshot is ~200-500KB base64, well under the 1MB S2 limit. Screenshots are triggered by configurable events (default: `error`, `navigation_settled`). + +## S2 Integration + +- **New dependency**: `github.com/s2-streamstore/s2-sdk-go` (v0.11.8, same as kernel repo) +- **Config env vars** (in `server/cmd/config/config.go`): + - `S2_ACCESS_TOKEN` -- S2 access token (optional; if absent, S2 writes are skipped) + - `S2_BASIN` -- S2 basin name + - `S2_STREAM_NAME` -- stream name for browser events +- **Write path**: CDPMonitor batches events (every 100ms or 50 events, whichever comes first) and calls `streamClient.Append()` with `[]AppendRecord`. Each record body is the JSON-serialized `BrowserEvent`. +- **Graceful degradation**: If S2 config is not provided, dual-write only goes to local buffer. SSE still works. + +## Files to Create / Modify + +### New Files + +| File | Purpose | +|------|---------| +| `server/lib/cdpmonitor/monitor.go` | Core: raw coder/websocket CDP client, domain enablement, setAutoAttach, event dispatch loop | +| `server/lib/cdpmonitor/events.go` | BrowserEvent struct, event type constants, JSON serialization, 1MB truncation | +| `server/lib/cdpmonitor/config.go` | EventCaptureConfig struct, validation, reconfiguration | +| `server/lib/cdpmonitor/settling.go` | Network idle state machine, layout shift observer injection/polling, composite navigation_settled | +| `server/lib/cdpmonitor/interactions.go` | JS injection for click/key/scroll tracking, 500ms polling, scroll 300ms debounce | +| `server/lib/cdpmonitor/screenshot.go` | Full-display screenshot via ffmpeg x11grab, base64 encode, triggered by event hooks | +| `server/lib/cdpmonitor/s2writer.go` | Batched S2 append writer, graceful degradation | +| `server/lib/cdpmonitor/buffer.go` | Ring buffer for local SSE subscribers | +| `server/cmd/api/api/events.go` | HTTP handlers for /events/start, /events/stop, /events/stream | + +### Modified Files + +| File | Changes | +|------|---------| +| `server/openapi.yaml` | Add POST /events/start, POST /events/stop, GET /events/stream endpoints | +| `server/cmd/api/api/api.go` | Add CDPMonitor field to ApiService | +| `server/cmd/api/main.go` | Wire up CDPMonitor with optional S2 client | +| `server/cmd/config/config.go` | Add S2_ACCESS_TOKEN, S2_BASIN, S2_STREAM_NAME env vars | +| `server/go.mod` | Add s2-sdk-go dependency | + +## Testing Plan + +### Unit Tests (`server/lib/cdpmonitor/*_test.go`) + +| File | Coverage | +|------|----------| +| `events_test.go` | Event serialization, 1MB truncation (verify truncated flag set, payload under limit), snake_case type validation | +| `config_test.go` | Config validation, defaults, reconfiguration merging, network_response_body requires network | +| `settling_test.go` | Network idle state machine (request counting, 500ms timer, reset on navigation), layout settled 1s timer, composite navigation_settled requires all 3 signals | +| `buffer_test.go` | Ring buffer overflow, subscriber catch-up, concurrent read/write safety | +| `s2writer_test.go` | Time-based and count-based flush batching, graceful skip when S2 not configured | + +### Integration Tests (`server/e2e/`) + +Tests are grouped to minimize container overhead. Each test function runs in a shared container. + +| File | Scenarios Covered | +|------|-------------------| +| `e2e_events_core_test.go` | **Lifecycle**: start/stop/restart capture. **Reconfigure**: start with network-only, verify no console events, reconfigure to add console, verify console events appear. **Console**: navigate to page with console.log/console.error, verify `console_log` and `console_error` events. **Network**: navigate to page that fetches an API, verify `network_request` + `network_response`, test with response bodies enabled, test large response truncation. | +| `e2e_events_navigation_test.go` | **Navigation & settling**: navigate between pages, verify `navigation`, `dom_content_loaded`, `page_load` events. Verify `network_idle`, `layout_settled`, `navigation_settled` fire in correct order. **Iframes**: load page with iframe, verify events carry correct `frame_id` and `parent_frame_id`. **Screenshots**: configure screenshot on `navigation_settled`, verify `screenshot` event with base64 PNG data. | +| `e2e_events_targets_test.go` | **Multi-target (setAutoAttach)**: open new tab via `window.open()`, verify `target_created` with correct URL and distinct `session_id`. Navigate in second tab, verify events attributed correctly. Close tab, verify `target_destroyed`. **Interactions**: click element, type in input, scroll page; verify `interaction_click`, `interaction_key`, `interaction_scroll`, `scroll_settled` events. | + +## Appendix: Prior Art + +- [dev3000 CDPMonitor](./dev3000/src/cdp-monitor.ts) -- TypeScript implementation of CDP event capture using raw `ws` WebSocket. Covers console, network, navigation, DOM, interactions (injected JS), and screenshot triggers. Connects to a single page target. +- [dev3000 ScreencastManager](./dev3000/src/screencast-manager.ts) -- Passive screencast capture and CLS detection using injected PerformanceObserver. Captures layout shift sources with element/rect details. +- [kernel API S2 usage](https://github.com/onkernel/kernel/tree/main/packages/api/lib/s2util) -- Go patterns for S2 read/write sessions using `s2-sdk-go`. diff --git a/devtools-protocol b/devtools-protocol new file mode 160000 index 00000000..92e7a2fa --- /dev/null +++ b/devtools-protocol @@ -0,0 +1 @@ +Subproject commit 92e7a2fa66a75e1d1d034c3da4606745d32f2e13 From 7b9c491bafebccf8b9e267fa83cfaed5b43ca30f Mon Sep 17 00:00:00 2001 From: Rafael Garcia Date: Thu, 5 Feb 2026 15:36:30 -0500 Subject: [PATCH 2/3] fix: clarify layout_settled semantics and screenshot sizing strategy - layout_settled: start 1s timer after page_load, reset on each shift, emit when timer expires. Handles zero-shift pages correctly. - screenshots: downscale PNG by halving dimensions if base64 exceeds ~950KB, rather than truncating (which corrupts binary data). Co-authored-by: Cursor --- .cursor/plans/2026-02-05-events.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/.cursor/plans/2026-02-05-events.md b/.cursor/plans/2026-02-05-events.md index 6fe160fe..050f4381 100644 --- a/.cursor/plans/2026-02-05-events.md +++ b/.cursor/plans/2026-02-05-events.md @@ -90,7 +90,7 @@ type BrowserEvent struct { | Type | Trigger | |------|---------| | `network_idle` | Pending request count at 0 for 500ms after navigation | -| `layout_settled` | No layout-shift entries for 1s after page_load | +| `layout_settled` | 1s of no layout-shift entries after page_load (timer resets on each shift) | | `scroll_settled` | No scroll events for 300ms with >5px movement | | `navigation_settled` | `dom_content_loaded` AND `network_idle` AND `layout_settled` all fired | @@ -98,7 +98,7 @@ type BrowserEvent struct { **`network_idle`**: Counter incremented on `Network.requestWillBeSent`, decremented on `Network.loadingFinished` / `Network.loadingFailed`. After `Page.frameNavigated`, when counter hits 0, start a 500ms timer. If no new requests arrive in 500ms, emit `network_idle`. Reset on next navigation. -**`layout_settled`**: After `Page.loadEventFired`, inject a [`PerformanceObserver`](https://developer.mozilla.org/en-US/docs/Web/API/PerformanceObserver) watching for [`layout-shift`](https://developer.mozilla.org/en-US/docs/Web/API/LayoutShift) entries. This is a browser API that fires whenever visible elements move position without user input (e.g., an image loads and pushes text down, a font swap changes line heights, lazy content appears). Each shift entry has a `value` (0-1 score) and `sources` (which DOM nodes moved, from/to rects). Poll via `Runtime.evaluate` every 500ms. Start a 1s timer after the last shift. If no new shifts in 1s, emit `layout_settled`. This captures visual stability that neither `networkidle` nor `domcontentloaded` can detect. +**`layout_settled`**: After `Page.loadEventFired`, inject a [`PerformanceObserver`](https://developer.mozilla.org/en-US/docs/Web/API/PerformanceObserver) watching for [`layout-shift`](https://developer.mozilla.org/en-US/docs/Web/API/LayoutShift) entries. This is a browser API that fires whenever visible elements move position without user input (e.g., an image loads and pushes text down, a font swap changes line heights, lazy content appears). Each shift entry has a `value` (0-1 score) and `sources` (which DOM nodes moved, from/to rects). Poll via `Runtime.evaluate` every 500ms. After `page_load`, start a 1s timer. Each time a layout shift is detected, reset the timer. When the timer expires (1s of quiet), emit `layout_settled`. For pages with zero layout shifts, this fires 1s after page_load. This captures visual stability that neither `networkidle` nor `domcontentloaded` can detect. **`scroll_settled`**: The injected interaction tracking JS coalesces scroll events with a 300ms debounce. When scrolling stops for 300ms with >5px total movement, emit `scroll_settled`. @@ -192,7 +192,7 @@ To monitor all tabs and iframes, the monitor calls `Target.setAutoAttach` with ` ## Screenshots -Full-display screenshots using the existing ffmpeg x11grab approach (same as `TakeScreenshot` in `computer.go`). The PNG is base64-encoded and placed in the event `data` field. A typical 1920x1080 PNG screenshot is ~200-500KB base64, well under the 1MB S2 limit. Screenshots are triggered by configurable events (default: `error`, `navigation_settled`). +Full-display screenshots using the existing ffmpeg x11grab approach (same as `TakeScreenshot` in `computer.go`). The PNG is base64-encoded and placed in the event `data` field. A typical 1920x1080 PNG screenshot is ~200-500KB base64, well under the 1MB S2 limit. If a screenshot exceeds ~950KB base64 (e.g., unusually complex screen content), downscale the image by halving dimensions and re-encode before embedding. This keeps the event under S2's 1MB record limit while preserving a usable PNG (never truncate binary data). Screenshots are triggered by configurable events (default: `error`, `navigation_settled`). ## S2 Integration From 4a3839829817a9510fcf892ce41f1152e065a801 Mon Sep 17 00:00:00 2001 From: Rafael Garcia Date: Fri, 6 Feb 2026 15:06:28 -0500 Subject: [PATCH 3/3] address review feedback: seq field, cdp_session_id, ring buffer architecture, failure tests MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add monotonic `seq` field to BrowserEvent for total ordering and SSE reconnection - Rename session_id → cdp_session_id to avoid confusion with Kernel sessions - Rewire architecture: monitor writes only to ring buffer, S2 writer is a consumer - Add CDP connection isolation note (confirmed from Chromium source) - Add monitor_disconnected/reconnected synthetic events for gap detection - Add e2e_events_failure_test.go for Chrome crash, ring buffer overflow, early start - Update SSE endpoint to include id: for Last-Event-ID support Co-authored-by: Cursor --- .cursor/plans/2026-02-05-events.md | 64 ++++++++++++++++++++---------- 1 file changed, 43 insertions(+), 21 deletions(-) diff --git a/.cursor/plans/2026-02-05-events.md b/.cursor/plans/2026-02-05-events.md index 050f4381..2a82be56 100644 --- a/.cursor/plans/2026-02-05-events.md +++ b/.cursor/plans/2026-02-05-events.md @@ -11,7 +11,7 @@ Browser agents need real-time observability into what the browser is doing: cons This system provides: 1. **Fine-grained, configurable capture** -- choose exactly which event categories to record, with per-category options (e.g., network with or without response bodies). -2. **Tab/iframe awareness** -- every event is tagged with target ID, session ID, and frame ID so consumers can distinguish events from different tabs and iframes. +2. **Tab/iframe awareness** -- every event is tagged with target ID, CDP session ID, and frame ID so consumers can distinguish events from different tabs and iframes. 3. **Smart waiting signals** -- computed meta-events (`network_idle`, `layout_settled`, `navigation_settled`) that are strictly more informative than Playwright's `networkidle` or `domcontentloaded`, enabling smarter wait strategies. 4. **Durable streaming via S2** -- events are written to an S2 stream for multi-consumer near-real-time access. @@ -21,20 +21,24 @@ This system provides: flowchart LR Chrome[Chromium CDP] Monitor[CDPMonitor goroutine] - LocalBuf[Local Ring Buffer] - S2Stream[S2 Stream] + RingBuf[Ring Buffer] + S2Writer[S2 Writer goroutine] SSE["GET /events/stream SSE"] + S2Stream[S2 Stream] Agents[Agents / Consumers] Chrome -->|"WebSocket events"| Monitor - Monitor -->|"dual write"| LocalBuf - Monitor -->|"dual write"| S2Stream - LocalBuf --> SSE + Monitor -->|"write"| RingBuf + RingBuf --> SSE + RingBuf --> S2Writer + S2Writer --> S2Stream SSE --> Agents S2Stream --> Agents ``` -The CDPMonitor opens its own CDP WebSocket to Chrome (using the existing `UpstreamManager.Current()` URL) and subscribes to configured CDP domains. It normalizes events into a common schema, tags each with tab/frame/target context, and dual-writes to both an S2 stream and a local ring buffer. The local buffer backs a `GET /events/stream` SSE endpoint. +The CDPMonitor opens its own CDP WebSocket to Chrome (using the existing `UpstreamManager.Current()` URL) and subscribes to configured CDP domains. It normalizes events into a common schema, tags each with tab/frame/target context, and writes to a local ring buffer. The ring buffer is the single write path; consumers include the SSE endpoint (`GET /events/stream`) and an S2 writer goroutine that batches and appends events to an S2 stream. This decouples S2 latency from CDP event processing. + +**CDP connection isolation**: Each CDP WebSocket connection gets its own `DevToolsSession` in Chrome with independent domain handler state. Enabling `Network` on the monitor's connection does not affect the user's CDP connection — events are dispatched only to the session that enabled the domain (confirmed from Chromium source: `devtools_session.cc`, `devtools_agent_host_impl.cc`). The overhead is one additional WebSocket + serialization of subscribed events. Benchmark under load once implemented. Default state is **off**. An explicit `POST /events/start` is required to begin capture. @@ -50,18 +54,21 @@ Each event is a JSON record, capped at **1MB** (S2's record size limit): ```go type BrowserEvent struct { - Timestamp int64 `json:"ts"` // unix millis - Type string `json:"type"` // snake_case event name - TargetID string `json:"target_id,omitempty"` // CDP target ID (tab/window) - SessionID string `json:"session_id,omitempty"` // CDP session ID - FrameID string `json:"frame_id,omitempty"` // CDP frame ID - ParentFrameID string `json:"parent_frame_id,omitempty"` // non-empty = iframe - URL string `json:"url,omitempty"` // URL context - Data json.RawMessage `json:"data"` // event-specific payload - Truncated bool `json:"truncated,omitempty"` // true if payload was cut to fit 1MB + Seq uint64 `json:"seq"` // monotonic sequence number, resets on server startup + Timestamp int64 `json:"ts"` // unix millis + Type string `json:"type"` // snake_case event name + TargetID string `json:"target_id,omitempty"` // CDP target ID (tab/window) + CDPSessionID string `json:"cdp_session_id,omitempty"` // CDP session ID (not Kernel session) + FrameID string `json:"frame_id,omitempty"` // CDP frame ID + ParentFrameID string `json:"parent_frame_id,omitempty"` // non-empty = iframe + URL string `json:"url,omitempty"` // URL context + Data json.RawMessage `json:"data"` // event-specific payload + Truncated bool `json:"truncated,omitempty"` // true if payload was cut to fit 1MB } ``` +The `seq` field provides total ordering within a capture session. Consumers can use `(seq, type, ts)` triples for deduplication (S2 provides at-least-once delivery). The counter is a `uint64` incremented atomically and resets when the server process restarts. + ### Event Types **Raw CDP events** (forwarded from Chrome, enriched with target/frame context): @@ -85,6 +92,15 @@ type BrowserEvent struct { | `layout_shift` | Injected PerformanceObserver | score, sources (element, previous_rect, current_rect) | | `screenshot` | ffmpeg x11grab (full display) | base64 PNG in data | +**Synthetic monitor events** (emitted by the monitor itself): + +| Type | Trigger | Key Fields in `data` | +|------|---------|---------------------| +| `monitor_disconnected` | CDP WebSocket to Chrome closed (crash, restart) | reason | +| `monitor_reconnected` | CDP WebSocket re-established after disconnect | reconnect_duration_ms | + +These events let consumers detect gaps in the event stream rather than silently missing events during Chrome restarts. + **Computed meta-events** (emitted by the monitor's settling logic): | Type | Trigger | @@ -140,7 +156,12 @@ Stop event capture. Returns 200. ### `GET /events/stream` -SSE stream of events from local ring buffer. Returns `text/event-stream`. Each SSE `data:` line is one `BrowserEvent` JSON. +SSE stream of events from local ring buffer. Returns `text/event-stream`. Each SSE event includes: + +- `id: ` -- the event's sequence number, enabling `Last-Event-ID` reconnection +- `data: ` -- one `BrowserEvent` per SSE event + +Clients can reconnect with `Last-Event-ID` to resume from where they left off (subject to ring buffer capacity). ### Config Schema @@ -188,7 +209,7 @@ EventCaptureConfig: ## Multi-Target via setAutoAttach -To monitor all tabs and iframes, the monitor calls `Target.setAutoAttach` with `{autoAttach: true, waitForDebuggerOnStart: false, flatten: true}` on the browser-level CDP session. With `flatten: true`, all events from child targets arrive on the same WebSocket connection annotated with `sessionId`. The monitor maintains a `sessionId -> targetInfo` map (populated from `Target.targetCreated` / `Target.attachedToTarget` events) to enrich each event with target context (URL, type, targetId). +To monitor all tabs and iframes, the monitor calls `Target.setAutoAttach` with `{autoAttach: true, waitForDebuggerOnStart: false, flatten: true}` on the browser-level CDP session. With `flatten: true`, all events from child targets arrive on the same WebSocket connection annotated with `sessionId`. The monitor maintains a `sessionId -> targetInfo` map (populated from `Target.targetCreated` / `Target.attachedToTarget` events) to enrich each event with target context (URL, type, targetId). The CDP `sessionId` is mapped to the `cdp_session_id` field in `BrowserEvent`. ## Screenshots @@ -201,8 +222,8 @@ Full-display screenshots using the existing ffmpeg x11grab approach (same as `Ta - `S2_ACCESS_TOKEN` -- S2 access token (optional; if absent, S2 writes are skipped) - `S2_BASIN` -- S2 basin name - `S2_STREAM_NAME` -- stream name for browser events -- **Write path**: CDPMonitor batches events (every 100ms or 50 events, whichever comes first) and calls `streamClient.Append()` with `[]AppendRecord`. Each record body is the JSON-serialized `BrowserEvent`. -- **Graceful degradation**: If S2 config is not provided, dual-write only goes to local buffer. SSE still works. +- **Write path**: The S2 writer is a consumer of the ring buffer, just like SSE clients. It reads events from the ring buffer, batches them (every 100ms or 50 events, whichever comes first), and calls `streamClient.Append()` with `[]AppendRecord`. Each record body is the JSON-serialized `BrowserEvent`. This single-write-path design means the CDP monitor never blocks on S2 latency. +- **Graceful degradation**: If S2 config is not provided, the S2 writer goroutine is not started. The ring buffer and SSE endpoint still work. ## Files to Create / Modify @@ -250,7 +271,8 @@ Tests are grouped to minimize container overhead. Each test function runs in a s |------|-------------------| | `e2e_events_core_test.go` | **Lifecycle**: start/stop/restart capture. **Reconfigure**: start with network-only, verify no console events, reconfigure to add console, verify console events appear. **Console**: navigate to page with console.log/console.error, verify `console_log` and `console_error` events. **Network**: navigate to page that fetches an API, verify `network_request` + `network_response`, test with response bodies enabled, test large response truncation. | | `e2e_events_navigation_test.go` | **Navigation & settling**: navigate between pages, verify `navigation`, `dom_content_loaded`, `page_load` events. Verify `network_idle`, `layout_settled`, `navigation_settled` fire in correct order. **Iframes**: load page with iframe, verify events carry correct `frame_id` and `parent_frame_id`. **Screenshots**: configure screenshot on `navigation_settled`, verify `screenshot` event with base64 PNG data. | -| `e2e_events_targets_test.go` | **Multi-target (setAutoAttach)**: open new tab via `window.open()`, verify `target_created` with correct URL and distinct `session_id`. Navigate in second tab, verify events attributed correctly. Close tab, verify `target_destroyed`. **Interactions**: click element, type in input, scroll page; verify `interaction_click`, `interaction_key`, `interaction_scroll`, `scroll_settled` events. | +| `e2e_events_targets_test.go` | **Multi-target (setAutoAttach)**: open new tab via `window.open()`, verify `target_created` with correct URL and distinct `cdp_session_id`. Navigate in second tab, verify events attributed correctly. Close tab, verify `target_destroyed`. **Interactions**: click element, type in input, scroll page; verify `interaction_click`, `interaction_key`, `interaction_scroll`, `scroll_settled` events. | +| `e2e_events_failure_test.go` | **Chrome crash/restart**: kill Chrome process during active capture, verify `monitor_disconnected` event with reason, verify automatic reconnection and `monitor_reconnected` event, verify domain re-subscription and events resume. **Ring buffer overflow**: generate high event volume (e.g., tight network request loop), verify oldest events are evicted without crash, verify SSE clients receive latest events. **Start before Chrome ready**: call `/events/start` before Chrome has finished launching, verify graceful error response (503) or queued start that activates once Chrome is available. | ## Appendix: Prior Art