Skip to content

RFC: browser event capture#145

Open
rgarcia wants to merge 2 commits intomainfrom
rfc/browser-event-capture
Open

RFC: browser event capture#145
rgarcia wants to merge 2 commits intomainfrom
rfc/browser-event-capture

Conversation

@rgarcia
Copy link
Contributor

@rgarcia rgarcia commented Feb 5, 2026

Summary

Design document for a configurable browser event streaming system on the image server.

  • Captures CDP events (console, network, DOM, layout shifts, screenshots, interactions) via raw WebSocket to Chrome
  • Tags every event with tab/frame/target context (session ID, target ID, frame ID) using Target.setAutoAttach with flatten: true
  • Computes meta-events for smart waiting: network_idle, layout_settled, navigation_settled (composite of dom_content_loaded + network_idle + layout_settled)
  • Dual-writes events to S2 streams (durable, multi-consumer) and a local ring buffer (SSE endpoint)
  • All capture is off by default; turned on/reconfigured via POST /events/start with a config body
  • Events capped at 1MB (S2 limit); large network response bodies truncated with a truncated flag

The full RFC is in .cursor/plans/2026-02-05-events.md. Also adds devtools-protocol/ as a reference for CDP domain definitions.

Test plan

  • Review RFC for completeness and correctness
  • Validate event schema covers agent use cases
  • Validate computed settling signals are useful wait primitives
  • Confirm S2 integration approach matches existing kernel patterns

Made with Cursor


Note

Low Risk
Documentation-only change; no runtime behavior, APIs, or dependencies are modified in this PR.

Overview
Adds a new RFC document (.cursor/plans/2026-02-05-events.md) proposing a configurable browser event capture pipeline, including an event schema, computed “settling” meta-events, and APIs (POST /events/start, POST /events/stop, GET /events/stream) plus optional durable streaming to S2.

No production code changes are included in this PR; it is design/specification only, outlining planned new packages, config env vars, and a testing approach.

Written by Cursor Bugbot for commit 7b9c491. This will update automatically on new commits. Configure here.

Add design document for a configurable browser event streaming system
that captures CDP events (console, network, DOM, layout shifts,
screenshots, interactions), tags them with tab/frame context, and
writes them durably to S2 streams.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

- layout_settled: start 1s timer after page_load, reset on each shift,
  emit when timer expires. Handles zero-shift pages correctly.
- screenshots: downscale PNG by halving dimensions if base64 exceeds
  ~950KB, rather than truncating (which corrupts binary data).

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Contributor Author

@rgarcia rgarcia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both addressed in 7b9c491. Thanks for the catches.

| Type | Trigger |
|------|---------|
| `network_idle` | Pending request count at 0 for 500ms after navigation |
| `layout_settled` | 1s of no layout-shift entries after page_load (timer resets on each shift) |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch -- the table and description were contradictory. Fixed in 7b9c491: after page_load, start a 1s timer. Each layout shift resets the timer. layout_settled fires when the timer expires (1s of quiet). For zero-shift pages, this correctly fires 1s after page_load.

| `interaction_key` | Injected JS | key, selector, tag |
| `interaction_scroll` | Injected JS | from_x, from_y, to_x, to_y, target_selector |
| `layout_shift` | Injected PerformanceObserver | score, sources (element, previous_rect, current_rect) |
| `screenshot` | ffmpeg x11grab (full display) | base64 PNG in data |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid concern -- truncating base64 PNG data produces corrupt output. We don't support 4K displays so this is unlikely in practice, but the plan now specifies: if the base64 PNG exceeds ~950KB, downscale by halving dimensions and re-encode. This keeps a usable PNG under the 1MB S2 limit. Fixed in 7b9c491.

Copy link
Contributor

@ulziibay-kernel ulziibay-kernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this an infra through we we can log IP addresses the browser sessions are assigned to? That is highly relevant for https://linear.app/onkernel/issue/KERNEL-801/residential-ip-reputation-measurement

Copy link
Contributor

@Sayan- Sayan- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

building on top of CDP, S2 makes sense! I think the main risks are going to be some of the signal settling + chromium lifecycle handling but all solvable problems

S2Stream --> Agents
```

The CDPMonitor opens its own CDP WebSocket to Chrome (using the existing `UpstreamManager.Current()` URL) and subscribes to configured CDP domains. It normalizes events into a common schema, tags each with tab/frame/target context, and dual-writes to both an S2 stream and a local ring buffer. The local buffer backs a `GET /events/stream` SSE endpoint.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

open question: what are the performance / IO implications of a second CDP WebSocket connection on these unikernels? with both the user's CDP session and the monitor subscribing to overlapping domains (e.g. Network), Chrome doubles the event traffic. worth benchmarking under load once implemented.


The CDPMonitor opens its own CDP WebSocket to Chrome (using the existing `UpstreamManager.Current()` URL) and subscribes to configured CDP domains. It normalizes events into a common schema, tags each with tab/frame/target context, and dual-writes to both an S2 stream and a local ring buffer. The local buffer backs a `GET /events/stream` SSE endpoint.

Default state is **off**. An explicit `POST /events/start` is required to begin capture.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when Chrome crashes and restarts mid-capture, the monitor's WebSocket dies and events are lost until reconnect. consider emitting synthetic monitor_disconnected / monitor_reconnected events so consumers know there's a gap in the stream rather than silently missing events.

Each event is a JSON record, capped at **1MB** (S2's record size limit):

```go
type BrowserEvent struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how should a downstream consumer ensure event ordering? two events can share the same millisecond timestamp. also, how should consumers deduplicate events? (S2 provides at-least-once delivery, so duplicates are possible.)

Timestamp int64 `json:"ts"` // unix millis
Type string `json:"type"` // snake_case event name
TargetID string `json:"target_id,omitempty"` // CDP target ID (tab/window)
SessionID string `json:"session_id,omitempty"` // CDP session ID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: session_id here means the CDP session ID, but in the broader Kernel system "session" means the user's browser session. consider cdp_session_id to avoid confusion for consumers, or at minimum add a doc comment clarifying.


### Event Types

**Raw CDP events** (forwarded from Chrome, enriched with target/frame context):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as designed, each event type requires a custom transform from CDP params to the data schema adding a new event type means writing a new handler, which seems reasonable. I don't think attempting to generically passthrough all CDP events across whatever domains the users enable is quite right but figured I'd double check the semantics we're initially landing on

- `S2_ACCESS_TOKEN` -- S2 access token (optional; if absent, S2 writes are skipped)
- `S2_BASIN` -- S2 basin name
- `S2_STREAM_NAME` -- stream name for browser events
- **Write path**: CDPMonitor batches events (every 100ms or 50 events, whichever comes first) and calls `streamClient.Append()` with `[]AppendRecord`. Each record body is the JSON-serialized `BrowserEvent`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the monitor write to the ring buffer only, with the S2 writer as another reader? single write path, naturally decouples S2 latency from CDP processing, and the S2 writer is just another consumer like SSE clients.


## Testing Plan

### Unit Tests (`server/lib/cdpmonitor/*_test.go`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test plan covers happy paths well but doesn't mention failure modes: Chrome crash/restart during capture, ring buffer overflow under high event volume, or calling /events/start when Chrome isn't ready yet. worth adding at least the Chrome lifecycle case since that's a real production scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants