Skip to content

feat: support direct CDP webview targets (/devtools/page/<id>)#1

Draft
seflless wants to merge 2 commits intomainfrom
francois/direct-webview-cdp-target
Draft

feat: support direct CDP webview targets (/devtools/page/<id>)#1
seflless wants to merge 2 commits intomainfrom
francois/direct-webview-cdp-target

Conversation

@seflless
Copy link
Copy Markdown
Contributor

@seflless seflless commented Mar 2, 2026

Summary

This PR adds direct CDP target support for WebSocket endpoints like:

  • ws://127.0.0.1:9222/devtools/page/<targetId>

This unblocks automating Electron webview targets that are visible in http://127.0.0.1:9222/json/list but are not exposed as normal Playwright Page objects via chromium.connectOverCDP().

Why this exists

Current agent-browser assumes browser-level CDP endpoints (/devtools/browser/...) and requires Playwright contexts/pages after connect. For direct target endpoints (/devtools/page/...) that fails with:

  • No page found. Make sure the app has loaded content.

This PR introduces a second execution mode for direct targets.

What's in this PR (implemented)

1) New direct-target mode in BrowserManager

  • Detects direct target URLs (/devtools/page/<id>)
  • Opens a raw CDP WebSocket connection for those targets
  • Adds direct CDP command transport (Runtime.evaluate, Page.captureScreenshot, etc.)
  • Tracks direct target connection liveness and cleanup
  • Treats direct mode as single-target with tab index 0

2) Action handler support for direct mode

The following actions now branch to direct-target implementations:

  • navigate
  • snapshot
  • evaluate
  • click
  • fill
  • type
  • press
  • screenshot
  • wait
  • scroll
  • url
  • title

3) Tab behavior in direct mode

  • tab works and returns one target
  • tab 0 works
  • switching to non-zero tab in direct mode returns an explicit error

4) Regression test

Added test coverage for direct /devtools/page/... endpoint support in src/browser.test.ts using a local mock WebSocket CDP server.

What's not in this PR (yet)

  • Full parity for every command in direct mode
  • Dedicated target discovery UX (targets, webviews, target switch)
  • Unified abstraction across Playwright-page and direct-CDP backends
  • Docs/help text updates for new direct target behavior

Validation run

  • npx vitest run src/browser.test.ts
  • npm run build
  • Manual e2e on Electron webview target:
    • open HN
    • click comments
    • click article link from comments page
    • screenshot + URL/title verification

Risks / tradeoffs

  • Direct mode currently uses JS evaluation for many interactions (not Playwright locators)
  • Behavior differences may exist for complex selectors, frames, and advanced interactions
  • Some commands remain intentionally unsupported outside Playwright-page mode

Recommended next steps (near-term)

  1. Add docs/help updates for direct target mode and caveats
  2. Add explicit error messaging for unsupported commands in direct mode
  3. Add tests for:
    • click/fill/type/press by selector and by snapshot ref
    • navigation wait semantics
    • screenshot path handling
  4. Add a command for target enumeration/selection (see architecture note below)

End game for parity (proposed architecture)

Recommendation: target abstraction, not a separate product surface

I recommend keeping the primary command surface (open, click, snapshot, etc.) unchanged and implementing a backend adapter model:

  • PlaywrightPageTargetAdapter (existing behavior)
  • DirectCdpTargetAdapter (webview/direct-target behavior)

Then optionally add target management commands (not a separate automation language), e.g.:

  • agent-browser target list
  • agent-browser target use <id>
  • agent-browser target list --type webview

This gives users one mental model for interactions while making target selection explicit.

Should webviews be “just another page”?

  • UX-wise: yes, as much as possible.
  • implementation-wise: no, not literally the same Page object because Playwright does not always expose direct webview targets as Page in connected contexts.

So, emulate page parity at command/API level using adapters.

Upstream interaction strategy (recommended)

  1. Keep this fork branch as a working implementation baseline
  2. Open an upstream issue with:
    • minimal reproduction
    • failing behavior in current main
    • this PR’s architecture notes and scope
  3. Share this fork branch/commit as a concrete reference implementation
  4. Offer to upstream in phases:
    • Phase A: direct target connect + minimal command set
    • Phase B: target discovery/selection UX
    • Phase C: broader parity and adapter refactor

Fork maintenance recommendation

  • Maintain a short-lived functional fork only while upstream discussion/PRs are active
  • Rebase frequently on upstream main
  • Avoid divergence in CLI semantics unless absolutely necessary for your product

Files changed

  • src/browser.ts
  • src/actions.ts
  • src/browser.test.ts

Attached Investigation Note

Checked in for traceability:

  • .github/notes/agent-browser-electron-webview-cdp-issue-v1.md

Permalink in this PR branch:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant