Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 80 additions & 0 deletions .github/notes/agent-browser-electron-webview-cdp-issue-v1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Agent-Browser + Electron `<webview>` CDP Issue

**Date:** March 1, 2026
**Project:** Decode desktop app (`Electron`)
**Context:** Trying to automate embedded `<webview>` instances (example: Google home page) via `agent-browser` over CDP.

## Summary

`agent-browser` can connect to the Electron CDP port (e.g. `9222`) and control the top-level app window, but it cannot attach to a direct target endpoint like:

- `ws://localhost:9222/devtools/page/<webviewTargetId>`

Even though Electron exposes the webview target correctly at `/json`, `agent-browser` throws:

- `No page found. Make sure the app has loaded content.`

## What Was Verified

1. Electron CDP target discovery is working.
- `http://localhost:9222/json` lists both:
- `type: "page"` (Decode main window)
- `type: "webview"` (Google)

2. The webview WebSocket endpoint itself is valid.
- Direct raw CDP calls to `ws://localhost:9222/devtools/page/<webviewId>` successfully returned:
- `document.title = "Google"`
- `location.href = "https://www.google.com/"`

3. `agent-browser` fails on that same target endpoint.
- Command reproduced:
- `agent-browser --cdp "ws://localhost:9222/devtools/page/<webviewId>" snapshot`
- Result:
- `No page found. Make sure the app has loaded content.`

## Root Cause

`agent-browser` currently assumes CDP connections resolve to browser-level Playwright contexts/pages.

When given a **target-level** endpoint (`/devtools/page/<id>`), it still runs browser-context/page validation. That validation fails for this mode and exits early.

In short:
- Discovery endpoint is fine.
- Webview target endpoint is fine.
- `agent-browser` connection model does not support this endpoint type yet.

## Code References (Where It Fails)

Installed `agent-browser` (`0.7.6`) checks contexts/pages after `connectOverCDP`:

- `/Users/francoislaberge/.nvm/versions/node/v22.16.0/lib/node_modules/agent-browser/dist/browser.js#L796`
- `/Users/francoislaberge/.nvm/versions/node/v22.16.0/lib/node_modules/agent-browser/dist/browser.js#L804`
- `/Users/francoislaberge/.nvm/versions/node/v22.16.0/lib/node_modules/agent-browser/dist/browser.js#L809`
- `/Users/francoislaberge/.nvm/versions/node/v22.16.0/lib/node_modules/agent-browser/dist/browser.js#L811`

Latest `agent-browser` (`0.15.1`) has same pattern:

- `/Users/francoislaberge/conductor/workspaces/decode-next/berlin-v3/.context/tmp-agent-browser/package/dist/browser.js#L1277`
- `/Users/francoislaberge/conductor/workspaces/decode-next/berlin-v3/.context/tmp-agent-browser/package/dist/browser.js#L1292`
- `/Users/francoislaberge/conductor/workspaces/decode-next/berlin-v3/.context/tmp-agent-browser/package/dist/browser.js#L1294`

## Notes About Electron Skill Docs

The upstream Electron skill doc suggests webviews should be accessible via `agent-browser tab`, but it does not document a direct `/devtools/page/<id>` workflow or workaround.

Reference:
- https://github.com/vercel-labs/agent-browser/blob/main/skills/electron/SKILL.md

## Practical Workarounds Right Now

1. Use `agent-browser` with browser-level endpoint (`9222` or `/devtools/browser/<id>`) for main-window automation.
2. Use raw CDP (custom script/tooling) for direct webview-target automation (`/devtools/page/<webviewId>`).

## Proposed Fix in Agent-Browser

To support webview target endpoints directly, `agent-browser` would need a dedicated target mode:

1. Detect `.../devtools/page/<id>` endpoints in `connectViaCDP`.
2. Skip browser `contexts/pages` validation in that mode.
3. Route commands (`snapshot`, `click`, `type`, `fill`, `press`, `eval`, `screenshot`) through raw CDP domains (`Runtime`, `DOM`, `Input`, `Page`) instead of Playwright `Page` APIs.

115 changes: 105 additions & 10 deletions src/actions.ts
Original file line number Diff line number Diff line change
Expand Up @@ -615,6 +615,14 @@ async function handleNavigate(
): Promise<Response<NavigateData>> {
browser.checkDomainAllowed(command.url);

if (browser.isDirectTargetMode()) {
await browser.directNavigate(command.url, command.waitUntil ?? 'load');
return successResponse(command.id, {
url: await browser.directGetUrl(),
title: await browser.directGetTitle(),
});
}

const page = browser.getPage();

// If headers are provided, set up scoped headers for this origin
Expand All @@ -633,6 +641,14 @@ async function handleNavigate(
}

async function handleClick(command: ClickCommand, browser: BrowserManager): Promise<Response> {
if (browser.isDirectTargetMode()) {
if (command.newTab) {
throw new Error('--new-tab is not supported in direct target mode');
}
await browser.directClick(command.selector);
return successResponse(command.id, { clicked: true });
}

// Support both refs (@e1) and regular selectors
const locator = browser.getLocator(command.selector);

Expand Down Expand Up @@ -676,6 +692,11 @@ async function handleClick(command: ClickCommand, browser: BrowserManager): Prom
}

async function handleType(command: TypeCommand, browser: BrowserManager): Promise<Response> {
if (browser.isDirectTargetMode()) {
await browser.directType(command.selector, command.text, command.clear);
return successResponse(command.id, { typed: true });
}

const locator = browser.getLocator(command.selector);

try {
Expand All @@ -694,6 +715,11 @@ async function handleType(command: TypeCommand, browser: BrowserManager): Promis
}

async function handlePress(command: PressCommand, browser: BrowserManager): Promise<Response> {
if (browser.isDirectTargetMode()) {
await browser.directPress(command.key, command.selector);
return successResponse(command.id, { pressed: true });
}

const page = browser.getPage();

if (command.selector) {
Expand All @@ -719,6 +745,21 @@ async function handleScreenshot(
command: ScreenshotCommand,
browser: BrowserManager
): Promise<Response<ScreenshotData>> {
if (browser.isDirectTargetMode()) {
let savePath = command.path;
if (!savePath) {
const timestamp = new Date().toISOString().replace(/[:.]/g, '-');
const random = Math.random().toString(36).substring(2, 8);
const filename = `screenshot-${timestamp}-${random}.png`;
const screenshotDir = path.join(getAppDir(), 'tmp', 'screenshots');
mkdirSync(screenshotDir, { recursive: true });
savePath = path.join(screenshotDir, filename);
}
const b64 = await browser.directCaptureScreenshotPngBase64();
fs.writeFileSync(savePath, Buffer.from(b64, 'base64'));
return successResponse(command.id, { path: savePath });
}

const page = browser.getPage();

const options: Parameters<Page['screenshot']>[0] = {
Expand Down Expand Up @@ -919,33 +960,41 @@ async function handleSnapshot(
},
browser: BrowserManager
): Promise<Response<SnapshotData>> {
// Use enhanced snapshot with refs and optional filtering
const { tree, refs } = await browser.getSnapshot({
interactive: command.interactive,
cursor: command.cursor,
maxDepth: command.maxDepth,
compact: command.compact,
selector: command.selector,
});
const { tree, refs } = browser.isDirectTargetMode()
? await browser.directSnapshot({ interactive: command.interactive })
: await browser.getSnapshot({
interactive: command.interactive,
cursor: command.cursor,
maxDepth: command.maxDepth,
compact: command.compact,
selector: command.selector,
});

// Simplify refs for output (just role and name)
const simpleRefs: Record<string, { role: string; name: string }> = {};
for (const [ref, data] of Object.entries(refs)) {
simpleRefs[ref] = { role: data.role, name: data.name };
}

const page = browser.getPage();
const origin = browser.isDirectTargetMode()
? await browser.directGetUrl()
: browser.getPage().url();
return successResponse(command.id, {
snapshot: tree || 'Empty page',
refs: Object.keys(simpleRefs).length > 0 ? simpleRefs : undefined,
origin: page.url(),
origin,
});
}

async function handleEvaluate(
command: EvaluateCommand,
browser: BrowserManager
): Promise<Response<EvaluateData>> {
if (browser.isDirectTargetMode()) {
const result = await browser.directEvaluate(command.script);
return successResponse(command.id, { result, origin: await browser.directGetUrl() });
}

const page = browser.getPage();

// Evaluate the script directly as a string expression
Expand All @@ -955,6 +1004,15 @@ async function handleEvaluate(
}

async function handleWait(command: WaitCommand, browser: BrowserManager): Promise<Response> {
if (browser.isDirectTargetMode()) {
await browser.directWait({
selector: command.selector,
state: command.selector ? command.state : undefined,
timeout: command.timeout,
});
return successResponse(command.id, { waited: true });
}

const page = browser.getPage();

if (command.selector) {
Expand All @@ -973,6 +1031,30 @@ async function handleWait(command: WaitCommand, browser: BrowserManager): Promis
}

async function handleScroll(command: ScrollCommand, browser: BrowserManager): Promise<Response> {
if (browser.isDirectTargetMode()) {
let deltaX = command.x ?? 0;
let deltaY = command.y ?? 0;
if (command.direction) {
const amount = command.amount ?? 100;
switch (command.direction) {
case 'up':
deltaY = -amount;
break;
case 'down':
deltaY = amount;
break;
case 'left':
deltaX = -amount;
break;
case 'right':
deltaX = amount;
break;
}
}
await browser.directScroll(command.selector, deltaX, deltaY);
return successResponse(command.id, { scrolled: true });
}

const page = browser.getPage();

let deltaX = command.x ?? 0;
Expand Down Expand Up @@ -1121,6 +1203,11 @@ async function handleWindowNew(
// New handlers for enhanced Playwright parity

async function handleFill(command: FillCommand, browser: BrowserManager): Promise<Response> {
if (browser.isDirectTargetMode()) {
await browser.directFill(command.selector, command.value);
return successResponse(command.id, { filled: true });
}

const locator = browser.getLocator(command.selector);
try {
await locator.fill(command.value);
Expand Down Expand Up @@ -1548,6 +1635,10 @@ async function handleUrl(
command: Command & { action: 'url' },
browser: BrowserManager
): Promise<Response> {
if (browser.isDirectTargetMode()) {
return successResponse(command.id, { url: await browser.directGetUrl() });
}

const page = browser.getPage();
return successResponse(command.id, { url: page.url() });
}
Expand All @@ -1556,6 +1647,10 @@ async function handleTitle(
command: Command & { action: 'title' },
browser: BrowserManager
): Promise<Response> {
if (browser.isDirectTargetMode()) {
return successResponse(command.id, { title: await browser.directGetTitle() });
}

const page = browser.getPage();
const title = await page.title();
return successResponse(command.id, { title });
Expand Down
57 changes: 57 additions & 0 deletions src/browser.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ import { describe, it, expect, beforeAll, afterAll, beforeEach, afterEach, vi }
import { BrowserManager, getDefaultTimeout } from './browser.js';
import { executeCommand } from './actions.js';
import { chromium } from 'playwright-core';
import { WebSocketServer } from 'ws';

describe('BrowserManager', () => {
let browser: BrowserManager;
Expand Down Expand Up @@ -902,6 +903,62 @@ describe('BrowserManager', () => {
expect(urls).toContain('http://example.com');
spy.mockRestore();
});

it('should connect to a direct /devtools/page target and expose a single tab', async () => {
const wss = new WebSocketServer({ port: 0 });
const address = wss.address();
const port = typeof address === 'object' && address ? address.port : 0;

wss.on('connection', (socket) => {
socket.on('message', (raw) => {
const msg = JSON.parse(raw.toString()) as {
id: number;
method: string;
params?: { expression?: string };
};

if (msg.method === 'Runtime.evaluate') {
const expression = msg.params?.expression ?? '';
let value: unknown = true;
if (expression.includes('location.href')) value = 'https://www.google.com/';
if (expression.includes('document.title')) value = 'Google';
socket.send(
JSON.stringify({
id: msg.id,
result: {
result: {
type: typeof value,
value,
},
},
})
);
return;
}

socket.send(JSON.stringify({ id: msg.id, result: {} }));
});
});

const directBrowser = new BrowserManager();
await directBrowser.launch({
id: 'direct-1',
action: 'launch',
cdpUrl: `ws://127.0.0.1:${port}/devtools/page/ABC123`,
});

const tabs = await directBrowser.listTabs();
expect(directBrowser.isDirectTargetMode()).toBe(true);
expect(tabs).toHaveLength(1);
expect(tabs[0].url).toBe('https://www.google.com/');
expect(tabs[0].title).toBe('Google');
await expect(directBrowser.switchTo(1)).rejects.toThrow(
'Direct target mode only supports tab index 0'
);

await directBrowser.close();
await new Promise<void>((resolve) => wss.close(() => resolve()));
});
});

describe('screencast', () => {
Expand Down
Loading