Developer Integration Guide

Build your own browser automation agent with BrowserHand. This guide covers both local development and cloud deployment.

Architecture Overview

Your Agent / Platform
    ↓  HTTP POST /command (+ API Key in cloud mode)
BrowserHand Relay
    ↓  WebSocket (per-user routing in multi-user mode)
Chrome Extension (in user's browser)
    ↓  Chrome DevTools Protocol
User's real browser (logged in, cookies intact)

Local Development

For building and testing your agent on your own machine.

1. Start the Relay

cd relay && npm install && npx tsx index.ts

The relay starts on 127.0.0.1:29981, single-user, no auth required.

2. Load the Chrome Extension

chrome://extensions → Developer mode ON → Load unpacked → select extension/

3. Send Commands

Every command is a POST /command with {"action": "<name>", "params": {...}}.

curl -X POST http://127.0.0.1:29981/command \
  -H "Content-Type: application/json" \
  -d '{"action": "navigate", "params": {"url": "https://example.com"}}'

Response:

{
  "v": "1.1",
  "id": "cmd-1711700000-abc123",
  "success": true,
  "data": {
    "pageTitle": "Example Domain",
    "pageUrl": "https://example.com",
    "tabId": 42
  }
}

4. Typical Agent Flow

import requests

RELAY = "http://127.0.0.1:29981/command"

def cmd(action, **params):
    r = requests.post(RELAY, json={"action": action, "params": params})
    return r.json()

# 1. Navigate to target page
cmd("navigate", url="https://twitter.com")

# 2. Extract all interactive elements
result = cmd("extract")
elements = result["data"]["elements"]
# Each element: {"index": 0, "tag": "a", "text": "Home", "type": "link", "rect": {...}}

# 3. Find and click the element you need
for el in elements:
    if "compose" in el["text"].lower():
        cmd("click", index=el["index"])
        break

# 4. Type content
result = cmd("extract")  # re-extract after page change
for el in result["data"]["elements"]:
    if el["type"] == "textarea":
        cmd("type", index=el["index"], text="Hello from my agent!")
        break

# 5. Take a screenshot to verify
cmd("screenshot")

Cloud Deployment

For deploying BrowserHand as part of your platform, serving multiple users.

Environment Variables

Variable	Default	Description
`BROWSERHAND_HOST`	`127.0.0.1`	Bind address. Use `0.0.0.0` for cloud
`BROWSERHAND_PORT`	`29981`	Listen port
`BROWSERHAND_MODE`	(single-user)	Set to `multi` for multi-user connection pool
`BROWSERHAND_API_KEY`	(no auth)	When set, `/command` requires `Authorization: Bearer <key>`
`BROWSERHAND_TOKEN`	(random)	Manual token for WebSocket auth and `/debug` endpoint

Start in Cloud Mode

BROWSERHAND_HOST=0.0.0.0 \
BROWSERHAND_API_KEY=your-secret-api-key \
BROWSERHAND_MODE=multi \
BROWSERHAND_TOKEN=your-extension-token \
npx tsx index.ts

User Extension Connects

Each user installs the BrowserHand Chrome extension. There are two ways to authenticate:

Option A: Auto-token from Cookie (Recommended)

After the user logs into your platform, the extension automatically reads the auth token from a cookie. Zero manual configuration for the user.

Your platform sets a cookie on login:

// Your platform's login handler
res.cookie("auth_token", jwt, { domain: "your-platform.com", httpOnly: false });

The extension is pre-configured (or the user configures once) with:

Relay URL: ws://your-server.com:29981
Token Cookie URL: https://your-platform.com
Token Cookie Name: auth_token
User ID: user_123 (or auto-derived from the token on the relay side)

The extension reads the cookie, extracts the token, and connects: ws://your-server.com:29981?token=<jwt>&userId=user_123

The user just logs in and it works. No token copy-pasting.

Option B: Manual Token

For simpler setups, the user pastes a token directly in the extension popup:

Relay URL: ws://your-server.com:29981
Token: your-extension-token

The cloud settings are hidden by default in the extension popup under "Cloud / Platform Settings".

Option C: Customize the Extension

The extension source code is fully open. You can fork and modify it to fit your platform:

Hard-code the relay URL and cookie config so users don't need to configure anything
Add your own branding (icons, popup UI)
Pre-fill userId from your platform's login session
Distribute as your own Chrome extension on the Web Store

See extension/background.js — the resolveToken() and connect() functions are the entry points for auth customization.

Agent Sends Commands

Your agent sends commands with userId to target the right user's browser:

import requests

RELAY = "https://your-server.com:29981/command"
API_KEY = "your-secret-api-key"

def cmd(user_id, action, **params):
    r = requests.post(RELAY,
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={"action": action, "params": params, "userId": user_id})
    return r.json()

# Control user_123's browser
cmd("user_123", "navigate", url="https://twitter.com")
cmd("user_123", "extract")
cmd("user_123", "click", index=5)

Multi-User Flow

User A's browser                    Your Server
  Extension ──WebSocket──→  Relay (userId: user_a)
                               ↑
User B's browser               │  POST /command {"userId":"user_a"}
  Extension ──WebSocket──→  Relay (userId: user_b)
                               ↑
                            Your Agent
                               │  POST /command {"userId":"user_b"}

Check Connection Status

# Single-user mode
curl http://localhost:29981/status
# {"connected": true}

# Multi-user mode
curl http://your-server.com:29981/status
# {"connected": true, "users": ["user_a", "user_b"]}

MCP Integration

If your agent framework supports MCP (Model Context Protocol), you can use the BrowserHand MCP server instead of HTTP.

{
  "mcpServers": {
    "browserhand": {
      "command": "npx",
      "args": ["browserhand-mcp"]
    }
  }
}

The MCP server auto-starts the relay and registers all 26 actions as MCP tools. Your LLM agent can discover and call them automatically.

All 26 Actions

Core

Action	Description	Params
`navigate`	Open a URL	`url`, `groupLabel?`
`extract`	Get all interactive elements	—
`click`	Click element by index	`index`
`type`	Type text into element	`index`, `text`
`scroll`	Scroll page	`direction` (up/down)
`screenshot`	Capture viewport (returns base64 JPEG)	—
`wait`	Wait N seconds	`seconds?`

Navigation

Action	Description	Params
`goback`	Browser back	—
`goforward`	Browser forward	—
`refresh`	Reload page	—
`tab`	Manage tabs	`tabAction` (list/switch/close/create), `tabId?`, `url?`

Interaction

Action	Description	Params
`select`	Choose dropdown option	`index`, `value`
`hover`	Mouse hover	`index`
`keypress`	Key combination	`key`, `modifiers?` (alt/ctrl/meta/shift)
`clear`	Clear input field	`index`
`upload`	Upload file	`index`, `filePath`
`dialog`	Handle alert/confirm/prompt	`dialogAction` (accept/dismiss), `dialogText?`

Data

Action	Description	Params
`gettext`	Extract page text (max 10k chars)	—
`exec`	Execute JavaScript	`script`
`cookie`	Manage cookies	`cookieAction` (get/set/delete), `cookieName`, `cookieValue?`
`network`	Monitor network	`networkAction` (start/stop/get)

Capture

Action	Description	Params
`fullscreenshot`	Full-page screenshot	`format?` (png/jpeg)
`pdf`	Export page as PDF (returns base64)	—

Scroll & Wait

Action	Description	Params
`scrollto`	Scroll to element	`index`
`waitfor`	Wait for CSS selector	`selector`, `timeout?` (ms, default 10000)

Response Format

All responses follow the same structure:

{
  "v": "1.1",
  "id": "cmd-...",
  "success": true,
  "data": {
    "pageTitle": "...",
    "pageUrl": "...",
    "tabId": 42,
    "elements": [...],
    "screenshot": "data:image/jpeg;base64,...",
    "error": "..."
  }
}

success: true — command executed, check data for results
success: false — command failed, check data.error for reason
data.error: "sensitive_page" — operation blocked by security policy

Security Notes

Sensitive pages are automatically blocked: banking sites (PayPal, Chase, HSBC...), password managers (1Password, Bitwarden, LastPass...), and sensitive paths (/login, /password, /payment, /checkout)
Tab grouping: pass groupLabel in navigate/tab actions to visually group agent-operated tabs
Cloud mode: always set BROWSERHAND_API_KEY to prevent unauthorized access to your relay
TLS: in production, put the relay behind a reverse proxy (nginx/Caddy) with TLS termination

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Developer Integration Guide

Architecture Overview

Local Development

1. Start the Relay

2. Load the Chrome Extension

3. Send Commands

4. Typical Agent Flow

Cloud Deployment

Environment Variables

Start in Cloud Mode

User Extension Connects

Option A: Auto-token from Cookie (Recommended)

Option B: Manual Token

Option C: Customize the Extension

Agent Sends Commands

Multi-User Flow

Check Connection Status

MCP Integration

All 26 Actions

Core

Navigation

Interaction

Data

Capture

Scroll & Wait

Response Format

Security Notes

FilesExpand file tree

integration.md

Latest commit

History

integration.md

File metadata and controls

Developer Integration Guide

Architecture Overview

Local Development

1. Start the Relay

2. Load the Chrome Extension

3. Send Commands

4. Typical Agent Flow

Cloud Deployment

Environment Variables

Start in Cloud Mode

User Extension Connects

Option A: Auto-token from Cookie (Recommended)

Option B: Manual Token

Option C: Customize the Extension

Agent Sends Commands

Multi-User Flow

Check Connection Status

MCP Integration

All 26 Actions

Core

Navigation

Interaction

Data

Capture

Scroll & Wait

Response Format

Security Notes