Skip to content

Latest commit

 

History

History
314 lines (232 loc) · 9.14 KB

File metadata and controls

314 lines (232 loc) · 9.14 KB

Developer Integration Guide

Build your own browser automation agent with BrowserHand. This guide covers both local development and cloud deployment.

Architecture Overview

Your Agent / Platform
    ↓  HTTP POST /command (+ API Key in cloud mode)
BrowserHand Relay
    ↓  WebSocket (per-user routing in multi-user mode)
Chrome Extension (in user's browser)
    ↓  Chrome DevTools Protocol
User's real browser (logged in, cookies intact)

Local Development

For building and testing your agent on your own machine.

1. Start the Relay

cd relay && npm install && npx tsx index.ts

The relay starts on 127.0.0.1:29981, single-user, no auth required.

2. Load the Chrome Extension

chrome://extensions → Developer mode ON → Load unpacked → select extension/

3. Send Commands

Every command is a POST /command with {"action": "<name>", "params": {...}}.

curl -X POST http://127.0.0.1:29981/command \
  -H "Content-Type: application/json" \
  -d '{"action": "navigate", "params": {"url": "https://example.com"}}'

Response:

{
  "v": "1.1",
  "id": "cmd-1711700000-abc123",
  "success": true,
  "data": {
    "pageTitle": "Example Domain",
    "pageUrl": "https://example.com",
    "tabId": 42
  }
}

4. Typical Agent Flow

import requests

RELAY = "http://127.0.0.1:29981/command"

def cmd(action, **params):
    r = requests.post(RELAY, json={"action": action, "params": params})
    return r.json()

# 1. Navigate to target page
cmd("navigate", url="https://twitter.com")

# 2. Extract all interactive elements
result = cmd("extract")
elements = result["data"]["elements"]
# Each element: {"index": 0, "tag": "a", "text": "Home", "type": "link", "rect": {...}}

# 3. Find and click the element you need
for el in elements:
    if "compose" in el["text"].lower():
        cmd("click", index=el["index"])
        break

# 4. Type content
result = cmd("extract")  # re-extract after page change
for el in result["data"]["elements"]:
    if el["type"] == "textarea":
        cmd("type", index=el["index"], text="Hello from my agent!")
        break

# 5. Take a screenshot to verify
cmd("screenshot")

Cloud Deployment

For deploying BrowserHand as part of your platform, serving multiple users.

Environment Variables

Variable Default Description
BROWSERHAND_HOST 127.0.0.1 Bind address. Use 0.0.0.0 for cloud
BROWSERHAND_PORT 29981 Listen port
BROWSERHAND_MODE (single-user) Set to multi for multi-user connection pool
BROWSERHAND_API_KEY (no auth) When set, /command requires Authorization: Bearer <key>
BROWSERHAND_TOKEN (random) Manual token for WebSocket auth and /debug endpoint

Start in Cloud Mode

BROWSERHAND_HOST=0.0.0.0 \
BROWSERHAND_API_KEY=your-secret-api-key \
BROWSERHAND_MODE=multi \
BROWSERHAND_TOKEN=your-extension-token \
npx tsx index.ts

User Extension Connects

Each user installs the BrowserHand Chrome extension. There are two ways to authenticate:

Option A: Auto-token from Cookie (Recommended)

After the user logs into your platform, the extension automatically reads the auth token from a cookie. Zero manual configuration for the user.

Your platform sets a cookie on login:

// Your platform's login handler
res.cookie("auth_token", jwt, { domain: "your-platform.com", httpOnly: false });

The extension is pre-configured (or the user configures once) with:

  • Relay URL: ws://your-server.com:29981
  • Token Cookie URL: https://your-platform.com
  • Token Cookie Name: auth_token
  • User ID: user_123 (or auto-derived from the token on the relay side)

The extension reads the cookie, extracts the token, and connects: ws://your-server.com:29981?token=<jwt>&userId=user_123

The user just logs in and it works. No token copy-pasting.

Option B: Manual Token

For simpler setups, the user pastes a token directly in the extension popup:

  • Relay URL: ws://your-server.com:29981
  • Token: your-extension-token

The cloud settings are hidden by default in the extension popup under "Cloud / Platform Settings".

Option C: Customize the Extension

The extension source code is fully open. You can fork and modify it to fit your platform:

  • Hard-code the relay URL and cookie config so users don't need to configure anything
  • Add your own branding (icons, popup UI)
  • Pre-fill userId from your platform's login session
  • Distribute as your own Chrome extension on the Web Store

See extension/background.js — the resolveToken() and connect() functions are the entry points for auth customization.

Agent Sends Commands

Your agent sends commands with userId to target the right user's browser:

import requests

RELAY = "https://your-server.com:29981/command"
API_KEY = "your-secret-api-key"

def cmd(user_id, action, **params):
    r = requests.post(RELAY,
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={"action": action, "params": params, "userId": user_id})
    return r.json()

# Control user_123's browser
cmd("user_123", "navigate", url="https://twitter.com")
cmd("user_123", "extract")
cmd("user_123", "click", index=5)

Multi-User Flow

User A's browser                    Your Server
  Extension ──WebSocket──→  Relay (userId: user_a)
                               ↑
User B's browser               │  POST /command {"userId":"user_a"}
  Extension ──WebSocket──→  Relay (userId: user_b)
                               ↑
                            Your Agent
                               │  POST /command {"userId":"user_b"}

Check Connection Status

# Single-user mode
curl http://localhost:29981/status
# {"connected": true}

# Multi-user mode
curl http://your-server.com:29981/status
# {"connected": true, "users": ["user_a", "user_b"]}

MCP Integration

If your agent framework supports MCP (Model Context Protocol), you can use the BrowserHand MCP server instead of HTTP.

{
  "mcpServers": {
    "browserhand": {
      "command": "npx",
      "args": ["browserhand-mcp"]
    }
  }
}

The MCP server auto-starts the relay and registers all 26 actions as MCP tools. Your LLM agent can discover and call them automatically.

All 26 Actions

Core

Action Description Params
navigate Open a URL url, groupLabel?
extract Get all interactive elements
click Click element by index index
type Type text into element index, text
scroll Scroll page direction (up/down)
screenshot Capture viewport (returns base64 JPEG)
wait Wait N seconds seconds?

Navigation

Action Description Params
goback Browser back
goforward Browser forward
refresh Reload page
tab Manage tabs tabAction (list/switch/close/create), tabId?, url?

Interaction

Action Description Params
select Choose dropdown option index, value
hover Mouse hover index
keypress Key combination key, modifiers? (alt/ctrl/meta/shift)
clear Clear input field index
upload Upload file index, filePath
dialog Handle alert/confirm/prompt dialogAction (accept/dismiss), dialogText?

Data

Action Description Params
gettext Extract page text (max 10k chars)
exec Execute JavaScript script
cookie Manage cookies cookieAction (get/set/delete), cookieName, cookieValue?
network Monitor network networkAction (start/stop/get)

Capture

Action Description Params
fullscreenshot Full-page screenshot format? (png/jpeg)
pdf Export page as PDF (returns base64)

Scroll & Wait

Action Description Params
scrollto Scroll to element index
waitfor Wait for CSS selector selector, timeout? (ms, default 10000)

Response Format

All responses follow the same structure:

{
  "v": "1.1",
  "id": "cmd-...",
  "success": true,
  "data": {
    "pageTitle": "...",
    "pageUrl": "...",
    "tabId": 42,
    "elements": [...],
    "screenshot": "data:image/jpeg;base64,...",
    "error": "..."
  }
}
  • success: true — command executed, check data for results
  • success: false — command failed, check data.error for reason
  • data.error: "sensitive_page" — operation blocked by security policy

Security Notes

  • Sensitive pages are automatically blocked: banking sites (PayPal, Chase, HSBC...), password managers (1Password, Bitwarden, LastPass...), and sensitive paths (/login, /password, /payment, /checkout)
  • Tab grouping: pass groupLabel in navigate/tab actions to visually group agent-operated tabs
  • Cloud mode: always set BROWSERHAND_API_KEY to prevent unauthorized access to your relay
  • TLS: in production, put the relay behind a reverse proxy (nginx/Caddy) with TLS termination