Build your own browser automation agent with BrowserHand. This guide covers both local development and cloud deployment.
Your Agent / Platform
↓ HTTP POST /command (+ API Key in cloud mode)
BrowserHand Relay
↓ WebSocket (per-user routing in multi-user mode)
Chrome Extension (in user's browser)
↓ Chrome DevTools Protocol
User's real browser (logged in, cookies intact)
For building and testing your agent on your own machine.
cd relay && npm install && npx tsx index.tsThe relay starts on 127.0.0.1:29981, single-user, no auth required.
chrome://extensions→ Developer mode ON → Load unpacked → selectextension/
Every command is a POST /command with {"action": "<name>", "params": {...}}.
curl -X POST http://127.0.0.1:29981/command \
-H "Content-Type: application/json" \
-d '{"action": "navigate", "params": {"url": "https://example.com"}}'Response:
{
"v": "1.1",
"id": "cmd-1711700000-abc123",
"success": true,
"data": {
"pageTitle": "Example Domain",
"pageUrl": "https://example.com",
"tabId": 42
}
}import requests
RELAY = "http://127.0.0.1:29981/command"
def cmd(action, **params):
r = requests.post(RELAY, json={"action": action, "params": params})
return r.json()
# 1. Navigate to target page
cmd("navigate", url="https://twitter.com")
# 2. Extract all interactive elements
result = cmd("extract")
elements = result["data"]["elements"]
# Each element: {"index": 0, "tag": "a", "text": "Home", "type": "link", "rect": {...}}
# 3. Find and click the element you need
for el in elements:
if "compose" in el["text"].lower():
cmd("click", index=el["index"])
break
# 4. Type content
result = cmd("extract") # re-extract after page change
for el in result["data"]["elements"]:
if el["type"] == "textarea":
cmd("type", index=el["index"], text="Hello from my agent!")
break
# 5. Take a screenshot to verify
cmd("screenshot")For deploying BrowserHand as part of your platform, serving multiple users.
| Variable | Default | Description |
|---|---|---|
BROWSERHAND_HOST |
127.0.0.1 |
Bind address. Use 0.0.0.0 for cloud |
BROWSERHAND_PORT |
29981 |
Listen port |
BROWSERHAND_MODE |
(single-user) | Set to multi for multi-user connection pool |
BROWSERHAND_API_KEY |
(no auth) | When set, /command requires Authorization: Bearer <key> |
BROWSERHAND_TOKEN |
(random) | Manual token for WebSocket auth and /debug endpoint |
BROWSERHAND_HOST=0.0.0.0 \
BROWSERHAND_API_KEY=your-secret-api-key \
BROWSERHAND_MODE=multi \
BROWSERHAND_TOKEN=your-extension-token \
npx tsx index.tsEach user installs the BrowserHand Chrome extension. There are two ways to authenticate:
After the user logs into your platform, the extension automatically reads the auth token from a cookie. Zero manual configuration for the user.
Your platform sets a cookie on login:
// Your platform's login handler
res.cookie("auth_token", jwt, { domain: "your-platform.com", httpOnly: false });The extension is pre-configured (or the user configures once) with:
- Relay URL:
ws://your-server.com:29981 - Token Cookie URL:
https://your-platform.com - Token Cookie Name:
auth_token - User ID:
user_123(or auto-derived from the token on the relay side)
The extension reads the cookie, extracts the token, and connects: ws://your-server.com:29981?token=<jwt>&userId=user_123
The user just logs in and it works. No token copy-pasting.
For simpler setups, the user pastes a token directly in the extension popup:
- Relay URL:
ws://your-server.com:29981 - Token:
your-extension-token
The cloud settings are hidden by default in the extension popup under "Cloud / Platform Settings".
The extension source code is fully open. You can fork and modify it to fit your platform:
- Hard-code the relay URL and cookie config so users don't need to configure anything
- Add your own branding (icons, popup UI)
- Pre-fill
userIdfrom your platform's login session - Distribute as your own Chrome extension on the Web Store
See extension/background.js — the resolveToken() and connect() functions are the entry points for auth customization.
Your agent sends commands with userId to target the right user's browser:
import requests
RELAY = "https://your-server.com:29981/command"
API_KEY = "your-secret-api-key"
def cmd(user_id, action, **params):
r = requests.post(RELAY,
headers={"Authorization": f"Bearer {API_KEY}"},
json={"action": action, "params": params, "userId": user_id})
return r.json()
# Control user_123's browser
cmd("user_123", "navigate", url="https://twitter.com")
cmd("user_123", "extract")
cmd("user_123", "click", index=5)User A's browser Your Server
Extension ──WebSocket──→ Relay (userId: user_a)
↑
User B's browser │ POST /command {"userId":"user_a"}
Extension ──WebSocket──→ Relay (userId: user_b)
↑
Your Agent
│ POST /command {"userId":"user_b"}
# Single-user mode
curl http://localhost:29981/status
# {"connected": true}
# Multi-user mode
curl http://your-server.com:29981/status
# {"connected": true, "users": ["user_a", "user_b"]}If your agent framework supports MCP (Model Context Protocol), you can use the BrowserHand MCP server instead of HTTP.
{
"mcpServers": {
"browserhand": {
"command": "npx",
"args": ["browserhand-mcp"]
}
}
}The MCP server auto-starts the relay and registers all 26 actions as MCP tools. Your LLM agent can discover and call them automatically.
| Action | Description | Params |
|---|---|---|
navigate |
Open a URL | url, groupLabel? |
extract |
Get all interactive elements | — |
click |
Click element by index | index |
type |
Type text into element | index, text |
scroll |
Scroll page | direction (up/down) |
screenshot |
Capture viewport (returns base64 JPEG) | — |
wait |
Wait N seconds | seconds? |
| Action | Description | Params |
|---|---|---|
goback |
Browser back | — |
goforward |
Browser forward | — |
refresh |
Reload page | — |
tab |
Manage tabs | tabAction (list/switch/close/create), tabId?, url? |
| Action | Description | Params |
|---|---|---|
select |
Choose dropdown option | index, value |
hover |
Mouse hover | index |
keypress |
Key combination | key, modifiers? (alt/ctrl/meta/shift) |
clear |
Clear input field | index |
upload |
Upload file | index, filePath |
dialog |
Handle alert/confirm/prompt | dialogAction (accept/dismiss), dialogText? |
| Action | Description | Params |
|---|---|---|
gettext |
Extract page text (max 10k chars) | — |
exec |
Execute JavaScript | script |
cookie |
Manage cookies | cookieAction (get/set/delete), cookieName, cookieValue? |
network |
Monitor network | networkAction (start/stop/get) |
| Action | Description | Params |
|---|---|---|
fullscreenshot |
Full-page screenshot | format? (png/jpeg) |
pdf |
Export page as PDF (returns base64) | — |
| Action | Description | Params |
|---|---|---|
scrollto |
Scroll to element | index |
waitfor |
Wait for CSS selector | selector, timeout? (ms, default 10000) |
All responses follow the same structure:
{
"v": "1.1",
"id": "cmd-...",
"success": true,
"data": {
"pageTitle": "...",
"pageUrl": "...",
"tabId": 42,
"elements": [...],
"screenshot": "data:image/jpeg;base64,...",
"error": "..."
}
}success: true— command executed, checkdatafor resultssuccess: false— command failed, checkdata.errorfor reasondata.error: "sensitive_page"— operation blocked by security policy
- Sensitive pages are automatically blocked: banking sites (PayPal, Chase, HSBC...), password managers (1Password, Bitwarden, LastPass...), and sensitive paths (/login, /password, /payment, /checkout)
- Tab grouping: pass
groupLabelin navigate/tab actions to visually group agent-operated tabs - Cloud mode: always set
BROWSERHAND_API_KEYto prevent unauthorized access to your relay - TLS: in production, put the relay behind a reverse proxy (nginx/Caddy) with TLS termination