Browser automation via the accessibility tree + keyboard navigation.
WCAG is all you need.
Tabi replaces the screenshot → vision-model → mouse-click loop with a text-only loop: read the browser's a11y tree, ask a small LLM for the next keystroke, dispatch it, repeat. Typical cost is ~$0.01 per 10-step task — roughly 30x cheaper than vision-based agents, and 3-5x faster.
npm install
npx playwright install chromium
cp .env.example .env # set ANTHROPIC_API_KEYnode src/index.js "Search Google for 'best flights to Tokyo'" \
--url https://google.com \
--headed --verboseFlags: --url <url> (required), --headed, --verbose, --max-steps <n>,
--model <id>, --json.
import { Tabi } from './src/agent.js';
const agent = new Tabi({
apiKey: process.env.ANTHROPIC_API_KEY,
headless: false,
verbose: true,
});
const result = await agent.run(
"Search for flights from London to Tokyo on 15 May 2026, for 2 passengers",
"https://www.google.com/travel/flights"
);result shape:
{
success: true,
summary: "…",
history: [ /* executed actions */ ],
steps: 8,
usage: { inputTokens, outputTokens, cacheCreationInputTokens, cacheReadInputTokens }
}Accessibility Tree → Haiku → Keyboard Dispatch → Accessibility Tree → …
- Perceiver (
src/perceiver.js) — flattenspage.accessibility.snapshot()into a table of interactive elements (role, name, value, focused). - Planner (
src/planner.js) — sends the goal + table to Claude Haiku with the system prompt cached; returns a single JSON action. - Executor (
src/executor.js) — dispatches the action as native keyboard events (Tab,Shift+Tab,Enter, typing, arrows, etc.). - Agent (
src/agent.js) — the perceive/plan/execute loop with stuck detection, retry logic, and domain guards.
tab_to, type, clear_and_type, press_enter, press_escape,
press_space, arrow, select, navigate, done, fail.
See DEFAULT_CONFIG in src/agent.js for all tunables (viewport, model,
retries, delays, allowed/blocked domains, element limits).
File uploads, drag-and-drop, canvas/WebGL content, CAPTCHA, authentication (handle separately via cookies/session), multi-tab workflows, iframe-heavy SPAs.
MIT.