Skip to content

MikeSquared-Agency/TABI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tabi

Browser automation via the accessibility tree + keyboard navigation.

WCAG is all you need.

Tabi replaces the screenshot → vision-model → mouse-click loop with a text-only loop: read the browser's a11y tree, ask a small LLM for the next keystroke, dispatch it, repeat. Typical cost is ~$0.01 per 10-step task — roughly 30x cheaper than vision-based agents, and 3-5x faster.

Install

npm install
npx playwright install chromium
cp .env.example .env   # set ANTHROPIC_API_KEY

CLI

node src/index.js "Search Google for 'best flights to Tokyo'" \
  --url https://google.com \
  --headed --verbose

Flags: --url <url> (required), --headed, --verbose, --max-steps <n>, --model <id>, --json.

Programmatic

import { Tabi } from './src/agent.js';

const agent = new Tabi({
  apiKey: process.env.ANTHROPIC_API_KEY,
  headless: false,
  verbose: true,
});

const result = await agent.run(
  "Search for flights from London to Tokyo on 15 May 2026, for 2 passengers",
  "https://www.google.com/travel/flights"
);

result shape:

{
  success: true,
  summary: "…",
  history: [ /* executed actions */ ],
  steps: 8,
  usage: { inputTokens, outputTokens, cacheCreationInputTokens, cacheReadInputTokens }
}

How it works

Accessibility Tree → Haiku → Keyboard Dispatch → Accessibility Tree → …
  • Perceiver (src/perceiver.js) — flattens page.accessibility.snapshot() into a table of interactive elements (role, name, value, focused).
  • Planner (src/planner.js) — sends the goal + table to Claude Haiku with the system prompt cached; returns a single JSON action.
  • Executor (src/executor.js) — dispatches the action as native keyboard events (Tab, Shift+Tab, Enter, typing, arrows, etc.).
  • Agent (src/agent.js) — the perceive/plan/execute loop with stuck detection, retry logic, and domain guards.

Actions

tab_to, type, clear_and_type, press_enter, press_escape, press_space, arrow, select, navigate, done, fail.

Config

See DEFAULT_CONFIG in src/agent.js for all tunables (viewport, model, retries, delays, allowed/blocked domains, element limits).

Out of scope (v1)

File uploads, drag-and-drop, canvas/WebGL content, CAPTCHA, authentication (handle separately via cookies/session), multi-tab workflows, iframe-heavy SPAs.

License

MIT.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors