Skip to content

Mhrnqaruni/mcp-playwright-browser

Repository files navigation

MCP Playwright Browser Server

A production-grade Model Context Protocol (MCP) server that gives AI assistants full browser control through Playwright — using a hybrid DOM + Accessibility Tree + Visual approach. Built for real-world agentic automation: job applications, web scraping, form filling, and complex multi-tab workflows.

v2.0 is a complete rewrite. The server grew from 680 lines and 23 tools to nearly 5,000 lines and 71 tools, with a modular architecture, token-optimized capture profiles, hard payload budgets, and a full test suite.


Table of Contents


What's New in v2.0

The Problem v1 Had

v1 was a working proof of concept. It could browse pages and extract jobs. But when used with Gemini CLI for real tasks — filling application forms, navigating multi-tab flows, handling downloads — it hit hard limits:

  • Token waste: Every tool response dumped everything it found. One browser.snapshot on a complex page could push 50KB+ into Gemini's context window in a single call, rapidly exhausting the budget.
  • No multi-tab support: If a link opened a new tab (very common in job applications), Gemini was stuck with no way to switch to it.
  • No form intelligence: Filling a form required manual click-by-click instructions. There was no way to ask "what fields are still empty?" or "fill all required fields."
  • Brittle DOM-only navigation: Shadow DOM, iframes, and obfuscated element IDs caused failures with no fallback.
  • No session persistence: Every run started fresh. Logging in again and again wasted time and triggered bot detection.
  • No safety rails: The AI could write files anywhere on disk, run arbitrary JS, or create its own automation scripts — unguarded.
  • Monolithic: One 680-line file with no tests.

What v2.0 Solves

Every one of those problems has a specific solution in v2.0:

Problem v2.0 Solution
Token waste Capture Profile System (light/balanced/full) + 280KB hard payload ceiling
Multi-tab stuck Page Manager with stable pageIds, browser.list_pages, browser.select_page
Dumb form filling browser.form_audit + browser.fill_form + Google Forms specialist tools
Shadow DOM / obfuscated IDs A11y tree via CDP Accessibility.getFullAXTree with stable ax- UIDs
Session loss Cookie export/import, browser.export_storage_state / browser.import_storage_state
No safety Path allowlist in src/security/paths.js, MCP_ALLOW_EVALUATE guard
Monolithic 10 focused modules in src/browser/ + src/security/ + 18-test suite

v1 vs v2 Comparison

Dimension v1.0 v2.0
Total MCP tools 23 71
Server size 680 lines, 1 file 4,966 lines, 11 modules
Token efficiency Uncontrolled dumps Capture profiles + 280KB hard ceiling
Multi-tab support Single tab only Full page manager (list, select, close)
Form automation Manual click-by-click form_audit + fill_form + Google Forms specialist
A11y / Shadow DOM DOM-only, brittle CDP Accessibility tree with stable UIDs
Scroll handling Saw first viewport only Scroll awareness + container scrolling
Session persistence None Cookie/storage export-import
Popup & dialog handling None Dialog accept/dismiss, popup pageId capture
Download management None Wait-for-download, save to path
File reading (CV/PDF) None files.read_text, files.read_pdf_text
Security No restrictions Allowlist-enforced read/write paths
Observability None Console log capture, network request log
Test coverage 2 tests 18 tests
Profiles 3 5 (+ persistent variants)
Batch scripts 5 .bat launchers 7 .bat launchers
Error handling Raw exceptions to AI Normalized, structured, budgeted

What stayed the same

  • Indeed job extractor (production-grade, multi-selector, deduplication)
  • Google search extractor (consent handling, URL deobfuscation)
  • Stealth mode (webdriver hiding, user agent spoofing)
  • CDP connection to real Chrome
  • Visual snapshot + coordinate-based clicking

How It Works

You / Gemini CLI
      │
      │ natural language prompt
      ▼
  Gemini CLI ──── loads MCP config ────► playwrightBrowser MCP server
                                               │
                              ┌────────────────┤
                              │                │
                         71 MCP Tools     Payload Budget
                         (browser.*)     (280KB ceiling)
                         (forms.*)       (capture profiles)
                         (files.*)       (retryWith hints)
                         (jobs.*)
                         (search.*)
                              │
                    ┌─────────┤──────────┐
                    │         │          │
               Playwright  CDP API   Security
               (browser)  (A11y,    (path
                          network,  allowlist)
                          clicks)
                    │
               Chrome / Chromium

The Capture Ladder

Every profile instructs Gemini to try tools in order, cheapest first:

1. browser.snapshot     → plain text summary       (cheapest, ~6KB in light mode)
2. browser.list         → interactive elements      (structured, ~8KB)
3. browser.query_dom    → targeted selector query   (focused, ~10KB)
4. browser.take_snapshot→ A11y tree with UIDs       (rich, only when uid-clicking needed)
5. browser.visual_snapshot → screenshot + bbox map  (most expensive, last resort)

Gemini only escalates to a more expensive tool when the cheaper one doesn't have what it needs. This is the core of why v2.0 uses far fewer tokens than v1.0.

The Payload Budget

Every single tool response passes through enforcePayloadCeiling() before being sent to Gemini:

  1. Measure response size in bytes
  2. If under 280KB → send as-is
  3. If over → progressively truncate: arrays shrink, strings truncate, fields drop
  4. Always include retryWith hints telling Gemini exactly what parameters to reduce next time
  5. Absolute floor: {truncated: true} — Gemini never gets a context-crashing response

Quick Start

# Clone
git clone https://github.com/Mhrnqaruni/mcp-playwright-browser.git
cd mcp-playwright-browser

# Install
npm install
npx playwright install chromium

# Run (interactive mode - chat with Gemini)
scripts\run-dom-headless.bat

# Run (one-shot automation)
scripts\run-dom-headless.bat -p "Go to https://example.com and extract the page title"

# Run with real Chrome (for logged-in sessions)
scripts\run-chrome-profile.bat --kill-chrome

Installation

Prerequisites

  • Node.js 18+
  • npm
  • Gemini CLI: npm install -g @google/gemini-cli then gemini auth login
  • Google Chrome (for CDP and chrome-profile modes)

Setup

1. Install dependencies

npm install
npx playwright install chromium

2. Configure the MCP server path

Edit .gemini/settings.json and set cwd to your repo location:

{
  "mcpServers": {
    "playwrightBrowser": {
      "command": "node",
      "args": ["src/mcp-browser-server.js"],
      "cwd": "C:/path/to/mcp-playwright-browser"
    }
  }
}

3. (Optional) Disable Chrome background apps

Prevents profile locking:

Chrome Settings → Advanced → System →
☐ Continue running background apps when Google Chrome is closed

4. Verify

scripts\run-dom-headless.bat -p "Use MCP server playwrightBrowser. Launch browser. Go to https://example.com. Take a snapshot. Close."

Profile Launchers

Each .bat file pre-configures everything (browser type, stealth, profile, environment variables) and starts Gemini with the right system instructions. You never need to configure Gemini manually.

Available Profiles

Script Browser Mode Best For
run-dom-headless.bat Chromium Headless ⚡ Bulk scraping, fastest
run-visual-headful.bat Chromium Visible + Screenshots Debugging, visual verification
run-chrome-profile.bat Real Chrome Your profile Logged-in sessions, form filling
run-cdp-profile.bat Real Chrome CDP Maximum stealth
run-cdp-profile-screen.bat Real Chrome CDP + Visual CDP with screenshot analysis
run-cdp-profile-persist.bat Real Chrome CDP + Persistent Long sessions, multi-step flows
run-cdp-profile-screen-persist.bat Real Chrome CDP + Visual + Persistent Full power mode

Interactive Mode (Chat)

# Start Gemini and chat with it
scripts\run-chrome-profile.bat --kill-chrome

# Then just type:
# "Fill out the job application at [URL] using my CV"
# "Go to LinkedIn and apply to the first 5 jobs"
# "Extract all AI engineer jobs from Indeed and save them"

One-Shot Mode (Automation)

# Run a task and get a log file
scripts\run-dom-headless.bat -p "Your full task here"

# With custom output
scripts\run-dom-headless.bat -p "Extract 50 jobs from Indeed" --output logs\jobs.log

# Chrome profile one-shot
scripts\run-chrome-profile.bat --kill-chrome -p "Submit application at [URL]" --output logs\apply.log

Logs are auto-saved to logs/ with timestamps.

Profile Details

run-dom-headless.bat — Fastest

  • Chromium headless (no GUI)
  • Best for: bulk extraction, scraping, background tasks
  • Token usage: lowest (no screenshots)

run-visual-headful.bat — Debugging

  • Chromium with visible window
  • Screenshot-based navigation available
  • Best for: troubleshooting, visual verification

run-chrome-profile.bat — Authenticated Sessions

  • Real Chrome with your existing logged-in profile
  • Already signed into Gmail, LinkedIn, job sites
  • Use --kill-chrome to free profile before starting
  • Best for: job applications, authenticated scraping

run-cdp-profile.bat — Maximum Stealth

  • Connects to real Chrome via Chrome DevTools Protocol
  • Hardest for sites to detect as automation
  • Best for: sites that block Playwright/Chromium
  • Auto-closes any existing Chrome using the profile before launch

run-cdp-profile-persist.bat — Long Sessions

  • CDP mode with persistent browser (doesn't close between tasks)
  • Best for: multi-step workflows where browser state must survive

All 71 MCP Tools

Capture Profile Control

Tool Description
browser.set_capture_profile Set light / balanced / full profile. Controls token usage across all tools. Call this first.
browser.get_capture_profile Show current profile settings and payload budget.

Browser Lifecycle

Tool Description
browser.launch Launch Chromium with options: headless, stealth, userDataDir, profileDirectory, channel, slowMo, args
browser.launch_chrome_cdp Launch real Chrome with remote debugging + connect in one step
browser.connect_cdp Connect to existing Chrome with --remote-debugging-port
browser.close Close browser session
browser.reload Reload current page

Multi-Tab Management

Tool Description
browser.new_page Open new tab, tracked by page manager
browser.list_pages List all open tabs with pageId, url, title, active/closed state
browser.select_page Switch active tab by pageId
browser.close_page Close a specific tab by pageId
browser.list_frames List all iframes on the current page

Navigation

Tool Description
browser.goto Navigate to URL with configurable waitUntil and timeout
browser.back Go back in history
browser.forward Go forward in history
browser.wait Wait for selector or fixed ms
browser.wait_for Smart wait: selector, text, or uid (A11y)

Event & Dialog Handling

Tool Description
browser.list_dialogs List pending JS dialogs (alert, confirm, prompt)
browser.handle_dialog Accept or dismiss a dialog, optionally with input text
browser.wait_for_download Block until a download starts, returns downloadId
browser.save_download Save a captured download to a specific path
browser.wait_for_popup Wait for a new tab/popup to open, returns its pageId
browser.expect_event Listen for a one-time event: dialog, download, navigation, request, response

Session & Cookie Management

Tool Description
browser.get_cookies List cookies, optionally filtered by URL
browser.set_cookies Inject cookies into browser session
browser.clear_cookies Clear all or URL-specific cookies
browser.export_storage_state Export full session state (cookies + localStorage) to JSON file
browser.import_storage_state Restore session from previously exported JSON

Scroll Control

Tool Description
browser.get_scroll_state Returns scrollY, scrollHeight, atTop, atBottom, viewport info
browser.scroll_by Scroll page by delta pixels (vertical + horizontal)
browser.scroll_to Scroll to absolute position
browser.get_scrollables Detect all scrollable containers on the page
browser.get_container_scroll_state Scroll metrics for a specific container selector
browser.scroll_container Scroll a specific container by selector

Page Reading & Snapshots

Tool Description
browser.snapshot Plain text page summary: title, text, links, optional headings + forms summary
browser.take_snapshot A11y tree via CDP: roles, names, UIDs (ax-{nodeId}), depth, state
browser.query_dom Flexible selector query: text, value, bbox, visibility, state, tagName
browser.evaluate Execute JavaScript (requires MCP_ALLOW_EVALUATE=true, origin-gated)

Element Interaction

Tool Description
browser.list List visible interactive elements with elementId, tag, text, href
browser.click Click by elementId, uid, selector, or text
browser.hover Hover over element (triggers dropdown menus, tooltips)
browser.type Simulate keypress-by-keypress typing
browser.fill Direct value fill (faster, no keypress simulation)
browser.press Press keyboard key (Enter, Tab, Escape, etc.)
browser.set_input_files Upload file to input[type=file]
browser.scroll_to_uid Scroll a UID element into view

Visual Navigation

Tool Description
browser.screenshot Save screenshot to path
browser.visual_snapshot Screenshot + element map with bounding boxes and IDs
browser.click_at Click at viewport-relative X/Y coordinates
browser.click_at_page Click at document-absolute X/Y coordinates

Data Extraction

Tool Description
browser.extract_text Extract text from CSS selector (single or all matches)
browser.extract_html Extract outerHTML from selector

Form Automation

Tool Description
browser.form_audit Scan page for all unfilled required fields: text, select, radio, checkbox, contenteditable
browser.fill_form Fill a list of {label, selector, value, kind} fields — label-driven or selector-driven
forms.google_audit Google Forms specialist: list all questions and check aria-checked for answers
forms.google_set_text Fill a Google Forms text question by question text
forms.google_set_dropdown Select option in Google Forms dropdown
forms.google_set_checkbox Check/uncheck Google Forms checkbox
forms.google_set_radio Select option in Google Forms radio group
forms.google_set_grid Select option in Google Forms grid question

Observability

Tool Description
browser.list_console_messages Show captured console.log/warn/error from the page
browser.list_network_requests Show all network requests (URL, method, status, timing)
browser.get_network_request Get full details for a specific request by ID

File Operations

Tool Description
files.read_text Read text file (restricted to allowed paths)
files.read_pdf_text Extract text from PDF — used to read CV files
files.list_dir List directory contents
files.write_text Write text to file (restricted to output/ and logs/)

Specialized Extractors (Production Examples)

Tool Description
jobs.extract_indeed Extract Indeed job listings with multi-selector fallbacks, deduplication, access detection
jobs.indeed_next_page Navigate to next Indeed page (direct URL, click, or auto mode)
search.google Open Google search and extract results with consent handling
search.extract_google Extract results from current Google search page

Architecture

Module Structure

src/
├── mcp-browser-server.js      # Main server: tool registration, env config, middleware
├── extractors.js              # Indeed + Google specialized extractors
├── browser/
│   ├── pages.js               # Multi-tab page manager (stable pageIds)
│   ├── snapshot.js            # A11y tree via CDP Accessibility.getFullAXTree
│   ├── capture-profiles.js    # light/balanced/full × low/high = 30 preset configs
│   ├── payload-budget.js      # Hard 280KB response ceiling with graceful truncation
│   ├── cdp.js                 # CDP session, click/hover/scroll by backendNodeId
│   ├── dom-version.js         # DOM mutation tracking, frame management
│   ├── forms.js               # Form audit + intelligent form fill
│   ├── observability.js       # Console + network request capture via CDP
│   └── wait.js                # Smart wait: selector, text, uid
└── security/
    └── paths.js               # Read/write path allowlist enforcement

Tool Registration Middleware

Every tool goes through a wrapper that runs before and after the handler:

AI calls tool
      │
      ▼
assign requestId
      │
      ▼
run handler
      │
      ▼
normalize errors (structured, no stack traces)
      │
      ▼
add envelope (ok, requestId, timestamp, url, domVersion)
      │
      ▼
enforcePayloadCeiling (truncate if > 280KB)
      │
      ▼
send to AI

This means every tool automatically benefits from error safety and payload budgeting without any extra code per tool.

UID System

The A11y snapshot (browser.take_snapshot) assigns every node a stable UID in the format ax-{nodeId}, tied to the CDP backendDOMNodeId. This UID can then be used with:

  • browser.click({ uid: "ax-123" }) — clicks via CDP directly on the backend node
  • browser.scroll_to_uid({ uid: "ax-123" }) — scrolls it into view first
  • browser.wait_for({ uid: "ax-123" }) — waits until it's visible

CDP-native clicks are more reliable than selector-based clicks because they bypass CSS selector resolution and work even in Shadow DOM.


Token Efficiency: Capture Profiles

This is the most important v2.0 feature for real-world use.

The Problem

AI context windows are finite. Every tool response consumes tokens. A naive implementation that dumps everything on every call quickly exhausts the budget.

The Solution: Three Profiles

Set the profile once at session start, and every subsequent tool call automatically uses appropriate limits:

browser.set_capture_profile({ profile: "light" })
Profile Snapshot chars List items A11y nodes Best For
light 6,000–9,000 120–180 220–320 Job scraping, bulk tasks
balanced 12,000–16,000 240–320 440–700 Form filling, research
full 20,000 500 1,200–2,000 Deep debugging only

Two Detail Levels Per Profile

Within each profile, tools accept detail: "low" or detail: "high":

browser.snapshot({ detail: "low" })   # minimal, fast
browser.snapshot({ detail: "high" })  # more text, links, headings, form summary

The Capture Ladder in Practice

The profile system instructions teach Gemini to escalate only when needed:

✅ "I need to find the Apply button"
→ browser.snapshot (low)           # did I find it in plain text? usually yes
→ browser.list (low)               # still looking? check interactive elements
→ browser.take_snapshot (low)      # need uid for reliable click? A11y tree
→ browser.visual_snapshot (low)    # shadow DOM / can't find it at all? visual fallback

In light mode, this entire ladder costs roughly 8x fewer tokens than v1.0's single dump approach.

Hard Payload Budget

Even with capture profiles, some pages are just huge. The payload budget is a safety net:

  • Default ceiling: 280KB per response
  • If exceeded: truncate progressively (arrays → strings → object keys)
  • Include retryWith field: { detail: "low", maxItems: 80, limit: 20 }
  • Gemini reads this and retries with smaller parameters
  • Absolute fallback: { truncated: true, truncationReason: "..." }

The budget is configurable: MCP_MAX_RESPONSE_BYTES=150000 for tighter contexts.


Common Use Cases

Job Application (Chrome Profile)

# Start with your real logged-in Chrome
scripts\run-chrome-profile.bat --kill-chrome

In Gemini:

Set capture profile to light.
Go to [application URL].
Run form_audit to see all required fields.
Fill them using fill_form with my details from Applied Jobs/CODEX/maincv.md.
Before submitting, take a screenshot and ask me to confirm.

Bulk Job Scraping (Headless)

scripts\run-dom-headless.bat -p "Use playwrightBrowser. Launch browser headless. Go to https://ae.indeed.com/q-ai-engineer-l-dubai-jobs.html. Extract jobs with jobs.extract_indeed limit 20, save to output/indeed/page-1. Go to next page with jobs.indeed_next_page. Extract again, save to output/indeed/page-2. Close."

Session Persistence (Login Once, Reuse)

# First time: login manually and export session
scripts\run-cdp-profile.bat

In Gemini:

Go to linkedin.com and wait for me to log in.
After I confirm logged in, run browser.export_storage_state to output/linkedin-session.json.

Next time:

Run browser.import_storage_state from output/linkedin-session.json.
Go to linkedin.com — should be logged in already.

Google Form Automation

scripts\run-dom-headless.bat

In Gemini:

Go to [Google Form URL].
Run forms.google_audit to see all questions.
Fill each question using the appropriate forms.google_set_* tool.
Run forms.google_audit again to verify all answered.
Submit.

PDF CV Reading

Gemini can read your CV directly without you pasting it:

Read my CV from Applied Jobs/CODEX/maincv.md using files.read_text.
Or read the PDF version: files.read_pdf_text from Applied Jobs/CODEX/CV.pdf.
Use that information to fill the job application form.

Debugging with Visual Mode

scripts\run-visual-headful.bat

In Gemini:

Go to [URL].
Take a visual_snapshot and save to output/debug.png.
Tell me what you see and identify any unusual elements.

Environment Variables

All variables have dual names for Gemini CLI compatibility. The launchers set both:

Variable Alias Description
MCP_HEADLESS GEMINI_CLI_MCP_HEADLESS true/false — run without GUI
MCP_STEALTH GEMINI_CLI_MCP_STEALTH true/false — enable anti-detection
MCP_CHANNEL GEMINI_CLI_MCP_CHANNEL chrome — use real Chrome
MCP_EXECUTABLE_PATH GEMINI_CLI_MCP_EXECUTABLE_PATH Absolute path to chrome.exe
MCP_USER_DATA_DIR GEMINI_CLI_MCP_USER_DATA_DIR Chrome profile directory
MCP_PROFILE GEMINI_CLI_MCP_PROFILE Profile name: Default, Profile 3
MCP_CDP_ENDPOINT GEMINI_CLI_MCP_CDP_ENDPOINT CDP URL: http://127.0.0.1:9222
MCP_CDP_PORT GEMINI_CLI_MCP_CDP_PORT CDP port number (default 9222)
MCP_CDP_AUTO_CLOSE GEMINI_CLI_MCP_CDP_AUTO_CLOSE Close Chrome on server exit
MCP_FORCE_CDP GEMINI_CLI_MCP_FORCE_CDP Disable browser.launch (CDP-only mode)
MCP_REQUIRE_PROFILE GEMINI_CLI_MCP_REQUIRE_PROFILE Require userDataDir (prevent bare Chromium)
MCP_ALLOW_EVALUATE GEMINI_CLI_MCP_ALLOW_EVALUATE Enable browser.evaluate tool
MCP_EVALUATE_ALLOW_ORIGINS GEMINI_CLI_MCP_EVALUATE_ALLOW_ORIGINS Comma-separated allowed origins for evaluate
MCP_CAPTURE_PROFILE GEMINI_CLI_MCP_CAPTURE_PROFILE Default profile: light, balanced, full
MCP_MAX_RESPONSE_BYTES GEMINI_CLI_MCP_MAX_RESPONSE_BYTES Override 280KB payload ceiling
MCP_SLOWMO_MS GEMINI_CLI_MCP_SLOWMO_MS Slow down actions by N ms (debugging)

Why dual names? Gemini CLI sanitizes environment variables and may strip MCP_* prefixed keys. The GEMINI_CLI_MCP_* variants bypass this filtering. The server reads both and uses whichever is set.


Project Structure

mcp-playwright-browser/
│
├── src/
│   ├── mcp-browser-server.js        # Main server (71 tools, middleware, env config)
│   ├── extractors.js                # Indeed + Google production extractors
│   ├── browser/
│   │   ├── pages.js                 # Multi-tab page manager
│   │   ├── snapshot.js              # A11y tree (CDP Accessibility API)
│   │   ├── capture-profiles.js      # Token budget profiles (light/balanced/full)
│   │   ├── payload-budget.js        # Hard response size ceiling
│   │   ├── cdp.js                   # CDP primitives (click, hover, scroll by nodeId)
│   │   ├── dom-version.js           # DOM mutation tracking + frame management
│   │   ├── forms.js                 # Form audit + intelligent fill
│   │   ├── observability.js         # Console + network capture
│   │   └── wait.js                  # Smart wait (selector, text, uid)
│   ├── security/
│   │   └── paths.js                 # File read/write path allowlist
│   └── tests/
│       ├── page-manager-test.js
│       ├── security-paths-test.js
│       ├── snapshot-uid-test.js
│       ├── uid-click-fill-test.js
│       ├── elementid-no-stale-test.js
│       ├── wait-for-test.js
│       ├── form-audit-fill-test.js
│       ├── console-network-test.js
│       ├── visual-coords-test.js
│       ├── frame-domversion-test.js
│       ├── cdp-hover-test.js
│       ├── browser-events-test.js
│       ├── storage-state-test.js
│       ├── capture-profiles-test.js
│       ├── payload-budget-test.js
│       ├── google-form-test.js
│       ├── google-test.js
│       └── indeed-test.js
│
├── scripts/
│   ├── run-dom-headless.bat          # Fastest: headless Chromium
│   ├── run-visual-headful.bat        # Visual: Chromium + screenshots
│   ├── run-chrome-profile.bat        # Auth: real Chrome with your profile
│   ├── run-cdp-profile.bat           # Stealth: CDP mode
│   ├── run-cdp-profile-screen.bat    # Stealth + visual
│   ├── run-cdp-profile-persist.bat   # Stealth + persistent session
│   ├── run-cdp-profile-screen-persist.bat  # Full power
│   ├── autoconnect.js                # CDP auto-connect helper
│   └── .gemini/settings.json         # Fallback MCP config
│
├── profiles/
│   ├── dom/
│   │   ├── system.md                 # Gemini system instructions (DOM mode)
│   │   └── oneshot.md                # One-shot variant (closes browser at end)
│   ├── visual/
│   │   ├── system.md
│   │   └── oneshot.md
│   ├── cdp/
│   │   ├── system.md
│   │   ├── oneshot.md
│   │   └── persistent.md
│   └── cdp-visual/
│       ├── system.md
│       ├── oneshot.md
│       └── persistent.md
│
├── .gemini/settings.json             # Main MCP config (set your cwd here)
├── GEMINI.md                         # Project-level Gemini instructions
├── LICENSE                           # ISC License
└── README.md

Running Tests

# All tests that don't need network
npm run test:local

# Live network tests (Indeed + Google)
npm run test:remote

# Everything
npm run test:all

Troubleshooting

"Chrome is already running" / Profile locked

# Use --kill-chrome
scripts\run-chrome-profile.bat --kill-chrome

# Or manually
taskkill /F /IM chrome.exe

Chrome 136+ blocks automation on the default User Data directory. Always use a dedicated profile or the ChromeForMCP data dir.

"Gmail says browser is not safe"

You're connected via Chromium, not your real Chrome. Ensure:

  1. Chrome is fully closed before starting (--kill-chrome)
  2. The launch response shows "persistent": true and your profile path
  3. If not, restart Gemini and verify .bat outputs Using Chrome executable: ...

MCP tools not found in Gemini

  • Run any .bat from any directory — they auto-fix cwd
  • Verify .gemini/settings.json has the correct cwd
  • The scripts/.gemini/settings.json is a fallback if Gemini starts in scripts/

Responses truncated / retryWith hint

This is the payload budget working correctly. Gemini will read the retryWith hint and retry with lower parameters. If it keeps happening, switch to light profile:

browser.set_capture_profile({ profile: "light" })

Slow performance

  • Use run-dom-headless.bat for bulk operations (no GUI = 3-4x faster)
  • Avoid browser.extract_html — it returns full HTML and wastes tokens
  • Use detail: "low" on all tools unless you specifically need more

Browser opens but ignores my profile

Check .bat output for:

Using Chrome executable: C:\Program Files\Google\Chrome\Application\chrome.exe
Using Chrome profile: Profile 3

If you see a different profile or "not found", edit the .bat and set MCP_PROFILE explicitly.


Security & Privacy

Path Restrictions

browser.evaluate (arbitrary JS execution) is disabled by default. Enable it only explicitly: MCP_ALLOW_EVALUATE=true

files.read_text and files.write_text are restricted to:

  • Read: Applied Jobs/, Auto/output/, Auto/logs/
  • Write: Auto/output/, Auto/logs/

Any attempt to read or write outside these paths throws immediately. Symlinks are resolved before checking (prevents traversal attacks).

What Is Stored

Data Location Git-ignored
Execution logs logs/ ✅ Yes
Extracted jobs/data output/ ✅ Yes
Session state exports output/ ✅ Yes
Gemini CLI state scripts/.gemini/state.json ✅ Yes
.gemini/ config root .gemini/ ✅ Yes

What Is Never Stored

  • ❌ Passwords or credentials
  • ❌ Credit card or payment information
  • ❌ Browser history
  • ❌ Personal documents outside the allowed paths

Ethical Use

This tool is provided for:

  • Learning browser automation and MCP development
  • Testing your own web applications
  • Automating tasks on sites you have permission to access
  • Legitimate job searching and application workflows

You are responsible for:

  • Respecting robots.txt and website Terms of Service
  • Complying with data protection regulations (GDPR, CCPA, etc.)
  • Rate-limiting your requests to avoid service disruption
  • Not using this to bypass paywalls or access controls without authorization

The authors assume no liability for misuse. Use responsibly.


How This Differs from Microsoft's Official playwright-mcp

Microsoft's playwright-mcp focuses on accessibility-tree based automation for test development in structured environments.

Feature Microsoft playwright-mcp This project
Navigation Accessibility tree Hybrid: DOM + A11y + Visual
Philosophy "Blind" automation (fast, structured) Human-like automation (robust, adaptive)
Primary use case QA testing, defined workflows Open-web agents, scraping, complex UIs
Token efficiency Not optimized Capture profiles + hard payload budget
Session persistence Basic Cookie/storage export-import
Form intelligence Manual form_audit + fill_form + Google Forms specialist
Multi-tab Basic Full page manager with stable pageIds
Setup Generic Batteries included (stealth, profiles, launchers)

Use Microsoft's for: CI/CD test automation, structured accessibility-driven workflows Use this for: Autonomous agents operating on the open web, job application automation, anti-detection scraping


Changelog

v2.0.0 (Current)

  • Complete architectural rewrite: monolithic → 11 modular files
  • 71 MCP tools (was 23)
  • Capture profile system (light/balanced/full) for token efficiency
  • Hard 280KB payload budget with graceful truncation and retryWith hints
  • Multi-tab page manager (list, select, close pages)
  • A11y tree snapshots via CDP with stable ax- UIDs
  • CDP-native click/hover/scroll by backendDOMNodeId (handles Shadow DOM)
  • Form audit + intelligent fill + Google Forms specialist (6 tools)
  • Session export/import (cookie + localStorage persistence)
  • Popup, dialog, download event handling
  • Scroll awareness: get state, scroll by delta, scroll containers
  • Network + console observability via CDP
  • File reading: text files + PDF extraction
  • Security: path allowlist enforcement, evaluate guard
  • 18-test suite (was 2)
  • 7 profile launchers (was 5): added persist variants for CDP
  • GEMINI_CLI_MCP_* dual env var support for Gemini sanitization

v1.1.0

  • Profile launcher system (.bat files)
  • Chrome profile integration
  • --kill-chrome flag
  • One-shot mode with automatic logging
  • GEMINI_CLI_MCP_* environment variable aliases
  • browser.visual_snapshot and browser.click_at

v1.0.0

  • Initial release
  • Basic MCP server with Playwright
  • Indeed + Google extractors
  • DOM and visual navigation

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/your-feature)
  3. Run npm run test:local to verify nothing breaks
  4. Commit (git commit -m 'Add your feature')
  5. Push and open a Pull Request

License

ISC License — see LICENSE file.


Acknowledgments


Support

About

Production-grade MCP server for browser automation. 71 tools, modular architecture, token-optimized capture profiles, A11y tree, multi-tab, form automation, session persistence.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors