Skip to content

feat(mcp): add compress option to browser_snapshot to collapse repeated ARIA nodes #41395

Description

@Josef-Le

Problem

On pages with large lists, data grids, or autocomplete menus, the ARIA snapshot returned by browser_snapshot can grow to thousands of lines. A GitHub issues page with 100 open issues produces ~1 800 lines; a spreadsheet with 500 rows produces ~6 000 lines. The vast majority of that content is structurally identical — only the text and element ref differ — so the model learns nothing from seeing item #51 through #500.

This wastes tokens, slows inference, and can cause the model to miss important interactive elements buried in the noise.

Proposed solution

Add a compress?: boolean parameter to browser_snapshot.

When true, a two-pass algorithm collapses repeated sibling nodes:

  1. Pre-scan (safety gate): Normalise every line to a structural signature (strip [ref=eN], accessible names, and numbers) and count how often each (indent, signature) pair appears. Only proceed if the maximum count exceeds 100 — this prevents false positives on diverse pages like dashboards.
  2. Compression pass: Walk the YAML line by line. Keep the first 10 occurrences of any repeated structural pattern; collapse the rest (along with their descendant subtrees). Elements with interactive roles (button, input, link, checkbox, …) are always kept.
  3. Trailing note: Append a line explaining what was removed and directing the model to use browser_evaluate() for full enumeration.

Example

A page with 150 <li>Item N</li> elements:

Before (152 lines):

- list [ref=e2]:
  - listitem [ref=e3]: Item 1
  - listitem [ref=e4]: Item 2
  ...
  - listitem [ref=e152]: Item 150

After (compress: true, 13 lines):

- list [ref=e2]:
  - listitem [ref=e3]: Item 1
  - listitem [ref=e4]: Item 2
  ...
  - listitem [ref=e12]: Item 10

[playwright-compress: 140 repeated ARIA nodes collapsed — use browser_evaluate() to enumerate the full list]

Implementation plan

  • packages/playwright-core/src/tools/backend/ariaCompression.ts — pure compression function
  • packages/playwright-core/src/tools/backend/snapshot.ts — add compress to browser_snapshot input schema (zod)
  • packages/playwright-core/src/tools/backend/response.ts — wire compress through setIncludeFullSnapshot() to the injection point
  • tests/mcp/snapshot-compression.spec.ts — 4 tests covering: happy path, passthrough on small lists, interactive-element preservation, compress: false escape hatch

Design decisions

Decision Rationale
FIRE_THRESHOLD = 100 A typical nav/sidebar has <20 items; a real pagination list has >100. This prevents false positives on diverse pages.
KEEP_N = 10 Enough examples for the model to understand the list structure without excessive verbosity.
Always keep interactive roles Buttons, inputs, links, checkboxes, etc. carry distinct meaning even when visually identical.
Explicit opt-in (compress: true) The model chooses when compression is appropriate; it is never forced on.
Trailing note Tells the model that data was removed and provides a concrete action (browser_evaluate()) to retrieve it.

I intend to work on this and have a draft implementation ready.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions