Multi-engine accessibility scans that survive real crawls.
a11y-catscan crawls a website with Playwright and runs four accessibility engines — axe-core, Siteimprove Alfa, IBM Equal Access, and HTML_CodeSniffer — sharing one Chromium instance. Findings are deduped across engines, streamed to JSONL/HTML/JSON reports, and exposed as MCP tools so an LLM can analyze them directly.
Status: beta. Production-shaped, exercising in dev; recovery
cycle and worker pool work end-to-end on multi-thousand-page
authenticated crawls. Architecture and per-module design notes
live in DESIGN.md. Site handbook is rendered to
GitHub Pages from docs-src/; see the documentation index
below.
- Four scan engines. axe-core (Deque), Siteimprove Alfa
(ACT-rules native), IBM Equal Access, HTML_CodeSniffer. Run
one or combine them —
--engine axe,alfa,ibm,htmlcs— all sharing one Chromium so a multi-engine scan isn't 4× the page loads. Each finding carries anengineattribution. - Cross-engine dedup. Findings sharing
(selector, primary-tag, outcome)collapse into one entry withengines: {axe: ..., ibm: ...}and per-engine impact upgraded to the worst severity. EARL outcomes (failed/cantTell/passed/inapplicable) are the internal vocabulary. - Streaming reports. JSONL is written one page per line so memory stays flat across 5000-page crawls; HTML and the LLM-friendly markdown summary stream from disk on demand.
- Sliding-window async crawler. N-worker pool with one
Chromium, periodic browser restart for memory hygiene
(
restart_every), atomic state save (--resume), graceful shutdown on SIGTERM/SIGINT, on-demand snapshot via SIGUSR1. - Authenticated scans with mid-scan session recovery. A Python login plugin authenticates once, the saved session state shortcuts subsequent starts, and if the session expires mid-crawl the scanner drains workers, re-logs-in, bans detected logout-trap URLs, and resumes. Persistent re-login failure trips a circuit breaker so the crawl exits instead of looping.
- Allowlist with engine + outcome filters. YAML allowlist suppresses known-acceptable findings by rule, URL, target, engine, and outcome — all AND'd. O(1) average lookup via a rule-id index.
- MCP server.
--mcpexposesscan_page/analyze_report/find_issues/check_page/compare_scans/manage_scans/lookup_wcag/list_enginesas Claude Code tools. URL-scheme validated to http(s). - Diff and rescan workflows.
--diff PREV.jsonlshows fixed/new/remaining findings;--rescan PREV.jsonlre-scans only pages that previously had issues;--violations-from/--incompletes-fromextract specific URL sets from prior reports. - Group-by analysis.
--group-by {rule, selector, color, reason, wcag, level, engine, bp}prints a sorted summary with per-group page counts and one example. - Niceness + OOM-resistance. Defaults to
nice 10andoom_score_adj=1000so the scanner doesn't starve production services on shared hosts.
Requires Python 3.12 and Node.js 18+.
pip install -e . # installs playwright, pyyaml, mcp
playwright install chromium
npm install # bundles the four enginesScan one URL:
./a11y-catscan.py --page https://example.com/Crawl with all four engines, write LLM-friendly report:
./a11y-catscan.py --engine all --max-pages 500 --llm \
https://example.com/Compare against last week's baseline:
./a11y-catscan.py --diff baseline.jsonl --max-pages 500 \
https://example.com/Full setup walkthrough in docs-src/getting-started.md.
Site handbook (rendered to hubzero.github.io/a11y-catscan from these sources):
| Topic | Source |
|---|---|
| Getting started — install, first scan, exit codes | docs-src/getting-started.md |
| Configuration — every YAML setting + CLI override | docs-src/configuration.md |
| Scan workflows — crawl, page, urls, rescan, diff, resume | docs-src/scan-workflows.md |
| Reports — JSON, JSONL, HTML, LLM markdown formats | docs-src/reports.md |
| Authentication — login plugin, session recovery, logout traps | docs-src/authentication.md |
| MCP server — tool surface for Claude Code | docs-src/mcp.md |
| Troubleshooting | docs-src/troubleshooting.md |
| FAQ | docs-src/faq.md |
Internal references:
- DESIGN.md — current-state design specification
- CHANGELOG.md — date-organized log of changes
| Engine | Flag | Type | License |
|---|---|---|---|
| axe-core (Deque) | --engine axe |
Browser injection (default) | MPL-2.0 |
| Siteimprove Alfa | --engine alfa |
Node.js subprocess via CDP | MIT |
| IBM Equal Access | --engine ibm |
Browser injection | Apache-2.0 |
| HTML_CodeSniffer | --engine htmlcs |
Browser injection | BSD-3 |
--engine all runs all four; engines that aren't listed are
skipped. axe-core, IBM, and HTML_CodeSniffer inject JavaScript
into the live page and run in-browser. Alfa's TypeScript engine
runs as a Node.js subprocess and connects to the shared Chromium
via CDP — no second page load.
The full test suite runs against the bundled fixtures:
pip install -e '.[dev]'
pytest # 368 tests, ~70s with browser
pytest -m "not browser" # 285 fast tests, <10sCoverage is configured in pyproject.toml; see
tests/ for the layout (test_engine_normalizers.py,
test_crawl_loop.py, test_mcp_tools.py, etc.).
MIT. See LICENSE.
Engine licenses: axe-core (MPL-2.0), Siteimprove Alfa (MIT), IBM Equal Access (Apache-2.0), HTML_CodeSniffer (BSD-3). The four engines are vendored via npm and ship under their own licenses; this repo wraps them.