Skip to content

kb223/consent-engine

Repository files navigation

consent-engine

Forensic engine that compares cookie + tag enforcement against user consent preferences. Built for enterprises facing privacy-litigation demand letters.

CI Python 3.12+ MIT License

Scans any web page with consent pre-set to reject all (S3 forensic methodology), captures every network request, then asks seven questions:

  1. What fires pre-consent (on landing)?
  2. What fires post-accept?
  3. What fires post-reject?
  4. Is GPC (Global Privacy Control) being honored?
  5. Is Consent Mode (Basic or Advanced) wired correctly?
  6. Which current enforcement patterns do the observed facts map to?
  7. Which tracking technologies need inventory or contract review?

Returns a structured audit result, an HTML report, an executive summary, and a client-ready Marp slide deck. The JSON, report, and deck include a deterministic enforcement-pattern map and tracking-technology inventory when the scan produces enough evidence to build them.

Why "engine" not "agent"

The audit is deterministic. Decisions are made at build time, not at runtime. The LLM writes the executive summary; everything else is code. That distinction is what makes the output legally defensible instead of plausibly correct.

Agentic Deterministic
When decisions are made At runtime At build time
Behavior Probabilistic, flexible Reproducible
Spec Implicit Explicit
Testability Hard to test, hard to prove Easy to test, debug, verify

The eight-tool pipeline below is all deterministic.

Architecture

        ┌──────────────────────────────────────────────────────────┐
        │                  POST /audit  { url }                    │
        └─────────────────────────────┬────────────────────────────┘
                                      ▼
   ┌──────────────────────────────────────────────────────────────────┐
   │ tool_01  GTM container parser (JSON / live network interception) │
   │ tool_02  Violation classifier (S2 inconclusive vs S3 definitive) │
   │ tool_03  Playwright browser scanner (consent pre-set)            │
   │ tool_04  HAR analyzer                                            │
   │ tool_05  Vendor library lookup (custom + Open Cookie DB)         │
   │ tool_06  Server-side GTM detector                                │
   │ tool_06b Pixel detector (out-of-GTM tracking)                    │
   │ derived  Enforcement-pattern map + tracking inventory            │
   │ tool_07  Knowledge-base retriever (markdown wiki, no vector DB)  │
   │ tool_08  Report + slide deck generator (LLM exec summary only)   │
   └─────────────────────────────────────┬────────────────────────────┘
                                         ▼
              ┌────────────────────────────────────────────────┐
              │  audit_result.json  +  report.html  +  deck.md │
              └────────────────────────────────────────────────┘

Full flow with sample inputs/outputs: see docs/scenarios.md.

Three ways to run it

1. CLI

uvx consent-engine audit https://example.com
# Writes: ./out/<audit_id>/report.html
#         ./out/<audit_id>/audit_result.json
#         ./out/<audit_id>/evidence.jsonl   ← every captured network request
#         ./out/<audit_id>/deck.marp.md

Install: pip install consent-engine or uvx consent-engine (zero-install).

2. Claude Code skill

mkdir -p ~/.claude/skills && cp -r .claude/skills/consent-audit ~/.claude/skills/

Then in any Claude Code conversation:

Audit https://example.com for consent compliance.

The skill drives the engine, surfaces findings inline, and lets you ask follow-up questions grounded in the captured evidence.

3. MCP server

# Note the [mcp] extra. The MCP SDK is an optional dependency.
uvx --from 'consent-engine[mcp]' consent-engine-mcp

Claude Desktop: edit ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "consent-engine": {
      "command": "/Users/you/.local/bin/uvx",
      "args": ["--from", "consent-engine[mcp]", "consent-engine-mcp"]
    }
  }
}

Use the absolute path to uvx (run which uvx to find yours). The Claude Desktop app is a macOS GUI process and does not inherit your shell PATH, so a bare "command": "uvx" cannot be found and the server fails to start. Also quit Claude Desktop completely before editing this file: the app holds the config in memory and rewrites it on quit, so an edit made while it is running gets overwritten.

Claude Code (CLI): add it to ~/.claude.json, or run:

claude mcp add consent-engine -- uvx --from 'consent-engine[mcp]' consent-engine-mcp

The CLI inherits your shell PATH, so a bare uvx command works here (no absolute path needed).

Either way, the server exposes audit_url, read_audit_result, and query_evidence as MCP tools.

Not showing up? The first uvx call resolves the package from PyPI (10-20s) before the tool panel turns green. On the Desktop app the usual cause is a bare uvx command (missing PATH) or an edit saved while the app was still running.

4. FastAPI service

The /audit endpoint requires a bearer token. It returns 503 until you set CONSENT_ENGINE_API_TOKEN (this is deliberate: it refuses to run an unauthenticated public audit endpoint).

docker build -t consent-engine .
docker run -p 8080:8080 -e CONSENT_ENGINE_API_TOKEN=your-secret-token consent-engine

# Then call it with the token:
curl -X POST http://localhost:8080/audit \
  -H "Authorization: Bearer your-secret-token" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

Drop-in Cloud Run / Fly / Railway deployable. Set CONSENT_ENGINE_API_TOKEN as a secret in your platform's environment config.

Real-world stakes

This isn't an academic project. Demand-letter law firms have built a pipeline around exactly the failure modes this tool detects:

The common pattern is simple: a visitor rejects tracking, but advertising, analytics, or social tags continue firing anyway. Demand letters typically argue that this creates privacy exposure and seek settlements in the $10,000 to $50,000 range.

Recent regulator actions also test the mechanics behind the policy: GPC and universal opt-out handling, whether reject is as easy as accept, whether vendor contracts limit ad-tech use, and whether the company maintains a current tracking-technology inventory. consent-engine maps scan evidence to those patterns without turning the mapping into legal advice.

CCPA fines are $2,500 per non-intentional violation, $7,500 per intentional violation. CIPA (California Invasion of Privacy Act) wiretap claims are running $5,000 per violation in active class actions against retailers, healthcare systems, and B2B SaaS marketing sites. See data/wiki/enforcement/lawsuit-surge.md for the case file.

See a finished audit before running

A committed sample audit lives at docs/sample-audit/. Open report.html and deck.html in a browser to see what the tool produces without installing it first.

Once GitHub Pages is enabled for this repo, the live demo URLs are:

Optional: unlock LLM-written executive summaries

By default, consent-engine ships with the LLM call disabled. The audit runs the full deterministic pipeline (scan → classify → wiki retrieval → HTML report + Marp deck) and writes a templated executive summary that's hand-tuned to be readable. No LLM, no API keys, no LiteLLM provider-probe warnings on stderr. This is the OSS-shipping default.

If you want the LLM-written prose summary instead (slightly sharper framing, adapted per-audit to the actual findings + wiki citations), set any one of these env vars before running:

# Gemini direct (recommended: generous free tier, simple auth)
export GEMINI_API_KEY="..."

# OR Anthropic (best at legal/compliance nuance)
export ANTHROPIC_API_KEY="..."

# OR OpenAI
export OPENAI_API_KEY="..."

# OR Vertex AI (requires a GCP service-account JSON)
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/sa.json"

The engine uses LiteLLM under the hood to route to whatever provider you've configured, no SDK swap required. The default model targets are gemini/gemini-2.5-pro (audit) and gemini/gemini-2.5-flash (executive summary classification), but you can override either via the default_audit_model / default_classify_model fields on consent_engine.config.Settings. Or just set LITELLM_LOG=ERROR and pick any model string LiteLLM understands.

The audit pipeline always falls back to the deterministic template if the LLM call fails for any reason. So if your key is rate-limited or invalid, the audit still completes cleanly.

Develop

uv sync
uv run playwright install chromium

uv run pytest tests/ -v       # one happy-path test per tool
uv run ruff check src/        # lint clean
uv run mypy src/              # types clean

Customize for your stack

The audit engine is configurable by data, not code:

  • Add a new CMP: the system ships with 35+ CMP detectors out of the box (OneTrust, Truyo, Cookiebot, CookieYes, Usercentrics, Didomi, TrustArc, Ketch, Sourcepoint, Quantcast, Osano, Axeptio, Klaro, CookieScript, CookieHub, Crownpeak, TrustCommander, Termly, Complianz, TrueVault, iubenda, Borlabs, Civic, Consentmanager, Shopify Customer Privacy, Pandectes, PiwikPRO, Transcend, Ensighten, DataGrail, CCM19, Wix, CookieInformation, CookieReports, Real Cookie Banner, plus IAB TCF + GPC/GPP). Add a new one by dropping a detector in src/consent_engine/tools/cmp_detector.py (JS-global, URL-pattern, and DOM-selector tiers) and a regional behavior profile in data/wiki/concepts/.
  • Add a vendor to the lawsuit-annotated library: edit data/vendor_library/vendors.json (priority lookup) or the Open Cookie Database CSV (fallback).
  • Add jurisdictional context (a new state, country, or sector): drop a markdown page in data/wiki/regulations/ and update data/wiki/index.md.

No vector database, no embeddings, no fine-tuning. The whole knowledge layer is markdown. Version it like any other code.

What this doesn't do

  • Does not submit anything anywhere. It's a read-only forensic tool.
  • Does not modify your GTM container. Use the companion gtm-ga4-sync for tag provisioning.
  • Does not produce legal advice. Outputs are evidence for legal counsel.

License

MIT. See LICENSE.

Credits

Created and maintained by Kenneth Buchanan.

The Open Cookie Database (~3,200 entries) is included under the project's permissive license.

About

Forensic consent-compliance audit engine. Deterministic by design.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors