Add JS Asset Auditor plugin with Playwright CLI#633
Add JS Asset Auditor plugin with Playwright CLI#633ChristianPavilonis wants to merge 17 commits intomainfrom
Conversation
Engineering spec for the /audit-js-assets . Covers sweep protocol, Chrome DevTools MCP tooling, heuristic filtering, slug generation, init and diff modes. Closes #606
Fix incorrect MCP tool name prefix, replace misused wait_for with
evaluate_script setTimeout, correct list_network_requests filtering to
use resourceTypes, resolve path derivation contradiction with consistent
/js-assets/{prefix}/{stem}.js formula, pin slug separator and base62
charset, add URL Processing section with normalization rules and
first-party boundary definition, tighten wildcard regex to require mixed
character classes, and move skill location to .claude/commands/.
Implement the /audit-js-assets command that sweeps a publisher page via Chrome DevTools MCP, detects third-party JS assets, and generates js-assets.toml entries. Includes a shared slug generation script (SHA-256 + base62) and adds MCP permission grants for navigate_page, list_network_requests, and close_page.
Move URL normalization, filtering, wildcard detection, slug generation, and TOML formatting into scripts/audit-js-assets.mjs. The skill now collects raw browser data and delegates processing to the script, replacing fragile LLM-side URL manipulation. Expand heuristic filter with Google ad rendering, ad fraud detection, ad verification, and reCAPTCHA categories. Auto-include target URL host as first-party. Add --no-filter flag. Fix semver regex to match alpha suffixes like 1.19.8-hcskhn.
Replace MCP-driven browser automation with a standalone Playwright CLI at tools/js-asset-auditor/audit.mjs. One command sweeps a publisher page, collects script URLs, processes them through the shared pipeline, and writes js-assets.toml. Refactor scripts/audit-js-assets.mjs to export processAssets() so both the stdin-based pipeline and the Playwright CLI share the same processing logic. Simplify the Claude skill from 115 to 59 lines — it now calls the CLI and formats the JSON summary.
Rewrite sweep protocol, implementation, and verification sections to describe the three-component architecture: Playwright CLI, processing library, and Claude Code skill wrapper. Add direct CLI invocation examples, --headed flag, first-party auto-detection verification, and ad-rendering filter verification steps.
Restructure into packages/js-asset-auditor/ as a self-contained Claude Code plugin with .claude-plugin/plugin.json manifest, skills/ directory, bin/ executable, and lib/ processing modules. The plugin provides the audit-js-assets skill and CLI automatically when enabled. Remove tools/js-asset-auditor/, scripts/audit-js-assets.mjs, and .claude/commands/audit-js-assets.md — all replaced by the plugin.
Enables installing the JS Asset Auditor plugin from this repo via /plugin marketplace add <org>/trusted-server followed by /plugin install js-asset-auditor.
Add --domain flag and fall back to inferring from the target URL when trusted-server.toml is not present. Enables using the plugin in any project without project-specific config.
Reflect the plugin layout at packages/js-asset-auditor/, update all file paths, document the domain resolution fallback chain (--domain flag > trusted-server.toml > infer from URL), and update skill invocation to use the namespaced /js-asset-auditor:audit-js-assets format.
New --config [path] flag auto-detects integrations (GPT, GTM, Didomi, DataDome, Lockr, Permutive, Prebid, APS) from swept script URLs and generates a trusted-server.toml with appropriate [integrations.*] sections. Auto-extracts fields like GTM container_id from query params and Permutive org/workspace IDs from URL paths. Fields needing manual input are marked with TODO comments.
Switch from headless-by-default to headed-by-default. Sites with bot protection (DataDome, Cloudflare, etc.) block headless browsers. The --headed flag becomes --headless for CI/automation use cases.
aram356
left a comment
There was a problem hiding this comment.
Summary
Additive Claude Code plugin (packages/js-asset-auditor/) with a Playwright-based CLI for sweeping publisher pages, a processing library, and integration auto-detection. No Rust changes. Review uncovered three blocking issues: the format-docs CI gate is red, the generated trusted-server.toml emits an invalid bidders = "" for Prebid, and the CLI arg parser silently consumes the next flag when a value is omitted. A handful of non-blocking refactor/hardening suggestions follow.
Blocking
🔧 wrench
- format-docs CI failing: prettier wants table column realignment in
docs/superpowers/specs/2026-04-01-js-asset-auditor-design.md(heuristic filter table and detection patterns table). Fix:cd docs && npx prettier --write superpowers/specs/2026-04-01-js-asset-auditor-design.md. (inline at line 100) - Generated
bidders = ""is invalid for Rust Prebid config:PrebidConfig::biddersisVec<String>(crates/trusted-server-core/src/integrations/prebid.rs:69). Users who flipenabled = trueon a generated[integrations.prebid]block get a deserialization failure. (inline atdetect.mjs:266) --first-party/--settle/--output/--domainconsume the next flag: passing a value-taking flag without a value silently swallows the following arg. Reproduced. (inline ataudit.mjs:84)
Non-blocking
♻️ refactor
- Slug algorithm duplicated in
scripts/js-asset-slug.mjs— make it import frompackages/js-asset-auditor/lib/process.mjs. Silent drift here breaks KV lookups against the Rust proxy. (inline atscripts/js-asset-slug.mjs:78) - Stale help block in
audit.mjs— references--headed(not implemented; actual flag is--headless) and omits--config,--force,--domain. (inline ataudit.mjs:18)
🤔 thinking
readPublisherDomainconflates parse errors with missing file — a malformed but presenttrusted-server.tomlsilently falls through to URL-inferred domain, producing wrong slugs (publisher prefix depends on domain). DistinguishENOENTfrom parse failure. (inline ataudit.mjs:142)--configexistence check usesreadFileSync— usefs.existsSync. (inline ataudit.mjs:238)- No unit tests for
process.mjs/detect.mjs— both have nontrivial logic (wildcard regex, first-party boundary, heuristic filter with two match shapes, integration field extraction). Vitest already runs incrates/js/lib; adding fixtures there (or alongside the plugin) would catch drift, especially for the slug hash that must match the Rust proxy.
⛏ nitpick
- GTM match uses
pathname.includes("/gtm.js")—.endsWith("/gtm.js")removes theoretical ambiguity. (inline atdetect.mjs:37) formatTomlValuedoesn't escape"/\— fine today since inputs are static, butJSON.stringifyon the string branch is free escape handling for future patterns. (inline atdetect.mjs:226)
🌱 seedling
- Cross-language slug fixture test — the spec claims the JS slug must produce identical output to the Rust proxy's KV key derivation, but the Rust proxy lives on a separate branch. Once it lands, a shared-fixture test (same
{domain, url}→ same slug in both JS and Rust) is the only reliable guard against silent drift. Worth a follow-up issue.
CI Status
- cargo fmt: PASS
- cargo clippy: PASS
- rust tests: PASS
- vitest (
crates/js/lib): PASS - format-typescript: PASS
- format-docs: FAIL (see 🔧 above)
prk-Jr
left a comment
There was a problem hiding this comment.
Summary
This plugin is close, but there are three blocking behavior issues in the CLI/config pipeline that can mislead operators or produce unusable output. CI is green, but these cases need to be fixed before the tool is safe to rely on for onboarding and diff workflows.
Blocking
🔧 wrench
- Generated configs enable integrations before required fields are present (packages/js-asset-auditor/lib/detect.mjs:260)
- Diff mode keeps re-appending the same NEW assets on repeated runs (packages/js-asset-auditor/lib/process.mjs:214)
--configconflict only logs an error and still exits successfully (packages/js-asset-auditor/lib/audit.mjs:266)
aram356
left a comment
There was a problem hiding this comment.
Summary
Re-review against main after 66951b9. That commit cleanly addresses every finding from my prior review (format-docs, bidders = "", arg-parser flag-swallowing, slug duplication, stale help block, existsSync, GTM endsWith, TOML escaping, ENOENT-vs-parse distinction), with focused regression tests for each.
However, the 2026-04-22 review from @prk-Jr is not addressed by the latest push — I confirmed all three of their blockers in the worktree — and I found two additional blocking-class issues while re-reading the 18 changed files.
Blocking
🔧 wrench
enabled = trueforpartial/detect_onlyintegrations — generated config boot-fails on missing required fields (server_url,pub_id,app_id).packages/js-asset-auditor/lib/detect.mjs:260(inline).- Diff mode re-appends the same NEW assets on repeated runs — reproduced: same asset appears 6× after 3 runs because
parseExistingTomlignores commented suggestions.packages/js-asset-auditor/lib/process.mjs:214(inline). --configcollision exits 0 with success summary — violates spec, breaks automation.packages/js-asset-auditor/lib/audit.mjs:269(inline).--configdefault path collides with livetrusted-server.toml— with--force, silently overwrites the real publisher config with a TODO skeleton.packages/js-asset-auditor/lib/audit.mjs:136(inline).- Plugin unit tests are not wired into CI —
.github/workflows/test.ymlonly runs vitest incrates/js/lib. The 8 regression tests added in 66951b9 (the sole guard against drift on slug algorithm, integration detection, arg parsing) run locally only, so regressions will land onmainunnoticed. Add a step that runscd packages/js-asset-auditor && npm ci && npm teston PRs touching this directory, or unconditionally.
Non-blocking
🤔 thinking
- No diff-idempotence test —
processAssets(..., { diff: true })run twice against its own output should yieldsummary.new.length === 0. A fixture test inpackages/js-asset-auditor/test/process.test.mjswould have caught the re-append regression and will keep catching it. - SKILL.md still says "headless browser" —
packages/js-asset-auditor/skills/audit-js-assets/SKILL.md:24. The CLI has defaulted to headed since e0c7e0c. Misleading for sandbox users. data:/blob:script URLs produce"null"origin in slugs —new URL("data:…").origin === "null"soapplyWildcardsyieldsnull/<pathname>. Rare for<script src>, but one protocol guard is cheap.
♻️ refactor
readPublisherDomainhand-rolls a TOML scan (lib/audit.mjs:37-61) — rejects single-quoted strings and any non-^domain = "…"shape. A small TOML lib would be more robust. Non-urgent.
⛏ nitpick
- APS pattern uses
pathname.includes("/apstag")(lib/detect.mjs:152) — same class of ambiguity as the GTM case tightened in 66951b9. Consider anchoring.
CI Status
- cargo fmt: PASS
- cargo clippy / Analyze (rust): PASS
- cargo test: PASS
- vitest (
crates/js/lib): PASS - format-typescript: PASS
- format-docs: PASS
- browser integration tests: PASS
- CodeQL: PASS
- plugin tests (
packages/js-asset-auditor/test): NOT WIRED INTO CI — see blocking #5
|
Addressed the remaining review feedback in 00ce787 and replied/resolved the inline threads:
Validation run after the changes:
|
aram356
left a comment
There was a problem hiding this comment.
Summary
Re-review against main after 5b3481a. Every prior blocker from the 2026-04-22 (prk-Jr) and 2026-04-23 (aram356) reviews is cleanly addressed in 00ce787 + 5b3481a, with focused regression tests for each fix and the plugin test job now wired into PR CI. CI is green (14 checks). I verified all 16 plugin tests pass locally and prettier is clean on the new docs files.
Two issues warrant changes before merge:
data:/blob:script URLs produce garbage TOML entries that look like real[[js_assets]]blocks but reference unfetchable origins. Reproduced both cases throughprocessAssets(). Inline below.- Spec example for Lockr now contradicts the implementation — the doc shows
enabled = truefor an integration that the (corrected) generator now emits asenabled = false. Inline below.
Plus two non-blocking suggestions: replace the hand-rolled TOML scan in readPublisherDomain with a real parser (carried over — the new test now locks in the strict-double-quote-only limitation), and tighten the loose pathname.includes(\"/apstag\") APS pattern the same way GTM was tightened in 66951b9.
Blocking
🔧 wrench
data:/blob:URLs produce garbage TOML (packages/js-asset-auditor/lib/process.mjs:321, inline)- Spec example contradicts implementation (
docs/superpowers/specs/2026-04-01-js-asset-auditor-design.md:284, inline)
Non-blocking
♻️ refactor
readPublisherDomainhand-rolled TOML scan (packages/js-asset-auditor/lib/audit.mjs:39-63, inline)
🤔 thinking
- Integration detection runs against unfiltered URLs —
lib/audit.mjs:305callsdetectIntegrations(scriptUrls, …)on the raw network capture, not the post-first-party set. If a publisher self-hosts their prebid bundle atcdn.publisher.com/prebid.js, a[integrations.prebid]block lands in the generated config. Not strictly wrong (publisher does run prebid), but it bypasses the first-party boundary the asset pipeline carefully enforces. Consider passing the third-party-filtered URL set instead.
⛏ nitpick
- APS pattern is loose (
packages/js-asset-auditor/lib/detect.mjs:170, inline) --first-partyaccepts unvalidated values (packages/js-asset-auditor/lib/audit.mjs:163) — splits on,with no normalization, so--first-party https://www.example.comsilently no-ops becausestripWwwdoesn't strip the protocol. Either reject non-hostname values or normalize vianew URL(value).hostname.
CI Status
- cargo fmt: PASS
- cargo clippy / Analyze (rust): PASS
- cargo test: PASS
- vitest (
crates/js/lib): PASS - format-typescript: PASS
- format-docs: PASS
- browser integration tests: PASS
- CodeQL: PASS
- js-asset-auditor tests: PASS (newly wired into CI in this PR)
|
I will use the generate docs skill |
- docs/guide/js-asset-auditor.md (expanded feature guide) - docs/superpowers/specs/implemented/2026-04-01-js-asset-auditor-design.md (marked implemented) Inline TODOs: 0
aram356
left a comment
There was a problem hiding this comment.
Summary
Re-review of feature/js-asset-auditor against main at head dd859bfa. The PR has been through three earlier review passes and Christian has cleanly addressed every prior blocker — fixes landed with focused regression tests, parseExistingToml now recognizes commented suggestions, --config collisions exit non-zero, integrations with TODO fields no longer auto-enable, data: / blob: URLs are rejected, and the plugin test job is wired into PR CI. 20/20 plugin tests pass locally.
One blocker: format-docs CI is failing on the new docs files. Same class of issue that landed in the original review (fixed in 66951b9); the doc additions in dd859bfa reintroduced it. Reproduced locally — see inline. Everything else is non-blocking polish.
Blocking
🔧 wrench
format-docsCI is failing — Prettier wants column-aligned tables indocs/guide/js-asset-auditor.md(L100–105 and L227–238) anddocs/superpowers/specs/implemented/2026-04-01-js-asset-auditor-design.md(L122–130 and L208–212). Inline.
Non-blocking
🤔 thinking
- Init mode silently overwrites an existing
js-assets.toml—audit-js-assets <url>with no--diffwritesargs.outputunconditionally atpackages/js-asset-auditor/lib/audit.mjs:314. Inconsistent with--config, which fails loudly viaensureConfigPathWritable()unless--forceis passed. If an operator re-runs init forgetting--diff, hand-curated entries are lost silently. Consider applyingensureConfigPathWritable(args.output, args.force)symmetrically (or at least when output is the default path). inject_in_headfirst-seen-wins on wildcard collapse — see inline atpackages/js-asset-auditor/lib/process.mjs:340.
♻️ refactor
- Lockr
hostname.includesis loose — see inline atpackages/js-asset-auditor/lib/detect.mjs:88. readPublisherDomainhand-rolled TOML scan — see inline atpackages/js-asset-auditor/lib/audit.mjs:39.
🌱 seedling
formatTomlEntrydoesn't escape strings —packages/js-asset-auditor/lib/process.mjs:185-196uses raw interpolation forslug,path,origin_url. detect.mjs'sformatTomlValuedoes escape viaJSON.stringify. All three values today come from URL-encoded ASCII so the inconsistency is harmless, but it's a footgun if anyone later adds a non-URL-derived field.- Missing direct unit tests for
applyWildcards,matchesHeuristicFilter,extractAssetStem,normalizeUrl,isFirstParty— All exercised only indirectly throughprocessAssets. Direct tests would catch regressions in the wildcard semver/hex/mixed-hash heuristics and in the dot-boundary suffix logic (e.g., a regression whereevil-googletagmanager.comstarts matching). page.waitForTimeoutis "discouraged" in Playwright —packages/js-asset-auditor/lib/audit.mjs:264. For an audit tool a fixed-window settle is reasonable, butpage.waitForLoadState('networkidle', { timeout: args.settle })is the documented alternative.- Permutive omits commented
cache_ttl_seconds/rewrite_sdkdefaults —packages/js-asset-auditor/lib/detect.mjs:126-129lists onlyapi_endpointandsecure_signals_endpoint. The Permutive Rust struct also definescache_ttl_secondsandrewrite_sdkwith serde defaults, and GPT lists analogous defaults in the generator. Inconsistent operator visibility into available overrides.
📌 out of scope
- Unrelated chrome-devtools MCP permissions in root
.claude/settings.json—.claude/settings.json:27-32addsnavigate_page,list_network_requests,close_page. The auditor uses Playwright, not chrome-devtools-mcp. These additions look orthogonal to this PR's scope and would be cleaner in a separate "expand harness permissions" change so the diff stays focused on the auditor.
CI Status
- cargo fmt: PASS
- cargo clippy / Analyze (rust): PASS
- cargo test: PASS
- vitest (
crates/js/lib): PASS - format-typescript: PASS
- format-docs: FAIL
- browser integration tests: PASS
- CodeQL: PASS
- js-asset-auditor tests: PASS
| inject_in_head = true | ||
| ``` | ||
|
|
||
| | Field | Description | |
There was a problem hiding this comment.
🔧 wrench — format-docs CI is failing on this file.
Reproduced locally with cd docs && npm run format: Prettier wants column-aligned tables here (the Field / Description table) and again at lines ~227–238 (the Common options table), plus the matching tables in docs/superpowers/specs/implemented/2026-04-01-js-asset-auditor-design.md (asset-entry table at L122–130 and diff-mode condition table at L208–212).
This is the same class of failure that landed in the original review and was fixed in 66951b9; the doc additions in dd859bfa reintroduced it. CI run: https://github.com/IABTechLab/trusted-server/actions/runs/25353966443/job/74339283778
Fix:
cd docs && npm run format:writethen commit the resulting whitespace-only changes. cd docs && npm run format should then pass.
|
|
||
| const slug = generateSlug(args.domain, wildcarded); | ||
| const prefix = slug.split(":")[0]; | ||
| const injectInHead = normalizedHead.has(url); |
There was a problem hiding this comment.
🤔 thinking — inject_in_head is determined from the pre-wildcard URL on a first-seen-wins basis.
injectInHead = normalizedHead.has(url) uses the original URL, but the dedup at L335 keys on the wildcarded URL. If a publisher's page loads …/prod/1.19.8/raven.js from <body> and …/prod/1.19.9/raven.js from <head> in the same sweep, both collapse to the same wildcard but inject_in_head reflects only whichever URL the iteration saw first.
Not common in practice (a single page sweep usually loads one version), but worth flagging — when a wildcarded entry is generated, inject_in_head should arguably be true if any of the originals appeared in head.
Suggested approach:
const injectInHead = [...normalizedHead].some((headUrl) => {
const { wildcarded: headWildcarded } = applyWildcards(headUrl);
return headWildcarded === wildcarded;
});or track the OR across originals collapsed into the same wildcard during the dedup loop.
| id: "lockr", | ||
| label: "Lockr Identity", | ||
| match: (url) => { | ||
| const href = url.href.toLowerCase(); |
There was a problem hiding this comment.
♻️ refactor — Lockr hostname match uses String.prototype.includes, which is looser than every other pattern in this file.
url.hostname.includes("aim.loc.kr") matches notaim.loc.kr and aim.loc.kr.attacker.com. Combined with the identity-lockr substring + .js suffix requirements (and the fact that partial integrations are emitted with enabled = false), the practical blast radius is limited to a noisy false positive in the generated config — but the test is still wrong, and inconsistent with how GTM / Datadome / Didomi / GPT / APS / Permutive all use === or endsWith.
Fix:
const { hostname } = url;
const hostMatch =
hostname === "aim.loc.kr" ||
hostname.endsWith(".aim.loc.kr") ||
hostname === "identity.loc.kr" ||
hostname.endsWith(".identity.loc.kr");
if (!hostMatch) return false;
const href = url.href.toLowerCase();
return href.includes("identity-lockr") && href.endsWith(".js");Worth covering with a regression test that asserts notaim.loc.kr/identity-lockr-foo.js does not match.
| // Config reading | ||
| // --------------------------------------------------------------------------- | ||
|
|
||
| export function readPublisherDomain(repoRoot, configPath = "trusted-server.toml") { |
There was a problem hiding this comment.
♻️ refactor — Carry-over from the 2026-04-29 review: readPublisherDomain is still a hand-rolled TOML scan that only matches domain = "…" (double-quoted basic strings).
Author explicitly deferred adding a TOML dependency in this pass, and the new test in audit.test.mjs locks in the strict-double-quote-only behavior — a single-quoted domain = 'publisher.com' is treated as missing and the CLI errors out, which is at least consistent. Re-flagging as non-blocking so the follow-up stays on the list: an actual TOML parser (e.g., smol-toml, ~6 KB) would gracefully accept literal strings and multi-line strings, and would future-proof against operators copy-pasting examples from the wider TOML ecosystem.
| implemented_in: feature/js-asset-auditor | ||
| last_reviewed: 2026-05-04 | ||
| verified_against_commit: 4773d13b | ||
| --- |
There was a problem hiding this comment.
⛏ nitpick — Frontmatter verified_against_commit: 4773d13b is stale.
Latest commit on this branch is dd859bfa (Update docs for JS Asset Auditor), which itself touches this file. Either bump the value to the current head or drop the field if it's not meant to track every doc edit.
Summary
packages/js-asset-auditor/with a standalone Playwright CLI that sweeps publisher pages for third-party JS assetstrusted-server.tomlconfig with--configflagtrusted-server.tomloptional with--domainflag for portabilityTry it out
1. Check out the branch
2. Install dependencies
3. Run the CLI directly
4. Use as a Claude Code plugin
Then in Claude Code:
CLI flags
--diffjs-assets.toml--settle <ms>--first-party <hosts>--domain <host>trusted-server.tomlor URL)--no-filter--headless--output <path>js-assets.toml)--config [path]trusted-server.tomlwith detected integrations--forceTest plan
js-assets.tomloutput--configand verify detected integrations in generatedtrusted-server.toml--diffagainst an existingjs-assets.tomland verify confirmed/new/missingtrusted-server.toml(e.g., from/tmp) and verify domain inference--configwithout--forceerrors when file already existsCloses #631