Address two inspection passes: security, detectors, hygiene by Conalh · Pull Request #41 · Conalh/ScopeTrail

Conalh · 2026-05-22T23:00:59Z

Summary

Two inspection passes worth of fixes against ScopeTrail. Ten commits, atomic and individually revertable. Test count went from 36 → 49 with regression coverage for every behavior change.

The most important commit is the security fix — please skim that one first.

🔴 Security fix (high priority)

0f5ff71 — Bare Bash/Read/Write/Edit not flagged as broad. isBroadAllow had an asymmetry: bare WebFetch/WebSearch/Task were flagged, but bare Bash (unlimited shell), Read/Write/Edit (unlimited filesystem) silently slipped through because the regex required parenthesized scope. A reviewer adding "allow": ["Bash"] got no ScopeTrail warning. Fixed by routing all dangerous verbs through the same bare-token check. Two regression tests — one for the gap, one asserting Bash(npm test) etc. stay narrow.

🟡 Detector improvements

07706c5 — Codex TOML MCP servers. .codex/config.toml's [mcp_servers.NAME] sections were invisible to the detector; only scalar fields (sandbox/approval/network/trust) were checked. Three new findings mirror the .mcp.json risk vocabulary: codex_mcp_server_added, codex_mcp_server_command_changed, codex_unpinned_mcp_command. Shared MCP risk heuristic extracted to src/mcp-risk.ts so JSON and TOML detectors stay aligned. Uses parseToml and lineOfTomlKey from agent-gov-core.
0aa8677 — Partial Claude hook command changes. hook_command_changed previously only fired on 1:1 command swaps. Now also catches: a no-op appended alongside a strict guard (count grows), and one guard dropped from a multi-guard hook (count shrinks).
fb56768 — bunx runner + parallel walk. Adds bunx (Bun's npx equivalent) to the unpinned-command runner list. Switches collectMcpSampleConfigPaths to Promise.all for parallel subdirectory walks. Notes in commit message why yarn dlx / npm exec need a structurally different check that's deferred.
4082373 — http:// sample endpoints fire high. mcp_sample_remote_endpoint was fixed-medium for both schemes. http:// now fires high ("unencrypted remote endpoint") since copy-pasted samples silently downgrade users to MitM-vulnerable transport. https:// stays at medium.

🟢 Performance / hygiene

416e37a — Single-scan multi-format Action. action.yml invoked the CLI three times per run, each repeating git snapshot materialization + detector work. New --out-markdown PATH / --out-json PATH flags let one scan render all three sinks. Stdout still carries the GitHub annotations the runner expects.
edfef04 — Whitelist publish surface. npm pack was shipping 67 files (161 kB) including test/fixtures/ (intentionally-risky agent configs), .github/, .claude/settings.json, .mcp.json, full TypeScript src/, and tsconfig.json. For a trust-focused permission-drift reviewer, shipping the adversarial fixtures as part of npm install is the wrong default. Now 16 files (141 kB, mostly the demo PNG). Regression test pins what's allowed and what isn't.
feadd61 — Dead link fix + coverage test. docs/ADOPTION.md linked ?template=team-adoption.yml but the template didn't exist. Added the template (ownership / reporting / exceptions / many-repos signal). New regression test walks every ?template=*.yml reference in README.md/ADOPTION.md/PILOT.md/TRUST.md and asserts each resolves to a real file — catches the next dead link too.
167a933 — Pin LF line endings. .gitattributes was just * text=auto, which under Windows core.autocrlf=true kept rewriting dist/detectors/*.js between LF and CRLF and producing a false dirty working tree after a clean checkout.

⚪ Already shipped

2449b84 — agent-gov-core 0.4.x → 0.5.0. Your existing bump; included here so the Codex MCP detector (which uses parseToml/lineOfTomlKey) builds cleanly.

What I deliberately did NOT do

From the second inspection:

Adding dlx/exec to the runner list as proposed — those aren't command names. yarn dlx foo is command: \"yarn\", args: [\"dlx\", \"foo\"], so the proposed ['npx','uvx','pipx','dlx','exec'] check catches nothing real. Needs a structurally different multi-arg pattern. Noted in fb56768 commit message.
Aider / .cursorrules / .windsurfrules detection — legitimate missing-surface requests, but each one needs a real risk model (especially .cursorrules/.windsurfrules since they're natural-language prompts, which would expand ScopeTrail's scope from structural config drift into prompt-content analysis — a deliberate product decision, not a code review fix). Filing as separate issues against the missing-surface template.
A render subcommand for multi-format — functionally equivalent to the --out-* flags but requires re-parsing JSON in a second CLI invocation. Flag-based approach renders from one in-memory report.

Test plan

npm test — 49/49 passing locally (up from 36)
npm run build clean
npm pack --dry-run shows 16 files, no fixtures or repo dotfiles
git status clean on fresh checkout (no CRLF flap)
CI green on PR
Action dogfood run on this PR shows the new findings against the repo's own .codex/config.toml (which has [mcp_servers.stripe-admin] with @latest)

🤖 Generated with Claude Code

Previously .gitattributes only declared `* text=auto`, which under core.autocrlf=true on Windows kept rewriting dist/detectors/*.js between LF and CRLF and producing a false dirty working tree after a clean checkout. Pin LF explicitly for source, generated JS, configs, and docs so the dirty state stops returning.

Before: `npm pack` shipped 67 files (161 kB) including test/fixtures of intentionally-risky agent configs, the dogfood .claude/settings.json and .mcp.json, .github/ workflows, full TypeScript src/, and tsconfig. For a trust-focused permission-drift reviewer this is the wrong default — every consumer was effectively `npm install`-ing the adversarial examples we use to test the detectors. Add a `files` whitelist that ships only dist/, action.yml, docs/, the README screenshot, README, and LICENSE. New tarball is 16 files (141 kB, almost entirely the PNG). A regression test asserts the runtime entry points stay present and the previously-leaked surfaces (.github, .claude, .codex, .cursor, .vscode, src, test, .mcp.json, tsconfig.json, repo-only dotfiles) cannot creep back in.

docs/ADOPTION.md step 7 advertised a `?template=team-adoption.yml` feedback channel, but no such template existed under .github/ISSUE_TEMPLATE/ — GitHub silently degraded the link to a generic new-issue page. The previous issue-template test only asserted the templates that already shipped, so it couldn't catch the missing one. Add the team-adoption template covering ownership, reporting, exceptions, and many-repository signals (the four themes ADOPTION.md already promises) and a new regression test that walks every `?template=*.yml` reference in README.md, ADOPTION.md, PILOT.md, and TRUST.md and asserts each one resolves to a real file. This catches any future doc that promises a template we don't actually ship.

`.codex/config.toml` carries the same `[mcp_servers.NAME]` shape that ScopeTrail already flags in `.mcp.json`, but the Codex detector previously only looked at scalar fields (sandbox, approval, network, project trust). A Codex user could drop in a `[mcp_servers.stripe-admin]` section with `args = ["-y", "@vendor/stripe-mcp@latest"]` and the unpinned-command risk model never saw it — exactly the gap the inspection caught against this repo's own .codex/config.toml dogfood example. Extract the MCP command-risk heuristic (`isUnpinnedCommand`, `serverCommand`, `isPipeToShellCommand`) into a shared `src/mcp-risk.ts` so the JSON and TOML detectors stay aligned as the risk model evolves. Add three new findings, mirroring the .mcp.json risk vocabulary: - `scope_trail.codex_mcp_server_added` (high) - `scope_trail.codex_mcp_server_command_changed` (medium) - `scope_trail.codex_unpinned_mcp_command` (high) Use agent-gov-core's `parseToml` and `lineOfTomlKey` so findings point at the `command`/`args` line inside the `[mcp_servers.NAME]` table. Also: - Update .gitignore so test/fixtures/**/.codex/ stays trackable while the top-level dogfood .codex/ remains ignored (matches the long-standing exception the existing codex-config-drift fixture relied on via `git add -f`). - Drop the DEP0190 deprecation warning from package-surface.test.mjs by routing the Windows npm.cmd invocation through cmd.exe instead of asking execFile to use the shell.

`hook_command_changed` previously only fired when an existing hook had the same number of commands but at least one was new — i.e., a 1:1 swap. That left two real weakening cases silent: - Appending a no-op alongside a strict guard (count grows). - Removing one guard from a multi-guard hook (count shrinks). Both are policy events the reviewer needs to see, but the `newCommands.size === oldCommands.size` guard skipped them. Replace the size-equality check with a symmetric-difference check: fire when any command was added *or* removed. Build a message that names the deltas so reviewers can see what changed without opening the diff. Add three regression tests covering the append, partial- removal, and unchanged-set cases.

Picks up v0.5.0: tokenizeShellDeep, ConfigParseError, generateWorkflowSummary, plus 10 pre-publish correctness fixes (JSON locator value-vs-key disambiguation, multi-line TOML escape tracking, dotted-key whitespace tolerance, line-ending backslash trim, boolean flag pairing, Windows exe case-fold + runtime path de-noise, HTML escape in summaries) and 11 golden compatibility tests pinning fingerprint/normalization output before v1.0 freeze. The path-denoise fix is especially relevant here: /usr/bin/node, node, and node.exe now produce identical canonical strings, closing a PolicyMesh mcp_command_mismatch false-positive class — but ScopeTrail benefits indirectly since the same shared normalizeMcpCommand is used for unpinned-command detection.

action.yml previously invoked the CLI three times per pull request — once with `--format markdown`, once with `--format json`, once with `--format github`. In git mode each call re-materialized both the base and head snapshots and re-ran every detector, tripling the work for no extra coverage. Add two CLI flags that let one scan render to multiple sinks: --out-markdown PATH Write the markdown report to a file. --out-json PATH Write the JSON report to a file. `--format` continues to control stdout, so the existing CLI surface is backward compatible. The Action now runs: scopetrail diff --format github \ --out-markdown $report_file --out-json $json_file Stdout streams the GitHub annotations the runner picks up, while the two file outputs feed the step summary append and the rating / finding-count output parsing. A separate `cat "$report_file"` keeps the markdown report visible in the Action log for parity with the prior `tee`. Tests: - New action-metadata assertion: exactly one `node $GITHUB_ACTION_PATH/dist/index.js` invocation appears in action.yml. - Existing `diff --repo` regex relaxed to tolerate the line continuation between `diff` and `--repo`. - New CLI test exercises `--out-markdown` + `--out-json` + `--format github` in one call and asserts both files match their single-format equivalents.

isBroadAllow had an asymmetry that silently allowed the most dangerous permissions: - Bare "WebFetch", "WebSearch", "Task" → correctly flagged as broad. - Bare "Bash", "Read", "Write", "Edit" → returned false. The bash/read/write/edit branches were regex-based and required a parenthesized scope (`Bash(npm *)` or `Read(~/**)`). A bare token like `"Bash"` — which grants unlimited shell execution without any prompt — had no parentheses, didn't match, and slipped through with no finding. Same for bare `"Read"`/`"Write"`/`"Edit"`, which grant unlimited filesystem access. Move all four into the same `isBroadVerbGrant` check that already handled WebFetch/WebSearch/Task. The bare-token + wildcard-scope detection now applies uniformly across every dangerous verb. Keep the rooted-path regex (`/\b(read|write|edit)$(~|...)$/`) since it catches `Read(/etc/passwd)` shapes that don't contain `*` — those aren't bare and the wildcard branch can't see them. Two new regression tests: - Asserts bare Bash/Read/Write/Edit are flagged broad. - Asserts narrowly-scoped Bash(npm test), Read(./src/foo.txt) etc. stay narrow — the fix must not over-fire on legitimate scoping.

Two small improvements bundled because they touch the same MCP detection code path. 1. `collectMcpSampleConfigPaths` walked subdirectories sequentially via `await` inside a `for` loop. Switch to `Promise.all` over `entries.map`. Each `readdir` is independent and `paths` is a Set mutated from the same event loop, so mutations are race-free in single-threaded Node. Caller already sorts the result, so insertion order doesn't matter. Practical impact is modest — the heavy hitters (node_modules, .git, dist, build, coverage, .next, .turbo) are already in IGNORED_SAMPLE_SCAN_DIRS and the GitHub Action walks a snapshot tarball that only contains the detector-relevant paths. But local `--repo .` runs on big monorepos benefit, and the change is free. 2. Add `bunx` to the npx/uvx/pipx runner list in mcp-risk.ts. `bunx` is Bun's npx equivalent and ships as its own binary, so it surfaces as `command: "bunx"` in MCP configs — directly analogous to npx. Add a regression test. Note on `yarn dlx` / `npm exec` / `pnpm dlx` — those are not added here because they have `command: "yarn"` (etc.) with the subcommand in args[0]. The current heuristic checks `spec.command` against a runner list, which is structurally the wrong shape for those. A proper fix would need to also check args[0] against subcommand names, then scan args[1..] for packages. Left out deliberately rather than shipping a wrong-shape change that catches nothing.

mcp_sample_remote_endpoint fired at fixed medium severity for both http:// and https:// endpoints. An unencrypted http:// MCP endpoint in a sample config is a different shape of risk: anyone who copies the sample inherits a MitM-vulnerable connection. Split severity by scheme: - http://* → high ("unencrypted remote endpoint") - https://* → medium ("remote endpoint", unchanged copy-paste check) Message and recommendation are scheme-aware too — http:// findings specifically call out that copy-pasted samples should not silently downgrade users to unencrypted transport. Existing tests covering https:// sample endpoints continue to assert medium severity (no fixture changes needed — all existing fixtures already used https). New regression test covers both schemes in one fixture to lock the split. Note: this is sample-only. The active .mcp.json detector doesn't currently flag any remote endpoints — http:// or https:// — and that's a separate gap. Filing as a follow-up issue rather than expanding scope here.

Extends `isUnpinnedCommand` to the multi-arg runner shape I'd deferred in fb56768 — `command: "yarn"` with `args: ["dlx", "pkg"]` and the npm / pnpm equivalents. Recognized executor subcommands: - npm exec | x - yarn dlx - pnpm dlx | exec | x When the command is one of the wrappers and args[0] is an executor subcommand, scan args[1..] (skipping flags) for an unpinned package name. Same `looksLikePackageName` / `hasExactVersion` check as the direct-runner path, so the heuristic stays consistent. New regression test covers all three wrappers in one fixture.

Conalh added 11 commits May 22, 2026 15:40

Conalh merged commit 5c02bed into main May 22, 2026
3 checks passed

Conalh deleted the inspector-fixes branch May 22, 2026 23:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Address two inspection passes: security, detectors, hygiene#41

Address two inspection passes: security, detectors, hygiene#41
Conalh merged 11 commits into
mainfrom
inspector-fixes

Conalh commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Conalh commented May 22, 2026

Summary

🔴 Security fix (high priority)

🟡 Detector improvements

🟢 Performance / hygiene

⚪ Already shipped

What I deliberately did NOT do

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant