Address two inspection passes: security, detectors, hygiene#41
Merged
Conversation
Previously .gitattributes only declared `* text=auto`, which under core.autocrlf=true on Windows kept rewriting dist/detectors/*.js between LF and CRLF and producing a false dirty working tree after a clean checkout. Pin LF explicitly for source, generated JS, configs, and docs so the dirty state stops returning.
Before: `npm pack` shipped 67 files (161 kB) including test/fixtures of intentionally-risky agent configs, the dogfood .claude/settings.json and .mcp.json, .github/ workflows, full TypeScript src/, and tsconfig. For a trust-focused permission-drift reviewer this is the wrong default — every consumer was effectively `npm install`-ing the adversarial examples we use to test the detectors. Add a `files` whitelist that ships only dist/, action.yml, docs/, the README screenshot, README, and LICENSE. New tarball is 16 files (141 kB, almost entirely the PNG). A regression test asserts the runtime entry points stay present and the previously-leaked surfaces (.github, .claude, .codex, .cursor, .vscode, src, test, .mcp.json, tsconfig.json, repo-only dotfiles) cannot creep back in.
docs/ADOPTION.md step 7 advertised a `?template=team-adoption.yml` feedback channel, but no such template existed under .github/ISSUE_TEMPLATE/ — GitHub silently degraded the link to a generic new-issue page. The previous issue-template test only asserted the templates that already shipped, so it couldn't catch the missing one. Add the team-adoption template covering ownership, reporting, exceptions, and many-repository signals (the four themes ADOPTION.md already promises) and a new regression test that walks every `?template=*.yml` reference in README.md, ADOPTION.md, PILOT.md, and TRUST.md and asserts each one resolves to a real file. This catches any future doc that promises a template we don't actually ship.
`.codex/config.toml` carries the same `[mcp_servers.NAME]` shape that ScopeTrail already flags in `.mcp.json`, but the Codex detector previously only looked at scalar fields (sandbox, approval, network, project trust). A Codex user could drop in a `[mcp_servers.stripe-admin]` section with `args = ["-y", "@vendor/stripe-mcp@latest"]` and the unpinned-command risk model never saw it — exactly the gap the inspection caught against this repo's own .codex/config.toml dogfood example. Extract the MCP command-risk heuristic (`isUnpinnedCommand`, `serverCommand`, `isPipeToShellCommand`) into a shared `src/mcp-risk.ts` so the JSON and TOML detectors stay aligned as the risk model evolves. Add three new findings, mirroring the .mcp.json risk vocabulary: - `scope_trail.codex_mcp_server_added` (high) - `scope_trail.codex_mcp_server_command_changed` (medium) - `scope_trail.codex_unpinned_mcp_command` (high) Use agent-gov-core's `parseToml` and `lineOfTomlKey` so findings point at the `command`/`args` line inside the `[mcp_servers.NAME]` table. Also: - Update .gitignore so test/fixtures/**/.codex/ stays trackable while the top-level dogfood .codex/ remains ignored (matches the long-standing exception the existing codex-config-drift fixture relied on via `git add -f`). - Drop the DEP0190 deprecation warning from package-surface.test.mjs by routing the Windows npm.cmd invocation through cmd.exe instead of asking execFile to use the shell.
`hook_command_changed` previously only fired when an existing hook had the same number of commands but at least one was new — i.e., a 1:1 swap. That left two real weakening cases silent: - Appending a no-op alongside a strict guard (count grows). - Removing one guard from a multi-guard hook (count shrinks). Both are policy events the reviewer needs to see, but the `newCommands.size === oldCommands.size` guard skipped them. Replace the size-equality check with a symmetric-difference check: fire when any command was added *or* removed. Build a message that names the deltas so reviewers can see what changed without opening the diff. Add three regression tests covering the append, partial- removal, and unchanged-set cases.
Picks up v0.5.0: tokenizeShellDeep, ConfigParseError, generateWorkflowSummary, plus 10 pre-publish correctness fixes (JSON locator value-vs-key disambiguation, multi-line TOML escape tracking, dotted-key whitespace tolerance, line-ending backslash trim, boolean flag pairing, Windows exe case-fold + runtime path de-noise, HTML escape in summaries) and 11 golden compatibility tests pinning fingerprint/normalization output before v1.0 freeze. The path-denoise fix is especially relevant here: /usr/bin/node, node, and node.exe now produce identical canonical strings, closing a PolicyMesh mcp_command_mismatch false-positive class — but ScopeTrail benefits indirectly since the same shared normalizeMcpCommand is used for unpinned-command detection.
action.yml previously invoked the CLI three times per pull request —
once with `--format markdown`, once with `--format json`, once with
`--format github`. In git mode each call re-materialized both the
base and head snapshots and re-ran every detector, tripling the
work for no extra coverage.
Add two CLI flags that let one scan render to multiple sinks:
--out-markdown PATH Write the markdown report to a file.
--out-json PATH Write the JSON report to a file.
`--format` continues to control stdout, so the existing CLI surface
is backward compatible. The Action now runs:
scopetrail diff --format github \
--out-markdown $report_file --out-json $json_file
Stdout streams the GitHub annotations the runner picks up, while the
two file outputs feed the step summary append and the rating /
finding-count output parsing. A separate `cat "$report_file"` keeps
the markdown report visible in the Action log for parity with the
prior `tee`.
Tests:
- New action-metadata assertion: exactly one `node $GITHUB_ACTION_PATH/dist/index.js`
invocation appears in action.yml.
- Existing `diff --repo` regex relaxed to tolerate the line
continuation between `diff` and `--repo`.
- New CLI test exercises `--out-markdown` + `--out-json` + `--format github`
in one call and asserts both files match their single-format
equivalents.
isBroadAllow had an asymmetry that silently allowed the most dangerous permissions: - Bare "WebFetch", "WebSearch", "Task" → correctly flagged as broad. - Bare "Bash", "Read", "Write", "Edit" → returned false. The bash/read/write/edit branches were regex-based and required a parenthesized scope (`Bash(npm *)` or `Read(~/**)`). A bare token like `"Bash"` — which grants unlimited shell execution without any prompt — had no parentheses, didn't match, and slipped through with no finding. Same for bare `"Read"`/`"Write"`/`"Edit"`, which grant unlimited filesystem access. Move all four into the same `isBroadVerbGrant` check that already handled WebFetch/WebSearch/Task. The bare-token + wildcard-scope detection now applies uniformly across every dangerous verb. Keep the rooted-path regex (`/\b(read|write|edit)\((~|...)\)/`) since it catches `Read(/etc/passwd)` shapes that don't contain `*` — those aren't bare and the wildcard branch can't see them. Two new regression tests: - Asserts bare Bash/Read/Write/Edit are flagged broad. - Asserts narrowly-scoped Bash(npm test), Read(./src/foo.txt) etc. stay narrow — the fix must not over-fire on legitimate scoping.
Two small improvements bundled because they touch the same MCP detection code path. 1. `collectMcpSampleConfigPaths` walked subdirectories sequentially via `await` inside a `for` loop. Switch to `Promise.all` over `entries.map`. Each `readdir` is independent and `paths` is a Set mutated from the same event loop, so mutations are race-free in single-threaded Node. Caller already sorts the result, so insertion order doesn't matter. Practical impact is modest — the heavy hitters (node_modules, .git, dist, build, coverage, .next, .turbo) are already in IGNORED_SAMPLE_SCAN_DIRS and the GitHub Action walks a snapshot tarball that only contains the detector-relevant paths. But local `--repo .` runs on big monorepos benefit, and the change is free. 2. Add `bunx` to the npx/uvx/pipx runner list in mcp-risk.ts. `bunx` is Bun's npx equivalent and ships as its own binary, so it surfaces as `command: "bunx"` in MCP configs — directly analogous to npx. Add a regression test. Note on `yarn dlx` / `npm exec` / `pnpm dlx` — those are not added here because they have `command: "yarn"` (etc.) with the subcommand in args[0]. The current heuristic checks `spec.command` against a runner list, which is structurally the wrong shape for those. A proper fix would need to also check args[0] against subcommand names, then scan args[1..] for packages. Left out deliberately rather than shipping a wrong-shape change that catches nothing.
mcp_sample_remote_endpoint fired at fixed medium severity for both
http:// and https:// endpoints. An unencrypted http:// MCP endpoint
in a sample config is a different shape of risk: anyone who copies
the sample inherits a MitM-vulnerable connection.
Split severity by scheme:
- http://* → high ("unencrypted remote endpoint")
- https://* → medium ("remote endpoint", unchanged copy-paste check)
Message and recommendation are scheme-aware too — http:// findings
specifically call out that copy-pasted samples should not silently
downgrade users to unencrypted transport.
Existing tests covering https:// sample endpoints continue to assert
medium severity (no fixture changes needed — all existing fixtures
already used https). New regression test covers both schemes in one
fixture to lock the split.
Note: this is sample-only. The active .mcp.json detector doesn't
currently flag any remote endpoints — http:// or https:// — and
that's a separate gap. Filing as a follow-up issue rather than
expanding scope here.
Extends `isUnpinnedCommand` to the multi-arg runner shape I'd deferred in fb56768 — `command: "yarn"` with `args: ["dlx", "pkg"]` and the npm / pnpm equivalents. Recognized executor subcommands: - npm exec | x - yarn dlx - pnpm dlx | exec | x When the command is one of the wrappers and args[0] is an executor subcommand, scan args[1..] (skipping flags) for an unpinned package name. Same `looksLikePackageName` / `hasExactVersion` check as the direct-runner path, so the heuristic stays consistent. New regression test covers all three wrappers in one fixture.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two inspection passes worth of fixes against ScopeTrail. Ten commits, atomic and individually revertable. Test count went from 36 → 49 with regression coverage for every behavior change.
The most important commit is the security fix — please skim that one first.
🔴 Security fix (high priority)
0f5ff71— BareBash/Read/Write/Editnot flagged as broad.isBroadAllowhad an asymmetry: bareWebFetch/WebSearch/Taskwere flagged, but bareBash(unlimited shell),Read/Write/Edit(unlimited filesystem) silently slipped through because the regex required parenthesized scope. A reviewer adding"allow": ["Bash"]got no ScopeTrail warning. Fixed by routing all dangerous verbs through the same bare-token check. Two regression tests — one for the gap, one assertingBash(npm test)etc. stay narrow.🟡 Detector improvements
07706c5— Codex TOML MCP servers..codex/config.toml's[mcp_servers.NAME]sections were invisible to the detector; only scalar fields (sandbox/approval/network/trust) were checked. Three new findings mirror the.mcp.jsonrisk vocabulary:codex_mcp_server_added,codex_mcp_server_command_changed,codex_unpinned_mcp_command. Shared MCP risk heuristic extracted tosrc/mcp-risk.tsso JSON and TOML detectors stay aligned. UsesparseTomlandlineOfTomlKeyfrom agent-gov-core.0aa8677— Partial Claude hook command changes.hook_command_changedpreviously only fired on 1:1 command swaps. Now also catches: a no-op appended alongside a strict guard (count grows), and one guard dropped from a multi-guard hook (count shrinks).fb56768—bunxrunner + parallel walk. Addsbunx(Bun's npx equivalent) to the unpinned-command runner list. SwitchescollectMcpSampleConfigPathstoPromise.allfor parallel subdirectory walks. Notes in commit message whyyarn dlx/npm execneed a structurally different check that's deferred.4082373—http://sample endpoints fire high.mcp_sample_remote_endpointwas fixed-medium for both schemes.http://now fireshigh("unencrypted remote endpoint") since copy-pasted samples silently downgrade users to MitM-vulnerable transport.https://stays at medium.🟢 Performance / hygiene
416e37a— Single-scan multi-format Action.action.ymlinvoked the CLI three times per run, each repeating git snapshot materialization + detector work. New--out-markdown PATH/--out-json PATHflags let one scan render all three sinks. Stdout still carries the GitHub annotations the runner expects.edfef04— Whitelist publish surface.npm packwas shipping 67 files (161 kB) includingtest/fixtures/(intentionally-risky agent configs),.github/,.claude/settings.json,.mcp.json, full TypeScriptsrc/, andtsconfig.json. For a trust-focused permission-drift reviewer, shipping the adversarial fixtures as part ofnpm installis the wrong default. Now 16 files (141 kB, mostly the demo PNG). Regression test pins what's allowed and what isn't.feadd61— Dead link fix + coverage test.docs/ADOPTION.mdlinked?template=team-adoption.ymlbut the template didn't exist. Added the template (ownership / reporting / exceptions / many-repos signal). New regression test walks every?template=*.ymlreference inREADME.md/ADOPTION.md/PILOT.md/TRUST.mdand asserts each resolves to a real file — catches the next dead link too.167a933— Pin LF line endings..gitattributeswas just* text=auto, which under Windowscore.autocrlf=truekept rewritingdist/detectors/*.jsbetween LF and CRLF and producing a false dirty working tree after a clean checkout.⚪ Already shipped
2449b84—agent-gov-core0.4.x → 0.5.0. Your existing bump; included here so the Codex MCP detector (which usesparseToml/lineOfTomlKey) builds cleanly.What I deliberately did NOT do
From the second inspection:
dlx/execto the runner list as proposed — those aren't command names.yarn dlx fooiscommand: \"yarn\", args: [\"dlx\", \"foo\"], so the proposed['npx','uvx','pipx','dlx','exec']check catches nothing real. Needs a structurally different multi-arg pattern. Noted infb56768commit message..cursorrules/.windsurfrulesdetection — legitimate missing-surface requests, but each one needs a real risk model (especially.cursorrules/.windsurfrulessince they're natural-language prompts, which would expand ScopeTrail's scope from structural config drift into prompt-content analysis — a deliberate product decision, not a code review fix). Filing as separate issues against the missing-surface template.rendersubcommand for multi-format — functionally equivalent to the--out-*flags but requires re-parsing JSON in a second CLI invocation. Flag-based approach renders from one in-memory report.Test plan
npm test— 49/49 passing locally (up from 36)npm run buildcleannpm pack --dry-runshows 16 files, no fixtures or repo dotfilesgit statusclean on fresh checkout (no CRLF flap).codex/config.toml(which has[mcp_servers.stripe-admin]with@latest)🤖 Generated with Claude Code