Skip to content

Address two inspection passes: security, detectors, hygiene#41

Merged
Conalh merged 11 commits into
mainfrom
inspector-fixes
May 22, 2026
Merged

Address two inspection passes: security, detectors, hygiene#41
Conalh merged 11 commits into
mainfrom
inspector-fixes

Conversation

@Conalh
Copy link
Copy Markdown
Owner

@Conalh Conalh commented May 22, 2026

Summary

Two inspection passes worth of fixes against ScopeTrail. Ten commits, atomic and individually revertable. Test count went from 36 → 49 with regression coverage for every behavior change.

The most important commit is the security fix — please skim that one first.

🔴 Security fix (high priority)

  • 0f5ff71 — Bare Bash/Read/Write/Edit not flagged as broad. isBroadAllow had an asymmetry: bare WebFetch/WebSearch/Task were flagged, but bare Bash (unlimited shell), Read/Write/Edit (unlimited filesystem) silently slipped through because the regex required parenthesized scope. A reviewer adding "allow": ["Bash"] got no ScopeTrail warning. Fixed by routing all dangerous verbs through the same bare-token check. Two regression tests — one for the gap, one asserting Bash(npm test) etc. stay narrow.

🟡 Detector improvements

  • 07706c5 — Codex TOML MCP servers. .codex/config.toml's [mcp_servers.NAME] sections were invisible to the detector; only scalar fields (sandbox/approval/network/trust) were checked. Three new findings mirror the .mcp.json risk vocabulary: codex_mcp_server_added, codex_mcp_server_command_changed, codex_unpinned_mcp_command. Shared MCP risk heuristic extracted to src/mcp-risk.ts so JSON and TOML detectors stay aligned. Uses parseToml and lineOfTomlKey from agent-gov-core.
  • 0aa8677 — Partial Claude hook command changes. hook_command_changed previously only fired on 1:1 command swaps. Now also catches: a no-op appended alongside a strict guard (count grows), and one guard dropped from a multi-guard hook (count shrinks).
  • fb56768bunx runner + parallel walk. Adds bunx (Bun's npx equivalent) to the unpinned-command runner list. Switches collectMcpSampleConfigPaths to Promise.all for parallel subdirectory walks. Notes in commit message why yarn dlx / npm exec need a structurally different check that's deferred.
  • 4082373http:// sample endpoints fire high. mcp_sample_remote_endpoint was fixed-medium for both schemes. http:// now fires high ("unencrypted remote endpoint") since copy-pasted samples silently downgrade users to MitM-vulnerable transport. https:// stays at medium.

🟢 Performance / hygiene

  • 416e37a — Single-scan multi-format Action. action.yml invoked the CLI three times per run, each repeating git snapshot materialization + detector work. New --out-markdown PATH / --out-json PATH flags let one scan render all three sinks. Stdout still carries the GitHub annotations the runner expects.
  • edfef04 — Whitelist publish surface. npm pack was shipping 67 files (161 kB) including test/fixtures/ (intentionally-risky agent configs), .github/, .claude/settings.json, .mcp.json, full TypeScript src/, and tsconfig.json. For a trust-focused permission-drift reviewer, shipping the adversarial fixtures as part of npm install is the wrong default. Now 16 files (141 kB, mostly the demo PNG). Regression test pins what's allowed and what isn't.
  • feadd61 — Dead link fix + coverage test. docs/ADOPTION.md linked ?template=team-adoption.yml but the template didn't exist. Added the template (ownership / reporting / exceptions / many-repos signal). New regression test walks every ?template=*.yml reference in README.md/ADOPTION.md/PILOT.md/TRUST.md and asserts each resolves to a real file — catches the next dead link too.
  • 167a933 — Pin LF line endings. .gitattributes was just * text=auto, which under Windows core.autocrlf=true kept rewriting dist/detectors/*.js between LF and CRLF and producing a false dirty working tree after a clean checkout.

⚪ Already shipped

  • 2449b84agent-gov-core 0.4.x → 0.5.0. Your existing bump; included here so the Codex MCP detector (which uses parseToml/lineOfTomlKey) builds cleanly.

What I deliberately did NOT do

From the second inspection:

  • Adding dlx/exec to the runner list as proposed — those aren't command names. yarn dlx foo is command: \"yarn\", args: [\"dlx\", \"foo\"], so the proposed ['npx','uvx','pipx','dlx','exec'] check catches nothing real. Needs a structurally different multi-arg pattern. Noted in fb56768 commit message.
  • Aider / .cursorrules / .windsurfrules detection — legitimate missing-surface requests, but each one needs a real risk model (especially .cursorrules/.windsurfrules since they're natural-language prompts, which would expand ScopeTrail's scope from structural config drift into prompt-content analysis — a deliberate product decision, not a code review fix). Filing as separate issues against the missing-surface template.
  • A render subcommand for multi-format — functionally equivalent to the --out-* flags but requires re-parsing JSON in a second CLI invocation. Flag-based approach renders from one in-memory report.

Test plan

  • npm test — 49/49 passing locally (up from 36)
  • npm run build clean
  • npm pack --dry-run shows 16 files, no fixtures or repo dotfiles
  • git status clean on fresh checkout (no CRLF flap)
  • CI green on PR
  • Action dogfood run on this PR shows the new findings against the repo's own .codex/config.toml (which has [mcp_servers.stripe-admin] with @latest)

🤖 Generated with Claude Code

Conalh added 11 commits May 22, 2026 15:40
Previously .gitattributes only declared `* text=auto`, which under
core.autocrlf=true on Windows kept rewriting dist/detectors/*.js
between LF and CRLF and producing a false dirty working tree after a
clean checkout. Pin LF explicitly for source, generated JS, configs,
and docs so the dirty state stops returning.
Before: `npm pack` shipped 67 files (161 kB) including test/fixtures
of intentionally-risky agent configs, the dogfood .claude/settings.json
and .mcp.json, .github/ workflows, full TypeScript src/, and tsconfig.
For a trust-focused permission-drift reviewer this is the wrong
default — every consumer was effectively `npm install`-ing the
adversarial examples we use to test the detectors.

Add a `files` whitelist that ships only dist/, action.yml, docs/,
the README screenshot, README, and LICENSE. New tarball is 16 files
(141 kB, almost entirely the PNG). A regression test asserts the
runtime entry points stay present and the previously-leaked surfaces
(.github, .claude, .codex, .cursor, .vscode, src, test, .mcp.json,
tsconfig.json, repo-only dotfiles) cannot creep back in.
docs/ADOPTION.md step 7 advertised a `?template=team-adoption.yml`
feedback channel, but no such template existed under
.github/ISSUE_TEMPLATE/ — GitHub silently degraded the link to a
generic new-issue page. The previous issue-template test only
asserted the templates that already shipped, so it couldn't catch
the missing one.

Add the team-adoption template covering ownership, reporting,
exceptions, and many-repository signals (the four themes ADOPTION.md
already promises) and a new regression test that walks every
`?template=*.yml` reference in README.md, ADOPTION.md, PILOT.md, and
TRUST.md and asserts each one resolves to a real file. This catches
any future doc that promises a template we don't actually ship.
`.codex/config.toml` carries the same `[mcp_servers.NAME]` shape that
ScopeTrail already flags in `.mcp.json`, but the Codex detector
previously only looked at scalar fields (sandbox, approval, network,
project trust). A Codex user could drop in a `[mcp_servers.stripe-admin]`
section with `args = ["-y", "@vendor/stripe-mcp@latest"]` and the
unpinned-command risk model never saw it — exactly the gap the
inspection caught against this repo's own .codex/config.toml dogfood
example.

Extract the MCP command-risk heuristic (`isUnpinnedCommand`,
`serverCommand`, `isPipeToShellCommand`) into a shared `src/mcp-risk.ts`
so the JSON and TOML detectors stay aligned as the risk model evolves.

Add three new findings, mirroring the .mcp.json risk vocabulary:
- `scope_trail.codex_mcp_server_added` (high)
- `scope_trail.codex_mcp_server_command_changed` (medium)
- `scope_trail.codex_unpinned_mcp_command` (high)

Use agent-gov-core's `parseToml` and `lineOfTomlKey` so findings point
at the `command`/`args` line inside the `[mcp_servers.NAME]` table.

Also:
- Update .gitignore so test/fixtures/**/.codex/ stays trackable
  while the top-level dogfood .codex/ remains ignored (matches the
  long-standing exception the existing codex-config-drift fixture
  relied on via `git add -f`).
- Drop the DEP0190 deprecation warning from package-surface.test.mjs
  by routing the Windows npm.cmd invocation through cmd.exe instead
  of asking execFile to use the shell.
`hook_command_changed` previously only fired when an existing hook
had the same number of commands but at least one was new — i.e., a
1:1 swap. That left two real weakening cases silent:

- Appending a no-op alongside a strict guard (count grows).
- Removing one guard from a multi-guard hook (count shrinks).

Both are policy events the reviewer needs to see, but the
`newCommands.size === oldCommands.size` guard skipped them.

Replace the size-equality check with a symmetric-difference check:
fire when any command was added *or* removed. Build a message that
names the deltas so reviewers can see what changed without opening
the diff. Add three regression tests covering the append, partial-
removal, and unchanged-set cases.
Picks up v0.5.0: tokenizeShellDeep, ConfigParseError, generateWorkflowSummary,
plus 10 pre-publish correctness fixes (JSON locator value-vs-key disambiguation,
multi-line TOML escape tracking, dotted-key whitespace tolerance, line-ending
backslash trim, boolean flag pairing, Windows exe case-fold + runtime path
de-noise, HTML escape in summaries) and 11 golden compatibility tests pinning
fingerprint/normalization output before v1.0 freeze.

The path-denoise fix is especially relevant here: /usr/bin/node, node, and
node.exe now produce identical canonical strings, closing a PolicyMesh
mcp_command_mismatch false-positive class — but ScopeTrail benefits indirectly
since the same shared normalizeMcpCommand is used for unpinned-command detection.
action.yml previously invoked the CLI three times per pull request —
once with `--format markdown`, once with `--format json`, once with
`--format github`. In git mode each call re-materialized both the
base and head snapshots and re-ran every detector, tripling the
work for no extra coverage.

Add two CLI flags that let one scan render to multiple sinks:

  --out-markdown PATH   Write the markdown report to a file.
  --out-json PATH       Write the JSON report to a file.

`--format` continues to control stdout, so the existing CLI surface
is backward compatible. The Action now runs:

  scopetrail diff --format github \
    --out-markdown $report_file --out-json $json_file

Stdout streams the GitHub annotations the runner picks up, while the
two file outputs feed the step summary append and the rating /
finding-count output parsing. A separate `cat "$report_file"` keeps
the markdown report visible in the Action log for parity with the
prior `tee`.

Tests:
- New action-metadata assertion: exactly one `node $GITHUB_ACTION_PATH/dist/index.js`
  invocation appears in action.yml.
- Existing `diff --repo` regex relaxed to tolerate the line
  continuation between `diff` and `--repo`.
- New CLI test exercises `--out-markdown` + `--out-json` + `--format github`
  in one call and asserts both files match their single-format
  equivalents.
isBroadAllow had an asymmetry that silently allowed the most
dangerous permissions:

- Bare "WebFetch", "WebSearch", "Task" → correctly flagged as broad.
- Bare "Bash", "Read", "Write", "Edit" → returned false.

The bash/read/write/edit branches were regex-based and required a
parenthesized scope (`Bash(npm *)` or `Read(~/**)`). A bare token
like `"Bash"` — which grants unlimited shell execution without any
prompt — had no parentheses, didn't match, and slipped through with
no finding. Same for bare `"Read"`/`"Write"`/`"Edit"`, which grant
unlimited filesystem access.

Move all four into the same `isBroadVerbGrant` check that already
handled WebFetch/WebSearch/Task. The bare-token + wildcard-scope
detection now applies uniformly across every dangerous verb. Keep
the rooted-path regex (`/\b(read|write|edit)\((~|...)\)/`) since it
catches `Read(/etc/passwd)` shapes that don't contain `*` — those
aren't bare and the wildcard branch can't see them.

Two new regression tests:
- Asserts bare Bash/Read/Write/Edit are flagged broad.
- Asserts narrowly-scoped Bash(npm test), Read(./src/foo.txt) etc.
  stay narrow — the fix must not over-fire on legitimate scoping.
Two small improvements bundled because they touch the same MCP
detection code path.

1. `collectMcpSampleConfigPaths` walked subdirectories sequentially
   via `await` inside a `for` loop. Switch to `Promise.all` over
   `entries.map`. Each `readdir` is independent and `paths` is a
   Set mutated from the same event loop, so mutations are race-free
   in single-threaded Node. Caller already sorts the result, so
   insertion order doesn't matter.

   Practical impact is modest — the heavy hitters (node_modules,
   .git, dist, build, coverage, .next, .turbo) are already in
   IGNORED_SAMPLE_SCAN_DIRS and the GitHub Action walks a snapshot
   tarball that only contains the detector-relevant paths. But
   local `--repo .` runs on big monorepos benefit, and the change
   is free.

2. Add `bunx` to the npx/uvx/pipx runner list in mcp-risk.ts.
   `bunx` is Bun's npx equivalent and ships as its own binary, so
   it surfaces as `command: "bunx"` in MCP configs — directly
   analogous to npx. Add a regression test.

Note on `yarn dlx` / `npm exec` / `pnpm dlx` — those are not added
here because they have `command: "yarn"` (etc.) with the subcommand
in args[0]. The current heuristic checks `spec.command` against a
runner list, which is structurally the wrong shape for those. A
proper fix would need to also check args[0] against subcommand
names, then scan args[1..] for packages. Left out deliberately
rather than shipping a wrong-shape change that catches nothing.
mcp_sample_remote_endpoint fired at fixed medium severity for both
http:// and https:// endpoints. An unencrypted http:// MCP endpoint
in a sample config is a different shape of risk: anyone who copies
the sample inherits a MitM-vulnerable connection.

Split severity by scheme:
- http://*   → high  ("unencrypted remote endpoint")
- https://*  → medium ("remote endpoint", unchanged copy-paste check)

Message and recommendation are scheme-aware too — http:// findings
specifically call out that copy-pasted samples should not silently
downgrade users to unencrypted transport.

Existing tests covering https:// sample endpoints continue to assert
medium severity (no fixture changes needed — all existing fixtures
already used https). New regression test covers both schemes in one
fixture to lock the split.

Note: this is sample-only. The active .mcp.json detector doesn't
currently flag any remote endpoints — http:// or https:// — and
that's a separate gap. Filing as a follow-up issue rather than
expanding scope here.
Extends `isUnpinnedCommand` to the multi-arg runner shape I'd
deferred in fb56768 — `command: "yarn"` with `args: ["dlx", "pkg"]`
and the npm / pnpm equivalents.

Recognized executor subcommands:
- npm  exec | x
- yarn dlx
- pnpm dlx | exec | x

When the command is one of the wrappers and args[0] is an executor
subcommand, scan args[1..] (skipping flags) for an unpinned package
name. Same `looksLikePackageName` / `hasExactVersion` check as the
direct-runner path, so the heuristic stays consistent.

New regression test covers all three wrappers in one fixture.
@Conalh Conalh merged commit 5c02bed into main May 22, 2026
3 checks passed
@Conalh Conalh deleted the inspector-fixes branch May 22, 2026 23:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant