Skip to content

feat(sarif,detector): describe instruction/baseline kinds; stop over-flagging SHA-pinned GitHub URLs#53

Merged
Conalh merged 2 commits into
mainfrom
pr7-sarif-benchmark
May 29, 2026
Merged

feat(sarif,detector): describe instruction/baseline kinds; stop over-flagging SHA-pinned GitHub URLs#53
Conalh merged 2 commits into
mainfrom
pr7-sarif-benchmark

Conversation

@Conalh
Copy link
Copy Markdown
Owner

@Conalh Conalh commented May 29, 2026

Addresses review findings #10 and #11.

#10 — SARIF descriptions

shortDescriptionForKind covered the MCP / Claude / Codex / Aider kinds but fell back to the raw kind string for instruction and baseline-drift findings, so those rules showed up in the GitHub Security tab as policy_mesh.instructions_override_safety instead of a sentence. Added descriptions for:

  • instructions_skip_confirmation, instructions_override_safety, instructions_broad_write, instructions_auto_version_control
  • baseline_rating_drift, baseline_version_drift, baseline_parse_error

(exceptions_parse_error was already present.)

#11 — Detector precision + benchmark

Stop over-flagging SHA-pinned GitHub URLs. isUnpinnedCommand flagged every github.com URL as unpinned, including one pinned to an immutable 40-char commit SHA — which is reproducible. Now only branch / tag / HEAD URLs are flagged; a SHA-pinned URL is treated as pinned.

Benchmark expansion. Added two benign false-positive-trap fixtures + labels and regenerated RESULTS.md (now 33 cases, 9 benign, 0 false positives, 100% detection recall):

  • mcp-github-sha-pinned — SHA-pinned GitHub URL must not be mcp_unpinned.
  • mcp-absolute-script-path — absolute local script path outside the repo must not be missing_local_script (the detector already excludes absolute paths; this locks it in).

Scope note

The other adversarial cases the review lists (same-severity diff change, fix-pin inline args, recursive instruction-only package, .cursorrules, fenced-code example) belong with PRs #47 / #48 / #50 / #51: this benchmark harness only runs audit, and those fixtures/behaviours live on those branches. They're covered by unit/CLI tests in their own PRs.

Tests

Unit test for isUnpinnedCommand (SHA-pinned vs branch vs @latest); CLI test asserting instruction & baseline SARIF rules carry real descriptions. All 120 tests pass; dist/ rebuilt and committed.

🤖 Generated with Claude Code

Conalh and others added 2 commits May 29, 2026 08:56
…flagging SHA-pinned GitHub URLs

SARIF (report.ts):
- shortDescriptionForKind covered the MCP / Claude / Codex / Aider kinds but
  fell back to the raw kind string for instruction and baseline-drift
  findings. Add descriptions for instructions_skip_confirmation /
  _override_safety / _broad_write / _auto_version_control and baseline_
  rating_drift / _version_drift / _parse_error.

Detector precision (parsers/mcp.ts):
- isUnpinnedCommand flagged every github.com URL as unpinned, including one
  pinned to an immutable 40-char commit SHA. A SHA makes the install
  reproducible, so only branch / tag / HEAD URLs are now flagged.

Benchmark (finding 11): add two benign false-positive-trap fixtures + labels —
a SHA-pinned GitHub URL (must not be mcp_unpinned) and an absolute local
script path outside the repo (must not be missing_local_script) — and
regenerate RESULTS.md (33 cases, 0 false positives).

Note: the diff/fix-pin/instruction-coverage adversarial cases from the review
belong with PRs 1/2/4/5 — the benchmark harness only runs `audit`, and those
fixtures/behaviours live on those branches.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
# Conflicts:
#	test/heuristics.test.mjs
@Conalh Conalh merged commit e1e91e1 into main May 29, 2026
5 checks passed
@Conalh Conalh deleted the pr7-sarif-benchmark branch May 29, 2026 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant