Skip to content

Fix scan fail-on bypass via repository-controlled suppressions#25

Closed
tachyon-beep wants to merge 1 commit into
mainfrom
codex/fix-vulnerability-bypassing-scan-fail-on-gate
Closed

Fix scan fail-on bypass via repository-controlled suppressions#25
tachyon-beep wants to merge 1 commit into
mainfrom
codex/fix-vulnerability-bypassing-scan-fail-on-gate

Conversation

@tachyon-beep

Copy link
Copy Markdown
Collaborator

Motivation

  • The CLI/MCP scan flow applied repository-supplied baseline/waiver/judged suppressions to emitted findings before evaluating the --fail-on gate, allowing an attacker-controlled repo entry to hide defects from CI gates.
  • The intent is to keep human-visible annotations (baselined/waived/judged) in scan output while ensuring CI gating only considers a trusted unsuppressed population.

Description

  • Introduce a separate unsuppressed gate population ScanResult.gate_findings that is derived from the raw (pre-suppression) findings so gates evaluate an unsuppressed view while emitted findings remain annotated (file: src/wardline/core/run.py).
  • Apply --new-since delta scoping to both the emitted findings and the separate gate population so operator scoping still scopes the gate but repository suppressions do not (file: src/wardline/core/run.py).
  • Change gate_decision to evaluate against the gate_findings population when present, falling back to result.findings otherwise (file: src/wardline/core/run.py).
  • Update unit/regression tests to assert that baselined/waived/judged annotations remain visible in output but no longer clear the --fail-on gate (files: tests/unit/core/test_run.py, tests/unit/cli/test_cli.py).

Testing

  • Ran static/style checks with uv run ruff check and Python compilation with python -m py_compile, both succeeded.
  • Executed focused unit tests including tests/unit/core/test_run.py::test_gate_decision_uses_unsuppressed_gate_population and tests/unit/core/test_suppression.py::test_gate_trips_only_on_active_defect_at_or_above_threshold, which passed.
  • Attempted broader test runs that exercise CLI/MCP parity (--extra scanner) but some runs were blocked by dependency downloads (PyYAML / click) due to external network/tunnel failures and thus could not complete; this is an environment issue and not a code regression.
  • Local test adjustments were added to assert the new gate semantics and the updated test subset ran green in the available environment.

Codex Task

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bd64010099

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/wardline/core/run.py
findings = apply_suppressions(raw, baseline, waivers, today=date.today(), judged=judged)
# Keep a separate gate population that applies only operator-supplied scan
# scoping (for example --new-since), not repository-controlled suppressions.
gate_findings = list(raw)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve line-less defect quarantine in gate population

When a rule or local pack emits a normal-file DEFECT with line_start=None, apply_suppressions intentionally replaces it with a WLN-ENGINE-LINELESS-DEFECT FACT so the invalid, collision-prone defect is not treated as an active finding. Building gate_findings directly from raw bypasses that tool-owned invariant, so scan --fail-on can now fail on a defect that is omitted from the emitted findings and was previously quarantined for fingerprint-safety reasons; only repository-controlled baseline/waiver/judged suppressions should be skipped here.

Useful? React with 👍 / 👎.

Comment thread src/wardline/core/run.py
if fail_on is None:
return GateDecision(tripped=False, fail_on=None, exit_class=0)
tripped = gate_trips(result.findings, fail_on)
gate_findings = result.gate_findings or result.findings

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Surface explanations for gate-only suppressed defects

When the only above-threshold defect is baselined, waived, or judged, this switch makes gate.tripped true while the MCP response still emits that finding as suppressed; _scan only attaches inline explanations for SuppressionState.ACTIVE findings (src/wardline/mcp/server.py:160-164). Agents using scan with explain: true and fail_on can therefore receive a failing gate with no explained active defect to fix, so the gate-relevant population needs to be exposed/explained separately or the emitted finding needs a gate-relevant marker.

Useful? React with 👍 / 👎.

Comment thread src/wardline/core/run.py
findings = apply_suppressions(raw, baseline, waivers, today=date.today(), judged=judged)
# Keep a separate gate population that applies only operator-supplied scan
# scoping (for example --new-since), not repository-controlled suppressions.
gate_findings = list(raw)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor waivers from explicit operator configs

This treats every waiver as untrusted by gating on the completely raw findings, but run_scan(..., config_path=...) and the CLI --config path can be an operator-supplied config outside the repository. In that CI setup, centrally managed waivers in the explicit config are no longer honored by --fail-on, even though the nearby config handling explicitly preserves operator-provided policy; only waivers loaded from the repository's default wardline.yaml should be excluded from the gate population.

Useful? React with 👍 / 👎.

tachyon-beep added a commit that referenced this pull request Jun 5, 2026
…ail-on) (#28)

Close a HIGH-severity CI-gate bypass. `wardline scan --fail-on` applied
repository-controlled suppressions (`.wardline/baseline.yaml`, `wardline.yaml`
waivers, `.wardline/judged.yaml`) to findings BEFORE evaluating the gate, so a
malicious PR could commit a suppression keyed to its own new defect's
fingerprint and clear the gate. All three sources are committed repo content
and equally exploitable. Reproduced live (baselining the sole ERROR zeroed the
gate).

Secure-by-default model (combines #24 + #25):
- `gate_decision` now evaluates a separate UNSUPPRESSED population
  (`ScanResult.gate_findings`). baseline/waiver/judged still ANNOTATE the
  emitted findings (`suppressed=…` stays visible) but no longer clear the gate.
- The gate population is built with apply_suppressions over EMPTY baseline +
  waivers + judged, NOT `list(raw)`, so the lineless-DEFECT→non-gating-FACT
  downgrade is preserved (no spurious gate trips).
- `--new-since <ref>` (operator-supplied, unforgeable) scopes BOTH the emitted
  and gate populations — the secure CI ratchet.
- `--trust-suppressions` (CLI) / `trust_suppressions` (run_scan, MCP scan tool),
  default False, restores the local ratchet / judge DX for trusted checkouts
  (None sentinel → gate falls back to the suppressed findings). `run_judge`
  passes True so judge/triage/persist are unchanged.
- `load_judged` now requires `verdict: FALSE_POSITIVE` (rejects a hand-edited
  TRUE_POSITIVE / missing verdict smuggled in as a silent suppression).

BREAKING (noted in CHANGELOG, acceptable at 0.x): baseline-gated CI goes
green→red on upgrade until `--new-since` or `--trust-suppressions` is added.
Docs updated (suppression.md): the secure CI ratchet is `--new-since`.

Combines and supersedes #24 (judged-only) and #25 (no escape hatch + a
lineless-DEFECT gate bug). Full suite green (2394 passed), ruff + mypy clean.

Co-authored-by: John Morrissey <john@wardline.dev>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@tachyon-beep

Copy link
Copy Markdown
Collaborator Author

Fixed ourselves in 16a4d005 (PR #28, merged to main) — your mechanism, two bugs fixed, plus an escape hatch.

The vulnerability is real and HIGH-severity, and your scope was right: all three repo-controlled suppression sources (baseline / waiver / judged) must stop clearing the --fail-on gate. #28 adopts your core design — a separate ScanResult.gate_findings unsuppressed population that gate_decision evaluates, with --new-since scoping both populations.

We wrote our own rather than merge this directly because of two issues:

  1. Broke the documented baseline ratchet with no escape hatch. Gating over the full unsuppressed set means scan --fail-on flips green→red for every pre-existing baselined finding on upgrade — and suppression.md documents the baseline as 'the gate fires only on findings after the snapshot'. fix(scan): gate on the unsuppressed population by default (secure --fail-on) — supersedes #24, #25 #28 keeps the secure default but adds --trust-suppressions (default off) to restore the local ratchet for trusted checkouts, threads it through CLI + MCP + run_judge, documents --new-since as the secure CI ratchet, and ships a BREAKING CHANGELOG note (acceptable at 0.x). This was a product decision; the maintainer chose the full secure-by-default + opt-in model.
  2. Lineless-DEFECT gate bug. gate_findings = list(raw) includes raw lineless DEFECTs that apply_suppressions deliberately downgrades to non-gating FACTs (fingerprint-collision safety), so the gate could trip where main wouldn't. fix(scan): gate on the unsuppressed population by default (secure --fail-on) — supersedes #24, #25 #28 builds the gate population as apply_suppressions(raw, Baseline(frozenset()), WaiverSet([]), judged=None) — zero suppression, but the same structural transforms — and adds a regression guard.

Also folded in #24's verdict: FALSE_POSITIVE hardening, and used a None sentinel for gate_findings (not a falsy-empty fallback) so a legitimately-empty gate population can't silently fall back to the suppressed set.

Verified: full suite 2406 passed, ruff + mypy clean; live repro shows a baselined ERROR trips the gate by default and clears under --trust-suppressions; existing gate-clears tests flipped to the secure default with --trust-suppressions variants. Thanks for the headline find — closing in favour of #28.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant