Skip to content

fix/reliability-safety-batch: close tier-1 reliability/safety cluster on the path to v2.0#32

Merged
vedanthvdev merged 1 commit intomasterfrom
fix/reliability-safety-batch
Apr 22, 2026
Merged

fix/reliability-safety-batch: close tier-1 reliability/safety cluster on the path to v2.0#32
vedanthvdev merged 1 commit intomasterfrom
fix/reliability-safety-batch

Conversation

@vedanthvdev
Copy link
Copy Markdown
Owner

Summary

Fifth of seven PRs on the path to v2.0. v1.9.20 closed the tier-1 correctness cluster (nine silent-drop bugs); this PR (v1.9.21) closes the tier-1 reliability and safety cluster from the same multi-reviewer sweep — issues that don't silently drop tests but weaken the merge-gate contract or the hardening work that shipped in v1.9.19.

Four production fixes, plus the review-found follow-ups applied before this PR opened.

The four fixes

1. File-level symlink rejection in SourceFileScanner

v1.9.19 hardened the directory leg via stayInsideProjectRoot + toRealPath, which closes src/main/java -> / planted by an attacker MR. The per-file leg of the same attack — committing src/main/java/com/x/Leak.java -> /etc/passwd — slid past because Files.walkFileTree still called visitFile on the symlink and the visitor didn't check attrs.isSymbolicLink(). The file would then be parsed by JavaParser, with its contents potentially surfaced in --explain samples or parse-failure WARN output.

Both file-walk visitors in the scanner (collectJavaFiles, walkFqnsUnder) now skip any entry flagged as a symlink unconditionally. This is deliberately stricter than the directory leg: a per-file toRealPath on every .java entry would add a syscall to the hot path for negligible benefit, and the usability cost of dropping intra-repo file symlinks is low (Git doesn't preserve hard links across clone either, so tooling that relies on canonical-reference aliasing already has to handle the regular-file case). The scanner javadoc documents the asymmetry so a future contributor isn't surprised.

Regression: SourceFileScannerTest#rejectsFileLevelSymlinkEscapingProjectRoot and #scansTestFqnsIgnoresFileLevelSymlinks both fail when the production guard is reverted, covering both the collectTestFiles/collectSourceFiles path and the scanTestFqns path so the two halves of discovery can't diverge on what counts as a valid source file.

2. Per-subtree IOException no longer truncates the whole walk

The default SimpleFileVisitor.visitFileFailed rethrows, and the scanner's outer IOException catch swallowed the remaining walk silently. A single unreadable nested module (permission denied, file-deleted-under-us race, filesystem loop) took every readable sibling with it — the operator got a generic "error searching for X" WARN and no way to know which subtree vanished or which tests were under-selected because of it.

All three walkers (collectJavaFiles, walkFqnsUnder, findAllMatchingDirs) now override visitFileFailed to log the unreadable entry at WARN via LogSanitizer and continue. Default-visible surfacing is the load-bearing change: operators can now see when a scan ran incomplete, where previously they'd silently under-select tests and have no way to correlate it with CI config drift.

No dedicated regression test — inducing a portable per-entry IOException requires either root-evadable chmod 000 semantics (fails on CI runners running as root) or heavyweight filesystem mocking. The fix is evident in the diff and backed by the 11 existing walk tests in SourceFileScannerTest.

3. AffectedTestsConfig.Builder#isAcceptableBaseRef rejects control characters at the entry gate

The rev-expression pattern used @\{[^}]+\} for the reflog segment — [^}] happily matched newline, ESC, CSI, DEL. A baseRef sourced from an attacker-controlled CI env var like

master@{1\n\u001b[2JAffected Tests: SELECTED\u001b[m}

previously passed validation and then flowed unsanitised into log.info(\"Base ref: {}\", ...) in AffectedTestsEngine and into three IllegalStateException messages in GitChangeDetector (which Gradle renders verbatim into the build log). An attacker who could poison CI_BASE_REF (or any equivalent) could forge plugin-branded status lines into CI output and defeat grep-based merge gates.

The fix is a single blanket containsControlChars check before any of the downstream matchers run — C0 (0x00..0x1F), DEL (0x7F), and C1 (0x80..0x9F) are all rejected, closing every regex path at once without depending on getting each hand-crafted character class right.

Regression: AffectedTestsConfigTest#baseRefRejectsControlCharsInReflogSuffix covers the newline-in-reflog shape, bare ESC/DEL, and CRLF anywhere in the input; each case fails when the entry-gate check is removed.

4. LogSanitizer coverage extended to every remaining diff-sourced log site

v1.9.19 introduced the sanitizer and applied it at four sites; the review found ~28 additional sites passing raw filenames, FQNs, or exception messages straight to the logger — most at DEBUG, several at default-visible WARN, and two inside IllegalStateException messages that Gradle renders verbatim into the build log. An attacker-crafted filename or a poisoned JavaParser diagnostic carrying \u001b[2J...\u001b[m could still forge ops-visible log lines.

All remaining sites now route through LogSanitizer.sanitize:

  • JavaParsers.parseOrWarn — WARN + DEBUG
  • ImplementationStrategy.supertypesOf — WARN + per-depth match trace (DEBUG)
  • PathToClassMapper — eight classification DEBUG lines
  • UsageStrategy — seven tier-match DEBUG lines
  • NamingConventionStrategy — match DEBUG line
  • GitChangeDetector — per-file changed-entry DEBUG line + the IOException/GitAPIException wrappers it throws (JGit exception messages carry pack-file paths and ref segments that are attacker-influenceable on an MR branch)
  • AffectedTestsEngine — FQN-has-no-file DEBUG line
  • SourceFileScanner.findAllMatchingDirs — trailing DEBUG summary (brought into parity with its WARN path)
  • AffectedTestsConfig.Builder#baseRef — the offending value echoed inside the IllegalArgumentException thrown when validation fails (otherwise the reject branch would reopen the same surface containsControlChars closes on the accept branch)

Even DEBUG sites are sanitised because operators who bump the log level to chase a false-positive selection are exactly the audience an attacker would target — the log-forgery contract has to hold at every visibility level, not just WARN/INFO.

Regression: LogSanitizerTest covers the escape table; baseRefValidationErrorDoesNotEchoRawControlChars pins the newly-sanitised exception-message path and fails on revert of the LogSanitizer.sanitize wrap.

Review-applied follow-ups (before opening this PR)

A pre-merge code-reviewer pass against the initial diff surfaced two tier-2 gaps and four tier-3 cleanups, all applied in this PR:

  • T2-1: The IllegalArgumentException thrown by Builder#baseRef was echoing the raw tainted input back into its message — same forgery surface, just the reject branch. Wrapped via LogSanitizer.sanitize + added baseRefValidationErrorDoesNotEchoRawControlChars.
  • T2-2: GitChangeDetector's IOException and GitAPIException catch-blocks concatenated e.getMessage() raw into IllegalStateException messages. JGit's exception strings embed pack-file paths, ref names, and (on PackInvalidException / CorruptObjectException) raw object-database bytes — all attacker-influenceable. Both sites now wrap e.getMessage() through LogSanitizer.sanitize.
  • T3-1: Softened the SourceFileScanner javadoc which over-claimed that files "should just exist as regular files." Now documents the directory-vs-file asymmetry and the cost/benefit rationale.
  • T3-3: findAllMatchingDirs's trailing DEBUG was inconsistent with the WARN five lines above it — now sanitised at parity.
  • T3-6: CHANGELOG site counts corrected (PathToClassMapper has 8 sanitised sites, UsageStrategy has 7; total across the batch is ~28 sites, not ~20 as originally stated).
  • T3-7: isAcceptableBaseRef javadoc now explicitly notes the SHORT_SHA short-circuit bypasses isValidRefName but is safe by construction (hex-only class excludes . and /).

Deferred to batch 6 (v1.9.22)

Two related findings from the same review are intentionally deferred so each PR stays scoped and independently bisectable:

Test plan

  • ./gradlew :affected-tests-core:check — green
  • baseRefRejectsControlCharsInReflogSuffix fails on revert of containsControlChars
  • rejectsFileLevelSymlinkEscapingProjectRoot + scansTestFqnsIgnoresFileLevelSymlinks fail on revert of the symlink guard
  • baseRefValidationErrorDoesNotEchoRawControlChars fails on revert of the LogSanitizer.sanitize wrap in the IAE message
  • No new log.(warn|info|debug) sites remain that embed diff-sourced input raw (manual sweep across affected-tests-core/src/main/java)

Path to v2.0

  • ✅ v1.9.19 — log-forgery sanitiser + symlink/ref-name hardening (shipped)
  • ✅ v1.9.20 — tier-1 correctness cluster, 9 silent-drop bugs (shipped)
  • 🚢 v1.9.21 — this PR, tier-1 reliability/safety cluster
  • ⏭ v1.9.22 — batch 6: Fix release workflow credential handling #9 ProjectIndex parse-escalation + Prepare for Gradle Plugin Portal review (v1.9.0) #11 gradlew timeout
  • ⏭ v1.9.23 — batch 7: tier-2/3 cleanup + anything surfaced from SS pilot re-test
  • ⏭ v2.0.0 — cut release, drop deprecated flags, update docs to v2-only

… on the path to v2.0

This batch closes the tier-1 reliability/safety findings from the most recent
multi-reviewer sweep — the class of issues that don't silently drop tests but
weaken the merge-gate contract and the hardening work that shipped in v1.9.19.
Four production fixes plus the review-found follow-ups applied in this PR.

1. SourceFileScanner now rejects file-level symlinks during the walk. v1.9.19
   hardened the directory leg via stayInsideProjectRoot + toRealPath, which
   closes `src/main/java -> /` planted by an attacker MR. The per-file leg of
   the same attack — committing `src/main/java/com/x/Leak.java -> /etc/passwd`
   — slid past that guard because walkFileTree still called visitFile on the
   symlink and the visitor didn't check attrs.isSymbolicLink(). The two
   relevant visitors (collectJavaFiles, walkFqnsUnder) now skip any symlink
   unconditionally. The policy is deliberately stricter than the directory
   leg: we don't do per-file toRealPath on every .java entry because the
   syscall cost on the hot path isn't worth the usability gain, and the
   javadoc documents that intra-repo file symlinks are not traversed.
   Regression: SourceFileScannerTest#rejectsFileLevelSymlinkEscapingProjectRoot
   and #scansTestFqnsIgnoresFileLevelSymlinks both fail when the guard is
   reverted, covering both halves of discovery so they can never diverge on
   what counts as a valid source file.

2. Per-subtree IOExceptions no longer truncate the whole walk. The default
   SimpleFileVisitor.visitFileFailed rethrows, and the scanner's outer catch
   swallowed the remaining walk silently — a single unreadable nested module
   (permission denied, file-deleted-under-us race, filesystem loop) took
   every readable sibling with it. All three walkers (collectJavaFiles,
   walkFqnsUnder, findAllMatchingDirs) now override visitFileFailed to log
   the unreadable entry at WARN via LogSanitizer and continue. Default-visible
   surfacing is the load-bearing change: operators can now see when a scan
   ran incomplete where previously they'd silently under-select tests and
   have no way to correlate it with CI config drift.

3. AffectedTestsConfig.Builder#isAcceptableBaseRef now rejects control
   characters at the entry gate. The REV_EXPR pattern used `@\{[^}]+\}` for
   the reflog segment — `[^}]` happily matched newline, ESC, CSI, DEL. A
   baseRef sourced from an attacker-controlled CI env var like
   `master@{1\n\u001b[2JAffected Tests: SELECTED\u001b[m}` previously passed
   validation and flowed unsanitised into log.info("Base ref: {}", …) in
   AffectedTestsEngine and into three IllegalStateException messages in
   GitChangeDetector — an attacker who could poison CI_BASE_REF could forge
   plugin-branded status lines and defeat grep-based merge gates. The fix is
   a single blanket containsControlChars check before any downstream matcher
   runs (C0, DEL, and C1 all rejected), closing every regex path at once
   without depending on getting each hand-crafted character class right.
   Regression: AffectedTestsConfigTest#baseRefRejectsControlCharsInReflogSuffix
   covers newline-in-reflog, bare ESC/DEL, and CRLF anywhere.

4. LogSanitizer coverage extended to every remaining diff-sourced log site.
   v1.9.19 introduced the sanitizer and applied it at four sites; the review
   found ~28 additional sites passing raw filenames, FQNs, or exception
   messages straight to the logger — most at DEBUG, several at default-
   visible WARN, and two inside IllegalStateException messages that Gradle
   renders verbatim into the build log. All remaining sites now route through
   LogSanitizer.sanitize: JavaParsers.parseOrWarn (WARN+DEBUG),
   ImplementationStrategy.supertypesOf (WARN+DEBUG trace), PathToClassMapper's
   eight classification DEBUG lines, UsageStrategy's seven tier-match DEBUG
   lines, NamingConventionStrategy's match DEBUG line, GitChangeDetector's
   per-file DEBUG line plus the IOException/GitAPIException wrappers it
   throws (JGit exception messages can carry attacker-influenced pack-file
   paths and ref segments), AffectedTestsEngine's FQN-has-no-file DEBUG line,
   SourceFileScanner.findAllMatchingDirs's trailing DEBUG summary, and the
   value echoed inside the IllegalArgumentException thrown by
   AffectedTestsConfig.Builder#baseRef when validation fails (otherwise the
   reject branch would reopen the same surface containsControlChars closes
   on the accept branch). Even DEBUG sites are sanitised because operators
   who bump the log level to chase a false-positive selection are exactly
   the audience an attacker would target — the log-forgery contract has to
   hold at every visibility level. Regression:
   baseRefValidationErrorDoesNotEchoRawControlChars pins the newly-sanitised
   exception-message path; LogSanitizerTest covers the escape table.

Two related findings are intentionally deferred to batch 6 (v1.9.22) so each
PR stays scoped and independently bisectable: #9 (ProjectIndex parse failures
should signal discovery-incomplete to the engine so the safety net can
escalate explicitly) and #11 (configurable timeout on the nested ./gradlew
invocation so a hung child test cannot pin a CI worker indefinitely).
@vedanthvdev vedanthvdev merged commit 8a5c68f into master Apr 22, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant