From c1c4a0889e64a58e3d614fd0112fa4bd0da28097 Mon Sep 17 00:00:00 2001 From: vedanthvasudev Date: Wed, 22 Apr 2026 22:01:25 +0100 Subject: [PATCH] fix/reliability-safety-batch: close tier-1 reliability/safety cluster on the path to v2.0 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This batch closes the tier-1 reliability/safety findings from the most recent multi-reviewer sweep — the class of issues that don't silently drop tests but weaken the merge-gate contract and the hardening work that shipped in v1.9.19. Four production fixes plus the review-found follow-ups applied in this PR. 1. SourceFileScanner now rejects file-level symlinks during the walk. v1.9.19 hardened the directory leg via stayInsideProjectRoot + toRealPath, which closes `src/main/java -> /` planted by an attacker MR. The per-file leg of the same attack — committing `src/main/java/com/x/Leak.java -> /etc/passwd` — slid past that guard because walkFileTree still called visitFile on the symlink and the visitor didn't check attrs.isSymbolicLink(). The two relevant visitors (collectJavaFiles, walkFqnsUnder) now skip any symlink unconditionally. The policy is deliberately stricter than the directory leg: we don't do per-file toRealPath on every .java entry because the syscall cost on the hot path isn't worth the usability gain, and the javadoc documents that intra-repo file symlinks are not traversed. Regression: SourceFileScannerTest#rejectsFileLevelSymlinkEscapingProjectRoot and #scansTestFqnsIgnoresFileLevelSymlinks both fail when the guard is reverted, covering both halves of discovery so they can never diverge on what counts as a valid source file. 2. Per-subtree IOExceptions no longer truncate the whole walk. The default SimpleFileVisitor.visitFileFailed rethrows, and the scanner's outer catch swallowed the remaining walk silently — a single unreadable nested module (permission denied, file-deleted-under-us race, filesystem loop) took every readable sibling with it. All three walkers (collectJavaFiles, walkFqnsUnder, findAllMatchingDirs) now override visitFileFailed to log the unreadable entry at WARN via LogSanitizer and continue. Default-visible surfacing is the load-bearing change: operators can now see when a scan ran incomplete where previously they'd silently under-select tests and have no way to correlate it with CI config drift. 3. AffectedTestsConfig.Builder#isAcceptableBaseRef now rejects control characters at the entry gate. The REV_EXPR pattern used `@\{[^}]+\}` for the reflog segment — `[^}]` happily matched newline, ESC, CSI, DEL. A baseRef sourced from an attacker-controlled CI env var like `master@{1\n\u001b[2JAffected Tests: SELECTED\u001b[m}` previously passed validation and flowed unsanitised into log.info("Base ref: {}", …) in AffectedTestsEngine and into three IllegalStateException messages in GitChangeDetector — an attacker who could poison CI_BASE_REF could forge plugin-branded status lines and defeat grep-based merge gates. The fix is a single blanket containsControlChars check before any downstream matcher runs (C0, DEL, and C1 all rejected), closing every regex path at once without depending on getting each hand-crafted character class right. Regression: AffectedTestsConfigTest#baseRefRejectsControlCharsInReflogSuffix covers newline-in-reflog, bare ESC/DEL, and CRLF anywhere. 4. LogSanitizer coverage extended to every remaining diff-sourced log site. v1.9.19 introduced the sanitizer and applied it at four sites; the review found ~28 additional sites passing raw filenames, FQNs, or exception messages straight to the logger — most at DEBUG, several at default- visible WARN, and two inside IllegalStateException messages that Gradle renders verbatim into the build log. All remaining sites now route through LogSanitizer.sanitize: JavaParsers.parseOrWarn (WARN+DEBUG), ImplementationStrategy.supertypesOf (WARN+DEBUG trace), PathToClassMapper's eight classification DEBUG lines, UsageStrategy's seven tier-match DEBUG lines, NamingConventionStrategy's match DEBUG line, GitChangeDetector's per-file DEBUG line plus the IOException/GitAPIException wrappers it throws (JGit exception messages can carry attacker-influenced pack-file paths and ref segments), AffectedTestsEngine's FQN-has-no-file DEBUG line, SourceFileScanner.findAllMatchingDirs's trailing DEBUG summary, and the value echoed inside the IllegalArgumentException thrown by AffectedTestsConfig.Builder#baseRef when validation fails (otherwise the reject branch would reopen the same surface containsControlChars closes on the accept branch). Even DEBUG sites are sanitised because operators who bump the log level to chase a false-positive selection are exactly the audience an attacker would target — the log-forgery contract has to hold at every visibility level. Regression: baseRefValidationErrorDoesNotEchoRawControlChars pins the newly-sanitised exception-message path; LogSanitizerTest covers the escape table. Two related findings are intentionally deferred to batch 6 (v1.9.22) so each PR stays scoped and independently bisectable: #9 (ProjectIndex parse failures should signal discovery-incomplete to the engine so the safety net can escalate explicitly) and #11 (configurable timeout on the nested ./gradlew invocation so a hung child test cannot pin a CI worker indefinitely). --- CHANGELOG.md | 105 ++++++++++++++++++ .../core/AffectedTestsEngine.java | 7 +- .../core/config/AffectedTestsConfig.java | 64 ++++++++++- .../discovery/ImplementationStrategy.java | 21 +++- .../core/discovery/JavaParsers.java | 22 +++- .../discovery/NamingConventionStrategy.java | 8 +- .../core/discovery/SourceFileScanner.java | 100 ++++++++++++++++- .../core/discovery/UsageStrategy.java | 31 ++++-- .../core/git/GitChangeDetector.java | 16 ++- .../core/mapping/PathToClassMapper.java | 31 ++++-- .../core/config/AffectedTestsConfigTest.java | 80 +++++++++++++ .../core/discovery/SourceFileScannerTest.java | 65 +++++++++++ 12 files changed, 515 insertions(+), 35 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index edc0f39..18ed31e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,6 +6,111 @@ adheres to [Semantic Versioning](https://semver.org/). ## [Unreleased] +### Fixed — tier-1 reliability / safety batch (v1.9.21) + +The v1.9.20 batch closed every tier-1 *correctness* bug surfaced by the +multi-reviewer sweep. v1.9.21 closes the tier-1 *reliability and +safety* cluster from the same review — the class of issues that don't +silently drop tests but do weaken the merge-gate contract and the +hardening work shipped in v1.9.19. + +- **`SourceFileScanner` now rejects file-level symlinks during the + walk.** v1.9.19 hardened the *directory* path via + `stayInsideProjectRoot` + `toRealPath`, which closed + `src/main/java -> /` as an MR-planted symlink. The per-file leg of + the same attack — an attacker committing + `src/main/java/com/x/Leak.java -> /etc/passwd` — slid past that + guard because `Files.walkFileTree` still called `visitFile` on the + symlink and the visitor didn't check `attrs.isSymbolicLink()`. The + two file-walk visitors in the scanner (`collectJavaFiles`, + `walkFqnsUnder`) now skip any entry flagged as a symlink + unconditionally; a regular `.java` file whose real path + legitimately lives inside the project tree is just a regular file, + not a symlink. Regression lives in `SourceFileScannerTest` + (`rejectsFileLevelSymlinkEscapingProjectRoot`, + `scansTestFqnsIgnoresFileLevelSymlinks`) — both fail when the + production guard is reverted, covering both the + `collectTestFiles`/`collectSourceFiles` path and the + `scanTestFqns` path so the two halves of discovery can never + diverge on what counts as a valid source file. +- **Per-subtree `IOException`s no longer truncate the whole walk.** + The default `SimpleFileVisitor.visitFileFailed` rethrows, and the + scanner's outer `IOException` catch swallowed the remaining walk + silently — a single unreadable nested module (permission denied, + file-deleted-under-us race, filesystem loop) took every readable + sibling down with it. All three walk sites (`collectJavaFiles`, + `walkFqnsUnder`, `findAllMatchingDirs`) now override + `visitFileFailed` to log the unreadable entry at WARN via + `LogSanitizer` and continue. Default-visible surfacing of the + skipped entry is the load-bearing change: operators can now see + when a scan ran incomplete, where previously they'd silently + under-select tests and have no way to correlate it with CI config + drift. No regression test — inducing a portable per-entry + `IOException` requires either root-evadable `chmod 000` semantics + (fails on CI runners running as root) or heavyweight filesystem + mocking; the fix is evident in the diff and backed by the 11 + existing walk tests in `SourceFileScannerTest`. +- **`AffectedTestsConfig.Builder#isAcceptableBaseRef` now rejects + control characters at the entry gate.** The rev-expression pattern + used `@\{[^}]+\}` for the reflog segment — `[^}]` happily matched + newline, ESC, CSI, and DEL. A `baseRef` sourced from an + attacker-controlled CI env var like + `master@{1\n\u001b[2JAffected Tests: SELECTED\u001b[m}` previously + passed validation and then flowed unsanitised into + `log.info("Base ref: {}", …)` in `AffectedTestsEngine` and into + three `IllegalStateException` messages in `GitChangeDetector`. An + attacker who could poison `CI_BASE_REF` (or an equivalent) could + forge plugin-branded status lines into CI output and defeat + grep-based merge gates. The fix is a single blanket + `containsControlChars` check before any of the downstream matchers + run — C0 (`0x00..0x1F`), DEL (`0x7F`), and C1 (`0x80..0x9F`) are + all rejected, closing every regex path at once without depending + on getting each hand-crafted character class right. Regression + lives in `AffectedTestsConfigTest.baseRefRejectsControlCharsInReflogSuffix`, + which covers the newline-in-reflog shape, bare ESC/DEL, and CRLF + anywhere in the input; each case fails when the entry-gate check + is removed. +- **`LogSanitizer` coverage extended to every remaining diff-sourced + log site.** v1.9.19 introduced the sanitizer and applied it at + four sites; the multi-reviewer review found ~28 additional sites + passing raw filenames, FQNs, or exception messages straight to the + logger — most at DEBUG, but several at default-visible WARN (and + two inside `IllegalStateException` messages that Gradle renders + verbatim into the build log). An attacker-crafted filename or a + poisoned `JavaParser` diagnostic carrying `\u001b[2J…\u001b[m` + could still forge ops-visible log lines. All remaining sites now + route through `LogSanitizer.sanitize`: + `JavaParsers.parseOrWarn` (WARN + DEBUG), + `ImplementationStrategy.supertypesOf` (WARN) and its per-depth + match trace (DEBUG), `PathToClassMapper`'s eight classification + DEBUG lines, `UsageStrategy`'s seven tier-match DEBUG lines, + `NamingConventionStrategy`'s match DEBUG line, + `GitChangeDetector`'s per-file changed-entry DEBUG line and the + `IOException` / `GitAPIException` wrappers it throws (because + JGit exception messages can carry attacker-influenced pack-file + paths and ref segments), `AffectedTestsEngine`'s FQN-has-no-file + DEBUG line, `SourceFileScanner.findAllMatchingDirs`'s trailing + DEBUG summary (brought into parity with its WARN path), and the + echo of the offending value inside the `IllegalArgumentException` + thrown by `AffectedTestsConfig.Builder#baseRef` when validation + fails (otherwise the reject branch would reopen the same surface + `containsControlChars` closes on the accept branch). Even DEBUG + sites are sanitised because operators who bump the log level to + chase a false-positive selection are exactly the audience an + attacker would target — the log-forgery contract has to hold at + every visibility level, not just WARN/INFO. Regressions: + `LogSanitizerTest` covers the escape table; + `baseRefValidationErrorDoesNotEchoRawControlChars` pins the + newly-sanitised exception-message path. + +Two related findings from the same review are intentionally deferred +to batch 6 (v1.9.22) so each PR stays scoped: #9 (`ProjectIndex` +parse failures signal discovery-incomplete to the engine so the +safety net can escalate explicitly) and #11 (configurable timeout on +the nested `./gradlew` invocation so a hung child test cannot pin +the outer build indefinitely). Both are observable-behaviour changes +and deserve their own CHANGELOG entries. + ### Fixed — post-v1.9.19 multi-reviewer correctness pass (v1.9.20) A second, deeper multi-agent review of the post-v1.9.19 tree surfaced a diff --git a/affected-tests-core/src/main/java/io/affectedtests/core/AffectedTestsEngine.java b/affected-tests-core/src/main/java/io/affectedtests/core/AffectedTestsEngine.java index 8b212b4..3772a22 100644 --- a/affected-tests-core/src/main/java/io/affectedtests/core/AffectedTestsEngine.java +++ b/affected-tests-core/src/main/java/io/affectedtests/core/AffectedTestsEngine.java @@ -311,7 +311,12 @@ public AffectedTestsResult run() { allTestsToRun.add(fqn); fqnToPath.put(fqn, file); } else { - log.debug("Skipping FQN with no matching test file on disk: {}", fqn); + // The FQN reached this log site via the discovery + // strategies, which derive names from diff paths and + // from the scanned source tree — both attacker- + // influenced on a merge-gate run. + log.debug("Skipping FQN with no matching test file on disk: {}", + LogSanitizer.sanitize(fqn)); } } diff --git a/affected-tests-core/src/main/java/io/affectedtests/core/config/AffectedTestsConfig.java b/affected-tests-core/src/main/java/io/affectedtests/core/config/AffectedTestsConfig.java index 887a3f7..a18c242 100644 --- a/affected-tests-core/src/main/java/io/affectedtests/core/config/AffectedTestsConfig.java +++ b/affected-tests-core/src/main/java/io/affectedtests/core/config/AffectedTestsConfig.java @@ -1,5 +1,7 @@ package io.affectedtests.core.config; +import io.affectedtests.core.util.LogSanitizer; + import java.util.ArrayList; import java.util.EnumMap; import java.util.List; @@ -526,9 +528,16 @@ public Builder baseRef(String baseRef) { throw new IllegalArgumentException("baseRef must not be null or blank"); } if (!isAcceptableBaseRef(baseRef)) { + // The rejected value is echoed back in the exception message, which + // Gradle renders verbatim into the build log (and often into build- + // scan HTML). Sanitising here closes the same log-forgery surface + // that containsControlChars closes on the accept path — without it, + // an attacker-poisoned CI_BASE_REF would still get its forged + // status line printed, just via the *reject* branch instead of the + // accept branch. See the javadoc on isAcceptableBaseRef. throw new IllegalArgumentException( "baseRef is not a valid git ref, SHA, or short form: '" - + baseRef + "' — expected something like " + + LogSanitizer.sanitize(baseRef) + "' — expected something like " + "'origin/master', 'HEAD~1', or a 7-40 char hex SHA"); } this.baseRef = baseRef; @@ -549,13 +558,35 @@ public Builder baseRef(String baseRef) { * {@link org.eclipse.jgit.lib.Repository#isValidRefName(String)}), * short/long SHAs, and {@code HEAD~N}/{@code ^N}/{@code @{N}} * rev-expressions. Rejects path-traversal shapes by construction - * since JGit's refname validator already forbids {@code ..}, a - * leading {@code /}, and control characters — no hand-rolled - * string sniffing required. Keeping this decision inside the - * builder means every call site (CLI, Gradle plugin, programmatic - * users) gets the same contract without duplicating the rule. + * since JGit's refname validator already forbids {@code ..} and + * a leading {@code /}. The {@link #SHORT_SHA} short-circuit at + * the top bypasses {@code isValidRefName}, but is safe by + * construction: the hex-only character class excludes {@code .} + * and {@code /}, so path-traversal shapes cannot reach the SHA + * path. + * + *

Control characters ({@code \n}, {@code \r}, ESC, CSI, DEL, + * the whole C0 and C1 ranges) are rejected at the entry gate + * before any of the downstream matchers run. This is load- + * bearing for the {@link #REV_EXPR} path: its + * {@code @\{[^}]+\}} segment matches newline, ESC, and CSI + * verbatim, so a {@code baseRef} sourced from an attacker- + * controlled CI environment variable like + * {@code master@\{1\n\u001b[2JAffected Tests: SELECTED\u001b[m\}} + * previously passed validation and then flowed straight into + * {@code log.info("Base ref: {}", …)} in + * {@code AffectedTestsEngine} and into three + * {@code IllegalStateException} messages in + * {@code GitChangeDetector} — a log-forgery surface that let + * an attacker fabricate plugin-branded status lines in CI + * output. Rejecting at the gate closes every regex path at + * once without depending on getting each hand-crafted + * character class right. */ private static boolean isAcceptableBaseRef(String baseRef) { + if (containsControlChars(baseRef)) { + return false; + } if (SHORT_SHA.matcher(baseRef).matches()) { return true; } @@ -581,6 +612,27 @@ private static boolean isAcceptableBaseRef(String baseRef) { } return false; } + + /** + * Returns {@code true} if {@code value} contains any C0 control + * character ({@code 0x00..0x1F}), DEL ({@code 0x7F}), or C1 + * control character ({@code 0x80..0x9F}). Kept in the builder + * rather than reaching for {@link io.affectedtests.core.util.LogSanitizer} + * because this is a validation gate, not a logging concern — + * the string must be rejected here before it ever reaches a + * logger, and keeping the check local makes the + * {@link #isAcceptableBaseRef(String)} contract auditable + * without cross-package hops. + */ + private static boolean containsControlChars(String value) { + for (int i = 0; i < value.length(); i++) { + char c = value.charAt(i); + if (c < 0x20 || c == 0x7F || (c >= 0x80 && c <= 0x9F)) { + return true; + } + } + return false; + } public Builder includeUncommitted(boolean v) { this.includeUncommitted = v; return this; } public Builder includeStaged(boolean v) { this.includeStaged = v; return this; } public Builder runAllIfNoMatches(boolean v) { this.runAllIfNoMatches = v; return this; } diff --git a/affected-tests-core/src/main/java/io/affectedtests/core/discovery/ImplementationStrategy.java b/affected-tests-core/src/main/java/io/affectedtests/core/discovery/ImplementationStrategy.java index b562ba9..4e1d015 100644 --- a/affected-tests-core/src/main/java/io/affectedtests/core/discovery/ImplementationStrategy.java +++ b/affected-tests-core/src/main/java/io/affectedtests/core/discovery/ImplementationStrategy.java @@ -8,6 +8,7 @@ import com.github.javaparser.ast.body.TypeDeclaration; import com.github.javaparser.ast.type.ClassOrInterfaceType; import io.affectedtests.core.config.AffectedTestsConfig; +import io.affectedtests.core.util.LogSanitizer; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -196,9 +197,16 @@ private Set findImplementations(Set changedClasses, for (ClassOrInterfaceType supertype : supertypesOf(decl)) { if (targetsBySimpleName.containsKey(supertype.getNameAsString())) { newImpls.add(implFqn); + // {@code implFqn} and {@code supertype.getNameAsString()} + // both originate from the scanned source tree, + // which on a merge-gate run is attacker- + // influenced. DEBUG-level, but sanitised for + // forgery-resistance when ops drop the level. log.debug("[impl] depth {}: {} ({}) extends/implements {}", - depth + 1, implFqn, decl.getClass().getSimpleName(), - supertype.getNameAsString()); + depth + 1, + LogSanitizer.sanitize(implFqn), + decl.getClass().getSimpleName(), + LogSanitizer.sanitize(supertype.getNameAsString())); break; } } @@ -289,9 +297,16 @@ private static List supertypesOf(TypeDeclaration decl) // known supertypes, which is the correct conservative // fallback (cannot become a false impl match; at worst we // underselect for a single declaration). + // {@code decl.getNameAsString()} originates from an + // attacker-controllable source file. This WARN surfaces at + // the default log level, so route through LogSanitizer to + // keep the log-forgery contract from v1.9.19 honest. + // {@code decl.getClass().getSimpleName()} is a JavaParser + // internal constant and does not need sanitising. log.warn("Affected Tests: [impl] unknown TypeDeclaration subtype {} for {} — " + "supertype edges cannot be derived, skipping", - decl.getClass().getSimpleName(), decl.getNameAsString()); + decl.getClass().getSimpleName(), + LogSanitizer.sanitize(decl.getNameAsString())); } return result; } diff --git a/affected-tests-core/src/main/java/io/affectedtests/core/discovery/JavaParsers.java b/affected-tests-core/src/main/java/io/affectedtests/core/discovery/JavaParsers.java index 43ba06c..76f1abb 100644 --- a/affected-tests-core/src/main/java/io/affectedtests/core/discovery/JavaParsers.java +++ b/affected-tests-core/src/main/java/io/affectedtests/core/discovery/JavaParsers.java @@ -5,6 +5,7 @@ import com.github.javaparser.ParserConfiguration; import com.github.javaparser.ParserConfiguration.LanguageLevel; import com.github.javaparser.ast.CompilationUnit; +import io.affectedtests.core.util.LogSanitizer; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -92,15 +93,32 @@ static CompilationUnit parseOrWarn(JavaParser parser, Path file, String label) { String firstProblem = result.getProblems().isEmpty() ? "no diagnostics" : result.getProblems().get(0).getMessage(); + // Both {@code file} (attacker-committable filename from + // the scanned source tree) and {@code firstProblem} (parser + // diagnostic that can embed a source-code snippet sourced + // from an attacker-controlled file) flow into a default- + // visible WARN line. Sanitise both so an attacker who + // plants a file with a {@code \n} + fake authoritative + // status line in the name or content cannot forge CI log + // output. Label is an internal constant, not sanitised. log.warn("Affected Tests: [{}] failed to parse {} at language level {}: {}", - label, file, LANGUAGE_LEVEL, firstProblem); + label, + LogSanitizer.sanitize(String.valueOf(file)), + LANGUAGE_LEVEL, + LogSanitizer.sanitize(firstProblem)); return null; } catch (Exception e) { // Preserve the pre-v1.9.20 behaviour of degrading the // exception path to DEBUG (I/O races, file-deleted-under- // JGit, etc. are noisy on CI) while still surfacing the // much-more-common isSuccessful==false branch above. - log.debug("Affected Tests: [{}] error parsing {}: {}", label, file, e.getMessage()); + // Sanitise the same way as the WARN branch so that if an + // operator bumps the level to DEBUG to investigate, the + // log cannot be retroactively forged. + log.debug("Affected Tests: [{}] error parsing {}: {}", + label, + LogSanitizer.sanitize(String.valueOf(file)), + LogSanitizer.sanitize(e.getMessage())); return null; } } diff --git a/affected-tests-core/src/main/java/io/affectedtests/core/discovery/NamingConventionStrategy.java b/affected-tests-core/src/main/java/io/affectedtests/core/discovery/NamingConventionStrategy.java index 80c6868..53b588b 100644 --- a/affected-tests-core/src/main/java/io/affectedtests/core/discovery/NamingConventionStrategy.java +++ b/affected-tests-core/src/main/java/io/affectedtests/core/discovery/NamingConventionStrategy.java @@ -1,6 +1,7 @@ package io.affectedtests.core.discovery; import io.affectedtests.core.config.AffectedTestsConfig; +import io.affectedtests.core.util.LogSanitizer; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -60,7 +61,12 @@ private Set matchTests(Set changedProductionClasses, Set for (var entry : expectedTestNames.entrySet()) { if (entry.getValue().contains(testSimpleName)) { discoveredTests.add(testFqn); - log.debug("Naming match: {} → {}", entry.getKey(), testFqn); + // Both keys (changed production FQNs from the diff) + // and test FQNs (from the scanned source tree) are + // attacker-influenced on a merge-gate run. + log.debug("Naming match: {} → {}", + LogSanitizer.sanitize(entry.getKey()), + LogSanitizer.sanitize(testFqn)); } } } diff --git a/affected-tests-core/src/main/java/io/affectedtests/core/discovery/SourceFileScanner.java b/affected-tests-core/src/main/java/io/affectedtests/core/discovery/SourceFileScanner.java index f01e420..ccbabf6 100644 --- a/affected-tests-core/src/main/java/io/affectedtests/core/discovery/SourceFileScanner.java +++ b/affected-tests-core/src/main/java/io/affectedtests/core/discovery/SourceFileScanner.java @@ -1,5 +1,6 @@ package io.affectedtests.core.discovery; +import io.affectedtests.core.util.LogSanitizer; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -53,20 +54,65 @@ public static List collectJavaFiles(Path dir) { /** * Recursively collects all {@code .java} files under {@code dir}, * appending them to the provided list. + * + *

Symlinks, at either the directory or file level, are never + * followed or added to the result. The directory-level + * containment check in {@link #findAllMatchingDirs} already + * rejects {@code src/main/java -> /} planted by an attacker MR, + * but a more subtle attack — committing + * {@code src/main/java/com/x/Leak.java -> /etc/passwd} — slid + * past that guard and the file would then be parsed by + * JavaParser, with its contents potentially surfaced in + * {@code --explain} samples or in parse-failure WARN output. The + * in-visitor {@link BasicFileAttributes#isSymbolicLink()} check + * closes that hole. This is deliberately stricter than the + * directory-level containment check: we reject every + * file-level symlink unconditionally, even intra-repo ones whose + * real path stays inside the project tree. The rationale is + * cost/benefit — a per-file {@code toRealPath} call on every + * {@code .java} entry would add a syscall to the hot path for + * negligible benefit, and the usability cost of dropping + * intra-repo file symlinks is low (Git doesn't preserve hard + * links across clone either, so tooling that relies on + * canonical-reference aliasing already has to handle the "regular + * file" case). Projects that currently commit file-level symlinks + * for tooling reasons will see them silently ignored by + * discovery. + * + *

A per-subtree {@link IOException} (permission denied on one + * nested directory) is logged and swallowed so the scan + * continues; before v1.9.21 the default {@link SimpleFileVisitor} + * rethrew and the outer catch truncated the walk silently — any + * readable siblings of the unreadable subtree vanished from + * discovery without any signal to the operator. */ public static void collectJavaFiles(Path dir, List result) { try { Files.walkFileTree(dir, new SimpleFileVisitor<>() { @Override public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) { + if (attrs.isSymbolicLink()) { + return FileVisitResult.CONTINUE; + } if (file.toString().endsWith(".java")) { result.add(file); } return FileVisitResult.CONTINUE; } + + @Override + public FileVisitResult visitFileFailed(Path file, IOException exc) { + log.warn("Affected Tests: skipping unreadable entry {} under {}: {}", + LogSanitizer.sanitize(String.valueOf(file)), + LogSanitizer.sanitize(String.valueOf(dir)), + LogSanitizer.sanitize(exc.getMessage())); + return FileVisitResult.CONTINUE; + } }); } catch (IOException e) { - log.warn("Error collecting Java files from {}: {}", dir, e.getMessage()); + log.warn("Affected Tests: error collecting Java files from {}: {}", + LogSanitizer.sanitize(String.valueOf(dir)), + LogSanitizer.sanitize(e.getMessage())); } } @@ -154,6 +200,15 @@ private static void walkFqnsUnder(Path sourceRoot, Map accumulator Files.walkFileTree(sourceRoot, new SimpleFileVisitor<>() { @Override public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) { + // Same symlink + containment contract as + // collectJavaFiles: file symlinks are never walked + // because the real path can escape the project + // root (DoS + data-exfiltration surface when the + // plugin runs on merge-gate CI against an + // attacker-influenced branch). + if (attrs.isSymbolicLink()) { + return FileVisitResult.CONTINUE; + } if (file.toString().endsWith(".java")) { Path relative = sourceRoot.relativize(file); String fqn = relative.toString() @@ -166,9 +221,20 @@ public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) { } return FileVisitResult.CONTINUE; } + + @Override + public FileVisitResult visitFileFailed(Path file, IOException exc) { + log.warn("Affected Tests: skipping unreadable entry {} under {}: {}", + LogSanitizer.sanitize(String.valueOf(file)), + LogSanitizer.sanitize(String.valueOf(sourceRoot)), + LogSanitizer.sanitize(exc.getMessage())); + return FileVisitResult.CONTINUE; + } }); } catch (IOException e) { - log.warn("Error scanning directory {}: {}", sourceRoot, e.getMessage()); + log.warn("Affected Tests: error scanning directory {}: {}", + LogSanitizer.sanitize(String.valueOf(sourceRoot)), + LogSanitizer.sanitize(e.getMessage())); } } @@ -290,12 +356,38 @@ public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) { return FileVisitResult.CONTINUE; } + + @Override + public FileVisitResult visitFileFailed(Path file, IOException exc) { + // Per-subtree permission / race errors must not + // truncate the entire module discovery walk. Pre- + // v1.9.21 the default SimpleFileVisitor rethrew + // here and the outer catch swallowed the whole + // remaining walk — any modules living next to an + // unreadable subtree were silently invisible. + log.warn("Affected Tests: skipping unreadable entry {} under {}: {}", + LogSanitizer.sanitize(String.valueOf(file)), + LogSanitizer.sanitize(String.valueOf(projectDir)), + LogSanitizer.sanitize(exc.getMessage())); + return FileVisitResult.CONTINUE; + } }); } catch (IOException e) { - log.warn("Error searching for '{}' under {}: {}", relativeDir, projectDir, e.getMessage()); + log.warn("Affected Tests: error searching for '{}' under {}: {}", + LogSanitizer.sanitize(relativeDir), + LogSanitizer.sanitize(String.valueOf(projectDir)), + LogSanitizer.sanitize(e.getMessage())); } - log.debug("Found {} directories matching '{}' under {}", matches.size(), relativeDir, projectDir); + // relativeDir and projectDir are Gradle-config-sourced and trusted by + // our threat model, but sanitising at DEBUG too keeps the log-forgery + // contract uniform across every visibility level and matches the WARN + // path a few lines up. One less thing for a future contributor to get + // subtly wrong. + log.debug("Found {} directories matching '{}' under {}", + matches.size(), + LogSanitizer.sanitize(relativeDir), + LogSanitizer.sanitize(String.valueOf(projectDir))); return matches; } diff --git a/affected-tests-core/src/main/java/io/affectedtests/core/discovery/UsageStrategy.java b/affected-tests-core/src/main/java/io/affectedtests/core/discovery/UsageStrategy.java index ca0e36f..8829f4e 100644 --- a/affected-tests-core/src/main/java/io/affectedtests/core/discovery/UsageStrategy.java +++ b/affected-tests-core/src/main/java/io/affectedtests/core/discovery/UsageStrategy.java @@ -10,6 +10,7 @@ import com.github.javaparser.ast.expr.ObjectCreationExpr; import com.github.javaparser.ast.type.ClassOrInterfaceType; import io.affectedtests.core.config.AffectedTestsConfig; +import io.affectedtests.core.util.LogSanitizer; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -92,7 +93,7 @@ private Set scanTestFiles(Set changedProductionClasses, if (testReferencesChangedClass(cu, changedFqns, simpleNames, simpleNameToFqns)) { discoveredTests.add(testFqn); - log.debug("Usage match: {}", testFqn); + log.debug("Usage match: {}", LogSanitizer.sanitize(testFqn)); } } @@ -150,15 +151,26 @@ private boolean testReferencesChangedClass(CompilationUnit cu, // `c.d.Outer` (PathToClassMapper is file-based, so it only // surfaces the outer FQN for the nested class's change). Without // this, a test that only uses the inner class is silently missed. + // All {@code changedFqn} and {@code imported} values in the + // Tier 1 / 1b / 2 / 3 blocks below are diff-derived and may + // legitimately carry odd but-valid characters that still need + // control-char sanitisation before they hit the logger — a + // malicious MR can craft an import line like + // {@code import com.evil.\u001b[m;}. Sanitisation is applied + // even at DEBUG because operators bumping level to chase a + // false-positive selection is exactly when forgery-resistance + // matters most. for (String changedFqn : changedFqns) { if (importedFqns.contains(changedFqn)) { - log.debug(" Direct import match: {}", changedFqn); + log.debug(" Direct import match: {}", LogSanitizer.sanitize(changedFqn)); return true; } String innerPrefix = changedFqn + "."; for (String imported : importedFqns) { if (imported.startsWith(innerPrefix)) { - log.debug(" Inner-class import match: {} <- {}", changedFqn, imported); + log.debug(" Inner-class import match: {} <- {}", + LogSanitizer.sanitize(changedFqn), + LogSanitizer.sanitize(imported)); return true; } } @@ -185,14 +197,16 @@ private boolean testReferencesChangedClass(CompilationUnit cu, // failed, silently dropping the consumer's coverage. for (String changedFqn : changedFqns) { if (wildcardPackages.contains(changedFqn)) { - log.debug(" Wildcard class-member import match: {}", changedFqn); + log.debug(" Wildcard class-member import match: {}", + LogSanitizer.sanitize(changedFqn)); return true; } String pkg = SourceFileScanner.packageOf(changedFqn); if (wildcardPackages.contains(pkg)) { String simpleName = SourceFileScanner.simpleClassName(changedFqn); if (typeNameAppearsInAst(cu, simpleName)) { - log.debug(" Wildcard package import + type ref match: {}", changedFqn); + log.debug(" Wildcard package import + type ref match: {}", + LogSanitizer.sanitize(changedFqn)); return true; } } @@ -207,7 +221,8 @@ private boolean testReferencesChangedClass(CompilationUnit cu, if (testPackage.equals(changedPkg)) { String simpleName = SourceFileScanner.simpleClassName(changedFqn); if (typeNameAppearsInAst(cu, simpleName)) { - log.debug(" Same-package type ref match: {}", changedFqn); + log.debug(" Same-package type ref match: {}", + LogSanitizer.sanitize(changedFqn)); return true; } } @@ -231,7 +246,9 @@ private boolean testReferencesChangedClass(CompilationUnit cu, if (scoped == null || !scoped.contains(".")) continue; for (String changedFqn : changedFqns) { if (scoped.equals(changedFqn) || scoped.startsWith(changedFqn + ".")) { - log.debug(" Inline fully-qualified reference: {} -> {}", scoped, changedFqn); + log.debug(" Inline fully-qualified reference: {} -> {}", + LogSanitizer.sanitize(scoped), + LogSanitizer.sanitize(changedFqn)); return true; } } diff --git a/affected-tests-core/src/main/java/io/affectedtests/core/git/GitChangeDetector.java b/affected-tests-core/src/main/java/io/affectedtests/core/git/GitChangeDetector.java index 13f7f76..ad0d2f9 100644 --- a/affected-tests-core/src/main/java/io/affectedtests/core/git/GitChangeDetector.java +++ b/affected-tests-core/src/main/java/io/affectedtests/core/git/GitChangeDetector.java @@ -1,6 +1,7 @@ package io.affectedtests.core.git; import io.affectedtests.core.config.AffectedTestsConfig; +import io.affectedtests.core.util.LogSanitizer; import org.eclipse.jgit.api.Git; import org.eclipse.jgit.api.errors.GitAPIException; import org.eclipse.jgit.diff.DiffEntry; @@ -103,17 +104,26 @@ public Set detectChangedFiles() { + "actions/checkout (GitHub) / GIT_DEPTH: 0 (GitLab).", e); } catch (IOException e) { + // JGit IOException messages can embed pack-file paths, ref names, + // or (on PackInvalidException / CorruptObjectException) raw bytes + // from the object database — all attacker-influenceable on an MR + // branch. Sanitise before embedding so the exception message, + // which Gradle renders verbatim into the build log, cannot carry + // a forged status line. throw new IllegalStateException( "Affected Tests: I/O error while reading git metadata at '" + projectDir + "': " - + e.getMessage(), e); + + LogSanitizer.sanitize(e.getMessage()), e); } catch (GitAPIException e) { throw new IllegalStateException( "Affected Tests: git diff against '" + config.baseRef() + "' failed: " - + e.getMessage(), e); + + LogSanitizer.sanitize(e.getMessage()), e); } log.info("Detected {} changed files", changedFiles.size()); - changedFiles.forEach(f -> log.debug(" Changed: {}", f)); + // File paths come straight from git diff entries on an + // attacker-controllable MR branch, so sanitise even at DEBUG + // — same rationale as every other diff-sourced log site. + changedFiles.forEach(f -> log.debug(" Changed: {}", LogSanitizer.sanitize(f))); return changedFiles; } diff --git a/affected-tests-core/src/main/java/io/affectedtests/core/mapping/PathToClassMapper.java b/affected-tests-core/src/main/java/io/affectedtests/core/mapping/PathToClassMapper.java index 34b18d0..074256d 100644 --- a/affected-tests-core/src/main/java/io/affectedtests/core/mapping/PathToClassMapper.java +++ b/affected-tests-core/src/main/java/io/affectedtests/core/mapping/PathToClassMapper.java @@ -140,9 +140,15 @@ public MappingResult mapChangedFiles(Set changedFiles) { // {@code ignorePaths} entry is a contract that nothing about // the file should influence the engine, including nudging it // into the out-of-scope bucket. + // All {@code filePath} values below originate from the git + // diff against an attacker-controllable MR branch, so every + // log site routes through {@link LogSanitizer}. Even + // DEBUG-level sites are worth sanitising because operators + // bumping to DEBUG to investigate incidents is exactly when + // log-forgery escalation is most likely to matter. if (isIgnored(filePath)) { ignoredFiles.add(filePath); - log.debug("Ignored by pattern: {}", filePath); + log.debug("Ignored by pattern: {}", LogSanitizer.sanitize(filePath)); continue; } @@ -154,12 +160,12 @@ public MappingResult mapChangedFiles(Set changedFiles) { if (matchesAny(filePath, outOfScopeSourceMatchers) || matchesAny(filePath, outOfScopeTestMatchers)) { outOfScopeFiles.add(filePath); - log.debug("Out-of-scope: {}", filePath); + log.debug("Out-of-scope: {}", LogSanitizer.sanitize(filePath)); continue; } if (!filePath.endsWith(".java")) { - log.debug("Non-Java file flagged as unmapped: {}", filePath); + log.debug("Non-Java file flagged as unmapped: {}", LogSanitizer.sanitize(filePath)); unmappedChangedFiles.add(filePath); continue; } @@ -178,7 +184,9 @@ public MappingResult mapChangedFiles(Set changedFiles) { // safety net decide (FULL_SUITE in CI mode by default). String fileName = extractFileName(filePath); if ("module-info.java".equals(fileName) || "package-info.java".equals(fileName)) { - log.debug("Java marker file ({}) flagged as unmapped: {}", fileName, filePath); + log.debug("Java marker file ({}) flagged as unmapped: {}", + LogSanitizer.sanitize(fileName), + LogSanitizer.sanitize(filePath)); unmappedChangedFiles.add(filePath); continue; } @@ -187,7 +195,9 @@ public MappingResult mapChangedFiles(Set changedFiles) { if (testFqn != null) { testClasses.add(testFqn); changedTestFiles.add(filePath); - log.debug("Mapped test file: {} → {}", filePath, testFqn); + log.debug("Mapped test file: {} → {}", + LogSanitizer.sanitize(filePath), + LogSanitizer.sanitize(testFqn)); continue; } @@ -195,13 +205,16 @@ public MappingResult mapChangedFiles(Set changedFiles) { if (prodFqn != null) { productionClasses.add(prodFqn); changedProductionFiles.add(filePath); - log.debug("Mapped production file: {} → {}", filePath, prodFqn); + log.debug("Mapped production file: {} → {}", + LogSanitizer.sanitize(filePath), + LogSanitizer.sanitize(prodFqn)); continue; } // A .java file outside the configured source/test dirs — still // unmappable, still a potential safety escalation trigger. - log.debug("Java file outside configured source/test dirs flagged as unmapped: {}", filePath); + log.debug("Java file outside configured source/test dirs flagged as unmapped: {}", + LogSanitizer.sanitize(filePath)); unmappedChangedFiles.add(filePath); } @@ -288,7 +301,9 @@ private boolean isIgnored(String filePath) { // the engine routes it through the unmapped bucket and the // safety net picks up the oddity, rather than blowing up // the whole task with an unhandled InvalidPathException. - log.debug("Ignore check skipped for unparseable path {}: {}", filePath, e.getMessage()); + log.debug("Ignore check skipped for unparseable path {}: {}", + LogSanitizer.sanitize(filePath), + LogSanitizer.sanitize(e.getMessage())); return false; } for (PathMatcher matcher : ignoreMatchers) { diff --git a/affected-tests-core/src/test/java/io/affectedtests/core/config/AffectedTestsConfigTest.java b/affected-tests-core/src/test/java/io/affectedtests/core/config/AffectedTestsConfigTest.java index b262488..631e729 100644 --- a/affected-tests-core/src/test/java/io/affectedtests/core/config/AffectedTestsConfigTest.java +++ b/affected-tests-core/src/test/java/io/affectedtests/core/config/AffectedTestsConfigTest.java @@ -242,6 +242,86 @@ void baseRefRejectsPathTraversal() { AffectedTestsConfig.builder().baseRef("refs/../../../secrets").build()); } + @Test + void baseRefRejectsControlCharsInReflogSuffix() { + // Regression for v1.9.21 #14: the REV_EXPR pattern's reflog + // segment is {@code @\{[^}]+\}}, which happily matches + // newline / ESC / CSI / DEL. Pre-fix these inputs passed + // {@link AffectedTestsConfig.Builder#isAcceptableBaseRef} and + // then flowed unsanitised into + // log.info("Base ref: {}", config.baseRef()) + // inside AffectedTestsEngine and into three IllegalStateException + // messages in GitChangeDetector — a log-forgery surface when + // baseRef came from an attacker-controlled CI env var. + // + // The fix is a blanket control-char rejection at the entry gate + // of isAcceptableBaseRef, so every downstream matcher (short + // SHA, refname, rev-expression) refuses tainted input. + String fakePluginStatus = "master@{1\n" + + "\u001b[2JAffected Tests: SELECTED (DISCOVERY_SUCCESS) — 0 tests\u001b[m}"; + assertThrows(IllegalArgumentException.class, + () -> AffectedTestsConfig.builder().baseRef(fakePluginStatus).build(), + "baseRef containing newline + ANSI inside @{…} must be rejected at " + + "validation time; otherwise an attacker could forge plugin-branded " + + "CI status lines by poisoning CI_BASE_REF"); + assertThrows(IllegalArgumentException.class, + () -> AffectedTestsConfig.builder().baseRef("master@{1\n}").build(), + "Bare newline in reflog segment must also be rejected"); + assertThrows(IllegalArgumentException.class, + () -> AffectedTestsConfig.builder().baseRef("master@{\u001b}").build(), + "Bare ESC in reflog segment must be rejected"); + assertThrows(IllegalArgumentException.class, + () -> AffectedTestsConfig.builder().baseRef("master@{\u007F}").build(), + "Bare DEL in reflog segment must be rejected"); + assertThrows(IllegalArgumentException.class, + () -> AffectedTestsConfig.builder().baseRef("master\r\noops").build(), + "CRLF anywhere in the baseRef must be rejected pre-regex"); + } + + @Test + void baseRefValidationErrorDoesNotEchoRawControlChars() { + // Regression for v1.9.21 review finding T2-1: the reject branch of + // isAcceptableBaseRef throws IllegalArgumentException whose message + // embeds the offending baseRef. Gradle renders that message verbatim + // into the build log (and into build-scan HTML), so echoing raw + // newline/ESC/CSI/DEL would reintroduce the exact log-forgery + // surface that #14 closed on the accept path — just via the + // validation-failure path instead of log.info("Base ref: …"). + // + // LogSanitizer must wrap the echoed value so the thrown message + // contains no C0, C1, or DEL characters. + String tainted = "master@{1\n\u001b[2JAffected Tests: SELECTED\u001b[m}"; + IllegalArgumentException thrown = assertThrows( + IllegalArgumentException.class, + () -> AffectedTestsConfig.builder().baseRef(tainted).build()); + String message = thrown.getMessage(); + assertNotNull(message, "exception must carry a diagnostic message"); + // Each forbidden byte class gets its own assertion so a regression + // that partially re-opens the surface (say, closes newline but leaks + // ESC) still fails loudly with a pointed message. + assertFalse(message.indexOf('\n') >= 0, + "baseRef validation message must not echo raw newline: " + debugEscape(message)); + assertFalse(message.indexOf('\r') >= 0, + "baseRef validation message must not echo raw CR: " + debugEscape(message)); + assertFalse(message.indexOf('\u001b') >= 0, + "baseRef validation message must not echo raw ESC: " + debugEscape(message)); + assertFalse(message.indexOf('\u007F') >= 0, + "baseRef validation message must not echo raw DEL: " + debugEscape(message)); + } + + private static String debugEscape(String s) { + StringBuilder out = new StringBuilder(s.length() + 8); + for (int i = 0; i < s.length(); i++) { + char c = s.charAt(i); + if (c < 0x20 || c == 0x7F || (c >= 0x80 && c <= 0x9F)) { + out.append(String.format("\\u%04X", (int) c)); + } else { + out.append(c); + } + } + return out.toString(); + } + @Test void baseRefAllowsValidRefs() { assertDoesNotThrow(() -> AffectedTestsConfig.builder().baseRef("origin/master").build()); diff --git a/affected-tests-core/src/test/java/io/affectedtests/core/discovery/SourceFileScannerTest.java b/affected-tests-core/src/test/java/io/affectedtests/core/discovery/SourceFileScannerTest.java index 64534b2..64d552f 100644 --- a/affected-tests-core/src/test/java/io/affectedtests/core/discovery/SourceFileScannerTest.java +++ b/affected-tests-core/src/test/java/io/affectedtests/core/discovery/SourceFileScannerTest.java @@ -163,6 +163,71 @@ void acceptsSymlinkThatStaysInsideProjectRoot() throws IOException { "Symlink resolving to a path under the project root must still be accepted"); } + @Test + void rejectsFileLevelSymlinkEscapingProjectRoot(@TempDir Path outside) throws IOException { + // Threat model (v1.9.19 closed the directory-level leg of this + // attack; v1.9.21 closes the per-file leg): even when + // `src/main/java` is a normal directory inside the project, + // an attacker MR can still commit + // `src/main/java/com/x/Leak.java -> /etc/passwd` + // and the pre-fix scanner would happily enumerate the file, + // hand it to JavaParser, and surface the filename in the + // discovery pipeline. `visitFile` now rejects symlinks + // unconditionally — a regular {@code .java} file whose real + // path legitimately lives inside the tree is just a regular + // file, not a symlink. + Path pkg = tempDir.resolve("src/main/java/com/x"); + Files.createDirectories(pkg); + Files.writeString(pkg.resolve("Ok.java"), "package com.x;\nclass Ok {}"); + + Path attackTarget = outside.resolve("Leak.java"); + Files.writeString(attackTarget, "package evil;\nclass Leak {}"); + + Path fileLink = pkg.resolve("Leak.java"); + try { + Files.createSymbolicLink(fileLink, attackTarget); + } catch (UnsupportedOperationException | FileSystemException e) { + return; + } + + List collected = SourceFileScanner.collectJavaFiles( + tempDir.resolve("src/main/java")); + + assertEquals(1, collected.size(), + "Only the real Ok.java should be collected; the symlinked Leak.java " + + "must be skipped even if its real path is outside the project root"); + assertTrue(collected.get(0).getFileName().toString().equals("Ok.java")); + } + + @Test + void scansTestFqnsIgnoresFileLevelSymlinks(@TempDir Path outside) throws IOException { + // Same hardening contract as above, but for the path that + // feeds {@link ProjectIndex#testFqns()} — the two visitors + // must agree or an attacker could plant a symlinked file + // that is invisible to `collectTestFiles` but visible to + // `scanTestFqns` (or vice versa), producing inconsistent + // bucket counts that are hard to debug. + Path pkg = tempDir.resolve("src/test/java/com/x"); + Files.createDirectories(pkg); + Files.writeString(pkg.resolve("OkTest.java"), "package com.x;\nclass OkTest {}"); + + Path attackTarget = outside.resolve("LeakTest.java"); + Files.writeString(attackTarget, "package evil;\nclass LeakTest {}"); + Path fileLink = pkg.resolve("LeakTest.java"); + try { + Files.createSymbolicLink(fileLink, attackTarget); + } catch (UnsupportedOperationException | FileSystemException e) { + return; + } + + var fqns = SourceFileScanner.scanTestFqns(tempDir, List.of("src/test/java")); + + assertTrue(fqns.contains("com.x.OkTest")); + assertFalse(fqns.contains("com.x.LeakTest"), + "Symlinked file must not contribute to test FQN discovery"); + assertEquals(1, fqns.size()); + } + @Test void scansTestFqnsFromDeeplyNestedModules() throws IOException { Path depth1 = tempDir.resolve("api/src/test/java/com/example");