diff --git a/CHANGELOG.md b/CHANGELOG.md index edc0f39..18ed31e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,6 +6,111 @@ adheres to [Semantic Versioning](https://semver.org/). ## [Unreleased] +### Fixed — tier-1 reliability / safety batch (v1.9.21) + +The v1.9.20 batch closed every tier-1 *correctness* bug surfaced by the +multi-reviewer sweep. v1.9.21 closes the tier-1 *reliability and +safety* cluster from the same review — the class of issues that don't +silently drop tests but do weaken the merge-gate contract and the +hardening work shipped in v1.9.19. + +- **`SourceFileScanner` now rejects file-level symlinks during the + walk.** v1.9.19 hardened the *directory* path via + `stayInsideProjectRoot` + `toRealPath`, which closed + `src/main/java -> /` as an MR-planted symlink. The per-file leg of + the same attack — an attacker committing + `src/main/java/com/x/Leak.java -> /etc/passwd` — slid past that + guard because `Files.walkFileTree` still called `visitFile` on the + symlink and the visitor didn't check `attrs.isSymbolicLink()`. The + two file-walk visitors in the scanner (`collectJavaFiles`, + `walkFqnsUnder`) now skip any entry flagged as a symlink + unconditionally; a regular `.java` file whose real path + legitimately lives inside the project tree is just a regular file, + not a symlink. Regression lives in `SourceFileScannerTest` + (`rejectsFileLevelSymlinkEscapingProjectRoot`, + `scansTestFqnsIgnoresFileLevelSymlinks`) — both fail when the + production guard is reverted, covering both the + `collectTestFiles`/`collectSourceFiles` path and the + `scanTestFqns` path so the two halves of discovery can never + diverge on what counts as a valid source file. +- **Per-subtree `IOException`s no longer truncate the whole walk.** + The default `SimpleFileVisitor.visitFileFailed` rethrows, and the + scanner's outer `IOException` catch swallowed the remaining walk + silently — a single unreadable nested module (permission denied, + file-deleted-under-us race, filesystem loop) took every readable + sibling down with it. All three walk sites (`collectJavaFiles`, + `walkFqnsUnder`, `findAllMatchingDirs`) now override + `visitFileFailed` to log the unreadable entry at WARN via + `LogSanitizer` and continue. Default-visible surfacing of the + skipped entry is the load-bearing change: operators can now see + when a scan ran incomplete, where previously they'd silently + under-select tests and have no way to correlate it with CI config + drift. No regression test — inducing a portable per-entry + `IOException` requires either root-evadable `chmod 000` semantics + (fails on CI runners running as root) or heavyweight filesystem + mocking; the fix is evident in the diff and backed by the 11 + existing walk tests in `SourceFileScannerTest`. +- **`AffectedTestsConfig.Builder#isAcceptableBaseRef` now rejects + control characters at the entry gate.** The rev-expression pattern + used `@\{[^}]+\}` for the reflog segment — `[^}]` happily matched + newline, ESC, CSI, and DEL. A `baseRef` sourced from an + attacker-controlled CI env var like + `master@{1\n\u001b[2JAffected Tests: SELECTED\u001b[m}` previously + passed validation and then flowed unsanitised into + `log.info("Base ref: {}", …)` in `AffectedTestsEngine` and into + three `IllegalStateException` messages in `GitChangeDetector`. An + attacker who could poison `CI_BASE_REF` (or an equivalent) could + forge plugin-branded status lines into CI output and defeat + grep-based merge gates. The fix is a single blanket + `containsControlChars` check before any of the downstream matchers + run — C0 (`0x00..0x1F`), DEL (`0x7F`), and C1 (`0x80..0x9F`) are + all rejected, closing every regex path at once without depending + on getting each hand-crafted character class right. Regression + lives in `AffectedTestsConfigTest.baseRefRejectsControlCharsInReflogSuffix`, + which covers the newline-in-reflog shape, bare ESC/DEL, and CRLF + anywhere in the input; each case fails when the entry-gate check + is removed. +- **`LogSanitizer` coverage extended to every remaining diff-sourced + log site.** v1.9.19 introduced the sanitizer and applied it at + four sites; the multi-reviewer review found ~28 additional sites + passing raw filenames, FQNs, or exception messages straight to the + logger — most at DEBUG, but several at default-visible WARN (and + two inside `IllegalStateException` messages that Gradle renders + verbatim into the build log). An attacker-crafted filename or a + poisoned `JavaParser` diagnostic carrying `\u001b[2J…\u001b[m` + could still forge ops-visible log lines. All remaining sites now + route through `LogSanitizer.sanitize`: + `JavaParsers.parseOrWarn` (WARN + DEBUG), + `ImplementationStrategy.supertypesOf` (WARN) and its per-depth + match trace (DEBUG), `PathToClassMapper`'s eight classification + DEBUG lines, `UsageStrategy`'s seven tier-match DEBUG lines, + `NamingConventionStrategy`'s match DEBUG line, + `GitChangeDetector`'s per-file changed-entry DEBUG line and the + `IOException` / `GitAPIException` wrappers it throws (because + JGit exception messages can carry attacker-influenced pack-file + paths and ref segments), `AffectedTestsEngine`'s FQN-has-no-file + DEBUG line, `SourceFileScanner.findAllMatchingDirs`'s trailing + DEBUG summary (brought into parity with its WARN path), and the + echo of the offending value inside the `IllegalArgumentException` + thrown by `AffectedTestsConfig.Builder#baseRef` when validation + fails (otherwise the reject branch would reopen the same surface + `containsControlChars` closes on the accept branch). Even DEBUG + sites are sanitised because operators who bump the log level to + chase a false-positive selection are exactly the audience an + attacker would target — the log-forgery contract has to hold at + every visibility level, not just WARN/INFO. Regressions: + `LogSanitizerTest` covers the escape table; + `baseRefValidationErrorDoesNotEchoRawControlChars` pins the + newly-sanitised exception-message path. + +Two related findings from the same review are intentionally deferred +to batch 6 (v1.9.22) so each PR stays scoped: #9 (`ProjectIndex` +parse failures signal discovery-incomplete to the engine so the +safety net can escalate explicitly) and #11 (configurable timeout on +the nested `./gradlew` invocation so a hung child test cannot pin +the outer build indefinitely). Both are observable-behaviour changes +and deserve their own CHANGELOG entries. + ### Fixed — post-v1.9.19 multi-reviewer correctness pass (v1.9.20) A second, deeper multi-agent review of the post-v1.9.19 tree surfaced a diff --git a/affected-tests-core/src/main/java/io/affectedtests/core/AffectedTestsEngine.java b/affected-tests-core/src/main/java/io/affectedtests/core/AffectedTestsEngine.java index 8b212b4..3772a22 100644 --- a/affected-tests-core/src/main/java/io/affectedtests/core/AffectedTestsEngine.java +++ b/affected-tests-core/src/main/java/io/affectedtests/core/AffectedTestsEngine.java @@ -311,7 +311,12 @@ public AffectedTestsResult run() { allTestsToRun.add(fqn); fqnToPath.put(fqn, file); } else { - log.debug("Skipping FQN with no matching test file on disk: {}", fqn); + // The FQN reached this log site via the discovery + // strategies, which derive names from diff paths and + // from the scanned source tree — both attacker- + // influenced on a merge-gate run. + log.debug("Skipping FQN with no matching test file on disk: {}", + LogSanitizer.sanitize(fqn)); } } diff --git a/affected-tests-core/src/main/java/io/affectedtests/core/config/AffectedTestsConfig.java b/affected-tests-core/src/main/java/io/affectedtests/core/config/AffectedTestsConfig.java index 887a3f7..a18c242 100644 --- a/affected-tests-core/src/main/java/io/affectedtests/core/config/AffectedTestsConfig.java +++ b/affected-tests-core/src/main/java/io/affectedtests/core/config/AffectedTestsConfig.java @@ -1,5 +1,7 @@ package io.affectedtests.core.config; +import io.affectedtests.core.util.LogSanitizer; + import java.util.ArrayList; import java.util.EnumMap; import java.util.List; @@ -526,9 +528,16 @@ public Builder baseRef(String baseRef) { throw new IllegalArgumentException("baseRef must not be null or blank"); } if (!isAcceptableBaseRef(baseRef)) { + // The rejected value is echoed back in the exception message, which + // Gradle renders verbatim into the build log (and often into build- + // scan HTML). Sanitising here closes the same log-forgery surface + // that containsControlChars closes on the accept path — without it, + // an attacker-poisoned CI_BASE_REF would still get its forged + // status line printed, just via the *reject* branch instead of the + // accept branch. See the javadoc on isAcceptableBaseRef. throw new IllegalArgumentException( "baseRef is not a valid git ref, SHA, or short form: '" - + baseRef + "' — expected something like " + + LogSanitizer.sanitize(baseRef) + "' — expected something like " + "'origin/master', 'HEAD~1', or a 7-40 char hex SHA"); } this.baseRef = baseRef; @@ -549,13 +558,35 @@ public Builder baseRef(String baseRef) { * {@link org.eclipse.jgit.lib.Repository#isValidRefName(String)}), * short/long SHAs, and {@code HEAD~N}/{@code ^N}/{@code @{N}} * rev-expressions. Rejects path-traversal shapes by construction - * since JGit's refname validator already forbids {@code ..}, a - * leading {@code /}, and control characters — no hand-rolled - * string sniffing required. Keeping this decision inside the - * builder means every call site (CLI, Gradle plugin, programmatic - * users) gets the same contract without duplicating the rule. + * since JGit's refname validator already forbids {@code ..} and + * a leading {@code /}. The {@link #SHORT_SHA} short-circuit at + * the top bypasses {@code isValidRefName}, but is safe by + * construction: the hex-only character class excludes {@code .} + * and {@code /}, so path-traversal shapes cannot reach the SHA + * path. + * + *

Control characters ({@code \n}, {@code \r}, ESC, CSI, DEL, + * the whole C0 and C1 ranges) are rejected at the entry gate + * before any of the downstream matchers run. This is load- + * bearing for the {@link #REV_EXPR} path: its + * {@code @\{[^}]+\}} segment matches newline, ESC, and CSI + * verbatim, so a {@code baseRef} sourced from an attacker- + * controlled CI environment variable like + * {@code master@\{1\n\u001b[2JAffected Tests: SELECTED\u001b[m\}} + * previously passed validation and then flowed straight into + * {@code log.info("Base ref: {}", …)} in + * {@code AffectedTestsEngine} and into three + * {@code IllegalStateException} messages in + * {@code GitChangeDetector} — a log-forgery surface that let + * an attacker fabricate plugin-branded status lines in CI + * output. Rejecting at the gate closes every regex path at + * once without depending on getting each hand-crafted + * character class right. */ private static boolean isAcceptableBaseRef(String baseRef) { + if (containsControlChars(baseRef)) { + return false; + } if (SHORT_SHA.matcher(baseRef).matches()) { return true; } @@ -581,6 +612,27 @@ private static boolean isAcceptableBaseRef(String baseRef) { } return false; } + + /** + * Returns {@code true} if {@code value} contains any C0 control + * character ({@code 0x00..0x1F}), DEL ({@code 0x7F}), or C1 + * control character ({@code 0x80..0x9F}). Kept in the builder + * rather than reaching for {@link io.affectedtests.core.util.LogSanitizer} + * because this is a validation gate, not a logging concern — + * the string must be rejected here before it ever reaches a + * logger, and keeping the check local makes the + * {@link #isAcceptableBaseRef(String)} contract auditable + * without cross-package hops. + */ + private static boolean containsControlChars(String value) { + for (int i = 0; i < value.length(); i++) { + char c = value.charAt(i); + if (c < 0x20 || c == 0x7F || (c >= 0x80 && c <= 0x9F)) { + return true; + } + } + return false; + } public Builder includeUncommitted(boolean v) { this.includeUncommitted = v; return this; } public Builder includeStaged(boolean v) { this.includeStaged = v; return this; } public Builder runAllIfNoMatches(boolean v) { this.runAllIfNoMatches = v; return this; } diff --git a/affected-tests-core/src/main/java/io/affectedtests/core/discovery/ImplementationStrategy.java b/affected-tests-core/src/main/java/io/affectedtests/core/discovery/ImplementationStrategy.java index b562ba9..4e1d015 100644 --- a/affected-tests-core/src/main/java/io/affectedtests/core/discovery/ImplementationStrategy.java +++ b/affected-tests-core/src/main/java/io/affectedtests/core/discovery/ImplementationStrategy.java @@ -8,6 +8,7 @@ import com.github.javaparser.ast.body.TypeDeclaration; import com.github.javaparser.ast.type.ClassOrInterfaceType; import io.affectedtests.core.config.AffectedTestsConfig; +import io.affectedtests.core.util.LogSanitizer; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -196,9 +197,16 @@ private Set findImplementations(Set changedClasses, for (ClassOrInterfaceType supertype : supertypesOf(decl)) { if (targetsBySimpleName.containsKey(supertype.getNameAsString())) { newImpls.add(implFqn); + // {@code implFqn} and {@code supertype.getNameAsString()} + // both originate from the scanned source tree, + // which on a merge-gate run is attacker- + // influenced. DEBUG-level, but sanitised for + // forgery-resistance when ops drop the level. log.debug("[impl] depth {}: {} ({}) extends/implements {}", - depth + 1, implFqn, decl.getClass().getSimpleName(), - supertype.getNameAsString()); + depth + 1, + LogSanitizer.sanitize(implFqn), + decl.getClass().getSimpleName(), + LogSanitizer.sanitize(supertype.getNameAsString())); break; } } @@ -289,9 +297,16 @@ private static List supertypesOf(TypeDeclaration decl) // known supertypes, which is the correct conservative // fallback (cannot become a false impl match; at worst we // underselect for a single declaration). + // {@code decl.getNameAsString()} originates from an + // attacker-controllable source file. This WARN surfaces at + // the default log level, so route through LogSanitizer to + // keep the log-forgery contract from v1.9.19 honest. + // {@code decl.getClass().getSimpleName()} is a JavaParser + // internal constant and does not need sanitising. log.warn("Affected Tests: [impl] unknown TypeDeclaration subtype {} for {} — " + "supertype edges cannot be derived, skipping", - decl.getClass().getSimpleName(), decl.getNameAsString()); + decl.getClass().getSimpleName(), + LogSanitizer.sanitize(decl.getNameAsString())); } return result; } diff --git a/affected-tests-core/src/main/java/io/affectedtests/core/discovery/JavaParsers.java b/affected-tests-core/src/main/java/io/affectedtests/core/discovery/JavaParsers.java index 43ba06c..76f1abb 100644 --- a/affected-tests-core/src/main/java/io/affectedtests/core/discovery/JavaParsers.java +++ b/affected-tests-core/src/main/java/io/affectedtests/core/discovery/JavaParsers.java @@ -5,6 +5,7 @@ import com.github.javaparser.ParserConfiguration; import com.github.javaparser.ParserConfiguration.LanguageLevel; import com.github.javaparser.ast.CompilationUnit; +import io.affectedtests.core.util.LogSanitizer; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -92,15 +93,32 @@ static CompilationUnit parseOrWarn(JavaParser parser, Path file, String label) { String firstProblem = result.getProblems().isEmpty() ? "no diagnostics" : result.getProblems().get(0).getMessage(); + // Both {@code file} (attacker-committable filename from + // the scanned source tree) and {@code firstProblem} (parser + // diagnostic that can embed a source-code snippet sourced + // from an attacker-controlled file) flow into a default- + // visible WARN line. Sanitise both so an attacker who + // plants a file with a {@code \n} + fake authoritative + // status line in the name or content cannot forge CI log + // output. Label is an internal constant, not sanitised. log.warn("Affected Tests: [{}] failed to parse {} at language level {}: {}", - label, file, LANGUAGE_LEVEL, firstProblem); + label, + LogSanitizer.sanitize(String.valueOf(file)), + LANGUAGE_LEVEL, + LogSanitizer.sanitize(firstProblem)); return null; } catch (Exception e) { // Preserve the pre-v1.9.20 behaviour of degrading the // exception path to DEBUG (I/O races, file-deleted-under- // JGit, etc. are noisy on CI) while still surfacing the // much-more-common isSuccessful==false branch above. - log.debug("Affected Tests: [{}] error parsing {}: {}", label, file, e.getMessage()); + // Sanitise the same way as the WARN branch so that if an + // operator bumps the level to DEBUG to investigate, the + // log cannot be retroactively forged. + log.debug("Affected Tests: [{}] error parsing {}: {}", + label, + LogSanitizer.sanitize(String.valueOf(file)), + LogSanitizer.sanitize(e.getMessage())); return null; } } diff --git a/affected-tests-core/src/main/java/io/affectedtests/core/discovery/NamingConventionStrategy.java b/affected-tests-core/src/main/java/io/affectedtests/core/discovery/NamingConventionStrategy.java index 80c6868..53b588b 100644 --- a/affected-tests-core/src/main/java/io/affectedtests/core/discovery/NamingConventionStrategy.java +++ b/affected-tests-core/src/main/java/io/affectedtests/core/discovery/NamingConventionStrategy.java @@ -1,6 +1,7 @@ package io.affectedtests.core.discovery; import io.affectedtests.core.config.AffectedTestsConfig; +import io.affectedtests.core.util.LogSanitizer; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -60,7 +61,12 @@ private Set matchTests(Set changedProductionClasses, Set for (var entry : expectedTestNames.entrySet()) { if (entry.getValue().contains(testSimpleName)) { discoveredTests.add(testFqn); - log.debug("Naming match: {} → {}", entry.getKey(), testFqn); + // Both keys (changed production FQNs from the diff) + // and test FQNs (from the scanned source tree) are + // attacker-influenced on a merge-gate run. + log.debug("Naming match: {} → {}", + LogSanitizer.sanitize(entry.getKey()), + LogSanitizer.sanitize(testFqn)); } } } diff --git a/affected-tests-core/src/main/java/io/affectedtests/core/discovery/SourceFileScanner.java b/affected-tests-core/src/main/java/io/affectedtests/core/discovery/SourceFileScanner.java index f01e420..ccbabf6 100644 --- a/affected-tests-core/src/main/java/io/affectedtests/core/discovery/SourceFileScanner.java +++ b/affected-tests-core/src/main/java/io/affectedtests/core/discovery/SourceFileScanner.java @@ -1,5 +1,6 @@ package io.affectedtests.core.discovery; +import io.affectedtests.core.util.LogSanitizer; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -53,20 +54,65 @@ public static List collectJavaFiles(Path dir) { /** * Recursively collects all {@code .java} files under {@code dir}, * appending them to the provided list. + * + *

Symlinks, at either the directory or file level, are never + * followed or added to the result. The directory-level + * containment check in {@link #findAllMatchingDirs} already + * rejects {@code src/main/java -> /} planted by an attacker MR, + * but a more subtle attack — committing + * {@code src/main/java/com/x/Leak.java -> /etc/passwd} — slid + * past that guard and the file would then be parsed by + * JavaParser, with its contents potentially surfaced in + * {@code --explain} samples or in parse-failure WARN output. The + * in-visitor {@link BasicFileAttributes#isSymbolicLink()} check + * closes that hole. This is deliberately stricter than the + * directory-level containment check: we reject every + * file-level symlink unconditionally, even intra-repo ones whose + * real path stays inside the project tree. The rationale is + * cost/benefit — a per-file {@code toRealPath} call on every + * {@code .java} entry would add a syscall to the hot path for + * negligible benefit, and the usability cost of dropping + * intra-repo file symlinks is low (Git doesn't preserve hard + * links across clone either, so tooling that relies on + * canonical-reference aliasing already has to handle the "regular + * file" case). Projects that currently commit file-level symlinks + * for tooling reasons will see them silently ignored by + * discovery. + * + *

A per-subtree {@link IOException} (permission denied on one + * nested directory) is logged and swallowed so the scan + * continues; before v1.9.21 the default {@link SimpleFileVisitor} + * rethrew and the outer catch truncated the walk silently — any + * readable siblings of the unreadable subtree vanished from + * discovery without any signal to the operator. */ public static void collectJavaFiles(Path dir, List result) { try { Files.walkFileTree(dir, new SimpleFileVisitor<>() { @Override public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) { + if (attrs.isSymbolicLink()) { + return FileVisitResult.CONTINUE; + } if (file.toString().endsWith(".java")) { result.add(file); } return FileVisitResult.CONTINUE; } + + @Override + public FileVisitResult visitFileFailed(Path file, IOException exc) { + log.warn("Affected Tests: skipping unreadable entry {} under {}: {}", + LogSanitizer.sanitize(String.valueOf(file)), + LogSanitizer.sanitize(String.valueOf(dir)), + LogSanitizer.sanitize(exc.getMessage())); + return FileVisitResult.CONTINUE; + } }); } catch (IOException e) { - log.warn("Error collecting Java files from {}: {}", dir, e.getMessage()); + log.warn("Affected Tests: error collecting Java files from {}: {}", + LogSanitizer.sanitize(String.valueOf(dir)), + LogSanitizer.sanitize(e.getMessage())); } } @@ -154,6 +200,15 @@ private static void walkFqnsUnder(Path sourceRoot, Map accumulator Files.walkFileTree(sourceRoot, new SimpleFileVisitor<>() { @Override public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) { + // Same symlink + containment contract as + // collectJavaFiles: file symlinks are never walked + // because the real path can escape the project + // root (DoS + data-exfiltration surface when the + // plugin runs on merge-gate CI against an + // attacker-influenced branch). + if (attrs.isSymbolicLink()) { + return FileVisitResult.CONTINUE; + } if (file.toString().endsWith(".java")) { Path relative = sourceRoot.relativize(file); String fqn = relative.toString() @@ -166,9 +221,20 @@ public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) { } return FileVisitResult.CONTINUE; } + + @Override + public FileVisitResult visitFileFailed(Path file, IOException exc) { + log.warn("Affected Tests: skipping unreadable entry {} under {}: {}", + LogSanitizer.sanitize(String.valueOf(file)), + LogSanitizer.sanitize(String.valueOf(sourceRoot)), + LogSanitizer.sanitize(exc.getMessage())); + return FileVisitResult.CONTINUE; + } }); } catch (IOException e) { - log.warn("Error scanning directory {}: {}", sourceRoot, e.getMessage()); + log.warn("Affected Tests: error scanning directory {}: {}", + LogSanitizer.sanitize(String.valueOf(sourceRoot)), + LogSanitizer.sanitize(e.getMessage())); } } @@ -290,12 +356,38 @@ public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) { return FileVisitResult.CONTINUE; } + + @Override + public FileVisitResult visitFileFailed(Path file, IOException exc) { + // Per-subtree permission / race errors must not + // truncate the entire module discovery walk. Pre- + // v1.9.21 the default SimpleFileVisitor rethrew + // here and the outer catch swallowed the whole + // remaining walk — any modules living next to an + // unreadable subtree were silently invisible. + log.warn("Affected Tests: skipping unreadable entry {} under {}: {}", + LogSanitizer.sanitize(String.valueOf(file)), + LogSanitizer.sanitize(String.valueOf(projectDir)), + LogSanitizer.sanitize(exc.getMessage())); + return FileVisitResult.CONTINUE; + } }); } catch (IOException e) { - log.warn("Error searching for '{}' under {}: {}", relativeDir, projectDir, e.getMessage()); + log.warn("Affected Tests: error searching for '{}' under {}: {}", + LogSanitizer.sanitize(relativeDir), + LogSanitizer.sanitize(String.valueOf(projectDir)), + LogSanitizer.sanitize(e.getMessage())); } - log.debug("Found {} directories matching '{}' under {}", matches.size(), relativeDir, projectDir); + // relativeDir and projectDir are Gradle-config-sourced and trusted by + // our threat model, but sanitising at DEBUG too keeps the log-forgery + // contract uniform across every visibility level and matches the WARN + // path a few lines up. One less thing for a future contributor to get + // subtly wrong. + log.debug("Found {} directories matching '{}' under {}", + matches.size(), + LogSanitizer.sanitize(relativeDir), + LogSanitizer.sanitize(String.valueOf(projectDir))); return matches; } diff --git a/affected-tests-core/src/main/java/io/affectedtests/core/discovery/UsageStrategy.java b/affected-tests-core/src/main/java/io/affectedtests/core/discovery/UsageStrategy.java index ca0e36f..8829f4e 100644 --- a/affected-tests-core/src/main/java/io/affectedtests/core/discovery/UsageStrategy.java +++ b/affected-tests-core/src/main/java/io/affectedtests/core/discovery/UsageStrategy.java @@ -10,6 +10,7 @@ import com.github.javaparser.ast.expr.ObjectCreationExpr; import com.github.javaparser.ast.type.ClassOrInterfaceType; import io.affectedtests.core.config.AffectedTestsConfig; +import io.affectedtests.core.util.LogSanitizer; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -92,7 +93,7 @@ private Set scanTestFiles(Set changedProductionClasses, if (testReferencesChangedClass(cu, changedFqns, simpleNames, simpleNameToFqns)) { discoveredTests.add(testFqn); - log.debug("Usage match: {}", testFqn); + log.debug("Usage match: {}", LogSanitizer.sanitize(testFqn)); } } @@ -150,15 +151,26 @@ private boolean testReferencesChangedClass(CompilationUnit cu, // `c.d.Outer` (PathToClassMapper is file-based, so it only // surfaces the outer FQN for the nested class's change). Without // this, a test that only uses the inner class is silently missed. + // All {@code changedFqn} and {@code imported} values in the + // Tier 1 / 1b / 2 / 3 blocks below are diff-derived and may + // legitimately carry odd but-valid characters that still need + // control-char sanitisation before they hit the logger — a + // malicious MR can craft an import line like + // {@code import com.evil.\u001b[m;}. Sanitisation is applied + // even at DEBUG because operators bumping level to chase a + // false-positive selection is exactly when forgery-resistance + // matters most. for (String changedFqn : changedFqns) { if (importedFqns.contains(changedFqn)) { - log.debug(" Direct import match: {}", changedFqn); + log.debug(" Direct import match: {}", LogSanitizer.sanitize(changedFqn)); return true; } String innerPrefix = changedFqn + "."; for (String imported : importedFqns) { if (imported.startsWith(innerPrefix)) { - log.debug(" Inner-class import match: {} <- {}", changedFqn, imported); + log.debug(" Inner-class import match: {} <- {}", + LogSanitizer.sanitize(changedFqn), + LogSanitizer.sanitize(imported)); return true; } } @@ -185,14 +197,16 @@ private boolean testReferencesChangedClass(CompilationUnit cu, // failed, silently dropping the consumer's coverage. for (String changedFqn : changedFqns) { if (wildcardPackages.contains(changedFqn)) { - log.debug(" Wildcard class-member import match: {}", changedFqn); + log.debug(" Wildcard class-member import match: {}", + LogSanitizer.sanitize(changedFqn)); return true; } String pkg = SourceFileScanner.packageOf(changedFqn); if (wildcardPackages.contains(pkg)) { String simpleName = SourceFileScanner.simpleClassName(changedFqn); if (typeNameAppearsInAst(cu, simpleName)) { - log.debug(" Wildcard package import + type ref match: {}", changedFqn); + log.debug(" Wildcard package import + type ref match: {}", + LogSanitizer.sanitize(changedFqn)); return true; } } @@ -207,7 +221,8 @@ private boolean testReferencesChangedClass(CompilationUnit cu, if (testPackage.equals(changedPkg)) { String simpleName = SourceFileScanner.simpleClassName(changedFqn); if (typeNameAppearsInAst(cu, simpleName)) { - log.debug(" Same-package type ref match: {}", changedFqn); + log.debug(" Same-package type ref match: {}", + LogSanitizer.sanitize(changedFqn)); return true; } } @@ -231,7 +246,9 @@ private boolean testReferencesChangedClass(CompilationUnit cu, if (scoped == null || !scoped.contains(".")) continue; for (String changedFqn : changedFqns) { if (scoped.equals(changedFqn) || scoped.startsWith(changedFqn + ".")) { - log.debug(" Inline fully-qualified reference: {} -> {}", scoped, changedFqn); + log.debug(" Inline fully-qualified reference: {} -> {}", + LogSanitizer.sanitize(scoped), + LogSanitizer.sanitize(changedFqn)); return true; } } diff --git a/affected-tests-core/src/main/java/io/affectedtests/core/git/GitChangeDetector.java b/affected-tests-core/src/main/java/io/affectedtests/core/git/GitChangeDetector.java index 13f7f76..ad0d2f9 100644 --- a/affected-tests-core/src/main/java/io/affectedtests/core/git/GitChangeDetector.java +++ b/affected-tests-core/src/main/java/io/affectedtests/core/git/GitChangeDetector.java @@ -1,6 +1,7 @@ package io.affectedtests.core.git; import io.affectedtests.core.config.AffectedTestsConfig; +import io.affectedtests.core.util.LogSanitizer; import org.eclipse.jgit.api.Git; import org.eclipse.jgit.api.errors.GitAPIException; import org.eclipse.jgit.diff.DiffEntry; @@ -103,17 +104,26 @@ public Set detectChangedFiles() { + "actions/checkout (GitHub) / GIT_DEPTH: 0 (GitLab).", e); } catch (IOException e) { + // JGit IOException messages can embed pack-file paths, ref names, + // or (on PackInvalidException / CorruptObjectException) raw bytes + // from the object database — all attacker-influenceable on an MR + // branch. Sanitise before embedding so the exception message, + // which Gradle renders verbatim into the build log, cannot carry + // a forged status line. throw new IllegalStateException( "Affected Tests: I/O error while reading git metadata at '" + projectDir + "': " - + e.getMessage(), e); + + LogSanitizer.sanitize(e.getMessage()), e); } catch (GitAPIException e) { throw new IllegalStateException( "Affected Tests: git diff against '" + config.baseRef() + "' failed: " - + e.getMessage(), e); + + LogSanitizer.sanitize(e.getMessage()), e); } log.info("Detected {} changed files", changedFiles.size()); - changedFiles.forEach(f -> log.debug(" Changed: {}", f)); + // File paths come straight from git diff entries on an + // attacker-controllable MR branch, so sanitise even at DEBUG + // — same rationale as every other diff-sourced log site. + changedFiles.forEach(f -> log.debug(" Changed: {}", LogSanitizer.sanitize(f))); return changedFiles; } diff --git a/affected-tests-core/src/main/java/io/affectedtests/core/mapping/PathToClassMapper.java b/affected-tests-core/src/main/java/io/affectedtests/core/mapping/PathToClassMapper.java index 34b18d0..074256d 100644 --- a/affected-tests-core/src/main/java/io/affectedtests/core/mapping/PathToClassMapper.java +++ b/affected-tests-core/src/main/java/io/affectedtests/core/mapping/PathToClassMapper.java @@ -140,9 +140,15 @@ public MappingResult mapChangedFiles(Set changedFiles) { // {@code ignorePaths} entry is a contract that nothing about // the file should influence the engine, including nudging it // into the out-of-scope bucket. + // All {@code filePath} values below originate from the git + // diff against an attacker-controllable MR branch, so every + // log site routes through {@link LogSanitizer}. Even + // DEBUG-level sites are worth sanitising because operators + // bumping to DEBUG to investigate incidents is exactly when + // log-forgery escalation is most likely to matter. if (isIgnored(filePath)) { ignoredFiles.add(filePath); - log.debug("Ignored by pattern: {}", filePath); + log.debug("Ignored by pattern: {}", LogSanitizer.sanitize(filePath)); continue; } @@ -154,12 +160,12 @@ public MappingResult mapChangedFiles(Set changedFiles) { if (matchesAny(filePath, outOfScopeSourceMatchers) || matchesAny(filePath, outOfScopeTestMatchers)) { outOfScopeFiles.add(filePath); - log.debug("Out-of-scope: {}", filePath); + log.debug("Out-of-scope: {}", LogSanitizer.sanitize(filePath)); continue; } if (!filePath.endsWith(".java")) { - log.debug("Non-Java file flagged as unmapped: {}", filePath); + log.debug("Non-Java file flagged as unmapped: {}", LogSanitizer.sanitize(filePath)); unmappedChangedFiles.add(filePath); continue; } @@ -178,7 +184,9 @@ public MappingResult mapChangedFiles(Set changedFiles) { // safety net decide (FULL_SUITE in CI mode by default). String fileName = extractFileName(filePath); if ("module-info.java".equals(fileName) || "package-info.java".equals(fileName)) { - log.debug("Java marker file ({}) flagged as unmapped: {}", fileName, filePath); + log.debug("Java marker file ({}) flagged as unmapped: {}", + LogSanitizer.sanitize(fileName), + LogSanitizer.sanitize(filePath)); unmappedChangedFiles.add(filePath); continue; } @@ -187,7 +195,9 @@ public MappingResult mapChangedFiles(Set changedFiles) { if (testFqn != null) { testClasses.add(testFqn); changedTestFiles.add(filePath); - log.debug("Mapped test file: {} → {}", filePath, testFqn); + log.debug("Mapped test file: {} → {}", + LogSanitizer.sanitize(filePath), + LogSanitizer.sanitize(testFqn)); continue; } @@ -195,13 +205,16 @@ public MappingResult mapChangedFiles(Set changedFiles) { if (prodFqn != null) { productionClasses.add(prodFqn); changedProductionFiles.add(filePath); - log.debug("Mapped production file: {} → {}", filePath, prodFqn); + log.debug("Mapped production file: {} → {}", + LogSanitizer.sanitize(filePath), + LogSanitizer.sanitize(prodFqn)); continue; } // A .java file outside the configured source/test dirs — still // unmappable, still a potential safety escalation trigger. - log.debug("Java file outside configured source/test dirs flagged as unmapped: {}", filePath); + log.debug("Java file outside configured source/test dirs flagged as unmapped: {}", + LogSanitizer.sanitize(filePath)); unmappedChangedFiles.add(filePath); } @@ -288,7 +301,9 @@ private boolean isIgnored(String filePath) { // the engine routes it through the unmapped bucket and the // safety net picks up the oddity, rather than blowing up // the whole task with an unhandled InvalidPathException. - log.debug("Ignore check skipped for unparseable path {}: {}", filePath, e.getMessage()); + log.debug("Ignore check skipped for unparseable path {}: {}", + LogSanitizer.sanitize(filePath), + LogSanitizer.sanitize(e.getMessage())); return false; } for (PathMatcher matcher : ignoreMatchers) { diff --git a/affected-tests-core/src/test/java/io/affectedtests/core/config/AffectedTestsConfigTest.java b/affected-tests-core/src/test/java/io/affectedtests/core/config/AffectedTestsConfigTest.java index b262488..631e729 100644 --- a/affected-tests-core/src/test/java/io/affectedtests/core/config/AffectedTestsConfigTest.java +++ b/affected-tests-core/src/test/java/io/affectedtests/core/config/AffectedTestsConfigTest.java @@ -242,6 +242,86 @@ void baseRefRejectsPathTraversal() { AffectedTestsConfig.builder().baseRef("refs/../../../secrets").build()); } + @Test + void baseRefRejectsControlCharsInReflogSuffix() { + // Regression for v1.9.21 #14: the REV_EXPR pattern's reflog + // segment is {@code @\{[^}]+\}}, which happily matches + // newline / ESC / CSI / DEL. Pre-fix these inputs passed + // {@link AffectedTestsConfig.Builder#isAcceptableBaseRef} and + // then flowed unsanitised into + // log.info("Base ref: {}", config.baseRef()) + // inside AffectedTestsEngine and into three IllegalStateException + // messages in GitChangeDetector — a log-forgery surface when + // baseRef came from an attacker-controlled CI env var. + // + // The fix is a blanket control-char rejection at the entry gate + // of isAcceptableBaseRef, so every downstream matcher (short + // SHA, refname, rev-expression) refuses tainted input. + String fakePluginStatus = "master@{1\n" + + "\u001b[2JAffected Tests: SELECTED (DISCOVERY_SUCCESS) — 0 tests\u001b[m}"; + assertThrows(IllegalArgumentException.class, + () -> AffectedTestsConfig.builder().baseRef(fakePluginStatus).build(), + "baseRef containing newline + ANSI inside @{…} must be rejected at " + + "validation time; otherwise an attacker could forge plugin-branded " + + "CI status lines by poisoning CI_BASE_REF"); + assertThrows(IllegalArgumentException.class, + () -> AffectedTestsConfig.builder().baseRef("master@{1\n}").build(), + "Bare newline in reflog segment must also be rejected"); + assertThrows(IllegalArgumentException.class, + () -> AffectedTestsConfig.builder().baseRef("master@{\u001b}").build(), + "Bare ESC in reflog segment must be rejected"); + assertThrows(IllegalArgumentException.class, + () -> AffectedTestsConfig.builder().baseRef("master@{\u007F}").build(), + "Bare DEL in reflog segment must be rejected"); + assertThrows(IllegalArgumentException.class, + () -> AffectedTestsConfig.builder().baseRef("master\r\noops").build(), + "CRLF anywhere in the baseRef must be rejected pre-regex"); + } + + @Test + void baseRefValidationErrorDoesNotEchoRawControlChars() { + // Regression for v1.9.21 review finding T2-1: the reject branch of + // isAcceptableBaseRef throws IllegalArgumentException whose message + // embeds the offending baseRef. Gradle renders that message verbatim + // into the build log (and into build-scan HTML), so echoing raw + // newline/ESC/CSI/DEL would reintroduce the exact log-forgery + // surface that #14 closed on the accept path — just via the + // validation-failure path instead of log.info("Base ref: …"). + // + // LogSanitizer must wrap the echoed value so the thrown message + // contains no C0, C1, or DEL characters. + String tainted = "master@{1\n\u001b[2JAffected Tests: SELECTED\u001b[m}"; + IllegalArgumentException thrown = assertThrows( + IllegalArgumentException.class, + () -> AffectedTestsConfig.builder().baseRef(tainted).build()); + String message = thrown.getMessage(); + assertNotNull(message, "exception must carry a diagnostic message"); + // Each forbidden byte class gets its own assertion so a regression + // that partially re-opens the surface (say, closes newline but leaks + // ESC) still fails loudly with a pointed message. + assertFalse(message.indexOf('\n') >= 0, + "baseRef validation message must not echo raw newline: " + debugEscape(message)); + assertFalse(message.indexOf('\r') >= 0, + "baseRef validation message must not echo raw CR: " + debugEscape(message)); + assertFalse(message.indexOf('\u001b') >= 0, + "baseRef validation message must not echo raw ESC: " + debugEscape(message)); + assertFalse(message.indexOf('\u007F') >= 0, + "baseRef validation message must not echo raw DEL: " + debugEscape(message)); + } + + private static String debugEscape(String s) { + StringBuilder out = new StringBuilder(s.length() + 8); + for (int i = 0; i < s.length(); i++) { + char c = s.charAt(i); + if (c < 0x20 || c == 0x7F || (c >= 0x80 && c <= 0x9F)) { + out.append(String.format("\\u%04X", (int) c)); + } else { + out.append(c); + } + } + return out.toString(); + } + @Test void baseRefAllowsValidRefs() { assertDoesNotThrow(() -> AffectedTestsConfig.builder().baseRef("origin/master").build()); diff --git a/affected-tests-core/src/test/java/io/affectedtests/core/discovery/SourceFileScannerTest.java b/affected-tests-core/src/test/java/io/affectedtests/core/discovery/SourceFileScannerTest.java index 64534b2..64d552f 100644 --- a/affected-tests-core/src/test/java/io/affectedtests/core/discovery/SourceFileScannerTest.java +++ b/affected-tests-core/src/test/java/io/affectedtests/core/discovery/SourceFileScannerTest.java @@ -163,6 +163,71 @@ void acceptsSymlinkThatStaysInsideProjectRoot() throws IOException { "Symlink resolving to a path under the project root must still be accepted"); } + @Test + void rejectsFileLevelSymlinkEscapingProjectRoot(@TempDir Path outside) throws IOException { + // Threat model (v1.9.19 closed the directory-level leg of this + // attack; v1.9.21 closes the per-file leg): even when + // `src/main/java` is a normal directory inside the project, + // an attacker MR can still commit + // `src/main/java/com/x/Leak.java -> /etc/passwd` + // and the pre-fix scanner would happily enumerate the file, + // hand it to JavaParser, and surface the filename in the + // discovery pipeline. `visitFile` now rejects symlinks + // unconditionally — a regular {@code .java} file whose real + // path legitimately lives inside the tree is just a regular + // file, not a symlink. + Path pkg = tempDir.resolve("src/main/java/com/x"); + Files.createDirectories(pkg); + Files.writeString(pkg.resolve("Ok.java"), "package com.x;\nclass Ok {}"); + + Path attackTarget = outside.resolve("Leak.java"); + Files.writeString(attackTarget, "package evil;\nclass Leak {}"); + + Path fileLink = pkg.resolve("Leak.java"); + try { + Files.createSymbolicLink(fileLink, attackTarget); + } catch (UnsupportedOperationException | FileSystemException e) { + return; + } + + List collected = SourceFileScanner.collectJavaFiles( + tempDir.resolve("src/main/java")); + + assertEquals(1, collected.size(), + "Only the real Ok.java should be collected; the symlinked Leak.java " + + "must be skipped even if its real path is outside the project root"); + assertTrue(collected.get(0).getFileName().toString().equals("Ok.java")); + } + + @Test + void scansTestFqnsIgnoresFileLevelSymlinks(@TempDir Path outside) throws IOException { + // Same hardening contract as above, but for the path that + // feeds {@link ProjectIndex#testFqns()} — the two visitors + // must agree or an attacker could plant a symlinked file + // that is invisible to `collectTestFiles` but visible to + // `scanTestFqns` (or vice versa), producing inconsistent + // bucket counts that are hard to debug. + Path pkg = tempDir.resolve("src/test/java/com/x"); + Files.createDirectories(pkg); + Files.writeString(pkg.resolve("OkTest.java"), "package com.x;\nclass OkTest {}"); + + Path attackTarget = outside.resolve("LeakTest.java"); + Files.writeString(attackTarget, "package evil;\nclass LeakTest {}"); + Path fileLink = pkg.resolve("LeakTest.java"); + try { + Files.createSymbolicLink(fileLink, attackTarget); + } catch (UnsupportedOperationException | FileSystemException e) { + return; + } + + var fqns = SourceFileScanner.scanTestFqns(tempDir, List.of("src/test/java")); + + assertTrue(fqns.contains("com.x.OkTest")); + assertFalse(fqns.contains("com.x.LeakTest"), + "Symlinked file must not contribute to test FQN discovery"); + assertEquals(1, fqns.size()); + } + @Test void scansTestFqnsFromDeeplyNestedModules() throws IOException { Path depth1 = tempDir.resolve("api/src/test/java/com/example");