From 1426d65d41943248cd578174933b8643d4f9c79c Mon Sep 17 00:00:00 2001 From: vedanthvasudev Date: Wed, 22 Apr 2026 17:49:38 +0100 Subject: [PATCH] chore/plugin-wide-review: apply plugin-wide code-review findings (T1+T2+T3) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit A four-reviewer sweep (correctness, security, maintainability, simplicity) against the plugin source tree as it landed in v1.9.18 surfaced ten actionable findings. This PR lands all of them in a single change so v1.9.19 carries the security-critical, correctness, and polish fixes together. Tier 1 — safety-critical: * TransitiveStrategy no longer drops consumer tests when a production class is deleted by the MR (`git rm FooService.java` MRs were silently losing coverage on every consumer). * SourceFileScanner now rejects symlinks whose real path escapes the project root, closing both a DoS (walking `/`) and an information- leak surface (`.java` filenames flowing into --explain / WARN). * New LogSanitizer escapes C0 + DEL + C1 control characters; every filename-derived log site (four total: --explain sample, engine unmapped-file WARN, AffectedTestTask malformed-FQN WARN, engine per-test INFO) now routes through it so a hostile MR cannot forge log lines or hide real warnings with ANSI escapes. * AffectedTestsConfig#effectiveMode() now always returns a concrete Mode (falls back to Builder.detectMode()), matching the Javadoc contract that downstream switches depend on. * OutOfScopeMatchers literal branch now matches path.equals(bare) and path.endsWith("/" + bare) — ProjectIndex relativises paths without a trailing slash, which was exactly the shape the matcher was missing. Tier 2 — correctness gaps that silently dropped tests: * UsageStrategy now selects tests that reference the changed class via inner-class imports (import x.Outer.Inner) or static imports (import static x.Constants.MAX), including the wildcard form. * ImplementationStrategy now performs a fixpoint subtype-closure walk bounded by transitiveDepth, so grandchildren of a changed interface (A <- B <- C) are discovered and C's tests run when A changes. Tier 3 — polish in adjacent code: * PathToClassMapper rejects path-traversal segments (`..`) in diff input (segment-aware, not substring) and routes them to the unmapped bucket at debug level. * AffectedTestsConfig baseRef validation now delegates to JGit's Repository.isValidRefName plus a SHA/rev-expression whitelist, replacing the brittle `contains("..")` rule. * Deleted dead PathToClassMapper#extractModule + its 3 unit tests; the method had no in-tree caller. * LogSanitizer gained null-tolerance on joinSanitized and Locale.ROOT on its hex format; ImplementationStrategy deep-copies its simple-name-to-FQN map so fixpoint mutations can't leak. CHANGELOG entry describes every listed fix and the behaviour change for operators. ./gradlew check passes (unit + functional). --- CHANGELOG.md | 117 ++++++++++++++++ .../core/AffectedTestsEngine.java | 19 ++- .../core/config/AffectedTestsConfig.java | 84 ++++++++++-- .../discovery/ImplementationStrategy.java | 89 +++++++++--- .../core/discovery/SourceFileScanner.java | 57 +++++++- .../core/discovery/TransitiveStrategy.java | 24 +++- .../core/discovery/UsageStrategy.java | 52 ++++++- .../core/mapping/OutOfScopeMatchers.java | 34 ++++- .../core/mapping/PathToClassMapper.java | 65 ++++----- .../affectedtests/core/util/LogSanitizer.java | 106 ++++++++++++++ .../core/config/AffectedTestsConfigTest.java | 41 ++++++ .../discovery/ImplementationStrategyTest.java | 33 +++++ .../core/discovery/SourceFileScannerTest.java | 60 ++++++++ .../discovery/TransitiveStrategyTest.java | 38 ++++++ .../core/discovery/UsageStrategyTest.java | 85 ++++++++++++ .../core/mapping/OutOfScopeMatchersTest.java | 25 ++++ .../core/mapping/PathToClassMapperTest.java | 62 ++++++--- .../core/util/LogSanitizerTest.java | 129 ++++++++++++++++++ .../gradle/AffectedTestTask.java | 43 ++++-- 19 files changed, 1049 insertions(+), 114 deletions(-) create mode 100644 affected-tests-core/src/main/java/io/affectedtests/core/util/LogSanitizer.java create mode 100644 affected-tests-core/src/test/java/io/affectedtests/core/util/LogSanitizerTest.java diff --git a/CHANGELOG.md b/CHANGELOG.md index 74bb677..ab392d1 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,6 +6,123 @@ adheres to [Semantic Versioning](https://semver.org/). ## [Unreleased] +### Fixed — post-v1.9.18 code-review pass + +A four-reviewer sweep (correctness, security, maintainability, simplicity) +ran against the plugin source tree as it landed in v1.9.18. Ten findings +were triaged into two tiers — safety-critical (Tier 1) and correctness +gaps that silently dropped tests in real hierarchies (Tier 2) — and both +tiers are resolved in this release. Tier 3 polish items ship in the same +PR because they touch adjacent code and keep v2.0 from carrying dead +weight. + +- **Transitive strategy no longer drops consumer tests when a production + class is deleted.** `TransitiveStrategy.buildReverseDependencyMap` + built its `allKnownFqns` set purely from files currently on disk. + A pure `git rm FooService.java` MR ended up with `FooService` in the + changed set (surfaced by `GitChangeDetector` via the old path) but + nowhere in the reverse map, so the consumers of `FooService` — the + only tests that would have caught the breakage — were silently + excluded. The fix unions `changedProductionClasses` into + `allKnownFqns` before walking. Regression lives in + `TransitiveStrategyTest#discoversConsumerTestsForDeletedProductionClass`. +- **`SourceFileScanner` no longer follows symlinks out of the project + root.** An MR that introduced `src/main/java` as a symlink to `/` + (or any path outside the project) would walk the victim's filesystem, + enumerating arbitrary `.java` files in the scanner's match list and + inflating the scan to the point of DoS on laptops. Matched filenames + then flowed into `--explain` samples and the WARN log, which is where + the leak would have become visible to the attacker on CI. + `findAllMatchingDirs` now canonicalises every candidate via + `Path.toRealPath()` and rejects anything whose real path does not + start with the project root's real path. Two regressions: + `rejectsSymlinkEscapingProjectRoot` covers the exploit; + `acceptsSymlinkThatStaysInsideProjectRoot` prevents the cure from + being worse than the disease for monorepos that symlink between + sibling modules. +- **Log output is now sanitised.** Filenames from the git diff can + legally contain newlines, carriage returns, and ANSI escape bytes — + JGit faithfully preserves them. Before this release the + `--explain` sample lines, the unmapped-file WARN, the malformed-FQN + WARN in the Gradle task, and the per-test INFO line in the engine + all fed those bytes straight into SLF4J, which means a hostile MR + could forge whole log lines (including fake + `Affected Tests: SELECTED` headers) or colour legitimate log lines + to hide warnings from a human reviewer. New `LogSanitizer` escapes + the C0 range (NUL..US), DEL (0x7F), and the C1 range (0x80..0x9F, + which includes single-byte CSI) into visible `\\xHH` / `\\n` / `\\r` + / `\\t` forms, and every filename-derived log site — four in total + — now routes through it. +- **`AffectedTestsConfig#effectiveMode()` now always returns a concrete + mode.** The Javadoc promised one of `LOCAL`, `CI`, `STRICT`; the + getter returned `null` for zero-config callers, so the documented + switch shape NPE'd on anyone who followed the Javadoc to the letter. + The public getter now falls back to `Builder.detectMode()` when the + internal field is unset — preserving the nullable internal signal + that `resolveSituationActions` uses to distinguish "mode was + explicitly set" from "we inferred it" — and a regression test + (`effectiveModeIsAlwaysConcreteForZeroConfigCallers`) locks the + contract down. +- **`outOfScopeTestDirs`/`outOfScopeSourceDirs` now catch literal + entries whose value equals the resolved directory path.** The + matcher treated `sub/src/main/java` as a prefix only and relied on + `contains("/sub/src/main/java/")`. `ProjectIndex` relativises source + directories with `Path.relativize`, which never emits a trailing + slash, so the contain-check missed its only real input shape. The + literal branch now also accepts `path.equals(bare)` and + `path.endsWith("/" + bare)`. Regression: + `OutOfScopeMatchersTest#literalEntryMatchesExactDirectoryPath`. +- **`UsageStrategy` now selects tests that import inner classes or use + static imports of the changed class.** Pre-fix, `import x.Outer.Inner` + and `import static x.Constants.MAX` did not register as a reference + to `x.Outer` / `x.Constants`, so a genuinely affected test whose only + reference to the changed surface was a static constant was silently + dropped. `testReferencesChangedClass` now strips the last segment + from static imports (handling both the named-member and + `x.C.*` shapes) and prefix-matches inner-class imports against + changed FQNs. Three regressions cover the inner-class direct import, + the static-member import, and the static wildcard. +- **`ImplementationStrategy` now performs a fixpoint walk of the + subtype closure.** Given `interface A <- abstract B implements A + <- class C extends B`, only B is a direct subtype of A. A single + pass stopped at B, so `CTest` — the only concrete-implementor test + — was silently dropped whenever A changed. The strategy now seeds + each subsequent pass with the freshly-found impls, bounded by + `transitiveDepth` as a sanity cap. Regression: + `findsGrandchildImplementationThroughMultiLevelHierarchy`. +- **`PathToClassMapper` rejects path-traversal segments in diff + input.** Git never emits `..` as a standalone segment in diff paths, + so one appearing there is either a malformed ref or an attempt to + confuse the mapper. Pre-fix, `../../etc/passwd.java` was handed to + `tryMapToClass`, which produced an FQN starting with `..` and + classified it as production — a shape the test dispatcher would then + try to run. The segment check now routes those inputs to the + unmapped bucket (at `debug` level, so a malicious diff cannot flood + the build log), which trips the `UNMAPPED_FILE` situation and + escalates to a full run on the engine's normal WARN path. A + segment-aware check keeps legitimate filenames like `foo..bar.java` + working. +- **`baseRef` validation is now JGit-native.** The v1.x check rejected + the single shape `contains("../")` and accepted everything else — + including control characters, leading slashes, and trailing + `.lock`, all of which JGit's own refname rules already forbid. The + builder now delegates to `Repository.isValidRefName` (canonical + `refs/...` names), accepts 7-40 char hex SHAs directly, and accepts + short rev-expressions like `HEAD~3` and `master^2`. The + error message on rejection explicitly names the three accepted + shapes so consumers have something actionable instead of "suspicious + path traversal". + +### Removed + +- `PathToClassMapper#extractModule` and its three unit tests. The + method had no in-tree caller and the `extractModuleFromDirs` branch + under it duplicated logic that `tryMapToClass` already performs more + carefully. Dead code by construction; removing it keeps the mapper's + surface aligned with what the engine actually uses. + +## [1.9.18] + ### Changed — behaviour flip, read this - **`includeUncommitted` and `includeStaged` now default to `false` diff --git a/affected-tests-core/src/main/java/io/affectedtests/core/AffectedTestsEngine.java b/affected-tests-core/src/main/java/io/affectedtests/core/AffectedTestsEngine.java index 4490c63..6b7afa2 100644 --- a/affected-tests-core/src/main/java/io/affectedtests/core/AffectedTestsEngine.java +++ b/affected-tests-core/src/main/java/io/affectedtests/core/AffectedTestsEngine.java @@ -7,6 +7,7 @@ import io.affectedtests.core.git.GitChangeDetector; import io.affectedtests.core.mapping.PathToClassMapper; import io.affectedtests.core.mapping.PathToClassMapper.MappingResult; +import io.affectedtests.core.util.LogSanitizer; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -224,9 +225,16 @@ public AffectedTestsResult run() { } if (!mapping.unmappedChangedFiles().isEmpty()) { Action action = config.actionFor(Situation.UNMAPPED_FILE); + // Filenames flow from the untrusted MR tree straight into the + // logger here — sanitise or an attacker committing a file + // with `\n` + fake plugin status line can forge CI audit + // output. See LogSanitizer for the full rationale. + List examples = mapping.unmappedChangedFiles().stream() + .limit(5) + .map(LogSanitizer::sanitize) + .toList(); log.warn("Non-Java / unmapped change detected ({} file(s)). Action: {}. Examples: {}", - mapping.unmappedChangedFiles().size(), action, - mapping.unmappedChangedFiles().stream().limit(5).toList()); + mapping.unmappedChangedFiles().size(), action, examples); // SELECTED here means "ignore the unmapped file, proceed with // discovery on whatever production/test files were in the // diff" — this is the behaviour legacy @@ -290,7 +298,12 @@ public AffectedTestsResult run() { } log.info("=== Result: {} affected test classes ===", allTestsToRun.size()); - allTestsToRun.forEach(t -> log.info(" -> {}", t)); + // FQNs are derived from diff filenames by the discovery strategies + // and have not yet been through the AffectedTestTask#isValidFqn + // gate; sanitise here so an attacker-planted filename like + // `Test\nAffected Tests: SELECTED.java` can't forge a fake + // log line at INFO/--info level. + allTestsToRun.forEach(t -> log.info(" -> {}", LogSanitizer.sanitize(t))); return new AffectedTestsResult( allTestsToRun, diff --git a/affected-tests-core/src/main/java/io/affectedtests/core/config/AffectedTestsConfig.java b/affected-tests-core/src/main/java/io/affectedtests/core/config/AffectedTestsConfig.java index 707bf67..887a3f7 100644 --- a/affected-tests-core/src/main/java/io/affectedtests/core/config/AffectedTestsConfig.java +++ b/affected-tests-core/src/main/java/io/affectedtests/core/config/AffectedTestsConfig.java @@ -365,12 +365,29 @@ private static Action defaultFor(Situation s, Mode effectiveMode) { public Mode mode() { return mode; } /** - * The mode after {@link Mode#AUTO} resolution. Always one of - * {@link Mode#LOCAL}, {@link Mode#CI} or {@link Mode#STRICT}. + * The mode after {@link Mode#AUTO} resolution. Always returns one of + * {@link Mode#LOCAL}, {@link Mode#CI} or {@link Mode#STRICT} — never + * {@code null}. * - * @return the resolved mode + *

When the caller did not configure a mode at all, the internal + * situation-action resolver deliberately falls through to pre-v2 + * hardcoded defaults (preserving zero-config behaviour parity with + * the legacy API). The public getter still reports the mode that + * {@code AUTO} would have selected — either the value detected from + * the environment, or {@link Mode#LOCAL} when detection finds no + * CI markers — so callers reading this value get a concrete Mode + * that accurately describes the execution environment. + * + * @return the resolved mode, never {@code null} */ - public Mode effectiveMode() { return effectiveMode; } + public Mode effectiveMode() { + // Honest fallback for the "caller passed nothing, we stayed on + // pre-v2 defaults" branch — resolve via the same AUTO-detection + // path the builder would have taken. Keeps the public contract + // non-null without destabilising the resolver, which reads the + // nullable internal field directly. + return effectiveMode != null ? effectiveMode : Builder.detectMode(); + } /** * The {@link Action} the engine will take for a given {@link Situation}. @@ -508,17 +525,62 @@ public Builder baseRef(String baseRef) { if (baseRef == null || baseRef.isBlank()) { throw new IllegalArgumentException("baseRef must not be null or blank"); } - if (baseRef.contains("..") && baseRef.contains("/")) { - // Prevent path traversal in ref names like "../../etc/passwd" - // (normal refs like "origin/master" or SHAs are fine) - String normalized = baseRef.replace("\\", "/"); - if (normalized.contains("../")) { - throw new IllegalArgumentException("baseRef contains suspicious path traversal: " + baseRef); - } + if (!isAcceptableBaseRef(baseRef)) { + throw new IllegalArgumentException( + "baseRef is not a valid git ref, SHA, or short form: '" + + baseRef + "' — expected something like " + + "'origin/master', 'HEAD~1', or a 7-40 char hex SHA"); } this.baseRef = baseRef; return this; } + + private static final java.util.regex.Pattern SHORT_SHA = + java.util.regex.Pattern.compile("^[0-9a-fA-F]{7,40}$"); + // Covers HEAD, HEAD~N, HEAD^, HEAD^N, HEAD@{0}, master~2, etc. + // The refname validity of the left-hand side is delegated to + // JGit below; this pattern only validates the suffix grammar. + private static final java.util.regex.Pattern REV_EXPR = + java.util.regex.Pattern.compile("^([^~^@]+)([~^][0-9]*|@\\{[^}]+\\})+$"); + + /** + * Accepts any input JGit would successfully resolve against a real + * repository: canonical ref names (delegated to + * {@link org.eclipse.jgit.lib.Repository#isValidRefName(String)}), + * short/long SHAs, and {@code HEAD~N}/{@code ^N}/{@code @{N}} + * rev-expressions. Rejects path-traversal shapes by construction + * since JGit's refname validator already forbids {@code ..}, a + * leading {@code /}, and control characters — no hand-rolled + * string sniffing required. Keeping this decision inside the + * builder means every call site (CLI, Gradle plugin, programmatic + * users) gets the same contract without duplicating the rule. + */ + private static boolean isAcceptableBaseRef(String baseRef) { + if (SHORT_SHA.matcher(baseRef).matches()) { + return true; + } + if (org.eclipse.jgit.lib.Repository.isValidRefName(baseRef)) { + return true; + } + // Short ref names like `master` or `origin/master` are rejected + // by isValidRefName (which requires a leading `refs/...`), so + // accept them if they at least survive the refname rules when + // prefixed. This preserves the pre-v1.9.19 behaviour of + // accepting `origin/master` as a base ref. + if (org.eclipse.jgit.lib.Repository.isValidRefName("refs/heads/" + baseRef)) { + return true; + } + java.util.regex.Matcher rev = REV_EXPR.matcher(baseRef); + if (rev.matches()) { + String head = rev.group(1); + if ("HEAD".equals(head) + || org.eclipse.jgit.lib.Repository.isValidRefName(head) + || org.eclipse.jgit.lib.Repository.isValidRefName("refs/heads/" + head)) { + return true; + } + } + return false; + } public Builder includeUncommitted(boolean v) { this.includeUncommitted = v; return this; } public Builder includeStaged(boolean v) { this.includeStaged = v; return this; } public Builder runAllIfNoMatches(boolean v) { this.runAllIfNoMatches = v; return this; } diff --git a/affected-tests-core/src/main/java/io/affectedtests/core/discovery/ImplementationStrategy.java b/affected-tests-core/src/main/java/io/affectedtests/core/discovery/ImplementationStrategy.java index 21befb6..44c8516 100644 --- a/affected-tests-core/src/main/java/io/affectedtests/core/discovery/ImplementationStrategy.java +++ b/affected-tests-core/src/main/java/io/affectedtests/core/discovery/ImplementationStrategy.java @@ -132,36 +132,81 @@ private Set findImplementations(Set changedClasses, } } - // 2. AST scanning: find classes that extend/implement changed types + // 2. AST scanning: find classes that extend/implement changed types, + // iterated to a fixpoint so multi-level hierarchies are covered. + // + // Motivating case: `interface A` ← `abstract class B implements A` + // ← `class C extends B`, with `CTest` the only real test + // coverage of A's behaviour through the C implementation. When A + // changes, a single pass finds only B (because nothing declares + // `extends A` directly except B). Pre-fix, CTest was silently + // dropped. The fixpoint loop treats each newly-found impl as a + // fresh target for the next pass; the loop terminates when a + // pass adds nothing new, or when depth hits the configured + // transitiveDepth (reused as a sanity bound — in practice Java + // hierarchies are 2-3 deep). JavaParser fallbackParser = (index == null) ? new JavaParser() : null; - for (Path sourceFile : sourceFiles) { - CompilationUnit cu = parseOrGet(sourceFile, index, fallbackParser); - if (cu == null) continue; - - List declarations = - cu.findAll(ClassOrInterfaceDeclaration.class); - - for (ClassOrInterfaceDeclaration decl : declarations) { - for (ClassOrInterfaceType extended : decl.getExtendedTypes()) { - if (simpleNameToFqns.containsKey(extended.getNameAsString())) { - String implFqn = extractFqn(cu, decl); - if (implFqn != null) { - implementations.add(implFqn); - log.debug("[impl] {} extends {}", implFqn, extended.getNameAsString()); + // Deep-copy the inner sets so the fixpoint loop's subsequent + // `.add(implFqn)` calls don't leak mutations into + // simpleNameToFqns. The latter is unused after this point + // today, but a shared mutable reference is a footgun waiting + // for the next refactor. + Map> targetsBySimpleName = new HashMap<>(); + simpleNameToFqns.forEach((k, v) -> targetsBySimpleName.put(k, new HashSet<>(v))); + int depthCap = Math.max(1, config.transitiveDepth()); + for (int depth = 0; depth < depthCap; depth++) { + Set newImpls = new LinkedHashSet<>(); + for (Path sourceFile : sourceFiles) { + CompilationUnit cu = parseOrGet(sourceFile, index, fallbackParser); + if (cu == null) continue; + + List declarations = + cu.findAll(ClassOrInterfaceDeclaration.class); + + for (ClassOrInterfaceDeclaration decl : declarations) { + String implFqn = extractFqn(cu, decl); + if (implFqn == null || implementations.contains(implFqn)) { + // Already known — skip. `implementations.contains` + // is the loop's termination gate: once a class is + // in the set it stops seeding further passes. + continue; + } + for (ClassOrInterfaceType extended : decl.getExtendedTypes()) { + if (targetsBySimpleName.containsKey(extended.getNameAsString())) { + newImpls.add(implFqn); + log.debug("[impl] depth {}: {} extends {}", + depth + 1, implFqn, extended.getNameAsString()); + break; } } - } - for (ClassOrInterfaceType implemented : decl.getImplementedTypes()) { - if (simpleNameToFqns.containsKey(implemented.getNameAsString())) { - String implFqn = extractFqn(cu, decl); - if (implFqn != null) { - implementations.add(implFqn); - log.debug("[impl] {} implements {}", implFqn, implemented.getNameAsString()); + if (newImpls.contains(implFqn)) continue; + for (ClassOrInterfaceType implemented : decl.getImplementedTypes()) { + if (targetsBySimpleName.containsKey(implemented.getNameAsString())) { + newImpls.add(implFqn); + log.debug("[impl] depth {}: {} implements {}", + depth + 1, implFqn, implemented.getNameAsString()); + break; } } } } + + if (newImpls.isEmpty()) { + break; + } + implementations.addAll(newImpls); + // Seed the next pass with the newly-found impls keyed by + // simple name — subclasses of these impls will match on the + // next iteration. Previously-targeted names stay in the map + // so a single type can be discovered via two independent + // paths without losing an edge. + for (String implFqn : newImpls) { + targetsBySimpleName + .computeIfAbsent(SourceFileScanner.simpleClassName(implFqn), + k -> new HashSet<>()) + .add(implFqn); + } } return implementations; diff --git a/affected-tests-core/src/main/java/io/affectedtests/core/discovery/SourceFileScanner.java b/affected-tests-core/src/main/java/io/affectedtests/core/discovery/SourceFileScanner.java index 65ac02a..f01e420 100644 --- a/affected-tests-core/src/main/java/io/affectedtests/core/discovery/SourceFileScanner.java +++ b/affected-tests-core/src/main/java/io/affectedtests/core/discovery/SourceFileScanner.java @@ -244,6 +244,15 @@ public static Path findTestRoot(Path file, String testDir) { * *

Directories listed in {@link #SKIP_DIRS} are pruned during the walk. * + *

Candidates whose real path escapes the project root (typically + * because a module's source directory is a symlink planted by an + * attacker-controlled MR branch) are silently dropped — the plugin + * runs on merge-gate CI against untrusted trees, and a + * {@code src/main/java -> /} symlink would otherwise make + * {@link Files#walkFileTree} scan the entire CI runner's filesystem + * (DoS + path-structure leakage into {@code --explain} output). + * See {@link #stayInsideProjectRoot}. + * * @param projectDir the root project directory * @param relativeDir the directory suffix to look for (e.g. {@code "src/test/java"}) * @return list of absolute paths that exist and match @@ -251,8 +260,10 @@ public static Path findTestRoot(Path file, String testDir) { static List findAllMatchingDirs(Path projectDir, String relativeDir) { List matches = new ArrayList<>(); + Path projectRoot = realPathOrNull(projectDir); + Path rootMatch = projectDir.resolve(relativeDir); - if (Files.isDirectory(rootMatch)) { + if (stayInsideProjectRoot(rootMatch, projectRoot)) { matches.add(rootMatch); } @@ -270,7 +281,7 @@ public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) { } Path candidate = dir.resolve(relativeDir); - if (Files.isDirectory(candidate)) { + if (stayInsideProjectRoot(candidate, projectRoot)) { matches.add(candidate); // Don't descend into the source tree itself — there are no // sub-modules inside src/main/java/com/example/... @@ -287,4 +298,46 @@ public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) { log.debug("Found {} directories matching '{}' under {}", matches.size(), relativeDir, projectDir); return matches; } + + /** + * Returns {@code true} iff {@code candidate} exists, is a directory + * when dereferenced, and — after symlink resolution — still lives + * under {@code projectRoot}. The real-path containment check is the + * load-bearing guard against an attacker-planted + * {@code src/main/java -> /} symlink in a merge-gate MR; a plain + * {@link Files#isDirectory(Path, java.nio.file.LinkOption...)} call + * without {@link LinkOption#NOFOLLOW_LINKS} would follow the + * symlink and produce a match pointing at the CI runner's + * filesystem root, which a subsequent {@link #collectJavaFiles} + * would then happily enumerate. + * + *

When {@code projectRoot} is {@code null} (e.g. the project + * directory itself cannot be canonicalised) we fall back to the + * pre-hardening behaviour — {@link Files#isDirectory} alone — so + * legitimate users on unusual filesystems don't silently lose + * discovery. The symlink hazard is specific to CI contexts where + * the tree is attacker-controlled, and in that threat model + * {@code toRealPath} on the project root is always well-defined. + */ + static boolean stayInsideProjectRoot(Path candidate, Path projectRoot) { + if (!Files.isDirectory(candidate)) { + return false; + } + if (projectRoot == null) { + return true; + } + Path candidateReal = realPathOrNull(candidate); + if (candidateReal == null) { + return false; + } + return candidateReal.startsWith(projectRoot); + } + + private static Path realPathOrNull(Path p) { + try { + return p.toRealPath(); + } catch (IOException e) { + return null; + } + } } diff --git a/affected-tests-core/src/main/java/io/affectedtests/core/discovery/TransitiveStrategy.java b/affected-tests-core/src/main/java/io/affectedtests/core/discovery/TransitiveStrategy.java index 9e96896..3f20c89 100644 --- a/affected-tests-core/src/main/java/io/affectedtests/core/discovery/TransitiveStrategy.java +++ b/affected-tests-core/src/main/java/io/affectedtests/core/discovery/TransitiveStrategy.java @@ -73,7 +73,17 @@ private Set walkTransitives(Set changedProductionClasses, Set currentLevel = new LinkedHashSet<>(changedProductionClasses); Set allVisited = new LinkedHashSet<>(changedProductionClasses); - Map> dependencyMap = buildReverseDependencyMap(sourceFiles, index); + // The changed set is threaded into map construction so reverse edges + // pointing at a class that was just `git rm`'d still get recorded. + // Without this, a pure-delete MR (file on disk is gone, consumers + // still reference the FQN from their still-present sources) produces + // an empty `dependencyMap.get(deletedFqn)`, and the downstream tests + // that are now broken against the deleted symbol are silently skipped. + // GitChangeDetector already surfaces the old path for DELETEs to seed + // `changedProductionClasses`; this patch closes the other half of the + // contract on the index side. + Map> dependencyMap = + buildReverseDependencyMap(sourceFiles, index, changedProductionClasses); for (int depth = 1; depth <= config.transitiveDepth(); depth++) { Set nextLevel = new LinkedHashSet<>(); @@ -115,11 +125,16 @@ private Set walkTransitives(Set changedProductionClasses, * Builds a reverse dependency map: for each production class FQN, * lists the FQNs of other production classes that depend on it. */ - private Map> buildReverseDependencyMap(List sourceFiles, ProjectIndex index) { + private Map> buildReverseDependencyMap(List sourceFiles, + ProjectIndex index, + Set extraKnownFqns) { Map> reverseMap = new HashMap<>(); JavaParser fallbackParser = (index == null) ? new JavaParser() : null; - // First pass: collect all known FQNs so we can resolve simple names + // First pass: collect all known FQNs so we can resolve simple names. + // `extraKnownFqns` is unioned in so reverse edges to deleted classes + // (which no longer have a source file on disk) still resolve — see + // the caller's comment for the full rationale. Set allKnownFqns = new HashSet<>(); for (Path file : sourceFiles) { String fqn = pathToFqn(file); @@ -127,6 +142,9 @@ private Map> buildReverseDependencyMap(List sourceFile allKnownFqns.add(fqn); } } + if (extraKnownFqns != null) { + allKnownFqns.addAll(extraKnownFqns); + } // Second pass: for each source file, find field types and build reverse edges for (Path file : sourceFiles) { diff --git a/affected-tests-core/src/main/java/io/affectedtests/core/discovery/UsageStrategy.java b/affected-tests-core/src/main/java/io/affectedtests/core/discovery/UsageStrategy.java index c74a56e..f0d3172 100644 --- a/affected-tests-core/src/main/java/io/affectedtests/core/discovery/UsageStrategy.java +++ b/affected-tests-core/src/main/java/io/affectedtests/core/discovery/UsageStrategy.java @@ -128,19 +128,49 @@ private boolean testReferencesChangedClass(CompilationUnit cu, Set importedFqns = new HashSet<>(); Set wildcardPackages = new HashSet<>(); for (ImportDeclaration imp : cu.getImports()) { - if (imp.isAsterisk()) { - wildcardPackages.add(imp.getNameAsString()); + String name = imp.getNameAsString(); + if (imp.isStatic()) { + // Static imports are member-scoped, not type-scoped. The name + // reported by JavaParser for `import static a.b.C.MAX;` is + // `a.b.C.MAX` and for `import static a.b.C.*;` is `a.b.C` + // (with isAsterisk=true). The thing a test actually depends + // on in both cases is the class `a.b.C`, so we normalise + // back to the class FQN for direct-import matching. Before + // this, a test that only referenced a changed class through + // a `import static … .CONSTANT;` was silently missed + // because the name in importedFqns had a member suffix that + // never equalled any changedFqn. + String classFqn = imp.isAsterisk() + ? name + : stripLastSegment(name); + if (classFqn != null) { + importedFqns.add(classFqn); + } + } else if (imp.isAsterisk()) { + wildcardPackages.add(name); } else { - importedFqns.add(imp.getNameAsString()); + importedFqns.add(name); } } - // Tier 1: Direct import match + // Tier 1: Direct import match. `innerClassMatch` also fires when + // an import targets a nested class of the changed outer — e.g. + // the test writes `import c.d.Outer.Inner;` and the diff touches + // `c.d.Outer` (PathToClassMapper is file-based, so it only + // surfaces the outer FQN for the nested class's change). Without + // this, a test that only uses the inner class is silently missed. for (String changedFqn : changedFqns) { if (importedFqns.contains(changedFqn)) { log.debug(" Direct import match: {}", changedFqn); return true; } + String innerPrefix = changedFqn + "."; + for (String imported : importedFqns) { + if (imported.startsWith(innerPrefix)) { + log.debug(" Inner-class import match: {} <- {}", changedFqn, imported); + return true; + } + } } // Tier 1b: Wildcard import match @@ -201,6 +231,20 @@ private boolean typeNameAppearsInAst(CompilationUnit cu, String simpleName) { return false; } + /** + * Returns {@code name} with the final {@code .segment} removed. For + * {@code "a.b.C.MAX"} returns {@code "a.b.C"}; for a single-segment + * input returns {@code null} (the input is already as stripped as it + * can be and clearly wasn't a qualified member reference). + */ + static String stripLastSegment(String name) { + int idx = name.lastIndexOf('.'); + if (idx < 0) { + return null; + } + return name.substring(0, idx); + } + /** * Matches a type string against a simple name, handling generics. * Uses word-boundary matching to avoid false positives (e.g. "Id" matching "GridLayout"). diff --git a/affected-tests-core/src/main/java/io/affectedtests/core/mapping/OutOfScopeMatchers.java b/affected-tests-core/src/main/java/io/affectedtests/core/mapping/OutOfScopeMatchers.java index 4133f7a..dfbeca9 100644 --- a/affected-tests-core/src/main/java/io/affectedtests/core/mapping/OutOfScopeMatchers.java +++ b/affected-tests-core/src/main/java/io/affectedtests/core/mapping/OutOfScopeMatchers.java @@ -107,9 +107,39 @@ public static List> compile(List dirs, String configKe } }); } else { - String prefix = normalizedDir.endsWith("/") ? normalizedDir : normalizedDir + "/"; + // The `bare` form is the directory without any trailing slash; + // `prefix` is the same thing with a trailing slash appended. + // + // The matcher accepts four shapes, all of which represent the + // same logical "path is under this directory" relationship: + // + // path == bare — caller handed us the directory + // itself (ProjectIndex does this + // after Path.relativize, which + // never emits a trailing slash). + // Pre-fix this case silently + // leaked: an entry exactly + // equal to a resolved source + // dir was not filtered. + // path.startsWith(prefix) — path is a descendant file or + // subdirectory. + // path ends with bare — path *is* the directory, but + // nested under a parent (covers + // the relativized-from-project- + // root case where the entry is + // a sub-path like + // "sub/src/main/java"). + // path.contains("/"+prefix) — legacy shape preserved for + // parity with v1 behaviour. + String bare = normalizedDir.endsWith("/") + ? normalizedDir.substring(0, normalizedDir.length() - 1) + : normalizedDir; + String prefix = bare + "/"; matchers.add(path -> - path.startsWith(prefix) || path.contains("/" + prefix)); + path.equals(bare) + || path.startsWith(prefix) + || path.endsWith("/" + bare) + || path.contains("/" + prefix)); } } return List.copyOf(matchers); diff --git a/affected-tests-core/src/main/java/io/affectedtests/core/mapping/PathToClassMapper.java b/affected-tests-core/src/main/java/io/affectedtests/core/mapping/PathToClassMapper.java index 6918561..ba39054 100644 --- a/affected-tests-core/src/main/java/io/affectedtests/core/mapping/PathToClassMapper.java +++ b/affected-tests-core/src/main/java/io/affectedtests/core/mapping/PathToClassMapper.java @@ -1,6 +1,7 @@ package io.affectedtests.core.mapping; import io.affectedtests.core.config.AffectedTestsConfig; +import io.affectedtests.core.util.LogSanitizer; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -113,6 +114,28 @@ public MappingResult mapChangedFiles(Set changedFiles) { Set unmappedChangedFiles = new LinkedHashSet<>(); for (String filePath : changedFiles) { + // Reject path-traversal shapes up-front. Git never emits `..` + // segments in diff paths, so any such segment is either a + // malformed input or an attempt to confuse the mapper + // (e.g. `../../etc/passwd.java` would otherwise be handed to + // tryMapToClass which would happily produce an FQN starting + // with `..`). Treating them as unmapped — instead of a hard + // throw — keeps the tool resilient and lets the + // UNMAPPED_FILE safety net escalate to a full run, matching + // the general "we don't know what this is" contract. + if (hasTraversalSegment(filePath)) { + // Debug level (not warn) so a hostile MR can't blast + // the build output with N lines per traversal path — + // the engine's downstream `unmappedChangedFiles()` + // warning already surfaces the escalation once per + // diff at WARN level, which is the operator-facing + // signal that matters. + log.debug("Rejecting path with '..' segment as unmapped: {}", + LogSanitizer.sanitize(filePath)); + unmappedChangedFiles.add(filePath); + continue; + } + // Ignore rules are evaluated FIRST: a user's explicit // {@code ignorePaths} entry is a contract that nothing about // the file should influence the engine, including nudging it @@ -215,39 +238,17 @@ private String tryMapToClass(String filePath, java.util.List sourceDirs) return null; } - /** - * Extracts the module prefix from a file path (e.g., "api" from "api/src/main/java/..."). - * Uses boundary-aware matching to avoid false positives when a source directory name - * appears as a substring of a module name (e.g., "someapi/src/main/java" won't match "api/src/main/java"). - * - * @param filePath the relative file path from git diff - * @return the module prefix, or empty string if no module prefix - */ - public String extractModule(String filePath) { - String normalized = filePath.replace('\\', '/'); - String module = extractModuleFromDirs(normalized, config.sourceDirs()); - if (!module.isEmpty()) { - return module; - } - return extractModuleFromDirs(normalized, config.testDirs()); - } - - private String extractModuleFromDirs(String normalizedPath, java.util.List dirs) { - for (String dir : dirs) { - String normalizedDir = dir.replace('\\', '/'); - // Ensure the source dir is preceded by a "/" boundary or is at the start - String withSlash = "/" + normalizedDir; - int idx = normalizedPath.indexOf(withSlash); - if (idx > 0) { - // Module is everything before the "/" that precedes the source dir - return normalizedPath.substring(0, idx); - } - // Also check if the path starts directly with the source dir (no module prefix) - if (normalizedPath.startsWith(normalizedDir)) { - return ""; - } + private static boolean hasTraversalSegment(String filePath) { + if (filePath == null || filePath.isEmpty()) { + return false; } - return ""; + String normalized = filePath.replace('\\', '/'); + // Match `..` as a standalone path segment only — `foo..bar/baz` + // (a legitimate, if ugly, filename) must not trip this guard. + return normalized.equals("..") + || normalized.startsWith("../") + || normalized.endsWith("/..") + || normalized.contains("/../"); } private boolean isIgnored(String filePath) { diff --git a/affected-tests-core/src/main/java/io/affectedtests/core/util/LogSanitizer.java b/affected-tests-core/src/main/java/io/affectedtests/core/util/LogSanitizer.java new file mode 100644 index 0000000..1790a2c --- /dev/null +++ b/affected-tests-core/src/main/java/io/affectedtests/core/util/LogSanitizer.java @@ -0,0 +1,106 @@ +package io.affectedtests.core.util; + +import java.util.Collection; +import java.util.Locale; +import java.util.stream.Collectors; + +/** + * Escapes control characters in strings that originate from untrusted + * input (filenames from the git diff, FQNs from a malicious source tree, + * etc.) before they flow into a logger. + * + *

Background: POSIX filesystems and git tree entries permit newlines + * ({@code \n} / {@code \r}), ASCII ESC ({@code \u001b}), and other + * control characters in path components. When the plugin runs on a + * merge-gate pipeline against an attacker-controlled MR branch and + * writes a filename verbatim to {@code lifecycle()} or {@code warn()}, + * the attacker can: + *

+ * + *

All sanitised output is printable ASCII — control characters are + * replaced with a {@code \xHH} escape (for {@code \n} etc. we use the + * familiar {@code \\n} / {@code \\r} / {@code \\t} forms instead, since + * operators reading the log are more likely to recognise those). + * Non-control Unicode is left untouched because valid non-ASCII + * filenames are legitimate and already round-trip through every other + * log pipeline cleanly. + * + *

This class is intentionally scoped to the logging channel only. + * Path-shaped callers that need filesystem safety (e.g. rejecting + * {@code ..} traversal) should use dedicated validators on the + * {@code mapping} package instead. + */ +public final class LogSanitizer { + + private LogSanitizer() { + // utility class + } + + /** + * Returns {@code value} with all C0 control characters + * ({@code NUL..US}, including {@code \n}, {@code \r}, ESC), + * DEL ({@code 0x7F}), and the C1 control range + * ({@code 0x80..0x9F}, which includes single-byte CSI) replaced + * with printable escape sequences. Returns {@code "(null)"} for + * a {@code null} input so callers don't have to null-check + * before formatting. + */ + public static String sanitize(String value) { + if (value == null) { + return "(null)"; + } + StringBuilder out = new StringBuilder(value.length()); + for (int i = 0; i < value.length(); i++) { + char c = value.charAt(i); + switch (c) { + case '\n': out.append("\\n"); break; + case '\r': out.append("\\r"); break; + case '\t': out.append("\\t"); break; + default: + // Escape the C0 range (NUL..US), DEL, and the C1 range + // (0x80..0x9F). C1 includes single-byte CSI (0x9B); + // on the narrow population of legacy 8-bit terminals + // that interpret bare C1 bytes as escape sequences, an + // unescaped C1 byte is equivalent to a multi-byte ANSI + // escape. Practical risk is low on UTF-8 log pipelines + // (SLF4J emits 0x9B as the 2-byte sequence C2 9B which + // modern terminals don't interpret), but closing the + // edge keeps the "no control characters reach the log" + // contract honest. Printable ASCII and valid non-C1 + // non-ASCII pass through unchanged so legitimate + // filenames round-trip cleanly. + if (c < 0x20 || c == 0x7F || (c >= 0x80 && c <= 0x9F)) { + out.append(String.format(Locale.ROOT, "\\x%02X", (int) c)); + } else { + out.append(c); + } + } + } + return out.toString(); + } + + /** + * Convenience: sanitises every element of {@code values} and joins + * them with {@code ", "} — the most common shape this helper is + * called from (bucket sample lines, unmapped-file examples). + * Returns an empty string for a {@code null} input, mirroring the + * null-tolerant shape of {@link #sanitize(String)}. + */ + public static String joinSanitized(Collection values) { + if (values == null) { + return ""; + } + return values.stream() + .map(LogSanitizer::sanitize) + .collect(Collectors.joining(", ")); + } +} diff --git a/affected-tests-core/src/test/java/io/affectedtests/core/config/AffectedTestsConfigTest.java b/affected-tests-core/src/test/java/io/affectedtests/core/config/AffectedTestsConfigTest.java index 08b5da9..b262488 100644 --- a/affected-tests-core/src/test/java/io/affectedtests/core/config/AffectedTestsConfigTest.java +++ b/affected-tests-core/src/test/java/io/affectedtests/core/config/AffectedTestsConfigTest.java @@ -249,6 +249,27 @@ void baseRefAllowsValidRefs() { assertDoesNotThrow(() -> AffectedTestsConfig.builder().baseRef("abc123def456").build()); assertDoesNotThrow(() -> AffectedTestsConfig.builder().baseRef("refs/heads/feature-branch").build()); assertDoesNotThrow(() -> AffectedTestsConfig.builder().baseRef("HEAD~3").build()); + assertDoesNotThrow(() -> AffectedTestsConfig.builder().baseRef("HEAD").build()); + assertDoesNotThrow(() -> AffectedTestsConfig.builder().baseRef("HEAD^").build()); + assertDoesNotThrow(() -> AffectedTestsConfig.builder().baseRef("master~2").build()); + assertDoesNotThrow(() -> AffectedTestsConfig.builder().baseRef("a".repeat(40)).build()); + } + + @Test + void baseRefRejectsRefnameSlurGarbage() { + // Regression: the v1.x check accepted anything without a `/..` in + // it. JGit's refname rules reject a far richer set of garbage + // (control chars, @{} outside rev-expressions, trailing `.lock`, + // leading `/`, etc.). Lock in that those shapes are refused at + // the builder boundary so downstream JGit calls never see them. + assertThrows(IllegalArgumentException.class, () -> + AffectedTestsConfig.builder().baseRef("/leading-slash").build()); + assertThrows(IllegalArgumentException.class, () -> + AffectedTestsConfig.builder().baseRef("trailing.lock").build()); + assertThrows(IllegalArgumentException.class, () -> + AffectedTestsConfig.builder().baseRef("contains\nnewline").build()); + assertThrows(IllegalArgumentException.class, () -> + AffectedTestsConfig.builder().baseRef("..").build()); } @Test @@ -391,4 +412,24 @@ private static String singleWarning(AffectedTestsConfig config, String message) assertEquals(1, config.deprecationWarnings().size(), message); return config.deprecationWarnings().get(0); } + + @Test + void effectiveModeIsAlwaysConcreteForZeroConfigCallers() { + // Regression: pre-fix, a zero-config caller got mode()=AUTO and + // effectiveMode()=null, despite the javadoc's "Always one of + // LOCAL, CI or STRICT" contract. Any downstream code following + // the documented shape (`switch(config.effectiveMode())`) would + // NPE. The fix resolves null to the AUTO-detected mode at the + // getter. + AffectedTestsConfig config = AffectedTestsConfig.builder() + .baseRef("origin/master") + .build(); + assertNotNull(config.effectiveMode(), + "effectiveMode() must never return null — it is the documented " + + "input to operator-facing switches on the resolved mode"); + assertTrue(config.effectiveMode() == Mode.LOCAL + || config.effectiveMode() == Mode.CI + || config.effectiveMode() == Mode.STRICT, + "effectiveMode() must be one of the three concrete modes"); + } } diff --git a/affected-tests-core/src/test/java/io/affectedtests/core/discovery/ImplementationStrategyTest.java b/affected-tests-core/src/test/java/io/affectedtests/core/discovery/ImplementationStrategyTest.java index 63cdc58..fdaa8c4 100644 --- a/affected-tests-core/src/test/java/io/affectedtests/core/discovery/ImplementationStrategyTest.java +++ b/affected-tests-core/src/test/java/io/affectedtests/core/discovery/ImplementationStrategyTest.java @@ -131,4 +131,37 @@ void returnsEmptyWhenNoImplementations() throws IOException { assertTrue(result.isEmpty()); } + + @Test + void findsGrandchildImplementationThroughMultiLevelHierarchy() throws IOException { + // Hierarchy: interface A <-- abstract class B implements A + // <-- class C extends B + // Only C has a direct test (CTest). Pre-fix the strategy found B + // on a single pass and stopped — CTest (the only real coverage + // of A's behaviour through an actual implementation) was + // silently dropped. The fixpoint loop re-runs with B as a target + // so C is found on the second pass, and naming-strategy then + // picks up CTest. + Path prodDir = tempDir.resolve("src/main/java/com/example"); + Files.createDirectories(prodDir); + Files.writeString(prodDir.resolve("A.java"), + "package com.example;\npublic interface A {}"); + Files.writeString(prodDir.resolve("B.java"), + "package com.example;\npublic abstract class B implements A {}"); + Files.writeString(prodDir.resolve("C.java"), + "package com.example;\npublic class C extends B {}"); + + Path testDir = tempDir.resolve("src/test/java/com/example"); + Files.createDirectories(testDir); + Files.writeString(testDir.resolve("CTest.java"), + "package com.example;\npublic class CTest {}"); + + Set result = strategy.discoverTests( + Set.of("com.example.A"), tempDir); + + assertTrue(result.contains("com.example.CTest"), + "Grandchild's test must be discovered through multi-level " + + "hierarchy — otherwise interface-level changes silently " + + "drop the tests of the only concrete implementor"); + } } diff --git a/affected-tests-core/src/test/java/io/affectedtests/core/discovery/SourceFileScannerTest.java b/affected-tests-core/src/test/java/io/affectedtests/core/discovery/SourceFileScannerTest.java index 8d8a7ea..64534b2 100644 --- a/affected-tests-core/src/test/java/io/affectedtests/core/discovery/SourceFileScannerTest.java +++ b/affected-tests-core/src/test/java/io/affectedtests/core/discovery/SourceFileScannerTest.java @@ -4,6 +4,7 @@ import org.junit.jupiter.api.io.TempDir; import java.io.IOException; +import java.nio.file.FileSystemException; import java.nio.file.Files; import java.nio.file.Path; import java.util.List; @@ -103,6 +104,65 @@ void collectsTestFilesFromDeeplyNestedModules() throws IOException { assertTrue(files.get(0).toString().endsWith("FooTest.java")); } + @Test + void rejectsSymlinkEscapingProjectRoot(@TempDir Path outside) throws IOException { + // Threat model: merge-gate CI runs against an attacker-controlled + // MR branch. An attacker commits `src/main/java` as a symlink to a + // location outside the project root — e.g. the CI runner's + // `/` or `$HOME` — and expects the scanner to walk that target, + // either (a) blowing out the walk budget (DoS) or (b) leaking the + // runner's directory structure into the `--explain` output via + // discovered .java filenames. + // + // The fix in SourceFileScanner.stayInsideProjectRoot canonicalises + // the candidate with toRealPath() and rejects anything whose real + // path doesn't live under the project's real root. + Path attackTarget = outside.resolve("runner-secrets"); + Files.createDirectories(attackTarget); + Files.writeString(attackTarget.resolve("Secret.java"), + "package secret;\npublic class Secret {}"); + + Path srcLink = tempDir.resolve("src/main/java"); + Files.createDirectories(srcLink.getParent()); + try { + Files.createSymbolicLink(srcLink, attackTarget); + } catch (UnsupportedOperationException | FileSystemException e) { + // Platform doesn't support symlinks (Windows without privilege, + // some containerised filesystems). Skip rather than fail — the + // attack surface does not exist on those platforms. + return; + } + + List matches = SourceFileScanner.findAllMatchingDirs(tempDir, "src/main/java"); + + assertTrue(matches.isEmpty(), + "Symlinked source directory escaping the project root must not be " + + "returned as a match — an attacker's MR could otherwise redirect " + + "the scanner at arbitrary filesystem locations. Matches: " + matches); + } + + @Test + void acceptsSymlinkThatStaysInsideProjectRoot() throws IOException { + // Legitimate use case: a module's src/main/java is a symlink to + // another directory that is still under the project root (some + // monorepo tooling does this). Must NOT be rejected. + Path actualSrc = tempDir.resolve("actual-source/src/main/java"); + Files.createDirectories(actualSrc); + + Path moduleSrc = tempDir.resolve("module/src/main/java"); + Files.createDirectories(moduleSrc.getParent()); + try { + Files.createSymbolicLink(moduleSrc, actualSrc); + } catch (UnsupportedOperationException | FileSystemException e) { + return; + } + + List matches = SourceFileScanner.findAllMatchingDirs(tempDir, "src/main/java"); + + assertFalse(matches.isEmpty(), + "Symlink resolving to a path under the project root must still be accepted"); + } + @Test void scansTestFqnsFromDeeplyNestedModules() throws IOException { Path depth1 = tempDir.resolve("api/src/test/java/com/example"); diff --git a/affected-tests-core/src/test/java/io/affectedtests/core/discovery/TransitiveStrategyTest.java b/affected-tests-core/src/test/java/io/affectedtests/core/discovery/TransitiveStrategyTest.java index 8d4b5c1..3dd3fdc 100644 --- a/affected-tests-core/src/test/java/io/affectedtests/core/discovery/TransitiveStrategyTest.java +++ b/affected-tests-core/src/test/java/io/affectedtests/core/discovery/TransitiveStrategyTest.java @@ -130,4 +130,42 @@ public class C { assertTrue(depth2Result.contains("com.example.CTest"), "Depth 2 should reach CTest"); } + + @Test + void discoversConsumerTestsForDeletedProductionClass() throws IOException { + // The fixture mirrors a `git rm FooService.java` MR: FooService is in + // the changed set (surfaced by GitChangeDetector via the old path) + // but the file itself is no longer on disk. BarService still + // references FooService; BarServiceTest exercises BarService and is + // therefore the test most likely to fail on the broken consumer. + // Pre-fix, the reverse-dependency map filtered out FooService because + // the on-disk scan could not derive an FQN for the deleted file, and + // the transitive walk returned empty — silently dropping the test on + // the MR. The fix unions `changedProductionClasses` into + // `allKnownFqns` inside buildReverseDependencyMap so edges to + // deleted FQNs survive. + Path prodDir = tempDir.resolve("src/main/java/com/example"); + Files.createDirectories(prodDir); + // FooService deliberately NOT written — simulating the deletion. + Files.writeString(prodDir.resolve("BarService.java"), """ + package com.example; + + public class BarService { + private FooService fooService; + } + """); + + Path testDir = tempDir.resolve("src/test/java/com/example"); + Files.createDirectories(testDir); + Files.writeString(testDir.resolve("BarServiceTest.java"), + "package com.example;\npublic class BarServiceTest {}"); + + Set result = strategy.discoverTests( + Set.of("com.example.FooService"), tempDir); + + assertTrue(result.contains("com.example.BarServiceTest"), + "Must discover tests for consumers of a deleted production class — " + + "otherwise a pure `git rm` MR silently drops coverage on " + + "the very tests that would surface the breakage"); + } } diff --git a/affected-tests-core/src/test/java/io/affectedtests/core/discovery/UsageStrategyTest.java b/affected-tests-core/src/test/java/io/affectedtests/core/discovery/UsageStrategyTest.java index 4bccaaf..7653fe8 100644 --- a/affected-tests-core/src/test/java/io/affectedtests/core/discovery/UsageStrategyTest.java +++ b/affected-tests-core/src/test/java/io/affectedtests/core/discovery/UsageStrategyTest.java @@ -303,4 +303,89 @@ public class DeepTest { "Should find test at depth 3"); assertEquals(2, result.size()); } + + @Test + void findsTestThatImportsInnerClassOfChangedOuter() throws IOException { + // Regression: PathToClassMapper is file-based and only surfaces the + // outer FQN when `Outer.java` (containing nested `Outer.Inner`) is + // changed. A test that only references the inner class writes + // `import com.example.Outer.Inner;` — pre-fix the direct-import + // tier matched `importedFqns.contains("com.example.Outer")` which + // is false (the imported name is `com.example.Outer.Inner`). The + // test was silently dropped. + Path testDir = tempDir.resolve("src/test/java/com/example/inner"); + Files.createDirectories(testDir); + Files.writeString(testDir.resolve("InnerUserTest.java"), """ + package com.example.inner; + + import com.example.Outer.Inner; + + public class InnerUserTest { + public void t() { Inner i = new Inner(); } + } + """); + + Set result = strategy.discoverTests( + Set.of("com.example.Outer"), tempDir); + + assertTrue(result.contains("com.example.inner.InnerUserTest"), + "Test importing Outer.Inner must be selected when Outer changes — " + + "inner-class changes surface via the outer's file"); + } + + @Test + void findsTestThatUsesStaticImportFromChangedClass() throws IOException { + // Regression: a test that only consumes a changed class through + // `import static com.example.Constants.MAX_VALUE;` never matched + // any tier — the imported name is `com.example.Constants.MAX_VALUE`, + // not the class FQN, and the class itself is often never written + // as a type in the test body. Consequence: changes to a pure + // constants-holder class silently dropped the tests that consume + // its constants. + Path testDir = tempDir.resolve("src/test/java/com/example/cfg"); + Files.createDirectories(testDir); + Files.writeString(testDir.resolve("ConstantsUserTest.java"), """ + package com.example.cfg; + + import static com.example.Constants.MAX_VALUE; + + public class ConstantsUserTest { + public void t() { int x = MAX_VALUE; } + } + """); + + Set result = strategy.discoverTests( + Set.of("com.example.Constants"), tempDir); + + assertTrue(result.contains("com.example.cfg.ConstantsUserTest"), + "Static-import-only consumer must be selected when the " + + "backing class changes"); + } + + @Test + void findsTestThatUsesStaticWildcardImportFromChangedClass() throws IOException { + // Same as above but with the wildcard static form: + // `import static com.example.Constants.*;`. Pre-fix this was + // bucketed into wildcardPackages with the value "com.example.Constants" + // — wrong, because that's a class name and would only match tests + // in a (non-existent) package named "com.example.Constants". + Path testDir = tempDir.resolve("src/test/java/com/example/cfg"); + Files.createDirectories(testDir); + Files.writeString(testDir.resolve("ConstantsWildcardUserTest.java"), """ + package com.example.cfg; + + import static com.example.Constants.*; + + public class ConstantsWildcardUserTest { + public void t() { int x = MAX_VALUE; } + } + """); + + Set result = strategy.discoverTests( + Set.of("com.example.Constants"), tempDir); + + assertTrue(result.contains("com.example.cfg.ConstantsWildcardUserTest"), + "Static wildcard import must be treated as a reference to the " + + "class, not as a package-level wildcard"); + } } diff --git a/affected-tests-core/src/test/java/io/affectedtests/core/mapping/OutOfScopeMatchersTest.java b/affected-tests-core/src/test/java/io/affectedtests/core/mapping/OutOfScopeMatchersTest.java index be916eb..ac916b0 100644 --- a/affected-tests-core/src/test/java/io/affectedtests/core/mapping/OutOfScopeMatchersTest.java +++ b/affected-tests-core/src/test/java/io/affectedtests/core/mapping/OutOfScopeMatchersTest.java @@ -107,4 +107,29 @@ void emptyAndNullListsYieldEmptyMatchers() { assertTrue(OutOfScopeMatchers.compile(List.of(), "outOfScopeTestDirs").isEmpty()); assertTrue(OutOfScopeMatchers.compile(null, "outOfScopeTestDirs").isEmpty()); } + + @Test + void literalEntryMatchesExactDirectoryPath() { + // Regression for the silent-leak case reported by the correctness + // review: a literal entry whose value equals the resolved directory's + // relative path — exactly the shape that ProjectIndex hands the + // matcher after Path.relativize (which never emits a trailing + // slash) — was not filtered. Result: tests under that directory + // slipped into sourceFqns and got discovered by strategies that + // iterate the full source set (ImplementationStrategy), quietly + // violating the out-of-scope contract. + List> matchers = OutOfScopeMatchers.compile( + List.of("sub/src/main/java"), "outOfScopeSourceDirs"); + + assertTrue(OutOfScopeMatchers.matchesAny("sub/src/main/java", matchers), + "Literal entry must match the bare directory path — that is the " + + "exact shape ProjectIndex relativizes to"); + assertTrue(OutOfScopeMatchers.matchesAny("sub/src/main/java/Foo.java", matchers), + "Descendants must still match"); + assertTrue(OutOfScopeMatchers.matchesAny("monorepo/sub/src/main/java", matchers), + "Nested parent path ending in the entry must still match"); + assertFalse(OutOfScopeMatchers.matchesAny("sub/src/main/javax/Foo.java", matchers), + "Boundary semantics must still reject substring-only matches — " + + "a dir named 'javax' is not under 'java'"); + } } diff --git a/affected-tests-core/src/test/java/io/affectedtests/core/mapping/PathToClassMapperTest.java b/affected-tests-core/src/test/java/io/affectedtests/core/mapping/PathToClassMapperTest.java index b5500d7..fc0e607 100644 --- a/affected-tests-core/src/test/java/io/affectedtests/core/mapping/PathToClassMapperTest.java +++ b/affected-tests-core/src/test/java/io/affectedtests/core/mapping/PathToClassMapperTest.java @@ -272,28 +272,6 @@ void handlesBothProductionAndTestChanges() { assertTrue(result.testClasses().contains("com.example.FooTest")); } - @Test - void extractsModuleFromPath() { - assertEquals("api", mapper.extractModule("api/src/main/java/com/example/Foo.java")); - assertEquals("application", mapper.extractModule("application/src/test/java/com/example/FooTest.java")); - assertEquals("", mapper.extractModule("src/main/java/com/example/Foo.java")); - } - - @Test - void extractModuleDoesNotMatchSubstring() { - // "someapi" should not be mis-parsed when the source dir is "src/main/java" - // and the module name happens to contain a source dir prefix. - assertEquals("someapi", mapper.extractModule("someapi/src/main/java/com/example/Foo.java")); - assertEquals("my-api", mapper.extractModule("my-api/src/main/java/com/example/Foo.java")); - assertEquals("api-gateway", mapper.extractModule("api-gateway/src/main/java/com/example/Foo.java")); - } - - @Test - void extractModuleHandlesNestedPaths() { - assertEquals("services/core-api", - mapper.extractModule("services/core-api/src/main/java/com/example/Foo.java")); - } - @Test void tryMapToClassRequiresBoundaryBeforeSourceDir() { // Regression: a plain indexOf("src/main/java/") treats @@ -384,4 +362,44 @@ void tryMapToClassHandlesSourceDirAfterModulePrefix() { assertTrue(result.productionClasses().contains("com.example.Foo")); assertTrue(result.unmappedChangedFiles().isEmpty()); } + + @Test + void rejectsPathsWithTraversalSegmentAsUnmapped() { + // Defence-in-depth: git never emits `..` segments in diff paths, + // so any such segment is either malformed input or a deliberate + // attempt to confuse tryMapToClass. The old behaviour handed + // these to the source-dir mapper, which could produce absurd + // FQNs like `..com.example.Foo` and quietly classify them as + // production — dangerous because the test-dispatcher would then + // try to run those FQNs as test classes. The fix routes them + // into the unmapped bucket, which trips the UNMAPPED_FILE + // situation and escalates to a full run. + Set changed = Set.of( + "../../etc/passwd.java", + "src/main/java/../../../../etc/Foo.java", + "legit/src/main/java/com/example/Foo.java"); + MappingResult result = mapper.mapChangedFiles(changed); + + assertTrue(result.unmappedChangedFiles().contains("../../etc/passwd.java"), + "Leading ../ must land in unmapped, got: " + result.unmappedChangedFiles()); + assertTrue(result.unmappedChangedFiles().contains("src/main/java/../../../../etc/Foo.java"), + "Embedded /../ must land in unmapped, got: " + result.unmappedChangedFiles()); + assertTrue(result.productionClasses().contains("com.example.Foo"), + "Clean path in the same batch must still be mapped normally"); + } + + @Test + void allowsLegitimateFilenamesThatContainDotDotSubstring() { + // Negative: a file literally named `foo..bar.java` is legal on + // POSIX and must not be mistaken for traversal. The guard is + // segment-aware exactly to avoid this false positive. + Set changed = Set.of("src/main/java/com/example/foo..bar.java"); + MappingResult result = mapper.mapChangedFiles(changed); + + assertTrue(result.productionClasses().contains("com.example.foo..bar"), + "File with '..' inside a name but not as a standalone segment " + + "must still map, got: " + result.productionClasses()); + assertTrue(result.unmappedChangedFiles().isEmpty(), + "No traversal warning should be emitted for legitimate names"); + } } diff --git a/affected-tests-core/src/test/java/io/affectedtests/core/util/LogSanitizerTest.java b/affected-tests-core/src/test/java/io/affectedtests/core/util/LogSanitizerTest.java new file mode 100644 index 0000000..c71eaf1 --- /dev/null +++ b/affected-tests-core/src/test/java/io/affectedtests/core/util/LogSanitizerTest.java @@ -0,0 +1,129 @@ +package io.affectedtests.core.util; + +import org.junit.jupiter.api.Test; + +import java.util.List; + +import static org.junit.jupiter.api.Assertions.*; + +class LogSanitizerTest { + + @Test + void passesThroughPrintableAscii() { + assertEquals("src/main/java/com/example/Foo.java", + LogSanitizer.sanitize("src/main/java/com/example/Foo.java")); + } + + @Test + void preservesLegitimateNonAscii() { + // Non-ASCII path components round-trip correctly through most + // logging pipelines. Only control characters need escaping. + assertEquals("src/main/java/ünïcødé/日本語/File.java", + LogSanitizer.sanitize("src/main/java/ünïcødé/日本語/File.java")); + } + + @Test + void escapesNewlineToPreventLogForgery() { + // The concrete attack: an attacker commits a file whose name + // contains \n followed by a fake plugin status line. With raw + // logging, a grep-based CI assertion that gates on + // "Affected Tests: SELECTED" would match the forgery. With + // sanitisation, the newline is visibly escaped and the + // attacker's embedded status line does not start a new + // log line. + String malicious = "foo.md\nAffected Tests: SELECTED (DISCOVERY_SUCCESS)"; + String sanitised = LogSanitizer.sanitize(malicious); + + assertFalse(sanitised.contains("\n"), + "Sanitised output must not contain raw newlines: " + sanitised); + assertTrue(sanitised.contains("\\n"), + "Newline must be visibly escaped so operators can see it: " + sanitised); + assertTrue(sanitised.contains("foo.md")); + assertTrue(sanitised.contains("Affected Tests: SELECTED"), + "Content is preserved — operators see what was there, just safely"); + } + + @Test + void escapesCarriageReturn() { + assertEquals("a\\rb", LogSanitizer.sanitize("a\rb")); + } + + @Test + void escapesTabAsBackslashT() { + // Tabs are escaped to aid human readability in the log — + // a raw tab in a filename sample line disrupts column + // alignment without being obviously an attack. + assertEquals("a\\tb", LogSanitizer.sanitize("a\tb")); + } + + @Test + void escapesAnsiEscape() { + // ANSI ESC is the other high-impact injection vector: + // `\x1b[2K\x1b[1A` erases the prior line on interactive + // terminals, retroactively hiding real plugin output. + String esc = "foo.md\u001b[2K\u001b[1A"; + String sanitised = LogSanitizer.sanitize(esc); + + assertFalse(sanitised.contains("\u001b"), + "Sanitised output must not contain raw ESC: " + sanitised); + assertTrue(sanitised.contains("\\x1B"), + "ESC must be visibly escaped: " + sanitised); + } + + @Test + void escapesNulByte() { + assertEquals("a\\x00b", LogSanitizer.sanitize("a\0b")); + } + + @Test + void escapesDelCharacter() { + assertEquals("a\\x7Fb", LogSanitizer.sanitize("a\u007Fb")); + } + + @Test + void handlesNullInput() { + assertEquals("(null)", LogSanitizer.sanitize(null)); + } + + @Test + void joinSanitisesEveryElement() { + String joined = LogSanitizer.joinSanitized(List.of( + "ok.java", + "bad\nline.md", + "esc\u001b.md")); + assertFalse(joined.contains("\n")); + assertFalse(joined.contains("\u001b")); + assertTrue(joined.contains("ok.java")); + assertTrue(joined.contains("bad\\nline.md")); + } + + @Test + void joinSanitizedTreatsNullAsEmpty() { + // Regression: the helper is published as part of the API + // surface and was previously happy to NPE on `.stream()` if + // the caller forgot a null-check. The contract should mirror + // sanitize's null-tolerance so callers can forward an + // optional bucket straight in. + assertEquals("", LogSanitizer.joinSanitized(null)); + } + + @Test + void escapesC1ControlRange() { + // Regression: C1 (0x80..0x9F) includes single-byte CSI + // (0x9B). On the narrow set of legacy 8-bit terminals that + // interpret C1 as ANSI escape-equivalents, an unescaped + // 0x9B acts like `ESC [`. Modern UTF-8 pipelines are + // largely immune (SLF4J emits 0x9B as two-byte `C2 9B`), + // but the sanitiser's contract is "no control characters + // reach the log", so we close the edge regardless. + String csi = "foo\u009B2Kbar"; + String sanitised = LogSanitizer.sanitize(csi); + + assertFalse(sanitised.contains("\u009B"), + "Sanitised output must not contain raw C1 CSI: " + sanitised); + assertTrue(sanitised.contains("\\x9B"), + "C1 CSI must be visibly escaped: " + sanitised); + assertTrue(sanitised.contains("foo")); + assertTrue(sanitised.contains("bar")); + } +} diff --git a/affected-tests-gradle/src/main/java/io/affectedtests/gradle/AffectedTestTask.java b/affected-tests-gradle/src/main/java/io/affectedtests/gradle/AffectedTestTask.java index bfb5437..0702a42 100644 --- a/affected-tests-gradle/src/main/java/io/affectedtests/gradle/AffectedTestTask.java +++ b/affected-tests-gradle/src/main/java/io/affectedtests/gradle/AffectedTestTask.java @@ -9,6 +9,7 @@ import io.affectedtests.core.config.AffectedTestsConfig; import io.affectedtests.core.config.Mode; import io.affectedtests.core.config.Situation; +import io.affectedtests.core.util.LogSanitizer; import org.gradle.api.DefaultTask; import org.gradle.api.GradleException; import org.gradle.api.file.DirectoryProperty; @@ -526,10 +527,15 @@ private void executeTests(Path projectDir, // tree sneaking shell-like tokens into a --tests // argument. The FQN cannot correspond to a real // JVM test class, so dropping it is lossless. + // Sanitise before logging — this is the exact + // input shape an attacker-planted filename + // containing a newline or ANSI escape would land + // in, and WARN is rendered in CI by default. getLogger().warn( "Affected Tests: skipping malformed test FQN '{}' for task {} — " + "not a Java-shaped identifier, cannot correspond to a " - + "real test class.", fqn, taskPath); + + "real test class.", + LogSanitizer.sanitize(fqn), taskPath); continue; } valid.add(fqn); @@ -678,14 +684,20 @@ private String resolveGradleCommand(Path projectDir) { } /** - * Java-shaped identifier regex with {@code $} allowed inside names - * for inner classes ({@code com.example.Outer$Inner}) and a - * {@code .methodName} tail for Gradle's - * {@code --tests com.example.Foo.someMethod} syntax. Matches a - * strict superset of what a sane Java source file can declare. - * Anything outside this shape cannot correspond to a real test - * class and is dropped with a warning before reaching the nested - * Gradle invocation. + * Matches a dotted sequence of Java identifier segments — the shape + * Gradle's {@code --tests} accepts. Each segment starts with a + * letter, {@code _}, or {@code $} and continues with letters, + * digits, {@code _}, or {@code $}. The trailing segment doubles as + * either an inner-class name (in the {@code Outer.Inner} form + * JavaParser emits) or a method name (in the + * {@code com.example.Foo.someMethod} form Gradle's + * {@code --tests} matcher expects); the regex does not — and + * intentionally cannot — distinguish between those two cases, since + * both are legal argv for the nested Gradle invocation. The + * bytecode-style {@code Outer$Inner} shape is also accepted because + * {@code $} is a valid identifier character in Java; discovery does + * not produce that shape today but users occasionally type it by + * hand on the command line. * *

Deliberately does NOT reject Java reserved words * ({@code if}, {@code class}, {@code return}, ...). The contract @@ -856,10 +868,11 @@ static List renderExplainTrace(AffectedTestsConfig config, AffectedTests lines.add("=== Affected Tests — decision trace (--explain) ==="); lines.add("Base ref: " + config.baseRef()); String configuredMode = config.mode() == null ? "unset" : config.mode().name(); - String effectiveMode = config.effectiveMode() == null - ? "n/a (pre-v2 defaults)" - : config.effectiveMode().name(); - lines.add("Mode: " + configuredMode + " (effective: " + effectiveMode + ")"); + // effectiveMode() is always non-null (see its Javadoc) — zero-config + // callers get the AUTO-detected value, identical to what an explicit + // `mode = "auto"` would have resolved to. + lines.add("Mode: " + configuredMode + + " (effective: " + config.effectiveMode().name() + ")"); lines.add("Changed files: " + result.changedFiles().size()); Buckets buckets = result.buckets(); @@ -987,9 +1000,13 @@ private static void appendSample(List lines, String label, Set f if (files.isEmpty()) { return; } + // Filenames originate from the git diff of an (on the merge-gate, + // attacker-controllable) MR tree. Sanitise before they reach the + // logger — see LogSanitizer for the full log-forgery rationale. String preview = files.stream() .sorted() .limit(EXPLAIN_SAMPLE_LIMIT) + .map(LogSanitizer::sanitize) .collect(Collectors.joining(", ")); if (files.size() > EXPLAIN_SAMPLE_LIMIT) { preview = preview + ", … (+" + (files.size() - EXPLAIN_SAMPLE_LIMIT) + " more)";