Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,62 @@ adheres to [Semantic Versioning](https://semver.org/).
malformed-glob error path is now the single source of truth for
`Affected Tests: invalid glob at outOfScope*Dirs[N]` messages.

### Fixed — post-v1.9.17 sanity-test pass

- The `Hint:` line on `affectedTest --explain` no longer fires on
situations where an out-of-scope pattern could not have mattered.
Running v1.9.17 through nine representative MR shapes on the
security-service pilot (empty diff, api-test-only, performance-test-
only, production Java only, markdown only, mixed api-test + prod,
test-only, `git rm` markdown, gradle + prod) showed the hint firing
on five of them — including a bare markdown-only MR where the diff
contained nothing a source-tree matcher could have bitten. That
5-of-9 false-positive rate trains reviewers to ignore the line,
defeating its purpose.

The hint now requires the situation to be `DISCOVERY_SUCCESS` or
`DISCOVERY_EMPTY` (the only two branches where a correctly-configured
out-of-scope pattern could have changed the outcome) in addition to
the existing "diff non-empty AND out-of-scope bucket empty AND at
least one out-of-scope dir configured" guard. `EMPTY_DIFF`,
`ALL_FILES_IGNORED`, `ALL_FILES_OUT_OF_SCOPE`, and `UNMAPPED_FILE`
all suppress the hint — none of them is a case where an operator
could act on it.

- Lifecycle output for `SELECTED` dispatches now previews the first
five FQNs per module with a "… and N more (use --info for full list)"
tail on larger dispatches. Pre-v1.9.18 the task printed only the
module summary at lifecycle level and demoted every FQN to info,
leaving a reviewer scrolling the default CI log unable to see *which*
tests were selected without either rerunning with `--info` or opening
the JUnit report after the fact. The info-level per-FQN log is
retained for `--info` users, and the bounded preview keeps total
lifecycle output well under the 4 MiB GitHub Actions step cap that
forced the demotion in the first place.

- Dispatch-side FQN validation was hoisted out of the per-module log
loop so the "Running N affected test classes across M module(s)"
header now reports the post-validation count and stays arithmetically
consistent with the per-module previews underneath it. If any FQNs
were skipped (an `isValidFqn` WARN was already logged) the header
gains a ` (K malformed FQN skipped — see WARN above)` suffix so the
mismatch between "discovered" and "dispatched" is visible at a
glance rather than a puzzle only the WARN log can solve. Two latent
side-effects fall out of the refactor:
- A module whose entire discovered FQN set fails validation is now
dropped from dispatch. Pre-v1.9.18 the task appended that module's
bare `taskPath` with no `--tests` filter, which silently degraded
into a full module test suite run — the exact safety posture
`isValidFqn` was added to prevent. Operators relying on this
accidental fallback should configure `runAllIfNoMatches = true` /
`Action.FULL_SUITE` explicitly instead.
- A dispatch where zero FQNs survive validation across ALL modules
now throws `GradleException` instead of invoking Gradle with an
empty task list (environment-dependent behaviour, ranging from
"fail with No tasks specified" to "silently run the default task").
The WARN logs above the failure name every rejected FQN, so the
mis-discovery is diagnosable from the same console output.

### Fixed — post-v1.9.16 review batch

- `GitChangeDetector.uncommittedChanges` and `stagedChanges` now
Expand Down
23 changes: 22 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ plugins {

Prints the full decision trace — bucket counts, situation, action, and the tier of the priority ladder (explicit / legacy / mode / hardcoded) that picked each action — without running a single test. Useful when a CI run escalated to the full suite and the operator needs to know *why* before filing a bug.

When `outOfScopeTestDirs` / `outOfScopeSourceDirs` are configured but zero files in the diff land in the out-of-scope bucket, the trace emits a one-line `Hint:` pointing at the configured knob. That's the silent-failure trap a real adopter hit: a perfectly valid-looking glob that never bit anything, which the plugin would otherwise only surface after a 30-minute full-suite CI run.
When `outOfScopeTestDirs` / `outOfScopeSourceDirs` are configured but zero files in the diff land in the out-of-scope bucket *and* the situation is `DISCOVERY_SUCCESS` or `DISCOVERY_EMPTY`, the trace emits a one-line `Hint:` pointing at the configured knob. That's the silent-failure trap a real adopter hit: a perfectly valid-looking glob that never bit anything, which the plugin would otherwise only surface after a 30-minute full-suite CI run. The hint is suppressed on `EMPTY_DIFF`, `ALL_FILES_IGNORED`, `ALL_FILES_OUT_OF_SCOPE`, and `UNMAPPED_FILE` because those branches ran the way they did for reasons an out-of-scope pattern could not have influenced.

Sample output:

Expand Down Expand Up @@ -102,6 +102,27 @@ Affected Tests: SKIPPED (ALL_FILES_IGNORED) — 1 changed file(s); every changed

The outcome (`SELECTED` / `FULL_SUITE` / `SKIPPED`) and the situation that produced it are first-class fields on every branch, so CI dashboards can `grep -E '^Affected Tests: (SELECTED|FULL_SUITE|SKIPPED)'` and bucket runs without parsing the tail. Every pre-v2 phrase (`running full suite`, `runAllIfNoMatches=true`, `runAllOnNonJavaChange=true`, `no affected tests discovered`) still appears verbatim in the reason segment, so existing greps keep working.

On a `SELECTED` outcome, the task also prints the first few FQNs per module at lifecycle level so a reviewer can sanity-check the dispatch from the default CI log without rerunning with `--info`:

```
Running 17 affected test classes across 2 module(s):
application:test (12 test classes)
com.example.auth.LoginControllerTest
com.example.auth.TokenServiceTest
com.example.auth.PasswordHasherTest
com.example.auth.SessionRepositoryTest
com.example.auth.RoleMapperTest
… and 7 more (use --info for full list)
api:test (5 test classes)
com.example.api.PublicEndpointsTest
com.example.api.RateLimitTest
com.example.api.VersionHeaderTest
com.example.api.ErrorFormatTest
com.example.api.HealthProbeTest
```

The preview caps at five FQNs per module; `--info` still surfaces the full per-FQN list. The cap exists so a utility change that ripples into hundreds of test classes can't blow past the 4 MiB GitHub Actions per-step log cap before the nested test output even starts.

### Situations (what the engine saw)

| Situation | Fires when |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -505,43 +505,82 @@ private void executeTests(Path projectDir,
// happen to contain the FQN).
Map<String, List<String>> grouped = groupFqnsByModule(projectDir, testFqns, fqnToPath);

getLogger().lifecycle("Running {} affected test classes across {} module(s):",
testFqns.size(), grouped.size());

// Validate every discovered FQN up front so the header,
// per-module preview, and argv-append stay arithmetically
// consistent. Splitting validation from dispatch also
// means a module whose entire discovered set is malformed
// (vanishingly unlikely in practice, but possible from a
// buggy custom strategy) is dropped cleanly instead of
// silently falling through to "run the whole module's
// test suite" once the bare `taskPath` is appended to the
// argv with no `--tests` filter.
Map<String, List<String>> validatedGroups = new LinkedHashMap<>(grouped.size());
int totalValid = 0;
for (Map.Entry<String, List<String>> entry : grouped.entrySet()) {
String modulePath = entry.getKey();
List<String> fqnsForModule = entry.getValue();

String taskPath = modulePath.isEmpty() ? "test" : modulePath + ":test";
args.add(taskPath);
// One module-level lifecycle line; the per-FQN entries
// are info-level because they can legitimately run into
// the thousands on a widely-used utility change and
// spamming lifecycle can blow past CI log-size caps
// (GitHub Actions truncates at 4 MiB/step) before the
// nested test output even starts.
getLogger().lifecycle(" {} ({} test class{})",
taskPath, fqnsForModule.size(), fqnsForModule.size() == 1 ? "" : "es");
for (String fqn : fqnsForModule) {
List<String> valid = new ArrayList<>(entry.getValue().size());
for (String fqn : entry.getValue()) {
if (!isValidFqn(fqn)) {
// Defence in depth against a compromised source tree
// sneaking shell-like tokens into a --tests argument.
// Gradle's CLI parser currently treats the next argv
// element as the value of --tests regardless of
// content, but that's an undocumented contract and a
// future parser change would turn this into real
// argument injection. Non-Java-shaped FQNs cannot
// correspond to a real JVM test class anyway, so
// dropping them costs nothing and forces the bad
// input to surface visibly.
// Defence in depth against a compromised source
// tree sneaking shell-like tokens into a --tests
// argument. The FQN cannot correspond to a real
// JVM test class, so dropping it is lossless.
getLogger().warn(
"Affected Tests: skipping malformed test FQN '{}' for task {} — "
+ "not a Java-shaped identifier, cannot correspond to a "
+ "real test class.", fqn, taskPath);
continue;
}
valid.add(fqn);
}
if (!valid.isEmpty()) {
validatedGroups.put(taskPath, valid);
totalValid += valid.size();
}
}

int skipped = testFqns.size() - totalValid;

// Belt-and-braces: a dispatch with zero surviving FQNs
// across ALL modules would otherwise produce an argv of
// `[gradlew, -x, compileJava]` with no task specified —
// Gradle's behavior there (fail vs. default-task) is
// environment-dependent and definitely not the safety
// posture `runAll` promises. If we get here the WARN logs
// above already explain which FQNs were dropped; a hard
// fail surfaces the mis-discovery instead of silently
// running nothing.
if (validatedGroups.isEmpty()) {
throw new GradleException(
"Affected Tests: every discovered FQN (" + testFqns.size()
+ ") was malformed and skipped — refusing to dispatch a taskless "
+ "Gradle invocation. See WARN logs above for the rejected FQNs.");
}

String skippedSuffix = skipped == 0
? ""
: " (" + skipped + (skipped == 1 ? " malformed FQN skipped" : " malformed FQNs skipped")
+ " — see WARN above)";
getLogger().lifecycle("Running {} affected test classes across {} module(s):{}",
totalValid, validatedGroups.size(), skippedSuffix);

// Dispatch-side emission. Lifecycle per module is bounded
// by LIFECYCLE_FQN_PREVIEW_LIMIT; the full list stays at
// info level (see that constant's Javadoc for why).
for (Map.Entry<String, List<String>> entry : validatedGroups.entrySet()) {
String taskPath = entry.getKey();
List<String> validFqns = entry.getValue();

args.add(taskPath);
for (String fqn : validFqns) {
args.add("--tests");
args.add(fqn);
}

renderLifecycleDispatchPreview(taskPath, validFqns)
.forEach(getLogger()::lifecycle);
for (String fqn : validFqns) {
getLogger().info(" {} -> {}", taskPath, fqn);
}
}
Expand Down Expand Up @@ -747,6 +786,56 @@ static String describeEscalation(EscalationReason reason) {
/** Cap on files listed per bucket in the {@code --explain} trace. */
static final int EXPLAIN_SAMPLE_LIMIT = 10;

/**
* Cap on FQNs listed at lifecycle level per module in the
* "Running N affected test classes" dispatch output. Chosen at 5
* to keep the preview tight enough that a reviewer can read it
* without scrolling yet large enough to sanity-check selection on
* most MRs, which dispatch single digits of classes per module.
* Larger dispatches still log every FQN at info level — this cap
* exists only to keep the default lifecycle log bounded and well
* under the 4 MiB GitHub Actions step cap that forced the
* per-FQN demotion in pre-v1.9.18 versions.
*/
static final int LIFECYCLE_FQN_PREVIEW_LIMIT = 5;

/**
* Renders the lifecycle-level dispatch preview for a single
* module: the summary line, then up to
* {@link #LIFECYCLE_FQN_PREVIEW_LIMIT} FQNs indented underneath,
* and — when the dispatch exceeds the preview limit — a single
* "… and N more (use --info for full list)" tail.
*
* <p>Package-private so
* {@code AffectedTestTaskDispatchPreviewTest} can pin the format
* without spinning up the Gradle runtime. The helper is pure over
* its inputs, so the test treats it as a pure function and the
* caller in {@link #executeTests} just pipes each returned line
* to {@link org.gradle.api.logging.Logger#lifecycle(String)}.
*
* @param taskPath the Gradle task path the dispatch targets (for
* example {@code "application:test"})
* @param fqns the validated FQNs being dispatched, in the
* order they will be passed to Gradle; preserving
* this order in the preview keeps the mental map
* from "what did I change" to "what is running"
* intact for the operator
*/
static List<String> renderLifecycleDispatchPreview(String taskPath, List<String> fqns) {
List<String> lines = new ArrayList<>(Math.min(fqns.size(), LIFECYCLE_FQN_PREVIEW_LIMIT) + 2);
int size = fqns.size();
String plural = size == 1 ? "" : "es";
lines.add(" " + taskPath + " (" + size + " test class" + plural + ")");
int preview = Math.min(size, LIFECYCLE_FQN_PREVIEW_LIMIT);
for (int i = 0; i < preview; i++) {
lines.add(" " + fqns.get(i));
}
if (size > preview) {
lines.add(" … and " + (size - preview) + " more (use --info for full list)");
}
return lines;
}

/**
* Renders the human-readable decision trace produced by
* {@code affectedTest --explain}. Returned as a list of lines so the
Expand Down Expand Up @@ -837,22 +926,33 @@ static List<String> renderExplainTrace(AffectedTestsConfig config, AffectedTests
* package-private so the explain-format tests can pin the exact
* conditions without spinning up Gradle.
*
* <p>The hint fires only when all three conditions hold:
* <ul>
* <li>at least one changed file exists (nothing to diagnose on
* an empty diff, and a re-run after every merge to master
* would otherwise spam the false alarm);</li>
* <li>at least one of {@code outOfScopeTestDirs} /
* {@code outOfScopeSourceDirs} is configured (zero-config
* installs never opted in, so the hint would just be noise);
* </li>
* <li>the out-of-scope bucket is empty (if the config IS biting,
* the bucket count already tells the story).</li>
* </ul>
* <p>Gate: only {@link Situation#DISCOVERY_SUCCESS} and
* {@link Situation#DISCOVERY_EMPTY} can have been influenced by an
* out-of-scope pattern. The other four situations reach their
* outcome for reasons the out-of-scope bucket cannot change:
* {@link Situation#EMPTY_DIFF} (no files to match),
* {@link Situation#ALL_FILES_IGNORED} (ignore wins before
* out-of-scope ever looks at the file),
* {@link Situation#ALL_FILES_OUT_OF_SCOPE} (bucket is non-empty, so
* the "silent" precondition is false — also caught by the
* bucket-empty guard below), and {@link Situation#UNMAPPED_FILE}
* (by construction the file missed every pattern, including
* out-of-scope). Firing here trains reviewers to ignore the hint
* on routine docs-only / gradle-only MRs — that's the v1.9.17
* sanity-test finding this gate exists to close.
*
* <p>Other preconditions: diff non-empty, at least one of
* {@code outOfScopeTestDirs} / {@code outOfScopeSourceDirs}
* configured, and the out-of-scope bucket empty.
*/
static void appendOutOfScopeHint(List<String> lines,
AffectedTestsConfig config,
AffectedTestsResult result) {
Situation situation = result.situation();
if (situation != Situation.DISCOVERY_SUCCESS
&& situation != Situation.DISCOVERY_EMPTY) {
return;
}
if (result.changedFiles().isEmpty()) {
return;
}
Expand Down
Loading