[build] File-level test target indexing for precise affected test detection #17033

titusfortner · 2026-01-31T16:03:12Z

Existing implementation is good for Java, and ok for Ruby, but bad for Python
Even after #17012, the current indexing would still run all python tests for any Python change
This PR will be good for Python and improve things for Ruby as well

💥 What does this PR do?

Refactors the Bazel test target indexing in rake_tasks/bazel.rake to work at the file level instead of the package level.
This enables more precise affected test detection when files change.

Key changes:

File-level indexing: Index now maps individual source files to test targets, not entire packages
High-impact directory detection: Changes to common/, rust/src/, javascript/atoms/, javascript/webdriver/atoms/ trigger all tests
Improved fallback for new files: When a file isn't in the index (e.g., newly added), queries Bazel to find the containing target and its dependent tests
Filter dotnet tests from java sources at index time: Reduces index size and speeds up lookups
Better Dotnet Test handling: Many .NET test files don't follow naming conventions, this matches off file directory instead of matching off file name
Filter root package targets: Skip //:* deps and srcs to avoid indexing generated files (node_modules, LICENSE, etc.)

The new approach:

For each test target, query deps(test) intersect //... except attr(testonly, 1, //...)
Filter out high-impact directories and root package targets
For each dependency, query its source files via labels(srcs, dep) (cached)
Build index: source_file → [test_targets]

For files not in the index (new files in PRs), the fallback:

Finds the target containing the file: attr(srcs, 'filename', //pkg:*)
Finds tests depending on that target: rdeps(//..., containing_target)

🔧 Implementation Notes

This is more queries during index building but results in a much more precise index for lookups.
Dependency source lookups are cached since many tests share the same dependencies (e.g., //py:common, //py:remote), so each dep is only queried once.

💡 Additional Considerations

We can add/remove to the HIGH_IMPACT_DIRS if we need

🔄 Types of changes

Cleanup (formatting, renaming)
Bug fix (backwards compatible)

qodo-code-review · 2026-01-31T16:03:25Z

PR Type

Enhancement, Bug fix

Description

Refactor test target indexing from package-level to file-level for precise affected test detection
Add high-impact directory detection to trigger all tests for critical shared code changes
Implement fallback query mechanism for newly added files not in index
Filter dotnet tests from java sources and exclude root package targets to reduce index size
Cache dependency source lookups to improve index building performance

File Walkthrough

Relevant files

Enhancement

bazel.rake `File-level test target indexing with caching and fallback` rake_tasks/bazel.rake Refactored indexing from package-level to file-level mapping (source files → test targets) Added `HIGH_IMPACT_DIRS` constant and pattern matching to detect critical directories that trigger all tests Implemented `query_test_deps()` to query test dependencies with filtering for high-impact dirs and root package targets Added `query_dep_srcs()` to query source files for dependencies with caching to avoid duplicate queries Implemented `add_test_to_index()` to build file-to-test mappings and filter dotnet tests from java sources Replaced `query_package_dep()` with `query_unindexed_file()` for fallback handling of new files using `rdeps()` queries Updated `affected_targets_with_index()` to perform direct file lookups and fallback to querying for unindexed files Renamed `affected_targets_fallback()` to `affected_targets_by_directory()` for clarity Updated test file pattern matching to include dotnet test directory detection Improved error handling with try-catch blocks in query functions	+86/-45

qodo-code-review · 2026-01-31T16:03:39Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
⚪	Bazel query injection Description: New code builds Bazel query strings by interpolating file-derived values (e.g., `rel` in `attr(srcs, '#{rel}', ...)` and `containing.join(' + ')`), so a repository filename/label containing quotes or query operators could alter the Bazel query (and potentially the executed command, depending on how `Bazel.execute` shells out), leading to unintended target selection or command/query injection in CI. bazel.rake [170-193] Referred Code def query_unindexed_file(filepath) pkg = find_bazel_package(filepath) return [] unless pkg rel = pkg == '.' ? filepath : filepath.sub(%r{^#{Regexp.escape(pkg)}/}, '') pkg = '' if pkg == '.' # Find targets that contain this file in their srcs containing = [] Bazel.execute('query', ['--output=label'], "attr(srcs, '#{rel}', //#{pkg}:*)") do \|out\| containing = out.lines.map(&:strip).select { \|l\| l.start_with?('//') } end return [] if containing.empty? # Find tests that depend on those targets targets = [] Bazel.execute('query', ['--output=label'], "kind(_test, rdeps(//..., #{containing.join(' + ')}))") do \|out\| targets = out.lines.map(&:strip).select { \|l\| l.start_with?('//') } end # dotnet tests depend on java server, but there are no remote tests, so safe to ignore ... (clipped 3 lines)
Ticket Compliance
⚪	🎫 No ticket provided Create ticket/issue
Codebase Duplication Compliance
⚪	Codebase context is not defined Follow the guide to enable codebase context checks.
Custom Compliance
🟢	Generic: Comprehensive Audit Trails Objective: To create a detailed and reliable record of critical system actions for security analysis and compliance. Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Meaningful Naming and Self-Documenting Code Objective: Ensure all identifiers clearly express their purpose and intent, making code self-documenting Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Secure Error Handling Objective: To prevent the leakage of sensitive system information through error messages while providing sufficient detail for internal debugging. Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Secure Logging Practices Objective: To ensure logs are useful for debugging and auditing without exposing sensitive information like PII, PHI, or cardholder data. Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
🔴	Generic: Robust Error Handling and Edge Case Management Objective: Ensure comprehensive error handling that provides meaningful context and graceful degradation Status: Silent error handling: New Bazel query helpers rescue `StandardError` and return empty results (or log without actionable context), which can silently hide failures and produce incorrect affected-target outputs. Referred Code def query_test_deps(test) deps = [] Bazel.execute('query', ['--output=label'], "deps(#{test}) intersect //... except attr(testonly, 1, //...)") do \|out\| deps = out.lines.map(&:strip).select { \|l\| l.start_with?('//') } end deps.reject do \|d\| # Skip high-impact dirs and root package targets (generated files, LICENSE, etc) HIGH_IMPACT_DIRS.any? { \|dir\| d.start_with?("//#{dir}") } \|\| d.start_with?('//:') end rescue StandardError => e puts " Warning: Failed to query deps for #{test}: #{e.message}" [] end def add_test_to_index(index, test, srcs) srcs.each do \|src\| # Convert //pkg:file to pkg/file filepath = src.sub(%r{^//}, '').tr(':', '/') # Skip dotnet tests for java sources (dotnet depends on java server but has no remote tests) next if filepath.start_with?('java/') && test.start_with?('//dotnet/') ... (clipped 81 lines) Learn more about managing compliance generic rules or creating your own custom rules
⚪	Generic: Security-First Input Validation and Data Handling Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent vulnerabilities Status: Unescaped query input: File-derived strings (e.g., `rel`/`filepath`) are interpolated into Bazel query expressions without escaping quotes/metacharacters, which could enable query injection or unexpected behavior if unusual filenames exist. Referred Code def query_unindexed_file(filepath) pkg = find_bazel_package(filepath) return [] unless pkg rel = pkg == '.' ? filepath : filepath.sub(%r{^#{Regexp.escape(pkg)}/}, '') pkg = '' if pkg == '.' # Find targets that contain this file in their srcs containing = [] Bazel.execute('query', ['--output=label'], "attr(srcs, '#{rel}', //#{pkg}:*)") do \|out\| containing = out.lines.map(&:strip).select { \|l\| l.start_with?('//') } end return [] if containing.empty? # Find tests that depend on those targets targets = [] Bazel.execute('query', ['--output=label'], "kind(_test, rdeps(//..., #{containing.join(' + ')}))") do \|out\| targets = out.lines.map(&:strip).select { \|l\| l.start_with?('//') } end # dotnet tests depend on java server, but there are no remote tests, so safe to ignore ... (clipped 1 lines) Learn more about managing compliance generic rules or creating your own custom rules
Update

Compliance status legend

🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

qodo-code-review · 2026-01-31T16:05:14Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Fail build on dependency query error In `query_test_deps`, re-raise the `StandardError` instead of returning an empty array to ensure the index-building task fails on error, preventing the creation of a faulty test index. rake_tasks/bazel.rake [93-105] def query_test_deps(test) deps = [] Bazel.execute('query', ['--output=label'], "deps(#{test}) intersect //... except attr(testonly, 1, //...)") do \|out\| deps = out.lines.map(&:strip).select { \|l\| l.start_with?('//') } end deps.reject do \|d\| # Skip high-impact dirs and root package targets (generated files, LICENSE, etc) HIGH_IMPACT_DIRS.any? { \|dir\| d.start_with?("//#{dir}") } \|\| d.start_with?('//:') end rescue StandardError => e - puts " Warning: Failed to query deps for #{test}: #{e.message}" - [] + puts " Error: Failed to query deps for #{test}: #{e.message}" + raise end Apply / Chat Suggestion importance[1-10]: 8 __ Why: The suggestion correctly identifies that silently handling errors during dependency querying can lead to an incomplete test index, causing tests to be skipped. Re-raising the exception makes the process fail-fast, which is more robust and prevents silent failures.	Medium
	Correctly handle Python package indexing In `add_test_to_index`, improve the Bazel label-to-filepath conversion to correctly handle Python `init.py` files by checking if the path is a directory and adjusting it accordingly. rake_tasks/bazel.rake [107-116] def add_test_to_index(index, test, srcs) srcs.each do \|src\| # Convert //pkg:file to pkg/file filepath = src.sub(%r{^//}, '').tr(':', '/') + + # Handle Python __init__.py files, where label is `//pkg/mod` for `pkg/mod/__init__.py` + if Dir.exist?(filepath) && File.exist?(File.join(filepath, '__init__.py')) + filepath = File.join(filepath, '__init__.py') + end + # Skip dotnet tests for java sources (dotnet depends on java server but has no remote tests) next if filepath.start_with?('java/') && test.start_with?('//dotnet/') index[filepath] << test end end Apply / Chat Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies a subtle bug in handling Python package source files (`__init__.py`), where the file path could be misinterpreted as a directory. The proposed fix improves the correctness of the test index for Python code.	Medium
	Quote test-kind pattern In the `Bazel.execute` call within `targets_from_tests`, wrap the `_test` pattern in single quotes to form a valid Bazel query string, changing it to `kind('_test',` `...)`. rake_tasks/bazel.rake [212-214] -Bazel.execute('query', ['--output=label'], "kind(_test, #{query})") do \|out\| +Bazel.execute('query', ['--output=label'], "kind('_test', #{query})") do \|out\| targets = out.lines.map(&:strip).select { \|l\| l.start_with?('//') } end Apply / Chat Suggestion importance[1-10]: 7 __ Why: The suggestion correctly points out that the `_test` pattern in the Bazel query should be quoted to be a valid string literal. This fixes a syntax error in the query that would cause it to fail.	Medium
High-level	Consider performance of unindexed file fallback The current fallback for unindexed files runs multiple Bazel queries sequentially for each file, which can be slow. Consider batching these queries to improve performance when a PR contains many new files. Examples: rake_tasks/bazel.rake [163] affected.merge(query_unindexed_file(filepath)) rake_tasks/bazel.rake [170-194] def query_unindexed_file(filepath) pkg = find_bazel_package(filepath) return [] unless pkg rel = pkg == '.' ? filepath : filepath.sub(%r{^#{Regexp.escape(pkg)}/}, '') pkg = '' if pkg == '.' # Find targets that contain this file in their srcs containing = [] Bazel.execute('query', ['--output=label'], "attr(srcs, '#{rel}', //#{pkg}:*)") do \|out\| ... (clipped 15 lines) Solution Walkthrough: Before: # in affected_targets_with_index lib_files.each do \|filepath\| tests = index[filepath] if tests affected.merge(tests) else # Fallback is called once per unindexed file affected.merge(query_unindexed_file(filepath)) end end def query_unindexed_file(filepath) # Query 1: Find owning target for ONE file containing = bazel_query("attr(srcs, '#{rel}', ...)") # Query 2: Find tests for that ONE target's deps targets = bazel_query("rdeps(..., #{containing.join(' + ')})") return targets end After: # in affected_targets_with_index unindexed_files = lib_files.select { \|f\| !index[f] } if unindexed_files.any? # Fallback is called once for ALL unindexed files affected.merge(query_unindexed_files_batched(unindexed_files)) end def query_unindexed_files_batched(filepaths) # Query 1 (Batched): Find all owning targets # (e.g., by joining `attr` queries with ' + ') all_containing = bazel_query("attr(...) + attr(...) + ...") # Query 2 (Batched): Find all tests for all owning targets all_targets = bazel_query("rdeps(..., #{all_containing.join(' + ')})") return all_targets end Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies a potential performance bottleneck in the `query_unindexed_file` fallback logic, which executes multiple sequential Bazel queries for each new file, and proposes a valid optimization by batching them.	Medium
Learned best practice	Use safe argument-based command execution Replace backticks with an argument-based command execution (e.g., `Open3.capture2e`) to avoid injection via `base_rev/head_rev` and to check the exit status with a clear error when `git diff` fails. rake_tasks/bazel.rake [29] -changed_files = `git diff --name-only #{base_rev} #{head_rev}`.split("\n").map(&:strip).reject(&:empty?) +require 'open3' +... +out, status = Open3.capture2e('git', 'diff', '--name-only', base_rev.to_s, head_rev.to_s) +raise "git diff failed for #{base_rev}..#{head_rev}: #{out}" unless status.success? +changed_files = out.lines.map(&:strip).reject(&:empty?) + `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 6 __ Why: Relevant best practice - Avoid shell interpolation when executing commands derived from inputs; use argument-based execution and fail fast on command errors.	Low
Learned best practice	Escape query inputs before interpolation Escape `'` (and other special characters) in `rel` before embedding it into the Bazel query, and log/propagate query failures so missing results don’t silently turn into “no affected tests.” rake_tasks/bazel.rake [179-181] -Bazel.execute('query', ['--output=label'], "attr(srcs, '#{rel}', //#{pkg}:)") do \|out\| +rel_escaped = rel.gsub("'", %q(\\\')) +Bazel.execute('query', ['--output=label'], "attr(srcs, '#{rel_escaped}', //#{pkg}:)") do \|out\| containing = out.lines.map(&:strip).select { \|l\| l.start_with?('//') } end -... `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 5 __ Why: Relevant best practice - Sanitize/escape inputs embedded into query strings and avoid silently swallowing failures in automation logic.	Low
Update

Copilot

Pull request overview

This PR refactors the Bazel test target indexing to operate at the file level instead of the package level, enabling more precise affected test detection. The change addresses an issue where any Python file change would trigger all Python tests.

Changes:

Implements file-level indexing that maps individual source files to test targets
Adds high-impact directory detection to trigger all tests for changes in common/, rust/src/, javascript/atoms/, and javascript/webdriver/atoms/
Improves fallback mechanism for unindexed files (e.g., newly added files) by querying Bazel to find containing targets and their dependent tests

rake_tasks/bazel.rake

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

.github/workflows/ci.yml

rake_tasks/bazel.rake

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

.github/workflows/ci.yml

rake_tasks/bazel.rake

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI review requested due to automatic review settings January 31, 2026 16:03

selenium-ci added the B-build Includes scripting, bazel and CI integrations label Jan 31, 2026

Copilot started reviewing on behalf of titusfortner January 31, 2026 16:03 View session

qodo-code-review bot added the Review effort 3/5 label Jan 31, 2026

Copilot AI reviewed Jan 31, 2026

View reviewed changes

rake_tasks/bazel.rake Show resolved Hide resolved

Copilot AI reviewed Jan 31, 2026

View reviewed changes

titusfortner force-pushed the index_files branch from c0fd7cf to 192a52a Compare January 31, 2026 17:40

Copilot AI review requested due to automatic review settings January 31, 2026 18:11

Copilot started reviewing on behalf of titusfortner January 31, 2026 18:11 View session

Copilot AI reviewed Jan 31, 2026

View reviewed changes

.github/workflows/ci.yml Outdated Show resolved Hide resolved

rake_tasks/bazel.rake Outdated Show resolved Hide resolved

rake_tasks/bazel.rake Show resolved Hide resolved

Copilot AI reviewed Jan 31, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings January 31, 2026 21:01

Copilot started reviewing on behalf of titusfortner January 31, 2026 21:01 View session

Copilot AI reviewed Jan 31, 2026

View reviewed changes

titusfortner added 7 commits January 31, 2026 15:50

[build] update target indexing to work by file not package

5275e5a

need to cache the dependency query for sources

62e6e67

need to filter out aspect rules and node modules

161449e

filter before querying sources

cd4d909

filter moar better

134c637

change name of index so new code is looking for the right index

7a33c06

fix lint

31155fc

titusfortner force-pushed the index_files branch from e43e353 to 31155fc Compare January 31, 2026 21:53

address PR feedback: improve logging and regex precision

2630838

Copilot AI reviewed Jan 31, 2026

View reviewed changes

Uh oh!

[build] File-level test target indexing for precise affected test detection #17033

Are you sure you want to change the base?

[build] File-level test target indexing for precise affected test detection #17033

Conversation

titusfortner commented Jan 31, 2026

💥 What does this PR do?

🔧 Implementation Notes

💡 Additional Considerations

🔄 Types of changes

Uh oh!

qodo-code-review bot commented Jan 31, 2026

PR Type

Description

File Walkthrough

Uh oh!

qodo-code-review bot commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Compliance Guide 🔍

Uh oh!

qodo-code-review bot commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Examples:

Solution Walkthrough:

Before:

After:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

qodo-code-review bot commented Jan 31, 2026 •

edited

Loading

qodo-code-review bot commented Jan 31, 2026 •

edited

Loading