Skip to content

Conversation

@titusfortner
Copy link
Member

Existing implementation is good for Java, and ok for Ruby, but bad for Python
Even after #17012, the current indexing would still run all python tests for any Python change
This PR will be good for Python and improve things for Ruby as well

💥 What does this PR do?

Refactors the Bazel test target indexing in rake_tasks/bazel.rake to work at the file level instead of the package level.
This enables more precise affected test detection when files change.

Key changes:

  • File-level indexing: Index now maps individual source files to test targets, not entire packages
  • High-impact directory detection: Changes to common/, rust/src/, javascript/atoms/, javascript/webdriver/atoms/ trigger all tests
  • Improved fallback for new files: When a file isn't in the index (e.g., newly added), queries Bazel to find the containing target and its dependent tests
  • Filter dotnet tests from java sources at index time: Reduces index size and speeds up lookups
  • Better Dotnet Test handling: Many .NET test files don't follow naming conventions, this matches off file directory instead of matching off file name
  • Filter root package targets: Skip //:* deps and srcs to avoid indexing generated files (node_modules, LICENSE, etc.)

The new approach:

  1. For each test target, query deps(test) intersect //... except attr(testonly, 1, //...)
  2. Filter out high-impact directories and root package targets
  3. For each dependency, query its source files via labels(srcs, dep) (cached)
  4. Build index: source_file → [test_targets]

For files not in the index (new files in PRs), the fallback:

  1. Finds the target containing the file: attr(srcs, 'filename', //pkg:*)
  2. Finds tests depending on that target: rdeps(//..., containing_target)

🔧 Implementation Notes

This is more queries during index building but results in a much more precise index for lookups.
Dependency source lookups are cached since many tests share the same dependencies (e.g., //py:common, //py:remote), so each dep is only queried once.

💡 Additional Considerations

  • We can add/remove to the HIGH_IMPACT_DIRS if we need

🔄 Types of changes

  • Cleanup (formatting, renaming)
  • Bug fix (backwards compatible)

Copilot AI review requested due to automatic review settings January 31, 2026 16:03
@qodo-code-review
Copy link
Contributor

PR Type

Enhancement, Bug fix


Description

  • Refactor test target indexing from package-level to file-level for precise affected test detection

  • Add high-impact directory detection to trigger all tests for critical shared code changes

  • Implement fallback query mechanism for newly added files not in index

  • Filter dotnet tests from java sources and exclude root package targets to reduce index size

  • Cache dependency source lookups to improve index building performance


File Walkthrough

Relevant files
Enhancement
bazel.rake
File-level test target indexing with caching and fallback

rake_tasks/bazel.rake

  • Refactored indexing from package-level to file-level mapping (source
    files → test targets)
  • Added HIGH_IMPACT_DIRS constant and pattern matching to detect
    critical directories that trigger all tests
  • Implemented query_test_deps() to query test dependencies with
    filtering for high-impact dirs and root package targets
  • Added query_dep_srcs() to query source files for dependencies with
    caching to avoid duplicate queries
  • Implemented add_test_to_index() to build file-to-test mappings and
    filter dotnet tests from java sources
  • Replaced query_package_dep() with query_unindexed_file() for fallback
    handling of new files using rdeps() queries
  • Updated affected_targets_with_index() to perform direct file lookups
    and fallback to querying for unindexed files
  • Renamed affected_targets_fallback() to affected_targets_by_directory()
    for clarity
  • Updated test file pattern matching to include dotnet test directory
    detection
  • Improved error handling with try-catch blocks in query functions
+86/-45 

@selenium-ci selenium-ci added the B-build Includes scripting, bazel and CI integrations label Jan 31, 2026
@qodo-code-review
Copy link
Contributor

qodo-code-review bot commented Jan 31, 2026

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Bazel query injection

Description: New code builds Bazel query strings by interpolating file-derived values (e.g., rel in
attr(srcs, '#{rel}', ...) and containing.join(' + ')), so a repository filename/label
containing quotes or query operators could alter the Bazel query (and potentially the
executed command, depending on how Bazel.execute shells out), leading to unintended target
selection or command/query injection in CI.
bazel.rake [170-193]

Referred Code
def query_unindexed_file(filepath)
  pkg = find_bazel_package(filepath)
  return [] unless pkg

  rel = pkg == '.' ? filepath : filepath.sub(%r{^#{Regexp.escape(pkg)}/}, '')
  pkg = '' if pkg == '.'

  # Find targets that contain this file in their srcs
  containing = []
  Bazel.execute('query', ['--output=label'], "attr(srcs, '#{rel}', //#{pkg}:*)") do |out|
    containing = out.lines.map(&:strip).select { |l| l.start_with?('//') }
  end
  return [] if containing.empty?

  # Find tests that depend on those targets
  targets = []
  Bazel.execute('query', ['--output=label'], "kind(_test, rdeps(//..., #{containing.join(' + ')}))") do |out|
    targets = out.lines.map(&:strip).select { |l| l.start_with?('//') }
  end

  # dotnet tests depend on java server, but there are no remote tests, so safe to ignore


 ... (clipped 3 lines)
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Silent error handling: New Bazel query helpers rescue StandardError and return empty results (or log without
actionable context), which can silently hide failures and produce incorrect
affected-target outputs.

Referred Code
def query_test_deps(test)
  deps = []
  Bazel.execute('query', ['--output=label'], "deps(#{test}) intersect //... except attr(testonly, 1, //...)") do |out|
    deps = out.lines.map(&:strip).select { |l| l.start_with?('//') }
  end
  deps.reject do |d|
    # Skip high-impact dirs and root package targets (generated files, LICENSE, etc)
    HIGH_IMPACT_DIRS.any? { |dir| d.start_with?("//#{dir}") } || d.start_with?('//:')
  end
rescue StandardError => e
  puts "  Warning: Failed to query deps for #{test}: #{e.message}"
  []
end

def add_test_to_index(index, test, srcs)
  srcs.each do |src|
    # Convert //pkg:file to pkg/file
    filepath = src.sub(%r{^//}, '').tr(':', '/')
    # Skip dotnet tests for java sources (dotnet depends on java server but has no remote tests)
    next if filepath.start_with?('java/') && test.start_with?('//dotnet/')



 ... (clipped 81 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Unescaped query input: File-derived strings (e.g., rel/filepath) are interpolated into Bazel query expressions
without escaping quotes/metacharacters, which could enable query injection or unexpected
behavior if unusual filenames exist.

Referred Code
def query_unindexed_file(filepath)
  pkg = find_bazel_package(filepath)
  return [] unless pkg

  rel = pkg == '.' ? filepath : filepath.sub(%r{^#{Regexp.escape(pkg)}/}, '')
  pkg = '' if pkg == '.'

  # Find targets that contain this file in their srcs
  containing = []
  Bazel.execute('query', ['--output=label'], "attr(srcs, '#{rel}', //#{pkg}:*)") do |out|
    containing = out.lines.map(&:strip).select { |l| l.start_with?('//') }
  end
  return [] if containing.empty?

  # Find tests that depend on those targets
  targets = []
  Bazel.execute('query', ['--output=label'], "kind(_test, rdeps(//..., #{containing.join(' + ')}))") do |out|
    targets = out.lines.map(&:strip).select { |l| l.start_with?('//') }
  end

  # dotnet tests depend on java server, but there are no remote tests, so safe to ignore


 ... (clipped 1 lines)

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link
Contributor

qodo-code-review bot commented Jan 31, 2026

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fail build on dependency query error

In query_test_deps, re-raise the StandardError instead of returning an empty
array to ensure the index-building task fails on error, preventing the creation
of a faulty test index.

rake_tasks/bazel.rake [93-105]

 def query_test_deps(test)
   deps = []
   Bazel.execute('query', ['--output=label'], "deps(#{test}) intersect //... except attr(testonly, 1, //...)") do |out|
     deps = out.lines.map(&:strip).select { |l| l.start_with?('//') }
   end
   deps.reject do |d|
     # Skip high-impact dirs and root package targets (generated files, LICENSE, etc)
     HIGH_IMPACT_DIRS.any? { |dir| d.start_with?("//#{dir}") } || d.start_with?('//:')
   end
 rescue StandardError => e
-  puts "  Warning: Failed to query deps for #{test}: #{e.message}"
-  []
+  puts "  Error: Failed to query deps for #{test}: #{e.message}"
+  raise
 end
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies that silently handling errors during dependency querying can lead to an incomplete test index, causing tests to be skipped. Re-raising the exception makes the process fail-fast, which is more robust and prevents silent failures.

Medium
Correctly handle Python package indexing

In add_test_to_index, improve the Bazel label-to-filepath conversion to
correctly handle Python init.py files by checking if the path is a directory
and adjusting it accordingly.

rake_tasks/bazel.rake [107-116]

 def add_test_to_index(index, test, srcs)
   srcs.each do |src|
     # Convert //pkg:file to pkg/file
     filepath = src.sub(%r{^//}, '').tr(':', '/')
+
+    # Handle Python __init__.py files, where label is `//pkg/mod` for `pkg/mod/__init__.py`
+    if Dir.exist?(filepath) && File.exist?(File.join(filepath, '__init__.py'))
+      filepath = File.join(filepath, '__init__.py')
+    end
+
     # Skip dotnet tests for java sources (dotnet depends on java server but has no remote tests)
     next if filepath.start_with?('java/') && test.start_with?('//dotnet/')
 
     index[filepath] << test
   end
 end
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a subtle bug in handling Python package source files (__init__.py), where the file path could be misinterpreted as a directory. The proposed fix improves the correctness of the test index for Python code.

Medium
Quote test-kind pattern

In the Bazel.execute call within targets_from_tests, wrap the _test pattern in
single quotes to form a valid Bazel query string, changing it to kind('_test',
...).

rake_tasks/bazel.rake [212-214]

-Bazel.execute('query', ['--output=label'], "kind(_test, #{query})") do |out|
+Bazel.execute('query', ['--output=label'], "kind('_test', #{query})") do |out|
   targets = out.lines.map(&:strip).select { |l| l.start_with?('//') }
 end
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly points out that the _test pattern in the Bazel query should be quoted to be a valid string literal. This fixes a syntax error in the query that would cause it to fail.

Medium
High-level
Consider performance of unindexed file fallback

The current fallback for unindexed files runs multiple Bazel queries
sequentially for each file, which can be slow. Consider batching these queries
to improve performance when a PR contains many new files.

Examples:

rake_tasks/bazel.rake [163]
      affected.merge(query_unindexed_file(filepath))
rake_tasks/bazel.rake [170-194]
def query_unindexed_file(filepath)
  pkg = find_bazel_package(filepath)
  return [] unless pkg

  rel = pkg == '.' ? filepath : filepath.sub(%r{^#{Regexp.escape(pkg)}/}, '')
  pkg = '' if pkg == '.'

  # Find targets that contain this file in their srcs
  containing = []
  Bazel.execute('query', ['--output=label'], "attr(srcs, '#{rel}', //#{pkg}:*)") do |out|

 ... (clipped 15 lines)

Solution Walkthrough:

Before:

# in affected_targets_with_index
lib_files.each do |filepath|
  tests = index[filepath]
  if tests
    affected.merge(tests)
  else
    # Fallback is called once per unindexed file
    affected.merge(query_unindexed_file(filepath))
  end
end

def query_unindexed_file(filepath)
  # Query 1: Find owning target for ONE file
  containing = bazel_query("attr(srcs, '#{rel}', ...)")
  
  # Query 2: Find tests for that ONE target's deps
  targets = bazel_query("rdeps(..., #{containing.join(' + ')})")
  return targets
end

After:

# in affected_targets_with_index
unindexed_files = lib_files.select { |f| !index[f] }
if unindexed_files.any?
  # Fallback is called once for ALL unindexed files
  affected.merge(query_unindexed_files_batched(unindexed_files))
end

def query_unindexed_files_batched(filepaths)
  # Query 1 (Batched): Find all owning targets
  # (e.g., by joining `attr` queries with ' + ')
  all_containing = bazel_query("attr(...) + attr(...) + ...")

  # Query 2 (Batched): Find all tests for all owning targets
  all_targets = bazel_query("rdeps(..., #{all_containing.join(' + ')})")
  return all_targets
end
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential performance bottleneck in the query_unindexed_file fallback logic, which executes multiple sequential Bazel queries for each new file, and proposes a valid optimization by batching them.

Medium
Learned
best practice
Use safe argument-based command execution

Replace backticks with an argument-based command execution (e.g.,
Open3.capture2e) to avoid injection via base_rev/head_rev and to check the exit
status with a clear error when git diff fails.

rake_tasks/bazel.rake [29]

-changed_files = `git diff --name-only #{base_rev} #{head_rev}`.split("\n").map(&:strip).reject(&:empty?)
+require 'open3'
+...
+out, status = Open3.capture2e('git', 'diff', '--name-only', base_rev.to_s, head_rev.to_s)
+raise "git diff failed for #{base_rev}..#{head_rev}: #{out}" unless status.success?
 
+changed_files = out.lines.map(&:strip).reject(&:empty?)
+

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 6

__

Why:
Relevant best practice - Avoid shell interpolation when executing commands derived from inputs; use argument-based execution and fail fast on command errors.

Low
Escape query inputs before interpolation

Escape ' (and other special characters) in rel before embedding it into the
Bazel query, and log/propagate query failures so missing results don’t silently
turn into “no affected tests.”

rake_tasks/bazel.rake [179-181]

-Bazel.execute('query', ['--output=label'], "attr(srcs, '#{rel}', //#{pkg}:*)") do |out|
+rel_escaped = rel.gsub("'", %q(\\\'))
+Bazel.execute('query', ['--output=label'], "attr(srcs, '#{rel_escaped}', //#{pkg}:*)") do |out|
   containing = out.lines.map(&:strip).select { |l| l.start_with?('//') }
 end
-...

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 5

__

Why:
Relevant best practice - Sanitize/escape inputs embedded into query strings and avoid silently swallowing failures in automation logic.

Low
  • Update

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the Bazel test target indexing to operate at the file level instead of the package level, enabling more precise affected test detection. The change addresses an issue where any Python file change would trigger all Python tests.

Changes:

  • Implements file-level indexing that maps individual source files to test targets
  • Adds high-impact directory detection to trigger all tests for changes in common/, rust/src/, javascript/atoms/, and javascript/webdriver/atoms/
  • Improves fallback mechanism for unindexed files (e.g., newly added files) by querying Bazel to find containing targets and their dependent tests

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI review requested due to automatic review settings January 31, 2026 18:11
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI review requested due to automatic review settings January 31, 2026 21:01
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

B-build Includes scripting, bazel and CI integrations Review effort 3/5

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants