Skip to content

add required layer option to conditionally run runtimes queries#1866

Open
khansaad wants to merge 6 commits into
kruize:mvp_demofrom
khansaad:runtime-conditional-run
Open

add required layer option to conditionally run runtimes queries#1866
khansaad wants to merge 6 commits into
kruize:mvp_demofrom
khansaad:runtime-conditional-run

Conversation

@khansaad
Copy link
Copy Markdown
Contributor

@khansaad khansaad commented Apr 4, 2026

Description

This PR adds a new required_layer attribute in the aggregation_functions alongside the query to conditionally run the runtime queries based on the corresponding layer detection.

Fixes # (issue)

Type of change

  • Bug fix
  • New feature
  • Docs update
  • Breaking change (What changes might users need to make in their application due to this PR?)
  • Requires DB changes

How has this been tested?

Please describe the tests that were run to verify your changes and steps to reproduce. Please specify any test configuration required.

  • New Test X
  • Functional testsuite

Test Configuration

  • Kubernetes clusters tested on:

Checklist 🎯

  • Followed coding guidelines
  • Comments added
  • Dependent changes merged
  • Documentation updated
  • Tests added or updated

Additional information

Include any additional information such as links, test results, screenshots here

Summary by Sourcery

Introduce conditional execution of runtime metrics based on detected layers to reduce unnecessary queries and improve efficiency.

New Features:

  • Add optional layers_required attribute to metrics to gate runtime Prometheus queries on detected runtime layers.
  • Extend metric filtering in the recommendation engine to consider experiment type, Kubernetes object, detected layers, and accelerator support before executing queries.

Enhancements:

  • Refactor recommendation engine metric filtering to pre-filter metrics for accelerators and layers, reducing redundant checks and query executions.
  • Update performance profile manifests and version to annotate JVM-related metrics with required runtime layers.

Documentation:

  • Add an ADR documenting the design and behavior of conditional runtime metric queries based on layer detection.

Signed-off-by: Saad Khan <saakhan@ibm.com>
@khansaad khansaad added this to the Kruize 0.11.0 Release milestone Apr 4, 2026
@khansaad khansaad requested a review from kusumachalasani April 4, 2026 08:19
@khansaad khansaad self-assigned this Apr 4, 2026
@khansaad khansaad added the enhancement New feature or request label Apr 4, 2026
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Apr 4, 2026

Reviewer's Guide

Implements conditional execution of runtime-related metric queries by introducing a layers_required attribute on metrics, wiring it into the recommendation engine’s metric filtering, and annotating JVM-related metrics in performance profiles so they only run when corresponding runtime layers are detected.

Flow diagram for metric filtering using layers_required

flowchart TD
    A[fetchContainerMetricsBasedOnDataSourceAndProfile] --> B[Call filterMetricsBasedOnExpTypeAndK8sObjectAndLayers]
    B --> C[Iterate metric from metricProfile getSloInfo getFunctionVariables]
    C --> D{metric.name == maxDateQuery?}
    D -->|Yes| SKIP1[Skip metric]
    D -->|No| E{ExperimentType matches namespace/container?}
    E -->|No| SKIP2[Skip metric]
    E -->|Yes| F{Accelerator metric and !fetchAcceleratorMetrics?}
    F -->|Yes| SKIP3[Skip metric]
    F -->|No| G{layersRequired is null or empty?}
    G -->|Yes| H[Include metric]
    G -->|No| I{Any layer in layersRequired exists in containerData.layerMap?}
    I -->|No| SKIP4[Skip metric<br>and log debug]
    I -->|Yes| H[Include metric]
    H --> J[Return filtered metricList to fetchContainerMetricsBasedOnDataSourceAndProfile]
    SKIP1 --> C
    SKIP2 --> C
    SKIP3 --> C
    SKIP4 --> C
Loading

File-Level Changes

Change Details Files
Pre-filter metrics in the RecommendationEngine based on experiment type, accelerator presence, and newly introduced required runtime layers before executing aggregation queries.
  • Replace the simple filterMetricsBasedOnExpTypeAndK8sObject call with a new filterMetricsBasedOnExpTypeAndK8sObjectAndLayers helper that also considers container layer detection and accelerator usage flags.
  • Compute a single fetchAcceleratorMetrics flag up front based on container device list and ROS instead of recomputing per metric, then use it in the new filter to exclude accelerator and accelerator-partition metrics when not applicable.
  • Add layer-based filtering inside the new helper so metrics with layers_required are only included if at least one required layer is present in the container’s layerMap, logging debug information when metrics are skipped.
src/main/java/com/autotune/analyzer/recommendations/engine/RecommendationEngine.java
Extend the Metric data model to carry an optional list of layers_required to drive conditional runtime metric execution.
  • Introduce a layersRequired field annotated as layers_required with corresponding getter and setter on the Metric class.
  • Include layersRequired in the Metric.toString output for easier debugging and logging.
src/main/java/com/autotune/common/data/metrics/Metric.java
Annotate JVM-related metrics in performance profiles with required runtime layers and bump profile version to reflect the schema enhancement.
  • Update resource_optimization_local_monitoring YAML performance profile to version 2.0 and add a layers_required array (hotspot, semeru) to JVM-related container metrics.
  • Update resource_optimization_local_monitoring_norecordingrules YAML performance profile similarly with version bump and layers_required annotations for JVM metrics.
  • Adjust corresponding JSON performance profile files to align with the new schema (implicit in the PR).
manifests/autotune/performance-profiles/resource_optimization_local_monitoring.yaml
manifests/autotune/performance-profiles/resource_optimization_local_monitoring_norecordingrules.yaml
manifests/autotune/performance-profiles/resource_optimization_local_monitoring.json
manifests/autotune/performance-profiles/resource_optimization_local_monitoring_norecordingrules.json
Document the design for conditional runtime metric queries and the layers_required field as an ADR-style design note.
  • Add a detailed design document describing motivation, schema changes, engine logic, alternatives considered, and example usage for conditional runtime metric queries based on layer detection.
design/ConditionalRuntimeMetricQueries.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@khansaad khansaad moved this to In Progress in Monitoring Apr 4, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • In fetchContainerMetricsBasedOnDataSourceAndProfile, you repeatedly call containerData.getLayerMap() and null-check it inside the inner loop; consider pulling the map into a local variable and short-circuiting when it's null to simplify the logic and avoid repeated lookups.
  • The AggregationFunctions class now has a requiredLayer field but no constructor path to set it explicitly; if this type is ever instantiated programmatically (not just via deserialization), you may want to add an overloaded constructor or builder method that accepts requiredLayer to avoid partially initialized instances.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `fetchContainerMetricsBasedOnDataSourceAndProfile`, you repeatedly call `containerData.getLayerMap()` and null-check it inside the inner loop; consider pulling the map into a local variable and short-circuiting when it's null to simplify the logic and avoid repeated lookups.
- The `AggregationFunctions` class now has a `requiredLayer` field but no constructor path to set it explicitly; if this type is ever instantiated programmatically (not just via deserialization), you may want to add an overloaded constructor or builder method that accepts `requiredLayer` to avoid partially initialized instances.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Signed-off-by: Saad Khan <saakhan@ibm.com>
Signed-off-by: Saad Khan <saakhan@ibm.com>
Copy link
Copy Markdown
Contributor

@kusumachalasani kusumachalasani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider updating the profile version ?

Signed-off-by: Saad Khan <saakhan@ibm.com>
@kusumachalasani
Copy link
Copy Markdown
Contributor

@sourcery-ai review

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The new filterMetricsBasedOnExpTypeAndK8sObjectAndLayers method largely duplicates the logic of filterMetricsBasedOnExpTypeAndK8sObject; consider refactoring to share the common experiment-type/kubernetes-object filtering to avoid future drift between the two.
  • The isROS parameter on filterMetricsBasedOnExpTypeAndK8sObjectAndLayers is currently unused; either remove it or wire it into the filtering logic to keep the method signature aligned with its behavior.
  • In the layer-based filtering, you access containerData.getLayerMap() and containerData.getContainer_name(); if there is any chance containerData can be null in this code path, consider a defensive null check to avoid potential NPEs when adding new call sites.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new `filterMetricsBasedOnExpTypeAndK8sObjectAndLayers` method largely duplicates the logic of `filterMetricsBasedOnExpTypeAndK8sObject`; consider refactoring to share the common experiment-type/kubernetes-object filtering to avoid future drift between the two.
- The `isROS` parameter on `filterMetricsBasedOnExpTypeAndK8sObjectAndLayers` is currently unused; either remove it or wire it into the filtering logic to keep the method signature aligned with its behavior.
- In the layer-based filtering, you access `containerData.getLayerMap()` and `containerData.getContainer_name()`; if there is any chance `containerData` can be null in this code path, consider a defensive null check to avoid potential NPEs when adding new call sites.

## Individual Comments

### Comment 1
<location path="src/main/java/com/autotune/analyzer/recommendations/engine/RecommendationEngine.java" line_range="1929" />
<code_context>
+            String maxDateQuery,
+            AnalyzerConstants.ExperimentType experimentType,
+            ContainerData containerData,
+            boolean isROS,
+            boolean fetchAcceleratorMetrics,
+            List<String> acceleratorFunctions,
</code_context>
<issue_to_address>
**suggestion:** Remove or use the `isROS` parameter in the new filter method to avoid confusing API surface.

The `filterMetricsBasedOnExpTypeAndK8sObjectAndLayers` method takes an `isROS` parameter but doesn’t use it. Since this is on the recommendation path and likely to be reused, an unused parameter is misleading. Either remove `isROS` if ROS-specific behavior is no longer needed here, or integrate it into the filter logic so the method’s contract matches its signature.

Suggested implementation:

```java
            AnalyzerConstants.ExperimentType experimentType,
            ContainerData containerData,
            boolean fetchAcceleratorMetrics,
            List<String> acceleratorFunctions,
            List<String> acceleratorPartitionFunctions) {

```

1. Update all call sites of `filterMetricsBasedOnExpTypeAndK8sObjectAndLayers` to remove the `isROS` argument and shift the remaining arguments accordingly.
2. If there is Javadoc or interface declarations elsewhere (e.g., in an interface or superclass) that include the `isROS` parameter for this method, remove that parameter there as well so the signatures stay consistent.
</issue_to_address>

### Comment 2
<location path="src/main/java/com/autotune/analyzer/recommendations/engine/RecommendationEngine.java" line_range="1972-1973" />
<code_context>
+if (layersRequired != null && !layersRequired.isEmpty()) {
+    // Check if any of the required layers are detected
+    boolean layerDetected = false;
+    for (String layer : layersRequired) {
+        if (containerData.getLayerMap() != null &&
+                containerData.getLayerMap().containsKey(layer)) {
+            layerDetected = true;
</code_context>
<issue_to_address>
**issue (bug_risk):** Guard against `containerData` being null when checking `layerMap`, or document the non-null contract.

In `filterMetricsBasedOnExpTypeAndK8sObjectAndLayers`, `containerData.getLayerMap()` is called without first checking whether `containerData` itself is null. If `containerData` can be null at any call site, this will cause an NPE during filtering. If it is guaranteed non-null, consider enforcing that via a precondition at method entry; otherwise, handle a null `containerData` case explicitly (e.g., by short-circuiting the layer filtering).
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

2 participants