Skip to content

NETOBSERV-2608: TLS metrics#2707

Open
jotak wants to merge 7 commits intonetobserv:mainfrom
jotak:tls-metrics
Open

NETOBSERV-2608: TLS metrics#2707
jotak wants to merge 7 commits intonetobserv:mainfrom
jotak:tls-metrics

Conversation

@jotak
Copy link
Copy Markdown
Member

@jotak jotak commented May 5, 2026

Description

Create specific metrics for TLS, counting flow, labelled by:

  • TLSVersion
  • TLSCipherSuite
  • TLSGroup

Add related charts in the monitoring dashboard

Add related otel mapping

Dependencies

Checklist

  • Does the changes in PR need specific configuration or environment set up for testing?
    • if so please describe it in PR description.
  • I have added thorough unit tests for the change.
  • QE requirements (check 1 from the list):
    • Standard QE validation, with pre-merge tests unless stated otherwise.
    • Regression tests only (e.g. refactoring with no user-facing change).
    • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Summary by CodeRabbit

  • New Features

    • Added TLS flow metrics for nodes, namespaces, and workloads.
    • Added a TLS section with charts (TLS traffic, per-version and per-group breakdowns).
  • UI

    • Renamed TLS field/column from "TLS Curve" to "TLS Group" in the console.
  • Documentation

    • Updated FlowCollector CRD and docs to include new TLS metrics.
  • Chores

    • Deployment flow updated to support PF5 console-plugin image handling.

Create specific metrics for TLS, counting flow, labelled by:
- TLSVersion
- TLSCipherSuite
- TLSGroup
- TLSTypes
- Proto
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 5, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot
Copy link
Copy Markdown
Collaborator

openshift-ci-robot commented May 5, 2026

@jotak: This pull request references NETOBSERV-2608 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Description

Create specific metrics for TLS, counting flow, labelled by:

  • TLSVersion
  • TLSCipherSuite
  • TLSGroup
  • TLSTypes
  • Proto

Dependencies

Checklist

  • Does the changes in PR need specific configuration or environment set up for testing?
    • if so please describe it in PR description.
  • I have added thorough unit tests for the change.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 5, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign mffiedler for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 5, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

Adds TLS flow metrics and TLSTracking gating, extends CRD/type enums and defaults, updates core metric definitions and TLS charts, adds OTEL TLS mappings, renames frontend TLSCurve→TLSGroup, enables TLSTracking in test specs, and extends Makefile/CSV handling for a PF5 console-plugin image variant.

Changes

Plugin Image PF5 Variant Support

Layer / File(s) Summary
Build Configuration
.mk/development.mk
set-plugin-image now sets RELATED_IMAGE_CONSOLE_PLUGIN_PF5 when CSV is empty and performs CSV swaps for console-plugin-pf5 / RELATED_IMAGE_CONSOLE_PLUGIN_PF5 when CSV is provided; updates static plugin image on redeploy.
Console Frontend Config
internal/controller/consoleplugin/config/static-frontend-config.yaml
Replaces TLSCurve with TLSGroup for UI columns, filters, and fields documentation.

TLS Metrics Feature

Layer / File(s) Summary
Type Definitions & Validation / Feature-gate
api/flowcollector/v1beta2/flowcollector_types.go, api/flowcollector/v1beta2/flowcollector_defaults.go, api/flowcollector/v1beta2/flowcollector_alert_types.go
Default include lists extended with TLS metrics; FLPMetric kubebuilder enum updated; GetIncludeList filters out _tls_ metrics when TLSTracking is disabled.
CRD & Manifests
config/crd/bases/flows.netobserv.io_flowcollectors.yaml, bundle/manifests/flows.netobserv.io_flowcollectors.yaml, helm/crds/flows.netobserv.io_flowcollectors.yaml
Added node_tls_flows_total, namespace_tls_flows_total, and workload_tls_flows_total to includeList / additionalIncludeList enums.
API Documentation
docs/FlowCollector.md
additionalIncludeList and adjacent include-list enums updated with TLS metric names.
Metrics Core Implementation
internal/pkg/metrics/predefined_metrics.go
Removed TLSVersion from baseline labels; added per-group RTT, drop, DNS, netpol, TLS, and IPsec metrics; refactored metric construction and adjusted label-removal logic.
Metrics Charts
internal/pkg/metrics/predefined_charts.go
Added tlsStatusChart() returning three TLS charts: TLS traffic (SingleStat), flows rate per TLS version (Line), and flows rate per TLS group (Line).
Observability Integration
internal/pkg/helper/otel/otel-config.json
Added OTEL transform mappings for TLSVersion, TLSGroup, TLSTypes, and TLSCipherSuite.
Tests & Utilities
internal/controller/flp/flp_pipeline_builder_test.go, internal/pkg/dashboards/dashboard_test.go, internal/pkg/metrics/predefined_metrics_test.go, internal/pkg/test/util/utils.go, internal/pkg/helper/otel/otel_config_test.go
Test expectations updated for TLS metrics and dashboards; TLSTracking added to test FlowCollector spec; OTEL transform test updated; minor test formatting changes.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • netobserv/netobserv-operator#2598: Both PRs modify the operator deployment/image pipeline to add and handle a PF5 console-plugin variant (introducing/using RELATED_IMAGE_CONSOLE_PLUGIN_PF5 and adding CSV/Makefile swaps for console-plugin-pf5).
  • netobserv/netobserv-operator#2546: Modifies the same FlowCollectorSpec.GetIncludeList function with overlapping metric-filtering changes.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'NETOBSERV-2608: TLS metrics' clearly describes the main change: adding TLS-related metrics functionality referenced by the Jira ticket.
Description check ✅ Passed The description covers the main changes (TLS metrics, charts, otel mapping) and includes dependencies and QE requirements. All required template sections are present and substantively filled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.40%. Comparing base (d483a69) to head (322576c).
⚠️ Report is 7 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2707      +/-   ##
==========================================
- Coverage   72.48%   72.40%   -0.09%     
==========================================
  Files         107      107              
  Lines       11482    11479       -3     
==========================================
- Hits         8323     8311      -12     
- Misses       2659     2667       +8     
- Partials      500      501       +1     
Flag Coverage Δ
unittests 72.40% <100.00%> (-0.09%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...flowcollector/v1beta2/flowcollector_alert_types.go 97.77% <100.00%> (+0.05%) ⬆️
api/flowcollector/v1beta2/flowcollector_types.go 100.00% <ø> (ø)
internal/pkg/metrics/predefined_charts.go 99.66% <100.00%> (+0.04%) ⬆️
internal/pkg/metrics/predefined_metrics.go 98.05% <100.00%> (-0.31%) ⬇️
internal/pkg/test/util/utils.go 100.00% <100.00%> (ø)

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jotak jotak marked this pull request as ready for review May 6, 2026 15:11
@jotak jotak requested a review from leandroberetta May 6, 2026 15:11
@Amoghrd
Copy link
Copy Markdown
Member

Amoghrd commented May 6, 2026

/ok-to-test

@openshift-ci openshift-ci Bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label May 6, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

New images:

quay.io/netobserv/network-observability-operator:cdee50e
quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-cdee50e
quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-cdee50e

They will expire in two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:cdee50e make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-cdee50e

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-cdee50e
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

leandroberetta
leandroberetta previously approved these changes May 7, 2026
Copy link
Copy Markdown
Contributor

@leandroberetta leandroberetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 7, 2026

New changes are detected. LGTM label has been removed.

@github-actions github-actions Bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label May 7, 2026
@jotak jotak added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label May 7, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

New images:

quay.io/netobserv/network-observability-operator:7583751
quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-7583751
quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-7583751

They will expire in two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:7583751 make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-7583751

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-7583751
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
internal/pkg/dashboards/dashboard_test.go (1)

90-170: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add an edge-case test for TLS chart generation.

The new TLS assertions cover happy-path titles only. Please add one test where TLS metrics are absent/empty and assert TLS panels are not generated (or are handled deterministically), so chart selection regressions are caught.

As per coding guidelines **/*_test.go: include tests for valid cases, invalid input, and edge cases (empty, nil values).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/pkg/dashboards/dashboard_test.go` around lines 90 - 170, The test
TestCreateFlowMetricsDashboard_DefaultList only asserts TLS happy-path titles;
add an edge-case unit test that simulates missing/empty TLS metrics by calling
CreateFlowMetricsDashboards with a metrics.Definitions set that has TLS metrics
removed (e.g., manipulate the result from
metrics.GetDefinitions(util.SpecForMetrics(), false) to clear TLS-related
entries) and then parse the dashboard JSON via FromBytes and assert that
d.FindRow("TLS") either returns nil/empty or that its Titles() is empty
(deterministic behavior), ensuring TLS panels are not generated when TLS metrics
are absent.
internal/pkg/metrics/predefined_metrics.go (1)

130-143: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use one family tag for both drop counters.

ignoreTags matching is exact. The packets metric uses "drops", but the bytes metric uses "drop", so disabling the drop family will still leave *_drop_bytes_total enabled.

Suggested fix
-			tags: []string{group, tagBytes, "drop"},
+			tags: []string{group, tagBytes, "drops"},
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/pkg/metrics/predefined_metrics.go` around lines 130 - 143, The bytes
drop metric added to predefinedMetrics uses an inconsistent family tag ("drop")
compared to the packets drop metric ("drops"), so ignoreTags won't match; update
the taggedMetricDefinition for the bytes metric (the entry constructing
metricslatest.FlowMetricSpec within predefinedMetrics) to use the same family
tag as the packets counter (make the tags slice contain "drops" instead of
"drop" or otherwise align both metrics to the same family tag) so disabling the
drop family via ignoreTags correctly disables both *_drop_bytes_total and
*_drops_* counters.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/pkg/metrics/predefined_charts.go`:
- Line 257: The TLS percentage denominator uses a hardcoded OR of
namespace/workload/node totals which yields the leftmost match in mixed-scope
environments; update tlsStatusChart() to accept an extra parameter (e.g.,
matchingTotalMetric string) and use that parameter in the PromQL denominator
instead of the OR expression, then update every call site that constructs TLS
metric charts to pass the appropriate matching total metric name
(namespace_flows_total, workload_flows_total or node_flows_total) so each
scope’s TLS numerator divides by its own total.

In `@internal/pkg/metrics/predefined_metrics.go`:
- Around line 24-26: The change removed the TLSVersion label from the shared
label slices (tagNodes, tagNamespaces, tagWorkloads) which silently mutates the
existing node_*, namespace_*, and workload_* metric schemas; revert or preserve
the legacy TLSVersion label in those slices to avoid breaking existing
PromQL/recording rules/dashboards, and if you need a TLS-specific breakdown
instead, add new metric names (or new dedicated label slices and metric
definitions) that include TLSVersion rather than removing it from
tagNodes/tagNamespaces/tagWorkloads so existing metrics remain backwards
compatible.

---

Outside diff comments:
In `@internal/pkg/dashboards/dashboard_test.go`:
- Around line 90-170: The test TestCreateFlowMetricsDashboard_DefaultList only
asserts TLS happy-path titles; add an edge-case unit test that simulates
missing/empty TLS metrics by calling CreateFlowMetricsDashboards with a
metrics.Definitions set that has TLS metrics removed (e.g., manipulate the
result from metrics.GetDefinitions(util.SpecForMetrics(), false) to clear
TLS-related entries) and then parse the dashboard JSON via FromBytes and assert
that d.FindRow("TLS") either returns nil/empty or that its Titles() is empty
(deterministic behavior), ensuring TLS panels are not generated when TLS metrics
are absent.

In `@internal/pkg/metrics/predefined_metrics.go`:
- Around line 130-143: The bytes drop metric added to predefinedMetrics uses an
inconsistent family tag ("drop") compared to the packets drop metric ("drops"),
so ignoreTags won't match; update the taggedMetricDefinition for the bytes
metric (the entry constructing metricslatest.FlowMetricSpec within
predefinedMetrics) to use the same family tag as the packets counter (make the
tags slice contain "drops" instead of "drop" or otherwise align both metrics to
the same family tag) so disabling the drop family via ignoreTags correctly
disables both *_drop_bytes_total and *_drops_* counters.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 56bc636b-716d-4e2b-b8b1-75834d8c073f

📥 Commits

Reviewing files that changed from the base of the PR and between d483a69 and 6327c8e.

📒 Files selected for processing (17)
  • .mk/development.mk
  • api/flowcollector/v1beta2/flowcollector_alert_types.go
  • api/flowcollector/v1beta2/flowcollector_defaults.go
  • api/flowcollector/v1beta2/flowcollector_types.go
  • bundle/manifests/flows.netobserv.io_flowcollectors.yaml
  • config/crd/bases/flows.netobserv.io_flowcollectors.yaml
  • docs/FlowCollector.md
  • helm/crds/flows.netobserv.io_flowcollectors.yaml
  • internal/controller/consoleplugin/config/static-frontend-config.yaml
  • internal/controller/flp/flp_pipeline_builder_test.go
  • internal/pkg/dashboards/dashboard_test.go
  • internal/pkg/helper/otel/otel-config.json
  • internal/pkg/helper/otel/otel_config_test.go
  • internal/pkg/metrics/predefined_charts.go
  • internal/pkg/metrics/predefined_metrics.go
  • internal/pkg/metrics/predefined_metrics_test.go
  • internal/pkg/test/util/utils.go

Unit: metricslatest.UnitPercent,
Queries: []metricslatest.Query{
{
PromQL: `sum(rate($METRIC[2m])) / (sum(rate(netobserv_namespace_flows_total[2m])) OR sum(rate(netobserv_workload_flows_total[2m])) OR sum(rate(netobserv_node_flows_total[2m])))`,
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify tlsStatusChart call sites and check whether mixed totals can coexist.
set -euo pipefail

echo "== tlsStatusChart call sites =="
rg -n --type=go '\btlsStatusChart\s*\('

echo
echo "== default include lists containing flow totals =="
rg -n --type=go '"(namespace_flows_total|workload_flows_total|node_flows_total|namespace_tls_flows_total|workload_tls_flows_total|node_tls_flows_total)"'

Repository: netobserv/netobserv-operator

Length of output: 5095


🏁 Script executed:

#!/bin/bash
# Examine the tlsStatusChart function and its call site context
echo "== tlsStatusChart function definition (lines 240-290) =="
sed -n '240,290p' internal/pkg/metrics/predefined_charts.go

echo
echo "== Call site in predefined_metrics.go (lines 200-220) =="
sed -n '200,220p' internal/pkg/metrics/predefined_metrics.go

Repository: netobserv/netobserv-operator

Length of output: 2429


🏁 Script executed:

#!/bin/bash
# Check how TLS metrics relate to flow metrics - are they paired?
echo "== TLS metric definitions =="
rg -A 3 'tls.*_total' internal/pkg/metrics/predefined_metrics.go | head -30

echo
echo "== How namespace/workload/node metrics are structured =="
rg -B 2 -A 5 'tlsStatusChart\(\)' internal/pkg/metrics/predefined_metrics.go

Repository: netobserv/netobserv-operator

Length of output: 661


🏁 Script executed:

#!/bin/bash
# Find where $METRIC gets substituted/replaced in the code
echo "== $METRIC substitution =="
rg -n '\$METRIC' internal/pkg/metrics/

echo
echo "== Check if tests mix multiple metric scopes =="
rg -B 2 -A 2 'GetDefinitions.*node.*namespace.*workload' internal/pkg/metrics/predefined_metrics_test.go

Repository: netobserv/netobserv-operator

Length of output: 3225


🏁 Script executed:

#!/bin/bash
# Look at how charts are used - does replacement happen?
echo "== Chart Query usage and replacement =="
rg -n 'Query.*PromQL\|PromQL.*\$METRIC' internal/pkg/metrics/ | head -20

echo
echo "== Monitor or dashboard code that processes these charts =="
rg -l 'Chart\|PromQL' internal/controller/monitoring/ internal/pkg/ | head -5

Repository: netobserv/netobserv-operator

Length of output: 173


🏁 Script executed:

#!/bin/bash
# Search for where charts are processed - look in monitoring controller
echo "== monitoring_controller.go - full scan for Chart handling =="
rg -n 'Chart|PromQL' internal/controller/monitoring/monitoring_controller.go | head -30

echo
echo "== Search for $METRIC replacement in codebase =="
rg -n 'Replace.*\$METRIC\|strings.Replace.*METRIC' internal/

echo
echo "== Look for Sprintf or formatting of PromQL =="
rg -n 'fmt.Sprintf.*PromQL\|Sprintf.*%' internal/pkg/metrics/ | head -20

Repository: netobserv/netobserv-operator

Length of output: 225


🏁 Script executed:

#!/bin/bash
# Check if there's a dashboard/query builder that handles substitution
echo "== Files that import or use predefined_charts =="
rg -l 'predefined_charts\|tlsStatusChart' internal/

echo
echo "== Check FlowMetricSpec usage and how charts are stored =="
rg -B 3 -A 3 'Charts:.*\[\]' internal/pkg/metrics/predefined_metrics.go | head -40

Repository: netobserv/netobserv-operator

Length of output: 181


🏁 Script executed:

#!/bin/bash
# Read the full tlsStatusChart function and surrounding context
echo "== Full tlsStatusChart and context (lines 246-286) =="
sed -n '246,286p' internal/pkg/metrics/predefined_charts.go

echo
echo "== Examine how metrics are built and charts assigned (lines 195-225) =="
sed -n '195,225p' internal/pkg/metrics/predefined_metrics.go

echo
echo "== Check MetricName vs Chart relationship =="
sed -n '200,215p' internal/pkg/metrics/predefined_metrics.go | cat -A

Repository: netobserv/netobserv-operator

Length of output: 3433


🏁 Script executed:

#!/bin/bash
# Look for where charts might be processed - check for any replacement/substitution
echo "== All files importing FlowMetricSpec or Chart types =="
fd -e go -x grep -l 'FlowMetricSpec\|metricslatest\.Chart' {} \; 2>/dev/null | head -10

echo
echo "== Search for where metric definitions are consumed =="
rg 'GetDefinitions\|predefinedMetrics' --type go -l

Repository: netobserv/netobserv-operator

Length of output: 636


🏁 Script executed:

#!/bin/bash
# Check dashboard.go to see how charts are rendered/processed
echo "== dashboard.go - search for METRIC substitution or chart processing =="
grep -n 'METRIC\|PromQL\|Chart' internal/pkg/dashboards/dashboard.go | head -40

echo
echo "== helper.go - might have utilities for chart/metric processing =="
cat -n internal/pkg/metrics/helper.go | head -50

Repository: netobserv/netobserv-operator

Length of output: 1518


🏁 Script executed:

#!/bin/bash
# Check if MetricName from FlowMetricSpec connects to chart rendering
echo "== Examine dashboard.go structure =="
head -100 internal/pkg/dashboards/dashboard.go | tail -80

echo
echo "== Search for where $METRIC or placeholders are replaced =="
rg -C 3 'strings\.Replace|Sprintf.*METRIC|METRIC.*Replace' internal/pkg/dashboards/

Repository: netobserv/netobserv-operator

Length of output: 4098


TLS traffic percentage will be incorrect in mixed-scope configurations.

The denominator at line 257 is hardcoded to (netobserv_namespace_flows_total OR netobserv_workload_flows_total OR netobserv_node_flows_total), but the numerator is dynamically substituted per metric. When multiple scopes exist, PromQL OR returns the leftmost match, causing the workload and node TLS charts to divide by namespace totals instead of their own scope.

Pass the matching flow metric name to tlsStatusChart() and use it in the denominator instead of hardcoded OR logic:

Proposed direction
-func tlsStatusChart() []metricslatest.Chart {
+func tlsStatusChart(totalFlowsMetric string) []metricslatest.Chart {
 ...
-    PromQL: `sum(rate($METRIC[2m])) / (sum(rate(netobserv_namespace_flows_total[2m])) OR sum(rate(netobserv_workload_flows_total[2m])) OR sum(rate(netobserv_node_flows_total[2m])))`,
+    PromQL: fmt.Sprintf(`sum(rate($METRIC[2m])) / sum(rate(%s[2m]))`, totalFlowsMetric),
 ...
}

Then pass the matching total metric (namespace_flows_total, workload_flows_total, or node_flows_total) from each TLS metric definition call site.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/pkg/metrics/predefined_charts.go` at line 257, The TLS percentage
denominator uses a hardcoded OR of namespace/workload/node totals which yields
the leftmost match in mixed-scope environments; update tlsStatusChart() to
accept an extra parameter (e.g., matchingTotalMetric string) and use that
parameter in the PromQL denominator instead of the OR expression, then update
every call site that constructs TLS metric charts to pass the appropriate
matching total metric name (namespace_flows_total, workload_flows_total or
node_flows_total) so each scope’s TLS numerator divides by its own total.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the namespace/workload/node variants differ by their aggregation keys but they should all be identical when not aggregated by key. That's what we do here: sum(...) and not sum by (...)(...). Their sum is expected to be equal.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jotak, you're right. Since the query uses sum(...) with no by (...) clause, the result collapses to a global scalar — and all three metrics represent the same underlying flows just with different label dimensions, so their fully-aggregated sums are equal. The OR is just a safety net for whichever metric happens to be present. My concern was mistaken.

Comment thread internal/pkg/metrics/predefined_metrics.go
The metrics tagging system is a leftover from old API, and is dead code
since that API was removed.
@github-actions github-actions Bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label May 7, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/pkg/metrics/predefined_metrics.go`:
- Around line 162-176: The tlsStatusChart helper is missing the group argument
so all TLS metrics use identical chart variants; update the loop that builds
FlowMetricSpec (using group, groupTrimmed, labels, tlsLabels, predefinedMetrics)
to call tlsStatusChart(group) instead of tlsStatusChart() so chartVariantsFor
can generate group-specific variants for tagNodes/tagNamespaces/tagWorkloads
(ensuring namespace/workload "infra"/"app" and node-specific configs are
produced).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: dc3fb3b9-5707-4b36-88be-3fc21f6d257f

📥 Commits

Reviewing files that changed from the base of the PR and between 6327c8e and 322576c.

📒 Files selected for processing (2)
  • internal/pkg/metrics/predefined_metrics.go
  • internal/pkg/metrics/predefined_metrics_test.go
💤 Files with no reviewable changes (1)
  • internal/pkg/metrics/predefined_metrics_test.go

Comment on lines +162 to +176
// TLS
for _, group := range []string{tagNodes, tagNamespaces, tagWorkloads} {
groupTrimmed := strings.TrimSuffix(group, "s")
labels := mapLabels[group]
tlsLabels := labels
tlsLabels = append(tlsLabels, "TLSVersion", "TLSCipherSuite", "TLSGroup")
predefinedMetrics = append(predefinedMetrics, metricslatest.FlowMetricSpec{
MetricName: fmt.Sprintf("%s_tls_flows_total", groupTrimmed),
Type: metricslatest.CounterMetric,
Help: fmt.Sprintf("Total TLS flows per %s", groupTrimmed),
Filters: []metricslatest.MetricFilter{{Field: "TLSTypes", MatchType: metricslatest.MatchPresence}},
Labels: tlsLabels,
Charts: tlsStatusChart(),
})
}
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find the tlsStatusChart and ipsecStatusChart definitions and compare signatures
rg -n "func tlsStatusChart\|func ipsecStatusChart" --type go -A 5

Repository: netobserv/netobserv-operator

Length of output: 54


🏁 Script executed:

#!/bin/bash
# Try broader search patterns for chart functions
echo "=== Searching for tlsStatusChart definition ==="
rg -n "tlsStatusChart" --type go -A 3 -B 1

echo ""
echo "=== Searching for ipsecStatusChart definition ==="
rg -n "ipsecStatusChart" --type go -A 3 -B 1

echo ""
echo "=== Searching for other chart helpers ==="
rg -n "rttCharts|dropCharts|dnsCharts|netpolCharts" --type go -A 2 -B 1

Repository: netobserv/netobserv-operator

Length of output: 4157


🏁 Script executed:

#!/bin/bash
# Check the tlsStatusChart implementation
echo "=== tlsStatusChart implementation ==="
sed -n '246,280p' internal/pkg/metrics/predefined_charts.go

echo ""
echo "=== ipsecStatusChart implementation (for comparison) ==="
sed -n '218,245p' internal/pkg/metrics/predefined_charts.go

Repository: netobserv/netobserv-operator

Length of output: 2181


🏁 Script executed:

#!/bin/bash
# Check what chartVariantsFor does and how it uses the group parameter
echo "=== Search for chartVariantsFor definition ==="
rg -n "func chartVariantsFor" --type go -A 10

Repository: netobserv/netobserv-operator

Length of output: 1182


🏁 Script executed:

#!/bin/bash
# Get the full chartVariantsFor function
sed -n '288,330p' internal/pkg/metrics/predefined_charts.go

Repository: netobserv/netobserv-operator

Length of output: 1666


Pass group to tlsStatusChart() to generate proper chart variants for each metric type.

tlsStatusChart() is the only chart helper lacking a group argument. All others—rttCharts(group), dropCharts(group, ...), dnsCharts(group), netpolCharts(group), ipsecStatusChart(group)—receive it so chartVariantsFor() can generate group-specific variants (namespace/workload "infra" and "app" layers, node-specific configs). Without it, all three group types get identical charts, breaking dashboard configurations for namespace and workload metrics.

Diff
-		Charts:     tlsStatusChart(),
+		Charts:     tlsStatusChart(group),
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/pkg/metrics/predefined_metrics.go` around lines 162 - 176, The
tlsStatusChart helper is missing the group argument so all TLS metrics use
identical chart variants; update the loop that builds FlowMetricSpec (using
group, groupTrimmed, labels, tlsLabels, predefinedMetrics) to call
tlsStatusChart(group) instead of tlsStatusChart() so chartVariantsFor can
generate group-specific variants for tagNodes/tagNamespaces/tagWorkloads
(ensuring namespace/workload "infra"/"app" and node-specific configs are
produced).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not doing so because the charts are not displayed per group in this case (yes, it's a new pattern)
Charts are deduplicated in createFlowMetricsDashboard so it won't create multiple instances of the same

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jotak Got it — the TLS charts are intentionally group-agnostic, relying on deduplication in createFlowMetricsDashboard rather than per-group variants. Makes sense as a new pattern.

( ˙ ɥ ˙ ) ~ learning noted


✏️ Learnings added
Learnt from: jotak
Repo: netobserv/netobserv-operator PR: 2707
File: internal/pkg/metrics/predefined_metrics.go:162-176
Timestamp: 2026-05-07T12:45:12.308Z
Learning: In `internal/pkg/metrics/predefined_metrics.go` (netobserv/netobserv-operator), `tlsStatusChart()` intentionally takes no `group` argument, unlike other chart helpers (e.g. `ipsecStatusChart(group)`, `rttCharts(group)`). TLS charts are not displayed per group; deduplication in `createFlowMetricsDashboard` prevents duplicate chart instances, so passing `group` is unnecessary and undesirable for this metric type.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

@jotak jotak added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label May 7, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

New images:

quay.io/netobserv/network-observability-operator:cfbe85d
quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-cfbe85d
quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-cfbe85d

They will expire in two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:cfbe85d make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-cfbe85d

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-cfbe85d
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

@Amoghrd Amoghrd added ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. and removed ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. labels May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference ok-to-test To set manually when a PR is safe to test. Triggers image build on PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants