Skip to content

Add network observability e2e tests#31342

Open
kapjain-rh wants to merge 5 commits into
openshift:mainfrom
kapjain-rh:netobserv
Open

Add network observability e2e tests#31342
kapjain-rh wants to merge 5 commits into
openshift:mainfrom
kapjain-rh:netobserv

Conversation

@kapjain-rh

@kapjain-rh kapjain-rh commented Jun 25, 2026

Copy link
Copy Markdown

Summary

  • Adds extended tests for Network Observability operator health and flow data verification
  • Includes a single-node guard test that validates NetObserv is not installed on SNO clusters
  • Adds a comprehensive health check test that verifies all NetObserv components (operator, FLP, eBPF agents, console plugin), monitoring resources (ServiceMonitors,
    PrometheusRules), and end-to-end flow data pipeline from FLP metrics to Prometheus ingestion

Test plan

  • [sig-network][Feature:NetObserv] should not be installed on single node clusters — skips on multi-node, validates no operator namespace/CRD on SNO
  • [sig-network][Feature:NetObserv] should have all components healthy and producing flow data — verified passing on a live cluster with NetObserv installed (all

Summary by CodeRabbit

Summary

  • Tests
    • Added new end-to-end coverage for network observability for both single-node and standard cluster setups.
    • Validates expected namespaces/resources, FlowCollector readiness, key workloads/agents, and monitoring resources.
    • Includes operator health checks via recent logs and verifies telemetry end-to-end by confirming flow metrics are produced and queryable in-cluster with non-zero samples.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@openshift-ci openshift-ci Bot requested a review from miheer June 25, 2026 22:57
@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: cdd03cfe-0c08-48c2-a40a-09564c7e9542

📥 Commits

Reviewing files that changed from the base of the PR and between 72d32c7 and 70db7d9.

📒 Files selected for processing (1)
  • test/extended/networking/network_observability.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/extended/networking/network_observability.go

Walkthrough

Adds a new Ginkgo e2e suite for NetObserv that checks single-node absence cases and, on supported clusters, validates readiness, monitoring resources, logs, and metric exposure.

Changes

NetObserv observability test suite

Layer / File(s) Summary
Suite scaffold and single-node checks
test/extended/networking/network_observability.go
Declares the package, shared constants, condition parsing, and the single-node absence checks for namespaces and the CRD.
Cluster health and readiness checks
test/extended/networking/network_observability.go
Skips unsupported environments and validates FlowCollector status, operator pod state, FLP pod state, and eBPF DaemonSet and pod readiness.
Plugin, logs, and monitoring checks
test/extended/networking/network_observability.go
Validates optional console plugin pods and resource naming, scans operator logs for error entries, and counts monitoring resources in the NetObserv namespaces.
Metrics validation
test/extended/networking/network_observability.go
Reads the FLP metrics endpoint, queries Prometheus for netobserv_ingest_flows_processed, and requires non-zero values from both paths.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 13 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning The suite has one-shot readiness checks and treats console-plugin API errors as skip, so it can hide real failures and flake on transient cluster state. Use Eventually with timeouts for FlowCollector/operator readiness, fail on console-plugin list errors, and add messages to the bare topology assertion.
✅ Passed checks (13 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding network observability end-to-end tests.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All 12 Ginkgo titles in network_observability.go are static strings; none embed generated names, timestamps, nodes, namespaces, or other dynamic data.
Microshift Test Compatibility ✅ Passed All new tests are either core-API checks or are guarded by MicroShift skips/apigroup tags before using unsupported OpenShift/monitoring resources.
Single Node Openshift (Sno) Test Compatibility ✅ Passed The new NetObserv tests explicitly skip single-node clusters via exutil.IsSingleNode() and only run health checks on non-SNO clusters.
Topology-Aware Scheduling Compatibility ✅ Passed Only an e2e test file was added; it checks cluster topology but introduces no manifests, controllers, or scheduling constraints.
Ote Binary Stdout Contract ✅ Passed The added file contains only Ginkgo test bodies and a harmless package-scope Describe; no main/init/TestMain/BeforeSuite/RunSpecs stdout writes or klog-to-stdout were found.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No hardcoded IPv4, IP-family parsing, or external/public connectivity found; the test uses cluster-internal APIs and localhost only.
No-Weak-Crypto ✅ Passed New file has no MD5/SHA1/DES/RC4/3DES/Blowfish/ECB usage, no crypto imports, and no secret/token comparisons.
Container-Privileges ✅ Passed The added NetObserv test file contains no privileged/securityContext/host* fields or embedded pod manifests; only test logic was added.
No-Sensitive-Data-In-Logs ✅ Passed The added test only logs generic resource names, phases, counts, and numeric metrics; it does not emit secrets, tokens, PII, or raw sensitive command output.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@openshift-ci openshift-ci Bot requested a review from pperiyasamy June 25, 2026 22:57
@openshift-ci

openshift-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kapjain-rh
Once this PR has been reviewed and has the lgtm label, please assign kyrtapz for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/extended/networking/network_observability.go`:
- Around line 41-55: The SNO absence checks in the namespace lookups and
FlowCollector CRD check are too broad because they treat any error as success.
Update the assertions around the namespace Get calls and the
oc.AsAdmin().WithoutNamespace().Run("get").Args("crd",
"flowcollectors.flows.netobserv.io") path to only pass when the error is
specifically NotFound, using apierrors.IsNotFound(err), and fail for all other
errors so the test only confirms the resources are truly absent.
- Around line 58-63: Skip the health spec on single-node clusters by adding the
same IsSingleNode guard used in the earlier NetObserv test before the checks in
the flow data health case. In the test that starts with g.It("should have all
components healthy and producing flow data"), call g.Skip when IsSingleNode
returns true so the namespace and FlowCollector assertions are never run on SNO.
Use the existing IsSingleNode helper and the health spec block to locate the
change.
- Around line 277-290: The Prometheus check in the network observability test
only verifies that a time series exists, not that it contains a non-zero sample.
Update the parsing logic in the Prometheus response handling to extract the
returned sample value from the result payload, then assert that the value is
greater than zero instead of only checking len(result.Data.Result). Keep the
change localized to the promResult handling and the Should(o.BeTrue()) predicate
so the test proves flow ingestion actually produced data.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: ba6bd409-331b-4935-b969-c585e33c4f4c

📥 Commits

Reviewing files that changed from the base of the PR and between 817fa8a and 972ddda.

📒 Files selected for processing (1)
  • test/extended/networking/network_observability.go

Comment thread test/extended/networking/network_observability.go Outdated
Comment thread test/extended/networking/network_observability.go Outdated
Comment thread test/extended/networking/network_observability.go Outdated
@openshift-ci openshift-ci Bot added the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label Jun 25, 2026
@kapjain-rh

Copy link
Copy Markdown
Author

/test verify

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@kapjain-rh

Copy link
Copy Markdown
Author

/retest

@openshift-ci

openshift-ci Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

@kapjain-rh: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-fips 70db7d9 link true /test e2e-aws-ovn-fips

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-trt

openshift-trt Bot commented Jun 27, 2026

Copy link
Copy Markdown

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: 70db7d9

Job Name New Test Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-fips High - "[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks should complete all checks within 60 seconds" is a new test that was not present in all runs against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-fips High - "[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks should produce valid JSON that round-trips" is a new test that was not present in all runs against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-fips High - "[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks should report PDB count matching actual PodDisruptionBudgets" is a new test that was not present in all runs against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-fips High - "[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks should report cluster conditions matching ClusterVersion status" is a new test that was not present in all runs against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-fips High - "[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks should report etcd member count matching actual etcd pods" is a new test that was not present in all runs against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-fips High - "[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks should report network type matching actual Network config" is a new test that was not present in all runs against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-fips High - "[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks should report node count matching the actual cluster" is a new test that was not present in all runs against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-fips High - "[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks should report operator count matching actual ClusterOperators" is a new test that was not present in all runs against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-fips High - "[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks should run all checks without errors" is a new test that was not present in all runs against the current commit.
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6 High - "[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks should complete all checks within 60 seconds" is a new test that was not present in all runs against the current commit.
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6 High - "[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks should produce valid JSON that round-trips" is a new test that was not present in all runs against the current commit.
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6 High - "[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks should report PDB count matching actual PodDisruptionBudgets" is a new test that was not present in all runs against the current commit.
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6 High - "[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks should report cluster conditions matching ClusterVersion status" is a new test that was not present in all runs against the current commit.
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6 High - "[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks should report etcd member count matching actual etcd pods" is a new test that was not present in all runs against the current commit.
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6 High - "[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks should report network type matching actual Network config" is a new test that was not present in all runs against the current commit.
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6 High - "[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks should report node count matching the actual cluster" is a new test that was not present in all runs against the current commit.
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6 High - "[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks should report operator count matching actual ClusterOperators" is a new test that was not present in all runs against the current commit.
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6 High - "[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks should run all checks without errors" is a new test that was not present in all runs against the current commit.

New tests seen in this PR at sha: 70db7d9

  • "[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks should complete all checks within 60 seconds" [Total: 4, Pass: 4, Fail: 0, Flake: 0]
  • "[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks should produce valid JSON that round-trips" [Total: 4, Pass: 4, Fail: 0, Flake: 0]
  • "[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks should report PDB count matching actual PodDisruptionBudgets" [Total: 4, Pass: 4, Fail: 0, Flake: 0]
  • "[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks should report cluster conditions matching ClusterVersion status" [Total: 4, Pass: 4, Fail: 0, Flake: 0]
  • "[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks should report etcd member count matching actual etcd pods" [Total: 4, Pass: 4, Fail: 0, Flake: 0]
  • "[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks should report network type matching actual Network config" [Total: 4, Pass: 4, Fail: 0, Flake: 0]
  • "[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks should report node count matching the actual cluster" [Total: 4, Pass: 4, Fail: 0, Flake: 0]
  • "[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks should report operator count matching actual ClusterOperators" [Total: 4, Pass: 4, Fail: 0, Flake: 0]
  • "[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks should run all checks without errors" [Total: 4, Pass: 4, Fail: 0, Flake: 0]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant