Skip to content

CNTRLPLANE-3271: add External OIDC e2e tests for v2 framework#8674

Draft
bryan-cox wants to merge 1 commit into
openshift:mainfrom
bryan-cox:CNTRLPLANE-3271
Draft

CNTRLPLANE-3271: add External OIDC e2e tests for v2 framework#8674
bryan-cox wants to merge 1 commit into
openshift:mainfrom
bryan-cox:CNTRLPLANE-3271

Conversation

@bryan-cox
Copy link
Copy Markdown
Member

@bryan-cox bryan-cox commented Jun 4, 2026

What this PR does / why we need it:

Adds automated e2e testing for External OIDC support on self-managed Azure HCP using the v2 test framework. The test deploys Keycloak on the management cluster as a platform-agnostic OIDC provider, creates a new external-oidc cluster variant, and validates the resulting control plane state.

Test areas (11 assertions across 4 contexts):

  • Cluster OIDC Configuration — HC has authentication.type: OIDC and is Available
  • OAuth Server Not Deployed — No oauth-openshift or openshift-oauth-apiserver deployments in the control plane namespace
  • KAS Authentication Configurationauth-config ConfigMap has JWT authenticator matching HC spec; no OAuth webhook
  • Keycloak Authentication & Claims — Token via password grant authenticates to KAS; SelfSubjectReview returns correct username, groups (with prefix), and UID claim mappings

Platform-agnostic design: Keycloak deployment utility and test assertions are reusable by any platform. Only the Azure lifecycle integration is platform-specific. Other platforms can add an external-oidc variant to their lifecycle config with zero test code changes.

Azure AD excluded from automation: Device code flow requires interactive browser sign-in. Manually validated in CNTRLPLANE-3462. Both providers exercise the same KAS JWT authenticator and CPO reconciliation code paths.

Which issue(s) this PR fixes:

Fixes https://issues.redhat.com/browse/CNTRLPLANE-3271

Special notes for your reviewer:

  • All tests follow v2 AGENTS.md conventions (18 rules verified)
  • The *testing.Ttesting.TB signature widening in test/e2e/util/external_oidc.go is backward compatible — existing v1 callers are unaffected
  • Tests skip automatically on non-OIDC clusters via BeforeEach + Skip() on authentication type
  • The Keycloak deployment in PostCreate() is adapted from the CI step registry idp/external-oidc/keycloak/server/

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Support for External OIDC authentication in hosted clusters.
    • Keycloak deployment and management utilities for OIDC testing.
  • Tests

    • New E2E v2 test suite for External OIDC hosted clusters (validation of OIDC config, OAuth components absence, KAS auth settings, and Keycloak token auth).
    • Extended Azure E2E lifecycle to include an External OIDC cluster variant.
  • Chores

    • Registered env vars for External OIDC CA bundle and test users.
    • Broadened E2E helper interfaces to accept a more general testing context.
    • Registered additional API types for test tooling.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 4, 2026
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 4, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Jun 4, 2026

@bryan-cox: This pull request references CNTRLPLANE-3271 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

What this PR does / why we need it:

Adds automated e2e testing for External OIDC support on self-managed Azure HCP using the v2 test framework. The test deploys Keycloak on the management cluster as a platform-agnostic OIDC provider, creates a new external-oidc cluster variant, and validates the resulting control plane state.

Test areas (11 assertions across 4 contexts):

  • Cluster OIDC Configuration — HC has authentication.type: OIDC and is Available
  • OAuth Server Not Deployed — No oauth-openshift or openshift-oauth-apiserver deployments in the control plane namespace
  • KAS Authentication Configurationauth-config ConfigMap has JWT authenticator matching HC spec; no OAuth webhook
  • Keycloak Authentication & Claims — Token via password grant authenticates to KAS; SelfSubjectReview returns correct username, groups (with prefix), and UID claim mappings

Platform-agnostic design: Keycloak deployment utility and test assertions are reusable by any platform. Only the Azure lifecycle integration is platform-specific. Other platforms can add an external-oidc variant to their lifecycle config with zero test code changes.

Azure AD excluded from automation: Device code flow requires interactive browser sign-in. Manually validated in CNTRLPLANE-3462. Both providers exercise the same KAS JWT authenticator and CPO reconciliation code paths.

Which issue(s) this PR fixes:

Fixes https://issues.redhat.com/browse/CNTRLPLANE-3271

Special notes for your reviewer:

  • All tests follow v2 AGENTS.md conventions (18 rules verified)
  • The *testing.Ttesting.TB signature widening in test/e2e/util/external_oidc.go is backward compatible — existing v1 callers are unaffected
  • Tests skip automatically on non-OIDC clusters via BeforeEach + Skip() on authentication type
  • The Keycloak deployment in PostCreate() is adapted from the CI step registry idp/external-oidc/keycloak/server/

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 4, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 4, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds end-to-end External OIDC E2E support: broadens four exported test helper signatures to testing.TB, registers two new env vars, adds Keycloak deploy/cleanup utilities (CA extraction and OIDC discovery verification), wires Azure lifecycle to deploy Keycloak and patch HostedCluster auth, and adds a new e2ev2 test suite validating KAS/auth plumbing and Keycloak-based authentication.

Sequence Diagram(s)

sequenceDiagram
  participant TestRunner
  participant AzureLifecycle
  participant KubernetesAPI
  participant Keycloak
  participant HostedClusterAPI

  TestRunner->>AzureLifecycle: request create external-oidc cluster
  AzureLifecycle->>KubernetesAPI: create Keycloak namespace + resources (ConfigMap, Service, StatefulSet, Route)
  KubernetesAPI->>Keycloak: start StatefulSet pods
  Keycloak->>AzureLifecycle: ready (StatefulSet ready)
  AzureLifecycle->>KubernetesAPI: create/update CA ConfigMap and console Secret
  AzureLifecycle->>HostedClusterAPI: patch HostedCluster authentication with ExtOIDCConfig
  TestRunner->>Keycloak: obtain test token (via issuer URL)
  TestRunner->>HostedClusterAPI: SelfSubjectReview using Keycloak token
  HostedClusterAPI->>TestRunner: authentication response (username, groups, uid)
Loading

Possibly related PRs

Suggested reviewers

  • enxebre
  • devguyio
  • clebs
🚥 Pre-merge checks | ✅ 9 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Test Structure And Quality ⚠️ Warning Cluster-scoped Keycloak resources created in PostCreate without cleanup; ~48% of assertions lack meaningful failure messages; some tests validate multiple unrelated behaviors. Add CleanupKeycloak call in PostCreate/teardown; add failure messages to all Expect statements matching patterns like "expected OAuth deployment to not exist in %s".
Ipv6 And Disconnected Network Test Compatibility ⚠️ Warning Test pulls Keycloak image from quay.io/keycloak/keycloak:26.0 and verifies OIDC endpoint via external HTTPS, incompatible with disconnected environments. Add [Skipped:Disconnected] label to tests or mirror Keycloak image to internal registry for disconnected/IPv6-only CI environments.
✅ Passed checks (9 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main objective: adding External OIDC end-to-end tests for the v2 testing framework, with a specific Jira ticket reference.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All 11 test names in hosted_cluster_external_oidc_test.go are static and deterministic. No dynamic values, timestamps, UUIDs, or pod/namespace names appear in any test titles.
Topology-Aware Scheduling Compatibility ✅ Passed PR only modifies E2E test code. Keycloak deployment has 1 replica, no affinity/topology constraints, no nodeSelectors. Compatible with all OpenShift topologies (SNO, TNF, TNA, HA, HyperShift).
No-Weak-Crypto ✅ Passed No weak cryptographic algorithms, custom crypto implementations, or insecure comparisons found. Code properly uses crypto/rand, x509 certificate pools, and TLS 1.2+ with verification enabled.
Container-Privileges ✅ Passed No privileged containers, host namespace access, SYS_ADMIN capabilities, or allowPrivilegeEscalation flags found in any container or Kubernetes manifest definitions across all modified files.
No-Sensitive-Data-In-Logs ✅ Passed PR contains no logging of sensitive data (passwords, tokens, API keys, PII, session IDs, credentials, or customer data). Only public OIDC endpoints, cluster names, and test usernames are logged.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 4, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bryan-cox

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. area/platform/azure PR/issue for Azure (AzurePlatform) platform area/testing Indicates the PR includes changes for e2e testing and removed do-not-merge/needs-area labels Jun 4, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 4, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 41.43%. Comparing base (38f7b17) to head (6d5b904).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #8674   +/-   ##
=======================================
  Coverage   41.43%   41.43%           
=======================================
  Files         756      756           
  Lines       93647    93647           
=======================================
  Hits        38802    38802           
  Misses      52124    52124           
  Partials     2721     2721           
Flag Coverage Δ
cmd-support 34.87% <ø> (ø)
cpo-hostedcontrolplane 43.50% <ø> (ø)
cpo-other 42.74% <ø> (ø)
hypershift-operator 51.57% <ø> (ø)
other 31.64% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/v2/internal/env_vars.go`:
- Around line 222-225: The env var E2E_EXTERNAL_OIDC_TEST_USERS registered via
RegisterEnvVar currently holds raw user:password pairs and can be printed by
PrintEnvVarHelp()/SetupTestEnv() when E2E_SHOW_ENV_HELP is enabled; change this
to avoid exposing secrets by either (A) converting the registration to accept a
file path (e.g., "path to file containing user:password lines") and update any
consumers that read E2E_EXTERNAL_OIDC_TEST_USERS to read that file, or (B) mark
the variable as sensitive in the masking logic used by
PrintEnvVarHelp()/RegisterEnvVar so it is redacted (add
"external_oidc_test_users" or the exact var name to the
secret/password/token/key detection or a dedicated sensitive flag on
RegisterEnvVar) and ensure SetupTestEnv exports it masked. Reference
RegisterEnvVar, E2E_EXTERNAL_OIDC_TEST_USERS, PrintEnvVarHelp, and SetupTestEnv
when making the change.

In `@test/e2e/v2/lifecycle/azure.go`:
- Around line 377-384: The env setup silently ignores os.Setenv errors for the
external OIDC variables; in the SetupTestEnv flow check the return values of
os.Setenv when setting "E2E_EXTERNAL_OIDC_CA_BUNDLE_FILE" and
"E2E_EXTERNAL_OIDC_TEST_USERS" and fail fast (return an error or call t.Fatalf /
log.Fatalf as appropriate) with contextual messages including the filepath and
the original error; update the code around the caPath/file read blocks to handle
and propagate Setenv errors instead of ignoring them.

In `@test/e2e/v2/tests/hosted_cluster_external_oidc_test.go`:
- Around line 60-69: The skipIfNotOIDC helper only checks Authentication.Type
but doesn't guard that Authentication.OIDCProviders exists or has at least one
element, which causes later OIDCProviders[0] accesses to panic; update
skipIfNotOIDC (in test/e2e/v2/tests/hosted_cluster_external_oidc_test.go) to
also check hc.Spec.Configuration.Authentication.OIDCProviders is non-nil and
len(...) > 0 and, if the slice is empty, call Fail (or Failf) with a clear
diagnostic like "External OIDC tests require at least one OIDC provider
configured" instead of Skip so tests fail with a helpful message rather than
panicking. Ensure you reference the Authentication and OIDCProviders fields when
adding the guard.

In `@test/e2e/v2/util/keycloak.go`:
- Line 45: The keycloakImage constant is pinned to
"quay.io/keycloak/keycloak:latest", which makes e2e runs non-reproducible;
update the keycloakImage variable to a specific, tested Keycloak image tag
(e.g., a stable release like 21.x or the exact version you validated), replace
":latest" with that pinned tag in the keycloakImage declaration, and add a short
comment noting why it’s pinned so future changes are explicit (ensure any
related startup script expectations in setup.sh and the readiness probe against
/realms/master are compatible with the chosen version).
- Around line 460-475: The verifyOIDCEndpoint function currently disables TLS
verification by setting tls.Config{InsecureSkipVerify: true}; instead build a
tls.Config that validates the issuer certificate chain using the provided
caBundle: create a x509.NewCertPool(), call AppendCertsFromPEM(caBundle) and set
tls.Config.RootCAs to that pool (remove InsecureSkipVerify), then use that
tls.Config in the http.Transport; keep a fallback only if AppendCertsFromPEM
fails but prefer failing the check and surfacing the error so the OIDC discovery
validates against the router CA.
- Around line 417-440: The getKeycloakIssuerURL function reads the Route once
and can return before spec.host or status.ingress[].host is populated; change it
to poll until a host is available (or a timeout/context deadline is reached). In
getKeycloakIssuerURL use a retry loop (e.g., wait.PollImmediate or simple
time.Ticker) that repeatedly calls client.Get for the Route
(keycloakRouteName/keycloakNamespace), each iteration checking route.Spec.Host
and falling back to route.Status.Ingress[*].Host, and only return the formatted
issuer URL once a non-empty host is found; if the loop times out return a clear
error indicating the host was never assigned. Ensure you reference
getKeycloakIssuerURL, keycloakRouteName, keycloakNamespace and defaultRealm in
the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 1439d0d5-16ee-429f-b78d-8f4b49691f60

📥 Commits

Reviewing files that changed from the base of the PR and between 38f7b17 and ecf40c1.

📒 Files selected for processing (5)
  • test/e2e/util/external_oidc.go
  • test/e2e/v2/internal/env_vars.go
  • test/e2e/v2/lifecycle/azure.go
  • test/e2e/v2/tests/hosted_cluster_external_oidc_test.go
  • test/e2e/v2/util/keycloak.go

Comment thread test/e2e/v2/internal/env_vars.go
Comment thread test/e2e/v2/lifecycle/azure.go
Comment thread test/e2e/v2/tests/hosted_cluster_external_oidc_test.go
Comment thread test/e2e/v2/util/keycloak.go Outdated
Comment thread test/e2e/v2/util/keycloak.go Outdated
Comment thread test/e2e/v2/util/keycloak.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/e2e/v2/util/keycloak.go (1)

157-177: 💤 Low value

Consider using json.Marshal for JSON construction.

The JSON strings are built via string concatenation, which could break if consoleRedirectURI contains characters requiring JSON escaping (e.g., quotes, backslashes). While the current caller passes a controlled URL, using json.Marshal would be more robust.

Example approach
type clientConfig struct {
    ClientID    string   `json:"clientId"`
    Enabled     bool     `json:"enabled"`
    // ... other fields
}
cfg := clientConfig{ClientID: defaultConsoleClientID, ...}
jsonBytes, _ := json.Marshal(cfg)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/v2/util/keycloak.go` around lines 157 - 177, In buildSetupConfigMap
replace the hand-assembled JSON string concatenation for cliClientJSON and
consoleClientJSON with strongly-typed structs (e.g., clientConfig with fields
ClientID, Enabled, PublicClient, Secret, RedirectUris,
DirectAccessGrantsEnabled, DefaultClientScopes, Attributes) and use json.Marshal
to produce the JSON strings so values like consoleSecret and consoleRedirectURI
are properly escaped; populate the Attributes map for "access.token.lifespan"
and set RedirectUris as a []string, marshal to []byte and convert to string to
assign to the ConfigMap data entries.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/e2e/v2/util/keycloak.go`:
- Around line 157-177: In buildSetupConfigMap replace the hand-assembled JSON
string concatenation for cliClientJSON and consoleClientJSON with strongly-typed
structs (e.g., clientConfig with fields ClientID, Enabled, PublicClient, Secret,
RedirectUris, DirectAccessGrantsEnabled, DefaultClientScopes, Attributes) and
use json.Marshal to produce the JSON strings so values like consoleSecret and
consoleRedirectURI are properly escaped; populate the Attributes map for
"access.token.lifespan" and set RedirectUris as a []string, marshal to []byte
and convert to string to assign to the ConfigMap data entries.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 64c1f89f-ce0b-4d5e-9a08-23951ba1c475

📥 Commits

Reviewing files that changed from the base of the PR and between ecf40c1 and 136b816.

📒 Files selected for processing (5)
  • test/e2e/util/external_oidc.go
  • test/e2e/v2/internal/env_vars.go
  • test/e2e/v2/lifecycle/azure.go
  • test/e2e/v2/tests/hosted_cluster_external_oidc_test.go
  • test/e2e/v2/util/keycloak.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • test/e2e/v2/internal/env_vars.go
  • test/e2e/util/external_oidc.go
  • test/e2e/v2/tests/hosted_cluster_external_oidc_test.go
  • test/e2e/v2/lifecycle/azure.go

@bryan-cox
Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

@bryan-cox
Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/v2/lifecycle/azure.go`:
- Around line 237-260: The ConfigMap and Secret names (caCM, consoleSecret) are
hardcoded and must be scoped to the HostedCluster: derive their Name using the
test's `name` (e.g. fmt.Sprintf("%s-oidc-ca", name) and "%s-console-secret") and
set those names on the corev1.ConfigMap (caCM) and corev1.Secret
(consoleSecret), pass those derived names into `ExtOIDCConfig`, and add an
OwnerReference to `hc` on both resources before calling
`createOrUpdateResource(ctx, cl, ...)` so they are unique per HostedCluster and
are garbage-collected with the cluster; apply the same changes to the analogous
resources created around lines 264-275.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: d2f76272-3893-40c9-8531-ec2b9495ee63

📥 Commits

Reviewing files that changed from the base of the PR and between 4eaad18 and f1e32ba.

📒 Files selected for processing (6)
  • test/e2e/util/external_oidc.go
  • test/e2e/v2/cmd/create-guests/main.go
  • test/e2e/v2/internal/env_vars.go
  • test/e2e/v2/lifecycle/azure.go
  • test/e2e/v2/tests/hosted_cluster_external_oidc_test.go
  • test/e2e/v2/util/keycloak.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • test/e2e/v2/internal/env_vars.go
  • test/e2e/util/external_oidc.go
  • test/e2e/v2/util/keycloak.go
  • test/e2e/v2/tests/hosted_cluster_external_oidc_test.go

Comment on lines +237 to +260
caCM := &corev1.ConfigMap{
ObjectMeta: metav1.ObjectMeta{
Name: "oidc-ca",
Namespace: namespace,
},
Data: map[string]string{
"ca-bundle.crt": string(kcConfig.CABundle),
},
}
if err := createOrUpdateResource(ctx, cl, caCM); err != nil {
return fmt.Errorf("creating OIDC CA configmap: %w", err)
}

consoleSecret := &corev1.Secret{
ObjectMeta: metav1.ObjectMeta{
Name: "console-secret",
Namespace: namespace,
},
Type: corev1.SecretTypeOpaque,
StringData: map[string]string{
"clientSecret": kcConfig.ConsoleClientSecret,
},
}
if err := createOrUpdateResource(ctx, cl, consoleSecret); err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Scope the OIDC ConfigMap/Secret to the HostedCluster.

These names are hardcoded inside the shared HostedCluster namespace, so another external-oidc run can upsert the same oidc-ca / console-secret objects and break an already-running cluster's auth wiring. Please derive the names from name and feed those names into ExtOIDCConfig; adding an owner reference to hc would also let destroy clean them up automatically.

Suggested change
+	caConfigMapName := fmt.Sprintf("%s-oidc-ca", name)
+	consoleSecretName := fmt.Sprintf("%s-console-secret", name)
+
 	caCM := &corev1.ConfigMap{
 		ObjectMeta: metav1.ObjectMeta{
-			Name:      "oidc-ca",
+			Name:      caConfigMapName,
 			Namespace: namespace,
 		},
 		Data: map[string]string{
 			"ca-bundle.crt": string(kcConfig.CABundle),
 		},
 	}
@@
 	consoleSecret := &corev1.Secret{
 		ObjectMeta: metav1.ObjectMeta{
-			Name:      "console-secret",
+			Name:      consoleSecretName,
 			Namespace: namespace,
 		},
@@
 	extOIDCConfig := &e2eutil.ExtOIDCConfig{
@@
-		ConsoleClientSecretName:  "console-secret",
+		ConsoleClientSecretName:  consoleSecretName,
@@
-		IssuerCAConfigmapName:    "oidc-ca",
+		IssuerCAConfigmapName:    caConfigMapName,
 		TestUsers:                kcConfig.TestUsers,
 	}

Also applies to: 264-275

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/v2/lifecycle/azure.go` around lines 237 - 260, The ConfigMap and
Secret names (caCM, consoleSecret) are hardcoded and must be scoped to the
HostedCluster: derive their Name using the test's `name` (e.g.
fmt.Sprintf("%s-oidc-ca", name) and "%s-console-secret") and set those names on
the corev1.ConfigMap (caCM) and corev1.Secret (consoleSecret), pass those
derived names into `ExtOIDCConfig`, and add an OwnerReference to `hc` on both
resources before calling `createOrUpdateResource(ctx, cl, ...)` so they are
unique per HostedCluster and are garbage-collected with the cluster; apply the
same changes to the analogous resources created around lines 264-275.

@bryan-cox
Copy link
Copy Markdown
Member Author

/pj-rehearse pull-ci-openshift-hypershift-main-e2e-azure-v2-self-managed

@bryan-cox
Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

@bryan-cox
Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

@bryan-cox bryan-cox force-pushed the CNTRLPLANE-3271 branch 2 times, most recently from dd7dc57 to 7db41d2 Compare June 5, 2026 11:09
@bryan-cox
Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

@bryan-cox bryan-cox force-pushed the CNTRLPLANE-3271 branch 2 times, most recently from aea8371 to 6cb9894 Compare June 5, 2026 11:32
@bryan-cox
Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

@bryan-cox
Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

@bryan-cox
Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

@bryan-cox
Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

Add platform-agnostic External OIDC testing infrastructure and tests
for the v2 e2e framework, using Keycloak as the identity provider.

Changes:
- Widen OIDC utility function signatures from *testing.T to testing.TB
  for Ginkgo v2 compatibility (backward compatible)
- Add Keycloak deployment utility (test/e2e/v2/util/keycloak.go) that
  deploys Keycloak on any management cluster with OIDC clients, test
  users, group mappings, and CA bundle extraction
- Add 'external-oidc' cluster variant to the Azure v2 lifecycle that
  deploys Keycloak in PostCreate and patches the HostedCluster with
  OIDC authentication config
- Add External OIDC validation tests covering: cluster OIDC config,
  OAuth server not deployed, KAS JWT authenticator configuration, and
  Keycloak token authentication with claim mapping verification
- Register E2E_EXTERNAL_OIDC_CA_BUNDLE_FILE and
  E2E_EXTERNAL_OIDC_TEST_USERS environment variables

Tests are platform-agnostic and skip based on authentication type,
not platform. Other platforms can reuse the Keycloak utility and test
file by adding an external-oidc variant to their lifecycle config.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@bryan-cox
Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 5, 2026

@bryan-cox: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure-v2-self-managed 6d5b904 link false /test e2e-azure-v2-self-managed

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@hypershift-jira-solve-ci
Copy link
Copy Markdown

hypershift-jira-solve-ci Bot commented Jun 6, 2026

Confirmed: Only the console operator is unhealthy. All other operators (including authentication) are Available=True, Degraded=False. This confirms the root cause chain:

  1. The create-guests test binary deploys Keycloak on the management cluster and patches the external-oidc HostedCluster with External OIDC config pointing to the management cluster's Keycloak Route
  2. The console pod (running on hosted cluster worker nodes) tries to initialize its OIDC authenticator by contacting the Keycloak OIDC discovery endpoint on the management cluster
  3. The console pod cannot reach the Keycloak server on the management cluster from the hosted cluster's worker nodes (different Azure VNet, no VNet peering configured for this test)
  4. The authenticator init times out after 5 minutes → console pod crashes → restartCount reaches 8
  5. Console operator reports Degraded=True → ClusterVersion remains at VersionState=Partial with Failing: cluster operator console is not available
  6. The create-guests binary times out waiting for the cluster to reach Completed → the CI step fails

Now I have all the evidence needed to write the report.

Test Failure Analysis Complete

Job Information

  • Prow Job: pull-ci-openshift-hypershift-main-e2e-azure-v2-self-managed
  • Build ID: 2063007712131682304
  • Target: e2e-azure-v2-self-managed
  • PR: #8674CNTRLPLANE-3271: add External OIDC e2e tests for v2 framework
  • Failed Step: e2e-azure-v2-self-managed-hypershift-azure-create-selfmanaged-guests (pre phase, 53m56s)

Test Failure Analysis

Error

ERROR: version rollout failed for external-oidc-4dfe0ebece (external-oidc): timed out waiting for clusters/external-oidc-4dfe0ebece: context deadline exceeded

Root cause in console pod (restartCount: 8, exitCode: 255):
F0605 22:45:49.118873  1 authoptions.go:332] Error initializing authenticator: failed to setup an async cache - caching func returned error: context deadline exceeded

Summary

The external-oidc hosted cluster's version rollout never completed because the console operator was the only cluster operator that remained degraded. The console pods (running on hosted cluster worker nodes) crash-loop because they cannot reach the Keycloak OIDC issuer URL (https://keycloak-keycloak.apps.4dfe0ebece-mgmt.hcp-sm-azure.azure.devcluster.openshift.com/realms/master) which is exposed as a Route on the management cluster. The hosted cluster worker nodes are in a separate Azure VNet without peering/connectivity to the management cluster's ingress, causing the authenticator initialization to time out after exactly 5 minutes on every attempt. All 5 other guest clusters (public, private, oauth-lb, upgrade, autoscaling) completed successfully — only the new external-oidc cluster failed, directly attributable to the new External OIDC e2e test code introduced by this PR.

Root Cause

The failure is a network connectivity issue between the hosted cluster worker nodes and the management cluster's Keycloak OIDC provider, introduced by PR #8674's new External OIDC e2e test.

Detailed chain of events:

  1. Keycloak deployed on management cluster (21:54:20): The create-guests binary deploys Keycloak in the keycloak namespace on the management cluster and creates a Route at https://keycloak-keycloak.apps.4dfe0ebece-mgmt.hcp-sm-azure.azure.devcluster.openshift.com/realms/master.

  2. HostedCluster patched with External OIDC (21:57:25): The HostedCluster external-oidc-4dfe0ebece is patched with spec.configuration.authentication containing an OIDC provider pointing to the management cluster's Keycloak Route, with issuerCertificateAuthority: oidc-ca configmap and clientSecret: console-secret.

  3. OIDC verification succeeds from management cluster: The verifyOIDCEndpoint() function in keycloak.go validates the Keycloak endpoint, but this check runs from the create-guests binary pod which is on the management cluster — it has direct access to the management cluster's Route.

  4. Console pods fail from hosted cluster (22:05:40 onward): The console pods are deployed on the hosted cluster's worker nodes in a different Azure VNet. When they try to initialize the OIDC authenticator by fetching the discovery document from the management cluster's Keycloak Route, the connection times out after 5 minutes because:

    • The hosted cluster VNet has no peering to the management cluster VNet
    • The management cluster's ingress Route may not be publicly routable from the hosted cluster's egress path
    • No Azure Private Link or Private Endpoint is configured between the two VNets for this traffic
  5. Console operator degraded → ClusterVersion stuck: With 0 console replicas ready and the Route returning 503, the console ClusterOperator reports Available=False, Degraded=True. The ClusterVersion operator cannot complete the rollout: "Unable to apply ... the cluster operator console is not available", keeping VersionState=Partial indefinitely.

  6. Timeout at 53m56s: The create-guests binary eventually times out waiting for all clusters to reach VersionState=Completed.

Key evidence that this is network-specific:

  • The kube-apiserver (which runs in the management cluster's control plane namespace clusters-external-oidc-4dfe0ebece) can reach Keycloak — the authentication ClusterOperator is Available=True, confirming the OIDC config itself is correct.
  • Only the console pods (which run on hosted cluster worker nodes in a separate VNet) fail.
  • All other cluster operators (31+) are healthy — only console is affected.
Recommendations
  1. Ensure network path from hosted cluster workers to Keycloak: The Keycloak Route on the management cluster must be reachable from the hosted cluster worker nodes. Options include:

    • Configure Azure VNet peering between the management cluster VNet and the hosted cluster VNet
    • Expose Keycloak via a LoadBalancer Service with a public IP instead of a Route (if the hosted cluster has public egress)
    • Use Azure Private DNS + Private Endpoint to make the management cluster's ingress accessible from the hosted VNet
  2. Deploy Keycloak on the hosted cluster instead: If the test architecture permits, deploy Keycloak inside the hosted cluster itself (or use the hosted cluster's ingress domain), so the console pods can reach it without cross-VNet connectivity.

  3. Add a connectivity pre-check from the hosted cluster: After patching the HostedCluster with OIDC config, verify that the Keycloak endpoint is reachable from a pod running on the hosted cluster worker nodes (not from the management cluster), and fail fast with a clear error rather than waiting for the full 54-minute timeout.

  4. Consider increasing the rollout timeout or treating console degraded as non-fatal during initial rollout: If the OIDC provider connectivity is expected to take time to establish (e.g., DNS propagation), the test framework could wait longer or handle the console operator's OIDC-related degradation as a separate post-rollout validation step.

Evidence
Evidence Detail
Failed step e2e-azure-v2-self-managed-hypershift-azure-create-selfmanaged-guests (pre phase, 53m56s)
Failed cluster external-oidc-4dfe0ebece — the only cluster that did not complete rollout
Successful clusters public, private, oauth-lb, upgrade, autoscaling — all 5 completed successfully
Console pod crash restartCount: 8, exitCode: 255, reason: Error on both console pods
Console fatal error authoptions.go:332: Error initializing authenticator: failed to setup an async cache - caching func returned error: context deadline exceeded
Timeout duration Exactly 5 minutes per crash cycle (22:40:49→22:45:49, 22:41:46→22:46:46)
Console operator status Available=False, Degraded=TrueRouteHealthDegraded: route not yet available, returns 503 Service Unavailable
ClusterVersion status Failing=TrueCluster operator console is not available; Progressing=True; state: Partial
OIDC issuer URL https://keycloak-keycloak.apps.4dfe0ebece-mgmt.hcp-sm-azure.azure.devcluster.openshift.com/realms/master (management cluster Route)
Authentication operator Available=True, Degraded=False — confirms OIDC config is valid, kube-apiserver (on mgmt cluster) can reach Keycloak
All other ClusterOperators Healthy (Available=True, Degraded=False) — only console is affected
Network architecture Hosted cluster workers in separate Azure VNet from management cluster; no VNet peering configured

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/platform/azure PR/issue for Azure (AzurePlatform) platform area/testing Indicates the PR includes changes for e2e testing do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants