feat(azure): add spec-level validation for workload identity config by chdeshpa-hue · Pull Request #8657 · openshift/hypershift

chdeshpa-hue · 2026-06-03T10:36:26Z

Summary

Azure HCP clusters using WorkloadIdentities auth enter an opaque degraded state when hypershift create iam azure is skipped or fails silently — the operator reconciles with empty client IDs producing confusing errors
Adds validateWorkloadIdentityConfig to ReconcileCredentials that checks all 7 required clientID fields before proceeding, surfacing a descriptive error via PlatformCredentialsFound condition
Phase 2 will promote ValidAzureWorkloadIdentity to a first-class condition on HostedCluster status

Test plan

Unit tests: 5 new cases covering non-Azure (no-op), ManagedIdentities (no-op), nil WorkloadIdentities, partial clientIDs, complete clientIDs
Regression: existing TestReconcileCredentials suite passes (fixtures updated to include all required fields)
go vet / gofmt clean
CI: e2e on Azure HCP with valid WI config (no behavioral change expected)
CI: e2e on Azure HCP with intentionally missing clientID (verify early clear error)

Ref: RFE-9351

Made with Cursor

Summary by CodeRabbit

Release Notes

New Features
- Added validation for Azure Workload Identity configurations to ensure all required managed identity client IDs are properly configured before credential reconciliation.
Tests
- Added test coverage for Azure Workload Identity configuration validation across different scenarios.

Azure HCP clusters using WorkloadIdentities authentication can enter an opaque degraded state when `hypershift create iam azure` is skipped or fails silently. The operator reconciles credentials with empty client IDs, producing confusing errors. This adds validateWorkloadIdentityConfig to ReconcileCredentials that checks all 7 required clientID fields are present before proceeding. Surfaces a descriptive error via PlatformCredentialsFound condition listing exactly which fields are missing. Phase 2 will wire ValidAzureWorkloadIdentity as a first-class condition on HostedCluster status. Ref: RFE-9351 Co-authored-by: Cursor <cursoragent@cursor.com>

openshift-merge-bot · 2026-06-03T10:36:36Z

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

coderabbitai · 2026-06-03T10:36:42Z

📝 Walkthrough

Walkthrough

This pull request introduces validation for Azure Workload Identity configuration in self-managed Azure HostedClusters. It defines a new ValidAzureWorkloadIdentity condition type with associated reason constants in the API schema, implements a validateWorkloadIdentityConfig helper function that inspects whether required Workload Identity client IDs are present, integrates this validation into the ReconcileCredentials flow to fail early if configuration is incomplete, and provides comprehensive test coverage including updated test fixtures and a new table-driven test for the validation logic.

🚥 Pre-merge checks | ✅ 11

✅ Passed checks (11 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely summarizes the main change: adding spec-level validation for Azure Workload Identity configuration.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	PR uses standard Go testing (table-driven tests with *testing.T), not Ginkgo. The custom check is specific to Ginkgo test declarations (It(), Describe(), etc.), which are not used in these changes.
Test Structure And Quality	✅ Passed	PR adds standard Go unit tests using testing.T, not Ginkgo tests. Custom check for Ginkgo test structure is not applicable to this PR.
Topology-Aware Scheduling Compatibility	✅ Passed	PR adds only validation logic for Azure workload identity config in controller code. No deployment manifests, scheduling constraints, affinity rules, or topology-dependent features are introduced.
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	No Ginkgo e2e tests are added in this PR; only standard Go unit tests in azure_test.go. Check applies only to Ginkgo e2e tests.
No-Weak-Crypto	✅ Passed	The PR adds Azure workload identity validation and condition constants with no cryptographic operations, weak crypto patterns, custom crypto implementations, or non-constant-time secret comparisons.
Container-Privileges	✅ Passed	PR contains only Go source code for Azure WorkloadIdentity validation; no Kubernetes manifests or container security configurations are present to flag.
No-Sensitive-Data-In-Logs	✅ Passed	The PR adds Azure Workload Identity validation that reports missing configuration field names via conditions, not sensitive values like client IDs, tokens, or credentials.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-06-03T10:36:59Z

Hi @chdeshpa-hue. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci · 2026-06-03T10:37:02Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: chdeshpa-hue
Once this PR has been reviewed and has the lgtm label, please assign enxebre for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS
api/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

hypershift-operator/controllers/hostedcluster/internal/platform/azure/azure_test.go (2)

553-553: ⚡ Quick win

Use the "When ... it should ..." naming format for the new test cases.

The new subtest names use plain descriptions (e.g., non-Azure platform returns nil, WorkloadIdentities with missing clientIDs returns incomplete) rather than the required Gherkin style.

♻️ Suggested renames

-			name: "non-Azure platform returns nil",
+			name: "When platform is not Azure it should return nil",

-			name: "Azure with ManagedIdentities auth returns nil",
+			name: "When Azure auth is ManagedIdentities it should return nil",

-			name: "WorkloadIdentities auth with nil WorkloadIdentities returns not configured",
+			name: "When WorkloadIdentities auth has a nil WorkloadIdentities block it should return not-configured",

-			name: "WorkloadIdentities with missing clientIDs returns incomplete",
+			name: "When WorkloadIdentities is missing required clientIDs it should return incomplete",

-			name: "WorkloadIdentities with all clientIDs returns valid",
+			name: "When WorkloadIdentities has all clientIDs it should return valid",

As per coding guidelines: "Always use 'When ... it should ...' format for describing test cases when creating unit tests".

Also applies to: 564-564, 580-580, 600-600, 629-629

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@hypershift-operator/controllers/hostedcluster/internal/platform/azure/azure_test.go`
at line 553, Rename the failing subtest titles to follow the Gherkin-style "When
... it should ..." pattern; specifically update the t.Run strings that currently
read like "non-Azure platform returns nil", "WorkloadIdentities with missing
clientIDs returns incomplete" and the other plain descriptions (also at the
other occurrences flagged around lines 564, 580, 600, 629) to descriptive
phrases such as "When platform is non-Azure it should return nil" and "When
WorkloadIdentities have missing clientIDs it should be incomplete" so the tests
in azure_test.go use the required "When ... it should ..." format; keep the
existing test bodies and only change the subtest name strings used in t.Run
calls (e.g., the subtests where the helper functions under test are invoked).

626-626: 💤 Low value

Brittle assertion on missing-field ordering.

expectContains: "ingress, file" couples the test to the exact iteration order of the missing-field message. If the helper ever reorders its checks (e.g., alphabetizes or changes struct traversal), this passes/fails for unrelated reasons. Consider asserting each missing field independently (e.g., ContainSubstring("ingress") and ContainSubstring("file")) so ordering changes don't break the test.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@hypershift-operator/controllers/hostedcluster/internal/platform/azure/azure_test.go`
at line 626, The test is brittle because it asserts the exact ordering via
expectContains: "ingress, file"; update the assertion to check each missing
field independently instead of an ordered string. Locate the table case using
expectContains in azure_test.go and replace the single ordered expectation with
two independent substring checks (e.g., assert the error/message contains
"ingress" and contains "file") or change expectContains to be a slice of
expected substrings and loop to assert each with ContainSubstring; keep the rest
of the test and helper behavior unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@hypershift-operator/controllers/hostedcluster/internal/platform/azure/azure.go`:
- Around line 529-549: The validation currently checks equality to empty string
which permits whitespace-only client IDs; update each check (e.g.,
wi.CloudProvider.ClientID, wi.NodePoolManagement.ClientID, wi.Ingress.ClientID,
wi.ImageRegistry.ClientID, wi.Disk.ClientID, wi.File.ClientID,
wi.Network.ClientID) to reject values that are empty after trimming whitespace
(use strings.TrimSpace or equivalent) so that purely whitespace client IDs are
treated as missing and appended to the missing slice.

---

Nitpick comments:
In
`@hypershift-operator/controllers/hostedcluster/internal/platform/azure/azure_test.go`:
- Line 553: Rename the failing subtest titles to follow the Gherkin-style "When
... it should ..." pattern; specifically update the t.Run strings that currently
read like "non-Azure platform returns nil", "WorkloadIdentities with missing
clientIDs returns incomplete" and the other plain descriptions (also at the
other occurrences flagged around lines 564, 580, 600, 629) to descriptive
phrases such as "When platform is non-Azure it should return nil" and "When
WorkloadIdentities have missing clientIDs it should be incomplete" so the tests
in azure_test.go use the required "When ... it should ..." format; keep the
existing test bodies and only change the subtest name strings used in t.Run
calls (e.g., the subtests where the helper functions under test are invoked).
- Line 626: The test is brittle because it asserts the exact ordering via
expectContains: "ingress, file"; update the assertion to check each missing
field independently instead of an ordered string. Locate the table case using
expectContains in azure_test.go and replace the single ordered expectation with
two independent substring checks (e.g., assert the error/message contains
"ingress" and contains "file") or change expectContains to be a slice of
expected substrings and loop to assert each with ContainSubstring; keep the rest
of the test and helper behavior unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 1874c375-7ab8-41a8-8bd6-c6ac03fff72e

📥 Commits

Reviewing files that changed from the base of the PR and between 860b695 and 315b47a.

⛔ Files ignored due to path filters (1)

vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_conditions.go is excluded by !vendor/**, !**/vendor/**

📒 Files selected for processing (3)

api/hypershift/v1beta1/hostedcluster_conditions.go
hypershift-operator/controllers/hostedcluster/internal/platform/azure/azure.go
hypershift-operator/controllers/hostedcluster/internal/platform/azure/azure_test.go

coderabbitai · 2026-06-03T10:41:01Z

+	if wi.CloudProvider.ClientID == "" {
+		missing = append(missing, "cloudProvider")
+	}
+	if wi.NodePoolManagement.ClientID == "" {
+		missing = append(missing, "nodePoolManagement")
+	}
+	if wi.Ingress.ClientID == "" {
+		missing = append(missing, "ingress")
+	}
+	if wi.ImageRegistry.ClientID == "" {
+		missing = append(missing, "imageRegistry")
+	}
+	if wi.Disk.ClientID == "" {
+		missing = append(missing, "disk")
+	}
+	if wi.File.ClientID == "" {
+		missing = append(missing, "file")
+	}
+	if wi.Network.ClientID == "" {
+		missing = append(missing, "network")
+	}


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Reject whitespace-only client IDs during WI validation.

clientID == "" allows values like " ", so invalid config can pass this gate and fail later with less actionable errors.

Suggested fix

- if wi.CloudProvider.ClientID == "" { + if strings.TrimSpace(string(wi.CloudProvider.ClientID)) == "" { missing = append(missing, "cloudProvider") } - if wi.NodePoolManagement.ClientID == "" { + if strings.TrimSpace(string(wi.NodePoolManagement.ClientID)) == "" { missing = append(missing, "nodePoolManagement") } - if wi.Ingress.ClientID == "" { + if strings.TrimSpace(string(wi.Ingress.ClientID)) == "" { missing = append(missing, "ingress") } - if wi.ImageRegistry.ClientID == "" { + if strings.TrimSpace(string(wi.ImageRegistry.ClientID)) == "" { missing = append(missing, "imageRegistry") } - if wi.Disk.ClientID == "" { + if strings.TrimSpace(string(wi.Disk.ClientID)) == "" { missing = append(missing, "disk") } - if wi.File.ClientID == "" { + if strings.TrimSpace(string(wi.File.ClientID)) == "" { missing = append(missing, "file") } - if wi.Network.ClientID == "" { + if strings.TrimSpace(string(wi.Network.ClientID)) == "" { missing = append(missing, "network") }

As per coding guidelines, "`**/*.{py,js,ts,go,rs,java,rb,php,kt,swift,cs}`: Validate at trust boundaries with allow-lists, not deny-lists".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if wi.CloudProvider.ClientID == "" {

missing = append(missing, "cloudProvider")

}

if wi.NodePoolManagement.ClientID == "" {

missing = append(missing, "nodePoolManagement")

}

if wi.Ingress.ClientID == "" {

missing = append(missing, "ingress")

}

if wi.ImageRegistry.ClientID == "" {

missing = append(missing, "imageRegistry")

}

if wi.Disk.ClientID == "" {

missing = append(missing, "disk")

}

if wi.File.ClientID == "" {

missing = append(missing, "file")

}

if wi.Network.ClientID == "" {

missing = append(missing, "network")

}

if strings.TrimSpace(string(wi.CloudProvider.ClientID)) == "" {

missing = append(missing, "cloudProvider")

}

if strings.TrimSpace(string(wi.NodePoolManagement.ClientID)) == "" {

missing = append(missing, "nodePoolManagement")

}

if strings.TrimSpace(string(wi.Ingress.ClientID)) == "" {

missing = append(missing, "ingress")

}

if strings.TrimSpace(string(wi.ImageRegistry.ClientID)) == "" {

missing = append(missing, "imageRegistry")

}

if strings.TrimSpace(string(wi.Disk.ClientID)) == "" {

missing = append(missing, "disk")

}

if strings.TrimSpace(string(wi.File.ClientID)) == "" {

missing = append(missing, "file")

}

if strings.TrimSpace(string(wi.Network.ClientID)) == "" {

missing = append(missing, "network")

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@hypershift-operator/controllers/hostedcluster/internal/platform/azure/azure.go` around lines 529 - 549, The validation currently checks equality to empty string which permits whitespace-only client IDs; update each check (e.g., wi.CloudProvider.ClientID, wi.NodePoolManagement.ClientID, wi.Ingress.ClientID, wi.ImageRegistry.ClientID, wi.Disk.ClientID, wi.File.ClientID, wi.Network.ClientID) to reject values that are empty after trimming whitespace (use strings.TrimSpace or equivalent) so that purely whitespace client IDs are treated as missing and appended to the missing slice.

openshift-ci Bot added the do-not-merge/needs-area label Jun 3, 2026

openshift-ci Bot requested review from bryan-cox and enxebre June 3, 2026 10:36

openshift-ci Bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 3, 2026

coderabbitai Bot reviewed Jun 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(azure): add spec-level validation for workload identity config#8657

feat(azure): add spec-level validation for workload identity config#8657
chdeshpa-hue wants to merge 1 commit into
openshift:mainfrom
chdeshpa-hue:feat/rfe-9351-azure-wi-spec-validation

chdeshpa-hue commented Jun 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

openshift-merge-bot Bot commented Jun 3, 2026

Uh oh!

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading

Walkthrough

Uh oh!

openshift-ci Bot commented Jun 3, 2026

Uh oh!

openshift-ci Bot commented Jun 3, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

chdeshpa-hue commented Jun 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

openshift-merge-bot Bot commented Jun 3, 2026

Uh oh!

coderabbitai Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

openshift-ci Bot commented Jun 3, 2026

Uh oh!

openshift-ci Bot commented Jun 3, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

chdeshpa-hue commented Jun 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading