Skip to content

feat(azure): add spec-level validation for workload identity config#8657

Open
chdeshpa-hue wants to merge 1 commit into
openshift:mainfrom
chdeshpa-hue:feat/rfe-9351-azure-wi-spec-validation
Open

feat(azure): add spec-level validation for workload identity config#8657
chdeshpa-hue wants to merge 1 commit into
openshift:mainfrom
chdeshpa-hue:feat/rfe-9351-azure-wi-spec-validation

Conversation

@chdeshpa-hue
Copy link
Copy Markdown

@chdeshpa-hue chdeshpa-hue commented Jun 3, 2026

Summary

  • Azure HCP clusters using WorkloadIdentities auth enter an opaque degraded state when hypershift create iam azure is skipped or fails silently — the operator reconciles with empty client IDs producing confusing errors
  • Adds validateWorkloadIdentityConfig to ReconcileCredentials that checks all 7 required clientID fields before proceeding, surfacing a descriptive error via PlatformCredentialsFound condition
  • Phase 2 will promote ValidAzureWorkloadIdentity to a first-class condition on HostedCluster status

Test plan

  • Unit tests: 5 new cases covering non-Azure (no-op), ManagedIdentities (no-op), nil WorkloadIdentities, partial clientIDs, complete clientIDs
  • Regression: existing TestReconcileCredentials suite passes (fixtures updated to include all required fields)
  • go vet / gofmt clean
  • CI: e2e on Azure HCP with valid WI config (no behavioral change expected)
  • CI: e2e on Azure HCP with intentionally missing clientID (verify early clear error)

Ref: RFE-9351

Made with Cursor

Summary by CodeRabbit

Release Notes

  • New Features

    • Added validation for Azure Workload Identity configurations to ensure all required managed identity client IDs are properly configured before credential reconciliation.
  • Tests

    • Added test coverage for Azure Workload Identity configuration validation across different scenarios.

Azure HCP clusters using WorkloadIdentities authentication can enter
an opaque degraded state when `hypershift create iam azure` is skipped
or fails silently. The operator reconciles credentials with empty client
IDs, producing confusing errors.

This adds validateWorkloadIdentityConfig to ReconcileCredentials that
checks all 7 required clientID fields are present before proceeding.
Surfaces a descriptive error via PlatformCredentialsFound condition
listing exactly which fields are missing.

Phase 2 will wire ValidAzureWorkloadIdentity as a first-class condition
on HostedCluster status.

Ref: RFE-9351
Co-authored-by: Cursor <cursoragent@cursor.com>
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 3, 2026

📝 Walkthrough

Walkthrough

This pull request introduces validation for Azure Workload Identity configuration in self-managed Azure HostedClusters. It defines a new ValidAzureWorkloadIdentity condition type with associated reason constants in the API schema, implements a validateWorkloadIdentityConfig helper function that inspects whether required Workload Identity client IDs are present, integrates this validation into the ReconcileCredentials flow to fail early if configuration is incomplete, and provides comprehensive test coverage including updated test fixtures and a new table-driven test for the validation logic.

🚥 Pre-merge checks | ✅ 11
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the main change: adding spec-level validation for Azure Workload Identity configuration.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR uses standard Go testing (table-driven tests with *testing.T), not Ginkgo. The custom check is specific to Ginkgo test declarations (It(), Describe(), etc.), which are not used in these changes.
Test Structure And Quality ✅ Passed PR adds standard Go unit tests using testing.T, not Ginkgo tests. Custom check for Ginkgo test structure is not applicable to this PR.
Topology-Aware Scheduling Compatibility ✅ Passed PR adds only validation logic for Azure workload identity config in controller code. No deployment manifests, scheduling constraints, affinity rules, or topology-dependent features are introduced.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No Ginkgo e2e tests are added in this PR; only standard Go unit tests in azure_test.go. Check applies only to Ginkgo e2e tests.
No-Weak-Crypto ✅ Passed The PR adds Azure workload identity validation and condition constants with no cryptographic operations, weak crypto patterns, custom crypto implementations, or non-constant-time secret comparisons.
Container-Privileges ✅ Passed PR contains only Go source code for Azure WorkloadIdentity validation; no Kubernetes manifests or container security configurations are present to flag.
No-Sensitive-Data-In-Logs ✅ Passed The PR adds Azure Workload Identity validation that reports missing configuration field names via conditions, not sensitive values like client IDs, tokens, or credentials.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from bryan-cox and enxebre June 3, 2026 10:36
@openshift-ci openshift-ci Bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 3, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 3, 2026

Hi @chdeshpa-hue. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 3, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: chdeshpa-hue
Once this PR has been reviewed and has the lgtm label, please assign enxebre for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added area/api Indicates the PR includes changes for the API area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/platform/azure PR/issue for Azure (AzurePlatform) platform and removed do-not-merge/needs-area labels Jun 3, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
hypershift-operator/controllers/hostedcluster/internal/platform/azure/azure_test.go (2)

553-553: ⚡ Quick win

Use the "When ... it should ..." naming format for the new test cases.

The new subtest names use plain descriptions (e.g., non-Azure platform returns nil, WorkloadIdentities with missing clientIDs returns incomplete) rather than the required Gherkin style.

♻️ Suggested renames
-			name: "non-Azure platform returns nil",
+			name: "When platform is not Azure it should return nil",
-			name: "Azure with ManagedIdentities auth returns nil",
+			name: "When Azure auth is ManagedIdentities it should return nil",
-			name: "WorkloadIdentities auth with nil WorkloadIdentities returns not configured",
+			name: "When WorkloadIdentities auth has a nil WorkloadIdentities block it should return not-configured",
-			name: "WorkloadIdentities with missing clientIDs returns incomplete",
+			name: "When WorkloadIdentities is missing required clientIDs it should return incomplete",
-			name: "WorkloadIdentities with all clientIDs returns valid",
+			name: "When WorkloadIdentities has all clientIDs it should return valid",

As per coding guidelines: "Always use 'When ... it should ...' format for describing test cases when creating unit tests".

Also applies to: 564-564, 580-580, 600-600, 629-629

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@hypershift-operator/controllers/hostedcluster/internal/platform/azure/azure_test.go`
at line 553, Rename the failing subtest titles to follow the Gherkin-style "When
... it should ..." pattern; specifically update the t.Run strings that currently
read like "non-Azure platform returns nil", "WorkloadIdentities with missing
clientIDs returns incomplete" and the other plain descriptions (also at the
other occurrences flagged around lines 564, 580, 600, 629) to descriptive
phrases such as "When platform is non-Azure it should return nil" and "When
WorkloadIdentities have missing clientIDs it should be incomplete" so the tests
in azure_test.go use the required "When ... it should ..." format; keep the
existing test bodies and only change the subtest name strings used in t.Run
calls (e.g., the subtests where the helper functions under test are invoked).

626-626: 💤 Low value

Brittle assertion on missing-field ordering.

expectContains: "ingress, file" couples the test to the exact iteration order of the missing-field message. If the helper ever reorders its checks (e.g., alphabetizes or changes struct traversal), this passes/fails for unrelated reasons. Consider asserting each missing field independently (e.g., ContainSubstring("ingress") and ContainSubstring("file")) so ordering changes don't break the test.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@hypershift-operator/controllers/hostedcluster/internal/platform/azure/azure_test.go`
at line 626, The test is brittle because it asserts the exact ordering via
expectContains: "ingress, file"; update the assertion to check each missing
field independently instead of an ordered string. Locate the table case using
expectContains in azure_test.go and replace the single ordered expectation with
two independent substring checks (e.g., assert the error/message contains
"ingress" and contains "file") or change expectContains to be a slice of
expected substrings and loop to assert each with ContainSubstring; keep the rest
of the test and helper behavior unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@hypershift-operator/controllers/hostedcluster/internal/platform/azure/azure.go`:
- Around line 529-549: The validation currently checks equality to empty string
which permits whitespace-only client IDs; update each check (e.g.,
wi.CloudProvider.ClientID, wi.NodePoolManagement.ClientID, wi.Ingress.ClientID,
wi.ImageRegistry.ClientID, wi.Disk.ClientID, wi.File.ClientID,
wi.Network.ClientID) to reject values that are empty after trimming whitespace
(use strings.TrimSpace or equivalent) so that purely whitespace client IDs are
treated as missing and appended to the missing slice.

---

Nitpick comments:
In
`@hypershift-operator/controllers/hostedcluster/internal/platform/azure/azure_test.go`:
- Line 553: Rename the failing subtest titles to follow the Gherkin-style "When
... it should ..." pattern; specifically update the t.Run strings that currently
read like "non-Azure platform returns nil", "WorkloadIdentities with missing
clientIDs returns incomplete" and the other plain descriptions (also at the
other occurrences flagged around lines 564, 580, 600, 629) to descriptive
phrases such as "When platform is non-Azure it should return nil" and "When
WorkloadIdentities have missing clientIDs it should be incomplete" so the tests
in azure_test.go use the required "When ... it should ..." format; keep the
existing test bodies and only change the subtest name strings used in t.Run
calls (e.g., the subtests where the helper functions under test are invoked).
- Line 626: The test is brittle because it asserts the exact ordering via
expectContains: "ingress, file"; update the assertion to check each missing
field independently instead of an ordered string. Locate the table case using
expectContains in azure_test.go and replace the single ordered expectation with
two independent substring checks (e.g., assert the error/message contains
"ingress" and contains "file") or change expectContains to be a slice of
expected substrings and loop to assert each with ContainSubstring; keep the rest
of the test and helper behavior unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 1874c375-7ab8-41a8-8bd6-c6ac03fff72e

📥 Commits

Reviewing files that changed from the base of the PR and between 860b695 and 315b47a.

⛔ Files ignored due to path filters (1)
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_conditions.go is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (3)
  • api/hypershift/v1beta1/hostedcluster_conditions.go
  • hypershift-operator/controllers/hostedcluster/internal/platform/azure/azure.go
  • hypershift-operator/controllers/hostedcluster/internal/platform/azure/azure_test.go

Comment on lines +529 to +549
if wi.CloudProvider.ClientID == "" {
missing = append(missing, "cloudProvider")
}
if wi.NodePoolManagement.ClientID == "" {
missing = append(missing, "nodePoolManagement")
}
if wi.Ingress.ClientID == "" {
missing = append(missing, "ingress")
}
if wi.ImageRegistry.ClientID == "" {
missing = append(missing, "imageRegistry")
}
if wi.Disk.ClientID == "" {
missing = append(missing, "disk")
}
if wi.File.ClientID == "" {
missing = append(missing, "file")
}
if wi.Network.ClientID == "" {
missing = append(missing, "network")
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Reject whitespace-only client IDs during WI validation.

clientID == "" allows values like " ", so invalid config can pass this gate and fail later with less actionable errors.

Suggested fix
-	if wi.CloudProvider.ClientID == "" {
+	if strings.TrimSpace(string(wi.CloudProvider.ClientID)) == "" {
 		missing = append(missing, "cloudProvider")
 	}
-	if wi.NodePoolManagement.ClientID == "" {
+	if strings.TrimSpace(string(wi.NodePoolManagement.ClientID)) == "" {
 		missing = append(missing, "nodePoolManagement")
 	}
-	if wi.Ingress.ClientID == "" {
+	if strings.TrimSpace(string(wi.Ingress.ClientID)) == "" {
 		missing = append(missing, "ingress")
 	}
-	if wi.ImageRegistry.ClientID == "" {
+	if strings.TrimSpace(string(wi.ImageRegistry.ClientID)) == "" {
 		missing = append(missing, "imageRegistry")
 	}
-	if wi.Disk.ClientID == "" {
+	if strings.TrimSpace(string(wi.Disk.ClientID)) == "" {
 		missing = append(missing, "disk")
 	}
-	if wi.File.ClientID == "" {
+	if strings.TrimSpace(string(wi.File.ClientID)) == "" {
 		missing = append(missing, "file")
 	}
-	if wi.Network.ClientID == "" {
+	if strings.TrimSpace(string(wi.Network.ClientID)) == "" {
 		missing = append(missing, "network")
 	}
As per coding guidelines, "`**/*.{py,js,ts,go,rs,java,rb,php,kt,swift,cs}`: Validate at trust boundaries with allow-lists, not deny-lists".
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if wi.CloudProvider.ClientID == "" {
missing = append(missing, "cloudProvider")
}
if wi.NodePoolManagement.ClientID == "" {
missing = append(missing, "nodePoolManagement")
}
if wi.Ingress.ClientID == "" {
missing = append(missing, "ingress")
}
if wi.ImageRegistry.ClientID == "" {
missing = append(missing, "imageRegistry")
}
if wi.Disk.ClientID == "" {
missing = append(missing, "disk")
}
if wi.File.ClientID == "" {
missing = append(missing, "file")
}
if wi.Network.ClientID == "" {
missing = append(missing, "network")
}
if strings.TrimSpace(string(wi.CloudProvider.ClientID)) == "" {
missing = append(missing, "cloudProvider")
}
if strings.TrimSpace(string(wi.NodePoolManagement.ClientID)) == "" {
missing = append(missing, "nodePoolManagement")
}
if strings.TrimSpace(string(wi.Ingress.ClientID)) == "" {
missing = append(missing, "ingress")
}
if strings.TrimSpace(string(wi.ImageRegistry.ClientID)) == "" {
missing = append(missing, "imageRegistry")
}
if strings.TrimSpace(string(wi.Disk.ClientID)) == "" {
missing = append(missing, "disk")
}
if strings.TrimSpace(string(wi.File.ClientID)) == "" {
missing = append(missing, "file")
}
if strings.TrimSpace(string(wi.Network.ClientID)) == "" {
missing = append(missing, "network")
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@hypershift-operator/controllers/hostedcluster/internal/platform/azure/azure.go`
around lines 529 - 549, The validation currently checks equality to empty string
which permits whitespace-only client IDs; update each check (e.g.,
wi.CloudProvider.ClientID, wi.NodePoolManagement.ClientID, wi.Ingress.ClientID,
wi.ImageRegistry.ClientID, wi.Disk.ClientID, wi.File.ClientID,
wi.Network.ClientID) to reject values that are empty after trimming whitespace
(use strings.TrimSpace or equivalent) so that purely whitespace client IDs are
treated as missing and appended to the missing slice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/api Indicates the PR includes changes for the API area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/platform/azure PR/issue for Azure (AzurePlatform) platform needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant