SSCSI-254: Configurable secret rotation and WIF support for SSCSI by chiragkyal · Pull Request #2012 · openshift/enhancements

chiragkyal · 2026-05-16T19:16:50Z

Summary

This enhancement proposal adds configurable secret rotation and workload identity federation (WIF) support to the OpenShift Secrets Store CSI Driver Operator via the ClusterCSIDriver CR.

Changes

Extends CSIDriverConfigSpec with a new SecretsStore discriminated union
variant containing secretRotation and tokenRequests fields.
The operator will dynamically propagate these settings to:
- The storage.k8s.io/v1 CSIDriver object (requiresRepublish, tokenRequests)
- The driver DaemonSet container args (--enable-secret-rotation, --rotation-poll-interval)
Aligns with upstream Secrets Store CSI Driver v1.6.0 which replaced the internal
rotation controller with kubelet-native requiresRepublish.

Tracking

Jira: https://redhat.atlassian.net/browse/SSCSI-254

/cc @mytreya-rh @dobsonj

openshift-ci · 2026-05-16T19:16:54Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci-robot · 2026-05-16T19:16:54Z

@chiragkyal: This pull request references SSCSI-254 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2026-05-16T19:17:05Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign suleymanakbas91 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

chiragkyal · 2026-05-18T06:54:35Z

/cc @mytreya-rh @dobsonj

mytreya-rh · 2026-05-28T10:40:35Z

+  workloads that use static secrets, so that the driver does not make unnecessary
+  provider API calls that may count against rate limits.
+
+- As a cluster administrator, I want to configure the rotation polling interval to


i think the need is actually to have a larger value. So the story i think is more about ability to configure the rotation poll interval.
[out of scope for this PR] We can also perhaps explore upstream eco-system of an issuer/provider driven refresh of the secret, which could be more optimal compared to frequent polling.

https://redhat.atlassian.net/browse/RFE-8422
The customer manages approximately 200 secrets per cluster. Continuous polling of the Azure Key Vault for secret updates results in a high number of transactions, leading to unnecessary costs and performance overhead.
Having the ability to control secret rotation behavior would provide better cost efficiency and operational flexibility.

i think the need is actually to have a larger value. So the story i think is more about ability to configure the rotation poll interval.

Thanks for the suggestion. Updated the user story.

[out of scope for this PR] We can also perhaps explore upstream eco-system of an issuer/provider driven refresh of the secret, which could be more optimal compared to frequent polling.

https://redhat.atlassian.net/browse/RFE-8422
The customer manages approximately 200 secrets per cluster. Continuous polling of the Azure Key Vault for secret updates results in a high number of transactions, leading to unnecessary costs and performance overhead.
Having the ability to control secret rotation behavior would provide better cost efficiency and operational flexibility.

Yeah, agreed! A push method is much more efficient than a pull method.

mytreya-rh · 2026-05-28T11:05:13Z

+**Upgrade**: Clusters upgrading to the new operator version will see no behavior
+change. The operator defaults match the previously hardcoded values
+(`requiresRepublish: true`, `--enable-secret-rotation=true`,
+`--rotation-poll-interval=2m`, no `tokenRequests`). 


We need to read the tokenRequests already set on the CSIDriver by users and merge it with user settings from ClusterCSIDriver*
We can think of a better way too to let the users keep their already configured CSIDriver settings than the above suggestion.

*This is needed because some users may already have set the audience for the Azure WIF integration.
The AWS provider was updated later, but the Azure Provider needed the audience for WIF flows for a long time.
Please note that changes to CSIDriver by user would not have caused reconcile (to the operator's static CSIDriver manifest) because the hash value in the annotation doesn't change.

Good point. I looked into how the CSIDriver object reconciliation works in library-go's ApplyCSIDriver. The operator's manifest never included tokenRequests or requiresRepublish. So when someone manually patched tokenRequests onto the CSIDriver (for Azure WIF), the operator's desired spec hadn't changed, the hash matched, and reconciliation was a no-op.

I thought about the merge approach ("if ClusterCSIDriver.tokenRequests is empty, preserve whatever's on the CSIDriver"), but it might create some issue:

Before upgrade: field doesn't exist on ClusterCSIDriver -> treat as preserve on CSIDriver.

User sets tokenRequests via ClusterCSIDriver after upgrade (AWS WIF)-> merged with the existing propagated tokenRequest on CSIDriver (AWS WIF + Azure WIF)

User later removes tokenRequests from ClusterCSIDriver -> merge logic says "preserve existing" -> no way to actually clear it

I think we should keep ClusterCSIDriver as the sole source of truth. The operator owns/manages the CSIDriver object completely, and not supposed to be handled manually. For users who already have Azure WIF configured manually, we can add a release note telling them to move their tokenRequests into ClusterCSIDriver. This keeps things predictable, both adding and removing tokenRequests works as expected.

Even if we notify in release notes, there could be temporary secret rotation or pull failures from the time the operator is upgraded till the time the ClusterCSIDriver is configured.
Another chance is oversight of release notes causing outage due to not configuring the ClusterCSIDriver. (Especially if we consider auto-upgrade scenarios)

Can we add a parameter in ClusterCSIDriver in the secretsStore section which would control whether or not we will update the CSIDriver?
The default would be to keep existing tokenRequests on the CSIDriver.
The user can then update the ClusterCSIDriver with the needed audience, as well as enable the parameter.
When the parameter is set to not overwrite tokenRequests, and the tokenRequests is not empty, we could degrade or set the relevant status condition to alert the user.

Can we add a parameter in ClusterCSIDriver in the secretsStore section which would control whether or not we will update the CSIDriver?

Whatever parameter we add will be part of the new OpenShift release. Operators deployed on older OpenShift releases won't have this parameter, so the apiserver will ignore this field or might error as unknown.

Do we have any data on how many such clusters are there which have manually patched the CSIDriver? TBH, users should not modify resources managed by an Operator, as such changes may be reverted.

Whatever parameter we add will be part of the new OpenShift release. Operators deployed on older OpenShift releases won't have this parameter, so the apiserver will ignore this field or might error as unknown.

Sorry, but my suggestion is NOT about pre-upgrade. It is about what happens after the upgrade.
In the API, if we have a field lets say syncTokenRequests with a default value set to "false", the operator can look at it and NOT overwrite the already configured audience.
We will document that along with populating tokenRequests, syncTokenRequests should be set to "true"

Do we have any data on how many such clusters are there which have manually patched the CSIDriver? TBH, users should not modify resources managed by an Operator, as such changes may be reverted.

We know of at least one user https://redhat-internal.slack.com/archives/C08F8UBM0F7/p1758270464495719
We have not done any survey on how many users have configured the audience directly on the CSIDriver yet.
But as you know, the operator does not immediately reconcile this change.
Thus, i think when we can provide a smooth integration, it is better we do so to avoid any surprises to our users.

Thanks for the suggestion. I've incorporated this into the proposal with the following design:

New struct: tokenRequests with policy and audiences

secretsStore: tokenRequests: policy: Managed # or "Unmanaged" (default) audiences: - audience: "sts.amazonaws.com" expirationSeconds: 3600 - audience: "api://AzureADTokenExchange"

How it works:

policy: "Unmanaged" (default): The operator reads the existing CSIDriver.spec.tokenRequests from the cluster and includes them in the desired spec. This means any manually patched audiences are preserved.

policy: "Managed": The operator uses the audiences list from ClusterCSIDriver as the sole source of truth. This gives full add/remove control power to the user.

Signed-off-by: chiragkyal <ckyal@redhat.com>

mytreya-rh

/lgtm

with a minor comment

mytreya-rh · 2026-06-01T11:30:48Z

+    // list, replacing any previously configured values.
+    // +default="Unmanaged"
+    // +optional
+    Policy TokenRequestsPolicy `json:"policy,omitempty"`


i think we should make it immutable once set to "Managed". Would there be any issues with such restriction?

Thanks for the suggestion; it has been incorporated. Also added some details about API state during upgrade. Please have a look.

Signed-off-by: chiragkyal <ckyal@redhat.com>

openshift-ci · 2026-06-02T10:39:11Z

New changes are detected. LGTM label has been removed.

openshift-ci · 2026-06-02T10:52:03Z

@chiragkyal: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

jsafrane

From the storage point of view, it looks solid. It follows existing ClusterCSIDriver API usage and also common usage of CSI interface in Kubernetes emphemeral volumes.

jsafrane · 2026-06-04T09:10:52Z

+- On upgrade, `tokenRequests.policy` defaults to `"Unmanaged"`, preserving any
+  existing tokenRequests on the CSIDriver.


Is this a behavior change? The current operator will overwrite any CSIDriver changes made by the cluster admin during cluster update. This enhancement suggests it won't be overwritten. I think it's a good step forward, but it should be explicitly called out and documented.

To clarify: there is no behavior change from the current operator's perspective.
Today the operator does not set tokenRequests at all (it's not in the static csidriver.yaml template), so any manually-patched tokenRequests on the CSIDriver object already survive reconciliation today.

What this enhancement does change is adding requiresRepublish and tokenRequests to the desired CSIDriver spec (which was previously unset). That changes the spec-hash and would trigger a delete+recreate, which would wipe manually patched tokenRequests if we do not explicitly preserve them.

So the "Unmanaged" default is not about any behavior change of the operator, it's about preventing the damage to user configured tokenRequests during the recreate.

See #2012 (comment) for a similar discussion.

jsafrane · 2026-06-04T09:23:33Z

+    // Only honored when policy is "Managed".
+    // +optional
+    // +listType=atomic
+    Audiences []SecretsStoreTokenRequest `json:"audiences,omitempty"`


How many audiences does Kubernetes support? Should there be an upper limit?

And in general, most (all?) new fields need some validation. Like explicit enum values for all Policy fields, lower boundary for <anything>Seconds (negative numbers are probably bad) etc.

BTW, the validation can wait for the API review.

Yes, that's true. I have a dedicated API PR : openshift/api#2846

It's better to get these validations finalized there, then we can copy it over to the EP.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 16, 2026

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 16, 2026

chiragkyal force-pushed the configure-secret-rotation-and-wif branch from 00a6104 to a49110c Compare May 16, 2026 20:00

chiragkyal marked this pull request as ready for review May 18, 2026 06:46

openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 18, 2026

openshift-ci Bot requested review from dobsonj and mytreya-rh May 18, 2026 06:54

mytreya-rh reviewed May 28, 2026

View reviewed changes

chiragkyal mentioned this pull request May 29, 2026

SSCSI-245: Add Secrets Store CSI driver configuration to ClusterCSIDriver API openshift/api#2846

Open

chiragkyal force-pushed the configure-secret-rotation-and-wif branch 2 times, most recently from 776cb54 to 4b00729 Compare May 29, 2026 11:34

Configurable secret rotation and WIF support for SSCSI

66f2f50

Signed-off-by: chiragkyal <ckyal@redhat.com>

chiragkyal force-pushed the configure-secret-rotation-and-wif branch from 4b00729 to 66f2f50 Compare May 29, 2026 12:43

Add field to control upgrade issue

666dc48

Signed-off-by: chiragkyal <ckyal@redhat.com>

mytreya-rh reviewed Jun 1, 2026

View reviewed changes

openshift-ci Bot assigned mytreya-rh Jun 1, 2026

openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 1, 2026

Add tokenRequests.policy: 'Managed' field immutable

c13aad6

Signed-off-by: chiragkyal <ckyal@redhat.com>

openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 2, 2026

jsafrane reviewed Jun 4, 2026

View reviewed changes

		- On upgrade, `tokenRequests.policy` defaults to `"Unmanaged"`, preserving any
		existing tokenRequests on the CSIDriver.

Conversation

chiragkyal commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Tracking

Uh oh!

openshift-ci Bot commented May 16, 2026

Uh oh!

openshift-ci-robot commented May 16, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci Bot commented May 16, 2026

Uh oh!

chiragkyal commented May 18, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mytreya-rh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

openshift-ci Bot commented Jun 2, 2026

Uh oh!

openshift-ci Bot commented Jun 2, 2026

Uh oh!

jsafrane left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

chiragkyal commented May 16, 2026 •

edited

Loading

openshift-ci-robot commented May 16, 2026 •

edited by openshift-ci Bot

Loading