Skip to content

[cinder-csi-plugin] Wait for volume availability before attach#3124

Open
hemna wants to merge 1 commit into
kubernetes:masterfrom
hemna:fix/wait-volume-available-before-attach
Open

[cinder-csi-plugin] Wait for volume availability before attach#3124
hemna wants to merge 1 commit into
kubernetes:masterfrom
hemna:fix/wait-volume-available-before-attach

Conversation

@hemna
Copy link
Copy Markdown
Contributor

@hemna hemna commented Jun 5, 2026

What this PR does / why we need it:

ControllerPublishVolume now waits for the volume to reach available or in-use status before calling the Cinder attachment API.

Previously, if the CO called ControllerPublishVolume immediately after CreateVolume, the volume could still be in creating state on the backend. This caused Cinder to reject the attachment with a 409 Conflict:

Volume X status must be available or downloading to reserve, but the current status is creating.

This forced the CO (external-attacher) to retry blindly, generating unnecessary API calls to Cinder and delaying volume attachment. The issue is most pronounced with storage backends where volume creation is asynchronous and takes several seconds (e.g., when volumes are backed by network-attached storage).

How the fix works:

A new WaitVolumeTargetStatusWithContext method uses wait.PollUntilContextCancel which:

  1. Polls the volume status every 3 seconds
  2. Returns immediately if the volume is already available or in-use
  3. Returns an error if the volume enters an error state
  4. Respects the gRPC context deadline (no fixed step count — bounded by the CO's RPC timeout)

If the context expires before the volume is ready, the driver returns FAILED_PRECONDITION which tells the CO to retry with exponential backoff.

Which issue this PR fixes(if applicable):
fixes #

Special notes for reviewers:

The volumeReadyPollInterval is set to 3 seconds, which balances responsiveness against API load on the Cinder service. The total wait time is bounded by the gRPC context deadline (typically 15-30s depending on the external-attacher configuration), not a fixed backoff.

The existing WaitVolumeTargetStatus (used in ControllerExpandVolume) is left unchanged to avoid disrupting existing behavior.

Release note:

[cinder-csi-plugin] ControllerPublishVolume now waits for volumes to become available before calling the Cinder attach API, eliminating 409 Conflict errors when volumes are still being provisioned.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 5, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign zetaab for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @hemna. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot requested review from stephenfin and zetaab June 5, 2026 14:38
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 5, 2026
ControllerPublishVolume now waits for the volume to reach 'available'
or 'in-use' status before calling the Cinder attachment API.

Previously, if the CO called ControllerPublishVolume immediately after
CreateVolume, the volume could still be in 'creating' state on the
backend. This caused Cinder to reject the attachment with a 409 Conflict
('status must be available or downloading to reserve, but the current
status is creating'), forcing the CO to retry blindly.

The new behavior uses a context-aware poll (WaitVolumeTargetStatusWithContext)
that respects the gRPC request deadline. The volume status is checked
every 3 seconds until it reaches a target state, enters an error state,
or the context expires. This eliminates unnecessary 409 errors against
Cinder and reduces time-to-attach for volumes still being provisioned.

Signed-off-by: Walter Boring <waboring@hemna.com>
@hemna hemna force-pushed the fix/wait-volume-available-before-attach branch from 4e4519a to 35031dc Compare June 5, 2026 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants