Skip to content

docs(known-issues): document BUG-048 concurrent late vd-create limitation#165

Merged
Andrei Kvapil (kvaps) merged 1 commit into
mainfrom
docs/bug-048-known-issue
Jun 14, 2026
Merged

docs(known-issues): document BUG-048 concurrent late vd-create limitation#165
Andrei Kvapil (kvaps) merged 1 commit into
mainfrom
docs/bug-048-known-issue

Conversation

@kvaps

@kvaps Andrei Kvapil (kvaps) commented Jun 14, 2026

Copy link
Copy Markdown
Member

What

Document the concurrent / rapid manual late volume-definition creation limitation (campaign tracking id BUG-048) as "Bug 50" in docs/known-issues.md, and add it to the recommended next-fix order.

Why

The release gate flagged that this known, operator-reachable availability limitation shipped with no entry in the product tree. Operators need to know not to issue rapid back-to-back linstor vd c against an existing multi-replica RD, and how to recover if they do.

Scope

Documentation only — no code, no test, no data-path change. The limitation is not reachable through the production CSI path (one PVC → one RD → one VD), is availability-only (no data loss, no node-reboot deadlock), and is recoverable by deleting and re-adding the volume-definition sequentially. A corrected code fix (superseding the deferred PR #164) is tracked separately.

Summary by CodeRabbit

  • Documentation
    • Updated known issues documentation with a new entry describing an intermittent failure that can occur when issuing rapid consecutive volume definition creation commands, which may result in dropped volumes or volumes stuck in an Inconsistent state. Includes recovery steps and recommended workarounds.

…tion

The release gate flagged that this known operator-reachable availability
limitation shipped with no entry in the product tree (tracked only in
internal campaign state). Document it as Bug 50 so operators know not to
issue rapid back-to-back vd-create against an existing multi-replica RD
and how to recover. Documentation only; no code or data-path change.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
@coderabbitai

coderabbitai Bot commented Jun 14, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9e142c07-c6fc-441f-8322-20265c6ee6a0

📥 Commits

Reviewing files that changed from the base of the PR and between f351504 and c611fbe.

📒 Files selected for processing (1)
  • docs/known-issues.md

📝 Walkthrough

Walkthrough

Adds a new "Bug 50" entry to docs/known-issues.md documenting an intermittent failure when two linstor volume-definition create calls are issued back-to-back. The recommended fix priority list is updated to include Bug 50 and reorder subsequent entries.

Changes

Bug 50 Known-Issue Documentation

Layer / File(s) Summary
Bug 50 entry and fix priority update
docs/known-issues.md
Adds Bug 50 documenting concurrent rapid vd c drop/wedge behavior, including reproduction conditions, scope limits, recovery procedure, and fix direction. Inserts Bug 50 into the recommended fix order list and shifts the ranking of Bug 34 Option B, Bug 38, and Bug 32.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

  • cozystack/blockstor#162: Modifies the late volume-add e2e test to detect the no-SyncSource wedge behavior that Bug 50 documents, tying both changes to the same underlying concurrent VD creation failure mode.

Poem

A rabbit found a race condition deep,
Two volumes created while the peers were asleep,
The second one wedged, no SyncSource in sight,
So hop-by-hop we document it right.
Sequential creation shall set things aright! 🐇

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and specifically summarizes the main change: documenting BUG-048 (a concurrent late volume-definition creation limitation) in the known-issues documentation.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/bug-048-known-issue

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request documents Bug 50 in docs/known-issues.md, detailing a concurrency issue when rapidly creating volume definitions, and updates the recommended next-fix order. The review comments suggest minor stylistic and grammatical improvements to ensure a formal tone and consistent terminology throughout the documentation.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread docs/known-issues.md

**Recommended fix**: Serialise concurrent VolumeDefinition creation on a given ResourceDefinition (optimistic-concurrency precondition on the RD volume-minor allocation plus the auto-numbered VD create) so the second add observes the first, and seed the new volume's GI / current-UUID before peers attach so a deterministic SyncSource is elected. A first attempt (PR #164) reduced the silent-drop and fixed a day0 split-brain leg but regressed the CSI-reachable single-late-VD path, so it was not merged; a corrected fix that does not touch the single-VD path is tracked.

**Related commits / tests**: cli-matrix cell `tests/e2e/cli-matrix/multi-volume-late-vd-create.sh` honestly catches this (as of #162); PR #164 (deferred). No closing commit yet.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The phrase honestly catches this is somewhat informal for technical documentation. Consider using a more professional term such as reliably catches this or consistently catches this to maintain a formal tone.

Comment thread docs/known-issues.md

**Recovery**: Delete the wedged or missing volume-definition (`linstor vd d <rd> <vnr>`) and re-add it sequentially — issue one `vd c`, wait for it to reach `UpToDate`, then issue the next. Always add late volume-definitions one at a time.

**Recommended fix**: Serialise concurrent VolumeDefinition creation on a given ResourceDefinition (optimistic-concurrency precondition on the RD volume-minor allocation plus the auto-numbered VD create) so the second add observes the first, and seed the new volume's GI / current-UUID before peers attach so a deterministic SyncSource is elected. A first attempt (PR #164) reduced the silent-drop and fixed a day0 split-brain leg but regressed the CSI-reachable single-late-VD path, so it was not merged; a corrected fix that does not touch the single-VD path is tracked.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For grammatical consistency with the rest of the sentence (which uses "VolumeDefinition creation"), consider changing auto-numbered VD create to auto-numbered VD creation.

Comment thread docs/known-issues.md
5. **Bug 34 Option B** (P1 follow-up) — wrap migrate-disk in the new state machine once 39/40 land.
6. **Bug 38** (P2, cosmetic STATE_INFO on shrink) — pure UX, do last.
7. **Bug 32** (P2, observation) — document only; no code fix needed.
5. **Bug 50** (P1, concurrent late `vd c`) — serialise VolumeDefinition create on the RD and seed GI before peer attach; operator-only path, recoverable, not data-loss. A corrected fix (superseding the deferred PR #164) is tracked.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve grammatical correctness and readability, consider changing VolumeDefinition create to VolumeDefinition creation (or creating VolumeDefinitions). Additionally, adding a comma in concurrent late (making it concurrent, late) would improve punctuation.

@kvaps Andrei Kvapil (kvaps) merged commit 5db10bd into main Jun 14, 2026
15 checks passed
@kvaps Andrei Kvapil (kvaps) deleted the docs/bug-048-known-issue branch June 14, 2026 23:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant