docs(known-issues): document BUG-048 concurrent late vd-create limitation#165
Conversation
…tion The release gate flagged that this known operator-reachable availability limitation shipped with no entry in the product tree (tracked only in internal campaign state). Document it as Bug 50 so operators know not to issue rapid back-to-back vd-create against an existing multi-replica RD and how to recover. Documentation only; no code or data-path change. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds a new "Bug 50" entry to ChangesBug 50 Known-Issue Documentation
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request documents Bug 50 in docs/known-issues.md, detailing a concurrency issue when rapidly creating volume definitions, and updates the recommended next-fix order. The review comments suggest minor stylistic and grammatical improvements to ensure a formal tone and consistent terminology throughout the documentation.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
|
||
| **Recommended fix**: Serialise concurrent VolumeDefinition creation on a given ResourceDefinition (optimistic-concurrency precondition on the RD volume-minor allocation plus the auto-numbered VD create) so the second add observes the first, and seed the new volume's GI / current-UUID before peers attach so a deterministic SyncSource is elected. A first attempt (PR #164) reduced the silent-drop and fixed a day0 split-brain leg but regressed the CSI-reachable single-late-VD path, so it was not merged; a corrected fix that does not touch the single-VD path is tracked. | ||
|
|
||
| **Related commits / tests**: cli-matrix cell `tests/e2e/cli-matrix/multi-volume-late-vd-create.sh` honestly catches this (as of #162); PR #164 (deferred). No closing commit yet. |
|
|
||
| **Recovery**: Delete the wedged or missing volume-definition (`linstor vd d <rd> <vnr>`) and re-add it sequentially — issue one `vd c`, wait for it to reach `UpToDate`, then issue the next. Always add late volume-definitions one at a time. | ||
|
|
||
| **Recommended fix**: Serialise concurrent VolumeDefinition creation on a given ResourceDefinition (optimistic-concurrency precondition on the RD volume-minor allocation plus the auto-numbered VD create) so the second add observes the first, and seed the new volume's GI / current-UUID before peers attach so a deterministic SyncSource is elected. A first attempt (PR #164) reduced the silent-drop and fixed a day0 split-brain leg but regressed the CSI-reachable single-late-VD path, so it was not merged; a corrected fix that does not touch the single-VD path is tracked. |
| 5. **Bug 34 Option B** (P1 follow-up) — wrap migrate-disk in the new state machine once 39/40 land. | ||
| 6. **Bug 38** (P2, cosmetic STATE_INFO on shrink) — pure UX, do last. | ||
| 7. **Bug 32** (P2, observation) — document only; no code fix needed. | ||
| 5. **Bug 50** (P1, concurrent late `vd c`) — serialise VolumeDefinition create on the RD and seed GI before peer attach; operator-only path, recoverable, not data-loss. A corrected fix (superseding the deferred PR #164) is tracked. |
There was a problem hiding this comment.
What
Document the concurrent / rapid manual late volume-definition creation limitation (campaign tracking id BUG-048) as "Bug 50" in
docs/known-issues.md, and add it to the recommended next-fix order.Why
The release gate flagged that this known, operator-reachable availability limitation shipped with no entry in the product tree. Operators need to know not to issue rapid back-to-back
linstor vd cagainst an existing multi-replica RD, and how to recover if they do.Scope
Documentation only — no code, no test, no data-path change. The limitation is not reachable through the production CSI path (one PVC → one RD → one VD), is availability-only (no data loss, no node-reboot deadlock), and is recoverable by deleting and re-adding the volume-definition sequentially. A corrected code fix (superseding the deferred PR #164) is tracked separately.
Summary by CodeRabbit