fix(snapshot): make consistency-group snapshot atomic via coordinated suspend barrier (BUG-046) by kvaps · Pull Request #160 · cozystack/blockstor

Andrei Kvapil (kvaps) · 2026-06-14T00:26:08Z

Summary

linstor snapshot create-multiple (consistency-group snapshot across multiple RDs) was not point-in-time atomic. The apiserver stamped a shared GroupID + SuspendIO, but the controller never coordinated phase ENTRY across siblings, so they entered SuspendIO ~15s apart (budget ≤5s). A DB's data + WAL volumes on separate RDs were captured at different instants → group restore could be inconsistent.

Root cause (two staggering points, both per-sibling)

Suspend entry: store stamped SuspendIO at Create time; apiserver creates sibling CRDs sequentially → each satellite froze the instant its own CRD landed (seconds apart).
Take/resume: Phase 1→2 take and Phase 2→3 resume flipped only self per sibling → staggered take/resume reopened the write window before all snapshots were captured.

Fix — coordinated group barrier (non-empty GroupID)

Store no longer stamps SuspendIO at Create; apiserver stamps Spec.GroupSize on every sibling.
Controller holds the group at Phase 0 until every member is observable (groupAssembled), then opens SuspendIO on EVERY sibling in one reconcile pass (suspendGroup).
take/resume/abort fan out across the whole group in one pass (flipGroup) → take-all and resume-all atomic vs the writer.
Single-snapshot (empty GroupID) path unchanged.

No-stuck-suspended safety (deadlock-class)

All-suspended-before-any-snapshot (take still gates on every node's suspend ack); refuse-rather-than-freeze (non-UpToDate group resumes without freezing); bounded suspend (existing deadline force-aborts + resumes all); abort cascades resume-all on any per-node Failed; legacy GroupSize-missing falls back to observed-siblings count (no hang).

Tests

L1 snapshot_bug_046_test.go: barrier enters all together, holds until assembled, take/resume fan-out, abort/resume-all, legacy fallback + store/apiserver pins. go test ./..., -race, lint green.
L7 replay/snap-create-multiple-group-consistency.yaml PASS on stand big.
Stand big 3×: suspend slip 0.087/0.072/0.053s (≤5s), cross-RD counter diff=0 (point-in-time proven).

Harness note

cli-matrix/snap-create-multiple-lifecycle has pre-existing harness bugs (dd oflag=direct bs=8 on a block dev zeroes the marker; Phase 5 colon CLI form) addressed separately; the product behavior was verified directly + via a working buffered writer.

…oup snapshots (BUG-046) `snapshot create-multiple` (consistency-group snapshot across multiple RDs) was not point-in-time atomic. The store stamped Spec.SuspendIO=true at Create time, and the apiserver creates the sibling CRDs sequentially, so each sibling's satellite started `drbdsetup suspend-io` the instant its own CRD landed. On a busy stand the group's volumes froze ~15s apart — far over the <=5s consistency budget — so a DB's data and WAL volumes on separate RDs were captured at different instants and a group restore could be inconsistent. Add a controller-side suspend barrier for grouped snapshots: the store no longer stamps SuspendIO=true for a grouped Snapshot (GroupID set); instead the controller holds the whole group at Phase 0 until every member is observable (new Spec.GroupSize denominator stamped by the apiserver), then opens SuspendIO=true on every sibling in a single reconcile pass so they all enter suspend within one controller cycle. This bounds the suspend-entry slip far under budget while keeping the existing all-suspended-before-any-take and resume-on-failure guarantees. No-stuck-suspended safety is preserved: the barrier checks UpToDate before freezing anything (refuse rather than suspend for an impossible snapshot), the existing suspend/take deadline and abort cascade still clear SuspendIO across the whole batch on any failure, and a legacy grouped Snapshot with no GroupSize falls back to the observed-siblings count so it can never hang on a missing denominator. The single-snapshot (Bug-351) path is unchanged — empty GroupID keeps the Create-time SuspendIO stamp and the self-only phase walks. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

L1: assert handleSnapshotCreateMulti stamps a shared GroupID + GroupSize on every batch entry, and that the store defers SuspendIO for grouped snapshots (no Create-time freeze) while keeping it for single snapshots. L7: snap-create-multiple-group-consistency replay YAML codifies the operator `snapshot create-multiple` sequence + the no-stuck-suspended / no-orphans convergence contract for a two-RD consistency group. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

…end entry (BUG-046) Stand validation surfaced a second staggering point: with only the suspend entry group-coordinated, the Phase 1->2 take promotion and the Phase 2->3 resume still fired per-sibling at staggered reconcile times (~15s apart observed), because each sibling flipped only its own Spec.TakeSnapshot when the controller happened to reconcile it. A sibling that resumed I/O while another was still mid-take reopened the write window before every snapshot was captured. Fan every grouped phase transition out across the whole batch in one reconcile pass (flipGroup): take-all together, resume-all together. The suspend barrier's suspendGroup is now the SuspendIO=true special case of the same helper. Single-snapshot snapshots still flip self only. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

coderabbitai · 2026-06-14T00:26:21Z

Warning

Review limit reached

@kvaps, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 36 minutes and 35 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4a9a4d3d-f11c-48f9-af6d-f0b07da6da09

📥 Commits

Reviewing files that changed from the base of the PR and between 2b9b648 and 4c4fc14.

📒 Files selected for processing (12)

api/v1alpha1/snapshot_types.go
config/crd/bases/blockstor.cozystack.io_snapshots.yaml
internal/controller/snapshot_bug_046_test.go
internal/controller/snapshot_controller.go
internal/controller/snapshot_timeout.go
pkg/api/v1/snapshot.go
pkg/rest/snapshot_multi.go
pkg/rest/snapshot_multi_test.go
pkg/store/k8s/snapshots.go
pkg/store/k8s/snapshots_internal_test.go
pkg/store/k8s/typed_field_carry_across_test.go
tests/operator-harness/replay/snap-create-multiple-group-consistency.yaml

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/bug-046-consistency-group-snapshot-atomic

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces a coordinated suspend barrier for grouped snapshots to ensure point-in-time consistency across multiple volume resources, addressing Bug 046 and Bug-353. It adds a GroupSize field to track batch size and ensures that all siblings in a group enter the suspend, take, and resume phases together in a single reconcile pass. Feedback on these changes highlights two key areas for improvement: first, implementing an assembly timeout check to prevent an infinite requeue loop if a group is never fully assembled; second, optimizing the replica status check to avoid redundant API list calls by pre-fetching resources.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-14T00:27:51Z

+	if !groupAssembled(snap, siblings) {
+		logger.V(1).Info("snapshot group still assembling; holding suspend barrier",
+			"group_id", snap.Spec.GroupID,
+			"observed", len(siblings), "expected", snap.Spec.GroupSize)
+
+		return ctrl.Result{RequeueAfter: groupAssembleRequeueAfter}, nil
+	}


If one or more siblings in a grouped snapshot batch fail to be created (e.g., due to a validation error or store failure in the API server), the group will never be fully assembled. Because SuspendIO is false during assembly, the group is not considered to be in the suspend phase, meaning the 2-minute snapshotSuspendDeadline is never armed. This causes the remaining siblings to hang in Phase 0 forever, continuously requeuing every 1 second via groupAssembleRequeueAfter. Adding an assembly timeout check using suspendDeadlineExceeded prevents this infinite loop and safely aborts the incomplete group.

if !groupAssembled(snap, siblings) { if suspendDeadlineExceeded(snap, time.Now()) { logger.Info("aborting snapshot group: assembly deadline exceeded", "group_id", snap.Spec.GroupID, "observed", len(siblings), "expected", snap.Spec.GroupSize) return r.abortGroupWithReason(ctx, siblings, "snapshot aborted: group assembly timed out; not all siblings were created") } logger.V(1).Info("snapshot group still assembling; holding suspend barrier", "group_id", snap.Spec.GroupID, "observed", len(siblings), "expected", snap.Spec.GroupSize) return ctrl.Result{RequeueAfter: groupAssembleRequeueAfter}, nil }

gemini-code-assist · 2026-06-14T00:27:51Z

+		return ctrl.Result{RequeueAfter: groupAssembleRequeueAfter}, nil
+	}
+
+	ok, offending, err := r.allTargetedReplicasUpToDate(ctx, siblings)


Calling allTargetedReplicasUpToDate results in listing all Resource objects in the cluster for each sibling in the group. Since siblingReplicasUpToDate performs a r.List(ctx, &resList) call internally, this leads to $O(N)$ API list calls per reconcile pass (where $N$ is the number of siblings). To optimize this and avoid unnecessary API server load, consider listing the resources once in allTargetedReplicasUpToDate and passing the pre-fetched list down to siblingReplicasUpToDate.

Andrei Kvapil (kvaps) and others added 3 commits June 14, 2026 02:36

gemini-code-assist Bot reviewed Jun 14, 2026

View reviewed changes

Andrei Kvapil (kvaps) merged commit d1d583f into main Jun 14, 2026
15 checks passed

Andrei Kvapil (kvaps) deleted the fix/bug-046-consistency-group-snapshot-atomic branch June 14, 2026 01:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(snapshot): make consistency-group snapshot atomic via coordinated suspend barrier (BUG-046)#160

fix(snapshot): make consistency-group snapshot atomic via coordinated suspend barrier (BUG-046)#160
Andrei Kvapil (kvaps) merged 3 commits into
mainfrom
fix/bug-046-consistency-group-snapshot-atomic

Andrei Kvapil (kvaps) commented Jun 14, 2026

Uh oh!

coderabbitai Bot commented Jun 14, 2026

Review limit reached

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 14, 2026

Uh oh!

gemini-code-assist Bot Jun 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Andrei Kvapil (kvaps) commented Jun 14, 2026

Summary

Root cause (two staggering points, both per-sibling)

Fix — coordinated group barrier (non-empty GroupID)

No-stuck-suspended safety (deadlock-class)

Tests

Harness note

Uh oh!

coderabbitai Bot commented Jun 14, 2026

Review limit reached

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant