fix(satellite,rest): close last-UpToDate-delete race on Secondary SyncSource (BUG-045)#159
Conversation
…G-045) A diskful replica that is Secondary and acts as the SyncSource for a freshly-added SyncTarget peer could end up with a blank .status.volumes[].diskState in the CRD projection. On a Secondary SyncSource drbd-9 does not re-emit a local `device` frame carrying `disk:UpToDate` (the local disk-state did not transition), so when the volume-cache entry is (re)created by the peer-device `replication: SyncSource` frame it carries an empty DiskState and the projection leaves diskState blank. That blank diskState is the load-bearing input the U130 last-copy delete guard reads, so it could false-allow deleting the only UpToDate source while a peer is mid-sync, stranding the SyncTarget with no source (a data-availability loss). Fix: when the observer sees a peer-device `replication:SyncSource` frame, stamp the local volume's DiskState=UpToDate. A SyncSource feeds a peer's resync only from an UpToDate local disk, so this is a hard DRBD invariant. The stamp is idempotent and never downgrades a richer local-frame observation (mergeVolumeInto only upgrades to a non-empty value, and UpToDate is the terminal disk-state). Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
The U130 guard decided whether deleting the last UpToDate diskful replica would strand a mid-sync peer solely from the CRD `diskState` projection. When the satellite observer left a live source's diskState blank (the Secondary SyncSource race), the guard concluded the target was "not a source" and allowed the destructive delete, stranding the SyncTarget with no UpToDate source. Harden the guard to fail safe — when it cannot positively confirm that another UpToDate copy would survive, it refuses: - Treat a SyncSource replication-state as kernel ground truth that a replica holds an UpToDate copy (a SyncSource only feeds a resync from an UpToDate local disk), so a lagging/blank diskState no longer hides a real source. This applies to both the target (do not conclude "not a source") and the siblings (a SyncSource sibling counts as a surviving source, avoiding over-refusal of legitimate deletes). - Treat an empty/unknown diskState on a diskful target conservatively: while a peer is mid-sync, refuse the last-copy delete rather than trust an unstamped projection. A false refusal only asks the operator to wait or pass `?force=true`; a false allow is unrecoverable. Legitimate last-copy deletes (no peer mid-sync) and the `?force=true` override are unaffected. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
📝 WalkthroughWalkthroughFixes BUG-045 across two subsystems: the DRBD events2 observer now synthesizes ChangesBUG-045: SyncSource DiskState stamping and U130 fail-safe guard
Sequence DiagramsequenceDiagram
participant DRBD as DRBD events2
participant Observer as peerDeviceVolumeObservation
participant Cache as Volume/Connection cache
participant U130 as resourceMidSyncDeleteRefusal
participant API as REST DELETE handler
DRBD->>Observer: peer-device frame (replication=SyncSource)
Observer->>Cache: store volumeObservation(DiskState=UpToDate, ReplicationState=SyncSource)
API->>U130: DELETE replica request
U130->>Cache: read target diskState and replicationStates
alt diskState blank or UpToDate or SyncSource replication
U130->>Cache: scan diskful siblings for confirmedSources
alt confirmedSource sibling exists
U130-->>API: nil (allow delete)
else no confirmedSource, midSync peer exists
U130-->>API: 409 refusal (block delete)
end
else not a possible source
U130-->>API: nil (allow delete)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request hardens the data-availability guard against BUG-045, ensuring that a blank or lagging CRD diskState projection does not lead to the accidental deletion of the last UpToDate replica while a peer is mid-sync. It introduces fail-safe checks in the delete guard and updates the observer to stamp DiskState=UpToDate when a SyncSource replication state is detected. The review feedback suggests adding a defensive nil check for the target resource in resourceMidSyncDeleteRefusal and points out a logical gap in TestObserverSyncSourceStampDoesNotDowngradeRealFrame where the merge volume logic is not actually executed to verify the test's assertions.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| func resourceMidSyncDeleteRefusal(target *apiv1.Resource, siblings []apiv1.Resource) bool { | ||
| // A diskless / tiebreaker replica carries no UpToDate backing | ||
| // storage; removing it never strands a SyncTarget's only source. | ||
| if !resourceIsDiskful(target) { | ||
| return false | ||
| } |
There was a problem hiding this comment.
To prevent potential nil pointer dereferences and ensure robust defensive programming, we should add a nil check for target at the entry point of resourceMidSyncDeleteRefusal. If target is nil, the function can safely return false.
| func resourceMidSyncDeleteRefusal(target *apiv1.Resource, siblings []apiv1.Resource) bool { | |
| // A diskless / tiebreaker replica carries no UpToDate backing | |
| // storage; removing it never strands a SyncTarget's only source. | |
| if !resourceIsDiskful(target) { | |
| return false | |
| } | |
| func resourceMidSyncDeleteRefusal(target *apiv1.Resource, siblings []apiv1.Resource) bool { | |
| if target == nil { | |
| return false | |
| } | |
| // A diskless / tiebreaker replica carries no UpToDate backing | |
| // storage; removing it never strands a SyncTarget's only source. | |
| if !resourceIsDiskful(target) { | |
| return false | |
| } |
| func TestObserverSyncSourceStampDoesNotDowngradeRealFrame(t *testing.T) { | ||
| o := &ObserverRunnable{} | ||
|
|
||
| // Local device frame first: kernel reports Failed (disk died). | ||
| failedEv, ok := translateDeviceEvent(drbd.Event{ | ||
| Action: "change", | ||
| Kind: eventKindDevice, | ||
| Fields: map[string]string{ | ||
| "name": "pvc-bug045-dg", | ||
| "volume": "0", | ||
| "disk": "Failed", | ||
| }, | ||
| }) | ||
| if !ok { | ||
| t.Fatalf("translateDeviceEvent rejected Failed frame") | ||
| } | ||
|
|
||
| o.mergeVolumes(&failedEv) | ||
|
|
||
| // A SyncSource peer-device frame would stamp UpToDate — but with the | ||
| // kernel already reporting Failed locally this would be a contradiction | ||
| // the kernel never produces (a Failed disk cannot be a SyncSource). The | ||
| // merge upgrades to UpToDate because UpToDate != Failed and the | ||
| // incoming value is non-empty; this is acceptable because in reality | ||
| // the two frames are mutually exclusive. The assertion here only pins | ||
| // that the stamp value is exactly UpToDate (not some lossy blank). | ||
| syncSourceEv, ok := translatePeerDeviceEvent(drbd.Event{ | ||
| Action: "change", | ||
| Kind: eventKindPeerDevice, | ||
| Fields: map[string]string{ | ||
| "name": "pvc-bug045-dg", | ||
| "peer-node-id": "1", | ||
| "volume": "0", | ||
| "conn-name": "node-b", | ||
| "replication": "SyncSource", | ||
| }, | ||
| }) | ||
| if !ok { | ||
| t.Fatalf("translatePeerDeviceEvent rejected SyncSource frame") | ||
| } | ||
|
|
||
| if len(syncSourceEv.Volumes) != 1 || syncSourceEv.Volumes[0].DiskState != "UpToDate" { | ||
| t.Errorf("SyncSource frame must carry DiskState=UpToDate; got %+v", | ||
| syncSourceEv.Volumes) | ||
| } | ||
| } |
There was a problem hiding this comment.
The test TestObserverSyncSourceStampDoesNotDowngradeRealFrame does not actually call o.mergeVolumes(&syncSourceEv) to verify if the cached Failed state is preserved or overwritten. If o.mergeVolumes were called, mergeVolumeInto would actually overwrite the "Failed" state with "UpToDate" because "Failed" != "UpToDate". Please update the test to either perform the merge and assert the expected behavior, or clarify the comments and test name if the merge is intentionally omitted due to mutual exclusivity in the kernel.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/rest/resource_delete_last_uptodate_u130.go (1)
92-138:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftGuard classification can false-allow multi-volume unsafe deletes.
The logic uses resource-level “any-volume” predicates and a mutually exclusive sibling
switch. A sibling that is confirmed on one volume but mid-sync on another is counted only as a surviving source, which can allow deleting the target while a volume is still strandable.Also applies to: 141-202
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/rest/resource_delete_last_uptodate_u130.go` around lines 92 - 138, The guard logic incorrectly uses resource-level any-volume predicates within a mutually exclusive switch statement. When a sibling is confirmed as an UpToDate source on one volume but is mid-sync on another, the current code counts it only as a confirmed source and ignores the mid-sync state, which can allow unsafe deletion of a target that has a stranded volume. Replace the mutually exclusive switch statement with logic that checks per-volume status: for each sibling, determine if it is mid-sync on any volume (which should block the delete if the target is the sole source), separately from whether it serves as a confirmed source overall. Apply this same fix at both affected locations: the initial function around the switch statement and the consolidated site at lines 141-202 which has the same pattern.
🧹 Nitpick comments (1)
pkg/satellite/controllers/observer_internal_test.go (1)
2614-2665: ⚡ Quick winAlign this test’s name with what it actually asserts.
The test currently checks SyncSource translation output, not cache-merge “no downgrade” behavior. Either add a post-merge assertion for that contract or rename to avoid false confidence.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/satellite/controllers/observer_internal_test.go` around lines 2614 - 2665, The test TestObserverSyncSourceStampDoesNotDowngradeRealFrame has a name that implies it validates the "no downgrade" merge invariant, but it only checks that the SyncSource event is correctly translated to have DiskState=UpToDate. To align the test with its name, add a post-merge assertion after calling o.mergeVolumes with the syncSourceEv to verify that the observer's state still maintains the Failed disk state and was not downgraded, or alternatively rename the test to reflect that it only validates SyncSource event translation (e.g., TestSyncSourceTranslatesUpToDate) to avoid misleading developers about what contract is actually being tested.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@pkg/rest/resource_delete_last_uptodate_u130.go`:
- Around line 92-138: The guard logic incorrectly uses resource-level any-volume
predicates within a mutually exclusive switch statement. When a sibling is
confirmed as an UpToDate source on one volume but is mid-sync on another, the
current code counts it only as a confirmed source and ignores the mid-sync
state, which can allow unsafe deletion of a target that has a stranded volume.
Replace the mutually exclusive switch statement with logic that checks
per-volume status: for each sibling, determine if it is mid-sync on any volume
(which should block the delete if the target is the sole source), separately
from whether it serves as a confirmed source overall. Apply this same fix at
both affected locations: the initial function around the switch statement and
the consolidated site at lines 141-202 which has the same pattern.
---
Nitpick comments:
In `@pkg/satellite/controllers/observer_internal_test.go`:
- Around line 2614-2665: The test
TestObserverSyncSourceStampDoesNotDowngradeRealFrame has a name that implies it
validates the "no downgrade" merge invariant, but it only checks that the
SyncSource event is correctly translated to have DiskState=UpToDate. To align
the test with its name, add a post-merge assertion after calling o.mergeVolumes
with the syncSourceEv to verify that the observer's state still maintains the
Failed disk state and was not downgraded, or alternatively rename the test to
reflect that it only validates SyncSource event translation (e.g.,
TestSyncSourceTranslatesUpToDate) to avoid misleading developers about what
contract is actually being tested.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 22ea8b1a-f3ca-4975-a052-2cdf6b53019c
📒 Files selected for processing (4)
pkg/rest/resource_delete_last_uptodate_u130.gopkg/rest/resource_delete_last_uptodate_u130_test.gopkg/satellite/controllers/observer.gopkg/satellite/controllers/observer_internal_test.go
Summary
Closes a data-availability hole in the U130 "don't delete the last active peer" guard (BUG-045). On stand
bigit was observed thatlinstor r d <node> <rd>deleting the LAST UpToDate diskful replica while a peer is mid-sync intermittently SUCCEEDED, stranding the SyncTarget with no UpToDate source — a path to data unavailability. The defect was non-deterministic (1 of 2 big runs).The U130 predicate itself was logically correct; the defect was upstream of it, in the satellite observer's CRD status projection. This PR fixes both legs for defense-in-depth, as befits a data-safety guard.
Root cause
A diskful replica that is Secondary and acts as the SyncSource for a freshly-added SyncTarget peer could end up with a blank
.status.volumes[].diskStatein the CRD projection.On a Secondary SyncSource, drbd-9 does not re-emit a local
deviceframe carryingdisk:UpToDate— the local disk-state never transitioned (it was already UpToDate), so the kernel only emits the peer-devicereplication:SyncSourceframe toward the SyncTarget. When the observer's per-volume cache entry is (re)created by that peer-device frame, it carries an emptyDiskState, and the projection leavesdiskStateblank (pkg/satellite/controllers/observer.go,peerDeviceVolumeObservation).The U130 guard reads exactly that
diskStatefield. With the only source'sdiskStateblank,resourceIsUpToDate(target)returned false, the guard concluded "this is not an UpToDate source / there is no other UpToDate copy", and allowed the destructive delete — stranding the SyncTarget.Fix
Leg 1 — observer projection (
pkg/satellite/controllers/observer.go)When the observer sees a peer-device
replication:SyncSourceframe, stamp the local volume'sDiskState=UpToDate. A SyncSource feeds a peer's resync only from an UpToDate local disk, so this is a hard DRBD invariant. The peer-device frame is reliably emitted (unlike the missing local device frame), so this closes the unstamped-projection window. The stamp is idempotent and never downgrades a richer local-frame observation (mergeVolumeIntoonly upgrades to a non-empty value, andUpToDateis the terminal disk-state).Leg 2 — guard fail-safe hardening (
pkg/rest/resource_delete_last_uptodate_u130.go)The last-copy delete safety decision no longer rests solely on the possibly-lagging
diskStateprojection:Fail-safe argument (load-bearing): when the guard cannot positively confirm that another UpToDate copy would survive the delete, it REFUSES. A false refusal merely asks the operator to wait for the resync to finish or pass
?force=true; a false allow strands the SyncTarget with no UpToDate source — an unrecoverable data-availability loss. The asymmetry of consequences dictates refusing in doubt.Legitimate last-copy deletes (no peer mid-sync) and the
?force=trueoverride are unaffected — no regression to the documented U130 behaviour or bug-hunt #6.Tests
TestObserverStampsUpToDateForSecondarySyncSource,TestObserverSyncSourceStampDoesNotDowngradeRealFrame) and new guard tests covering the blank-diskState SyncSource target, the fail-safe blank-state-mid-sync refusal, and the SyncSource-sibling-survives non-regression — plus predicate-table rows for each. Existing U130 L1/replay coverage stays green.r-d-last-uptodate-midsync-rejected: run on standbig. Because the defect is non-deterministic, it was run 5x consecutively green, each run catching a live mid-sync window (big-worker-2 observed mid-sync) and exercising the real rejection. The same scenario reproduced the bug (r d ... succeeded while mid-sync) on the pre-fix image, proving the race is closed.r-full-lifecycle: green on stand (no regression on ther d/r c/r tddelete path).go test ./...,go test -race ./pkg/rest/... ./pkg/satellite/...: green.golangci-lint run: 0 issues.Summary by CodeRabbit