From 822668c2d93f790c8c6471a2d6164eeda98a21db Mon Sep 17 00:00:00 2001 From: Andrei Kvapil Date: Sun, 14 Jun 2026 08:35:46 +0300 Subject: [PATCH 1/6] test(cli-matrix): fix bug-278 to exercise the real SkipDisk auto-clear contract MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The cell stamped DrbdOptions/SkipDisk=True on a HEALTHY UpToDate replica and read the prop back as a setup precondition. But the product's SkipDiskClearer correctly auto-clears the observer-owned stamp on every reconcile of a healthy slot (HasDisklessVolume==false), so the satellite wiped the stamp between the SSA apply and the read-back — the precondition raced the product's correct behaviour and aborted with "stamp did not land". The stamped state on a healthy disk is transient by design. Restructure into a two-sided contract: A. Persistence (positive control): stamp on a genuinely-Diskless replica, where the auto-clear gate is intentionally closed, and assert the stamp survives — proving the clearer is selective. B. Auto-clear: stamp on a healthy replica, restart its satellite (the Talos reattach shape), and poll for the stamp's ABSENCE (the steady state) plus disk UpToDate, instead of reading back the doomed stamp. No product behaviour changed. Validated on stand "big". Co-Authored-By: Claude Signed-off-by: Andrei Kvapil --- ...g-278-skipdisk-autoclear-after-reattach.sh | 197 +++++++++++------- 1 file changed, 127 insertions(+), 70 deletions(-) diff --git a/tests/e2e/cli-matrix/bug-278-skipdisk-autoclear-after-reattach.sh b/tests/e2e/cli-matrix/bug-278-skipdisk-autoclear-after-reattach.sh index 7d64d589..7f6c8458 100755 --- a/tests/e2e/cli-matrix/bug-278-skipdisk-autoclear-after-reattach.sh +++ b/tests/e2e/cli-matrix/bug-278-skipdisk-autoclear-after-reattach.sh @@ -11,29 +11,53 @@ # `drbdadm adjust --skip-disk` and the local volume stayed Diskless # even though the kernel was healthy. # -# Bug 278 fix: when the reconciler sees `SkipDisk=True` AND the kernel -# reports the local volume as non-Diskless (UpToDate / Inconsistent / -# Outdated), the satellite releases the observer's SSA claim on the -# SkipDisk key via SkipDiskClearer. The next dispatcher cycle -# re-resolves Spec.Props without SkipDisk, the FSM transitions -# PhaseSkipDisk → PhaseRunning, and the next reconcile dispatches plain -# `drbdadm adjust` to re-attach the lower disk. +# Bug 278 fix (pkg/satellite/reconciler.go runAdjust + SkipDiskClearer): +# on EVERY reconcile, when the reconciler sees the observer-owned +# `SkipDisk=True` stamp AND the kernel probe reports the local volume +# as non-Diskless (HasDisklessVolume==false — UpToDate / Inconsistent / +# Outdated, backing storage attached), the satellite releases the +# observer's SSA claim on the SkipDisk key via SkipDiskClearer. The next +# dispatcher cycle re-resolves Spec.Props without SkipDisk, the FSM +# transitions PhaseSkipDisk → PhaseRunning, and the next reconcile +# dispatches plain `drbdadm adjust` to keep the lower disk attached. # -# Contract this cell pins: +# Two-sided contract this cell pins (BUG-046 triage corrected the +# previous revision — see below): # -# 1. Steady-state: 2-replica diskful RD, both UpToDate. -# 2. Stamp SkipDisk=True onto Resource.Spec.Props on $N2 (simulates -# the observer's defensive write on a transient Failed event the -# kernel emits at Talos upgrade time). -# 3. Restart the satellite pod on $N2 (simulates Talos kernel -# restart). The pod reattaches, sees the SkipDisk prop, but -# kernel state on $N2 is healthy (disk:UpToDate, backing_dev -# present — the lower LV survived the OS upgrade). -# 4. Within 60s assert: Spec.Props NO LONGER contains DrbdOptions/SkipDisk -# AND Status.volumes[0].diskState is back to UpToDate. +# A. PERSISTENCE (positive control). On a replica whose kernel volume +# really is Diskless (HasDisklessVolume==true), the auto-clear gate +# is INTENTIONALLY closed: a defensive SkipDisk stamp there MUST +# survive — that is the exact state SkipDisk is meant to guard. This +# proves the clearer is selective (it does not blindly wipe every +# stamp) and that the stamp wiring works at all. # -# Pre-fix: SkipDisk stays pinned forever; the cell would FAIL at -# step 4 because Spec.Props still carries the key after 60s. +# B. AUTO-CLEAR. On a HEALTHY (UpToDate) replica the satellite auto- +# clears the observer-owned defensive stamp, so the volume stays +# attached after a Talos-upgrade reattach. We restart the satellite +# pod on that node (the documented Talos kernel-restart shape), and +# assert the stamp is gone and the disk is UpToDate. +# +# Why the previous revision could never pass (BUG-046 triage): it stamped +# SkipDisk on a HEALTHY UpToDate replica and then read the prop BACK as a +# setup precondition (`assert stamped == "True"`). But the auto-clear +# fires on EVERY reconcile of a healthy slot, so the satellite cleared +# the stamp between the SSA apply and the read-back — the precondition +# raced the product's correct behaviour and aborted with "stamp did not +# land". The stamped-state on a healthy disk is transient BY DESIGN; the +# cell now asserts the auto-clear directly (poll for ABSENCE, the steady +# state) and uses the genuinely-persistent Diskless replica as the +# positive control instead of a brittle read-back of a doomed stamp. +# +# The SSA stamp is applied under the OBSERVER's field manager +# (`blockstor-satellite-skipdisk`, pkg/satellite/controllers/observer.go) +# so it carries the exact SSA ownership the real defensive write has. +# The auto-clear RELEASES that owner's claim — that is the whole product +# contract: the satellite un-stamps only its own defensive writes, while +# an operator-set SkipDisk (different field manager) would survive. The +# apply doc carries the two +required immutable scalars +# (resourceDefinitionName / nodeName) — same SSA-validation rule the +# observer's writeSkipDiskProp follows; --force-conflicts mirrors the +# observer's ForceOwnership. set -euo pipefail @@ -44,7 +68,7 @@ SCRIPT_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) # shellcheck source=lib.sh source "$SCRIPT_DIR/lib.sh" -require_workers 2 +require_workers 3 linstor_cli_setup @@ -59,56 +83,90 @@ trap cleanup EXIT N1=$WORKER_1 N2=$WORKER_2 - -echo ">> [Bug 278] 2-replica diskful RD on $N1+$N2" -"${LCTL[@]}" resource-definition create "$RD" >/dev/null -"${LCTL[@]}" volume-definition create "$RD" 128M >/dev/null -"${LCTL[@]}" resource create "$N1" "$RD" --storage-pool=stand >/dev/null -"${LCTL[@]}" resource create "$N2" "$RD" --storage-pool=stand >/dev/null - -echo ">> wait for both diskful UpToDate" -RD="$RD" wait_uptodate "$RD" "$N1" "$N2" - -echo ">> stamp DrbdOptions/SkipDisk=True onto $N2 (simulates pre-upgrade defensive stamp)" -# Server-side-apply under the OBSERVER's field manager -# (`blockstor-satellite-skipdisk`, pkg/satellite/controllers/observer.go) -# so the stamp carries the exact SSA ownership the real defensive write -# has. The Bug 278 auto-clear RELEASES that owner's claim — that is the -# whole product contract: the satellite un-stamps only its own -# defensive writes, while an operator-set SkipDisk (different field -# manager) survives the reattach. The previous revision of this cell -# stamped via `kubectl patch --type=merge` (field manager -# "kubectl-patch"), which the release path correctly refuses to clear — -# the cell was asserting the opposite of the product contract and could -# never pass (BUG-040 triage). -# -# The apply doc must carry the two +required immutable scalars -# (resourceDefinitionName / nodeName) — same SSA-validation rule the -# observer's writeSkipDiskProp follows. -# --force-conflicts mirrors the observer's ForceOwnership: the apply -# doc must claim the two required scalars even though the controller's -# field manager owns them. -kubectl apply --server-side --force-conflicts --field-manager=blockstor-satellite-skipdisk -f - < — SSA-apply DrbdOptions/SkipDisk=True onto +# (RD, NODE).Spec.Props under the observer's field manager, mirroring +# the real defensive stamp's ownership shape. +stamp_skipdisk() { + local node=$1 + kubectl apply --server-side --force-conflicts \ + --field-manager=blockstor-satellite-skipdisk -f - </dev/null apiVersion: blockstor.cozystack.io/v1alpha1 kind: Resource metadata: - name: ${RD}.${N2} + name: ${RD}.${node} spec: resourceDefinitionName: ${RD} - nodeName: ${N2} + nodeName: ${node} props: DrbdOptions/SkipDisk: "True" EOF +} + +# skipdisk_prop — echo the current SkipDisk prop value on +# (RD, NODE).Spec.Props ("True" or empty). +skipdisk_prop() { + local node=$1 + kubectl get "resources.blockstor.cozystack.io/${RD}.${node}" \ + -o jsonpath='{.spec.props.DrbdOptions/SkipDisk}' 2>/dev/null || echo "" +} -echo ">> confirm SkipDisk is stamped on $N2 Spec.Props" -stamped=$(kubectl get "resources.blockstor.cozystack.io/${RD}.${N2}" \ - -o jsonpath='{.spec.props.DrbdOptions/SkipDisk}' 2>/dev/null || echo "") -if [[ "$stamped" != "True" ]]; then - echo "FAIL (Bug 278 setup): SkipDisk stamp did not land (got '$stamped'); aborting" >&2 +echo ">> [Bug 278] 2-replica diskful RD on $N1+$N2, diskless replica on $N3" +"${LCTL[@]}" resource-definition create "$RD" >/dev/null +"${LCTL[@]}" volume-definition create "$RD" 128M >/dev/null +"${LCTL[@]}" resource create "$N1" "$RD" --storage-pool=stand >/dev/null +"${LCTL[@]}" resource create "$N2" "$RD" --storage-pool=stand >/dev/null +"${LCTL[@]}" resource create "$N3" "$RD" --diskless >/dev/null + +echo ">> wait for both diskful UpToDate ($N1, $N2)" +wait_uptodate "$RD" "$N1" "$N2" + +echo ">> wait for $N3 to settle Diskless" +if ! wait_status_diskless "$RD" "$N3" 60; then + echo "FAIL (Bug 278 setup): $N3 never converged to Diskless; aborting" >&2 + exit 1 +fi + +# ---- Part A: PERSISTENCE on a genuinely-Diskless replica ----------------- +# +# HasDisklessVolume==true on $N3, so the auto-clear gate is closed: the +# defensive stamp MUST survive there. This is the positive control that +# proves the clearer is selective (it does not blindly wipe every stamp). +echo ">> [Bug 278/A] stamp SkipDisk=True onto the DISKLESS $N3 (must persist)" +stamp_skipdisk "$N3" + +echo ">> [Bug 278/A] confirm SkipDisk persists on Diskless $N3 (15s observation)" +# Poll for a sustained 15s: the stamp must stay pinned, not flicker away. +persist_ok=true +for _ in $(seq 1 8); do + val=$(skipdisk_prop "$N3") + if [[ "$val" != "True" ]]; then + persist_ok=false + break + fi + sleep 2 +done +if [[ "$persist_ok" != "true" ]]; then + echo "FAIL (Bug 278/A regression): SkipDisk was cleared on the Diskless $N3 — the clearer is over-eager (must only clear on a HEALTHY slot)" >&2 + kubectl get "resources.blockstor.cozystack.io/${RD}.${N3}" \ + -o json 2>/dev/null | jq '{props: .spec.props, status: .status}' >&2 || true exit 1 fi -echo ">> restart satellite pod on $N2 (simulates Talos kernel upgrade reattach)" +# ---- Part B: AUTO-CLEAR on a HEALTHY replica after reattach -------------- +# +# Stamp the same defensive prop onto the HEALTHY $N2. Restart its +# satellite pod (the documented Talos kernel-upgrade reattach shape), +# then assert the satellite auto-clears the observer-owned stamp and the +# disk stays UpToDate. We poll for ABSENCE — the steady state the +# product converges to — rather than the transient stamped state (which +# the clearer correctly wipes on every reconcile of a healthy slot, and +# which the previous revision wrongly tried to read back as a precondition). +echo ">> [Bug 278/B] stamp SkipDisk=True onto the HEALTHY $N2 (simulates pre-upgrade defensive stamp)" +stamp_skipdisk "$N2" + +echo ">> [Bug 278/B] restart satellite pod on $N2 (simulates Talos kernel upgrade reattach)" sat_pod=$(kubectl -n "$NS" get pods -l app=blockstor-satellite \ --field-selector "spec.nodeName=$N2" \ -o jsonpath='{.items[0].metadata.name}' 2>/dev/null || echo "") @@ -119,6 +177,7 @@ fi kubectl -n "$NS" delete pod "$sat_pod" --wait=true >/dev/null echo ">> wait for satellite back up on $N2" +new_pod="" for _ in $(seq 1 60); do new_pod=$(kubectl -n "$NS" get pods -l app=blockstor-satellite \ --field-selector "spec.nodeName=$N2,status.phase=Running" \ @@ -133,17 +192,15 @@ if [[ -z "$new_pod" || "$new_pod" == "$sat_pod" ]]; then exit 1 fi -echo ">> wait up to 60s for the satellite to auto-clear SkipDisk on $N2" +echo ">> [Bug 278/B] wait up to 60s for the satellite to auto-clear SkipDisk on $N2" # Poll Resource.Spec.Props.DrbdOptions/SkipDisk — Bug 278 contract: # the satellite reconciler probes kernel state, sees healthy -# (disk:UpToDate, backing_dev set), and releases the observer's SSA -# claim on the SkipDisk key. After SSA release the apiserver removes -# the key from Spec.Props (no other owner claims it). +# (HasDisklessVolume==false), and releases the observer's SSA claim on +# the SkipDisk key. After SSA release the apiserver removes the key from +# Spec.Props (no other owner claims it). cleared=false for _ in $(seq 1 30); do - val=$(kubectl get "resources.blockstor.cozystack.io/${RD}.${N2}" \ - -o jsonpath='{.spec.props.DrbdOptions/SkipDisk}' 2>/dev/null || echo "") - if [[ -z "$val" ]]; then + if [[ -z "$(skipdisk_prop "$N2")" ]]; then cleared=true break fi @@ -151,15 +208,15 @@ for _ in $(seq 1 30); do done if [[ "$cleared" != "true" ]]; then - echo "FAIL (Bug 278): SkipDisk did NOT auto-clear on $N2 within 60s after satellite restart" >&2 + echo "FAIL (Bug 278): SkipDisk did NOT auto-clear on the HEALTHY $N2 within 60s after satellite restart" >&2 kubectl get "resources.blockstor.cozystack.io/${RD}.${N2}" \ -o json 2>/dev/null | jq '{props: .spec.props, status: .status}' >&2 || true exit 1 fi -echo ">> confirm $N2 disk state is back to UpToDate (re-attached after clear)" +echo ">> [Bug 278/B] confirm $N2 disk state is UpToDate (stayed attached after clear)" if ! wait_status_state "$RD" "$N2" "UpToDate" 60 0; then - echo "FAIL (Bug 278 deep): SkipDisk cleared but $N2 did not re-attach to UpToDate within 60s" >&2 + echo "FAIL (Bug 278 deep): SkipDisk cleared but $N2 did not hold UpToDate within 60s" >&2 kubectl get "resources.blockstor.cozystack.io/${RD}.${N2}" \ -o json 2>/dev/null | jq '{props: .spec.props, status: .status}' >&2 || true exit 1 @@ -175,4 +232,4 @@ if [[ "$n1_disk" != "UpToDate" ]]; then exit 1 fi -echo ">> bug-278-skipdisk-autoclear-after-reattach OK (auto-clear fires on healthy reattach, $N2 back to UpToDate)" +echo ">> bug-278-skipdisk-autoclear-after-reattach OK (A: stamp persists on Diskless $N3; B: auto-clear fires on healthy $N2 reattach, $N2 stays UpToDate)" From 42f440049bb98b50353164afcabe8af1057ca8ac Mon Sep 17 00:00:00 2001 From: Andrei Kvapil Date: Sun, 14 Jun 2026 08:38:56 +0300 Subject: [PATCH 2/6] test(cli-matrix): fix SIGPIPE (exit 141) in drbd-options-hierarchy wait loop MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The cell picked a diskful node via `linstor_diskful_nodes | head -1`: head closed the pipe after the first node while the helper's internal `while read` loop was still emitting the rest, so under `set -o pipefail` the pipeline exited 141 and `set -e` aborted the cell before the C1/C2 assertions ran. The replay twin of this scenario passes, confirming the product is correct. Consume the helper via mapfile over a process substitution (no pipe) — the convention every other cell follows — and use the existing wait_diskful_count helper for the convergence wait. Also harden show_max_buffers: capture drbdsetup output into a variable before the awk `... exit`, so the early pipe close can never SIGPIPE the writer. Validated on stand "big". Co-Authored-By: Claude Signed-off-by: Andrei Kvapil --- .../drbd-options-hierarchy-controller.sh | 41 ++++++++++++------- 1 file changed, 26 insertions(+), 15 deletions(-) diff --git a/tests/e2e/cli-matrix/drbd-options-hierarchy-controller.sh b/tests/e2e/cli-matrix/drbd-options-hierarchy-controller.sh index 293ee843..f55e8352 100755 --- a/tests/e2e/cli-matrix/drbd-options-hierarchy-controller.sh +++ b/tests/e2e/cli-matrix/drbd-options-hierarchy-controller.sh @@ -66,10 +66,17 @@ fi # show_max_buffers — echo the effective max-buffers for $RD as # the running kernel sees it. `drbdsetup show` dumps the live config in # the same `net { max-buffers ; }` shape the .res carries. +# +# Capture drbdsetup's output into a variable FIRST, then awk the string. +# Piping `drbdsetup show | awk '... exit'` closed the pipe on the first +# match while drbdsetup was still writing → SIGPIPE on the writer, which +# under `set -o pipefail` surfaced as exit 141 from the command +# substitution in wait_max_buffers and aborted the cell. The here-string +# has no writer to signal, so the early awk `exit` is harmless. show_max_buffers() { - local node=$1 - on_node "$node" drbdsetup show "$RD" 2>/dev/null \ - | awk '/max-buffers/ { gsub(/[;]/,""); print $2; exit }' + local node=$1 out + out=$(on_node "$node" drbdsetup show "$RD" 2>/dev/null) || out="" + awk '/max-buffers/ { gsub(/[;]/,""); print $2; exit }' <<<"$out" } # wait_max_buffers — poll the kernel config until the @@ -99,18 +106,22 @@ echo ">> rd c + vd c + r c (--auto-place=2 -s $POOL)" "${LCTL[@]}" resource create --auto-place=2 --storage-pool="$POOL" "$RD" >/dev/null echo ">> wait for 2 diskful replicas (auto-tiebreaker may add a 3rd, diskless row)" -deadline=$(( $(date +%s) + 60 )) -while :; do - diskful_n=$(linstor_diskful_nodes "$RD" | grep -c . || true) - [[ "$diskful_n" == "2" ]] && break - if (( $(date +%s) > deadline )); then - die "never reached 2 diskful replicas (last=$diskful_n)" - fi - sleep 2 -done - -# Pick a diskful node to probe the kernel config on. -node=$(linstor_diskful_nodes "$RD" | head -1) +# Use wait_diskful_count (DISKLESS/TIE_BREAKER rows excluded) — same +# helper the lifecycle catchers use. The previous `linstor_diskful_nodes +# | grep -c .` / `| head -1` forms raced SIGPIPE under `set -o pipefail`: +# `head -1` closes the pipe after the first node while the helper's +# internal `while read` loop is still emitting the rest, so the pipeline +# exited 141 and `set -e` aborted the cell before C1 ever ran. Consume +# the helper via mapfile/process-substitution (no pipe) — the project +# convention every other cell already follows. +if ! wait_diskful_count "$RD" 2 60; then + die "never reached 2 diskful replicas" +fi + +# Pick a diskful node to probe the kernel config on. mapfile over a +# process substitution — no pipe, so no SIGPIPE under pipefail. +mapfile -t diskful_nodes < <(linstor_diskful_nodes "$RD") +node=${diskful_nodes[0]:-} [[ -n "$node" ]] || die "no diskful node for $RD" echo ">> [C1] assert controller max-buffers ($CTRL_MB) inherited into $RD on $node" From 994a4ec67ad74703805ede7eddb507ab86134ebe Mon Sep 17 00:00:00 2001 From: Andrei Kvapil Date: Sun, 14 Jun 2026 08:59:08 +0300 Subject: [PATCH 3/6] test(replay): fix snap create-multiple CLI form so BUG-046 L7 coverage runs The replay called `snapshot create-multiple` with positional colon args (`{{rd}}:grp bug046-multi-b:grp`), which linstor-client 1.27.1 rejects with argparse exit 2 (`-r`/--resource_names is required). The step never ran, so BUG-046's L7 group-consistency coverage was dead. Use the correct CLI form: the snapshot name is a single positional and the resource names go to `-r`. The snapshot name must come BEFORE `-r`, otherwise argparse's variadic `-r A B grp` greedily consumes `grp` and leaves no positional ("too few arguments"). Mirrors the multi-resource form the cli-matrix cell drives. Validated PASS on stand "big". Co-Authored-By: Claude Signed-off-by: Andrei Kvapil --- .../snap-create-multiple-group-consistency.yaml | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/tests/operator-harness/replay/snap-create-multiple-group-consistency.yaml b/tests/operator-harness/replay/snap-create-multiple-group-consistency.yaml index aebff85e..0a169442 100644 --- a/tests/operator-harness/replay/snap-create-multiple-group-consistency.yaml +++ b/tests/operator-harness/replay/snap-create-multiple-group-consistency.yaml @@ -76,8 +76,17 @@ steps: timeout_s: 240 # ---- Bug-046: group snapshot — single suspend barrier, all-or-nothing -- + # CLI form (linstor-client 1.27.1): the snapshot name is a single + # POSITIONAL and the resource names go to `-r`/--resource_names (one + # shared snapshot name across all listed RDs). The snapshot name MUST + # come before `-r` — argparse's variadic `-r A B grp` would greedily + # eat `grp` into --resource_names and leave no positional ("too few + # arguments", exit 2). The previous positional colon form + # (`{{rd}}:grp bug046-multi-b:grp`) is not accepted at all and aborted + # with argparse exit 2, so this step (and thus BUG-046's L7 coverage) + # never ran. - name: snapshot-create-multiple - cmd: ["snapshot", "create-multiple", "{{rd}}:grp", "bug046-multi-b:grp"] + cmd: ["snapshot", "create-multiple", "grp", "-r", "{{rd}}", "bug046-multi-b"] expect_exit: 0 - name: both-snapshots-visible cmd: ["snapshot", "list"] From 16c511fc3439cd3e18203854733ae848a9309d6d Mon Sep 17 00:00:00 2001 From: Andrei Kvapil Date: Sun, 14 Jun 2026 08:59:08 +0300 Subject: [PATCH 4/6] test(burnin): guard md5 read against transient dd failures to kill false alarms The md5-consistency check compared primary vs peer md5 without checking dd's exit code. Under churn dd can fail to open /dev/drbdN (EAGAIN) and read zero bytes; md5("") is a fixed digest, so the unguarded read raised a FALSE mismatch alarm that would drown out a real future divergence. Guard each read with dd's exit status (PIPESTATUS) and the byte count (must equal the full 1 MiB); the md5 is still piped straight from dd into md5sum so binary NUL bytes are never routed through a shell variable. On a transient read failure the iteration emits a sentinel and its compare is SKIPPED (neither PASS nor FAIL) rather than reported as a mismatch. Validated on stand "big" (12/12 iterations pass, no false alarms). Co-Authored-By: Claude Signed-off-by: Andrei Kvapil --- tests/burnin-blockstor.sh | 32 +++++++++++++++++++++++++++++--- 1 file changed, 29 insertions(+), 3 deletions(-) diff --git a/tests/burnin-blockstor.sh b/tests/burnin-blockstor.sh index 45bff07b..a689d6fe 100755 --- a/tests/burnin-blockstor.sh +++ b/tests/burnin-blockstor.sh @@ -135,20 +135,46 @@ EOF fi DEV=$(on_node "$PRIMARY" bash -c "grep -oE '/dev/drbd[0-9]+' /etc/drbd.d/${RD}.res | head -1") + + # Write 1 MiB urandom on PRIMARY and capture its md5, then read it + # back on PEER and compare. Each remote read GUARDS dd's exit code + # (via PIPESTATUS) AND the byte count: under churn `dd` can fail to + # open /dev/drbdN (EAGAIN, device transiently busy) and read ZERO + # bytes; md5("") is a fixed digest, so an unguarded read produces a + # FALSE mismatch alarm that would drown out a real future divergence. + # The md5 is computed by piping dd STRAIGHT into md5sum (never via a + # shell variable — command substitution strips NUL bytes, which a + # binary 1 MiB read is full of). On a guarded read failure the snippet + # emits the sentinel "READFAIL"; the iteration's compare is then + # SKIPPED (not counted as a real FAIL) and re-tried next iteration. + # `bs=1M count=1` reads exactly 1048576 bytes on success. + EXPECT_BYTES=1048576 PRIMARY_MD5=$(on_node "$PRIMARY" bash -c " drbdadm primary ${RD} dd if=/dev/urandom of=${DEV} bs=1M count=1 status=none oflag=direct - dd if=${DEV} bs=1M count=1 status=none iflag=direct | md5sum | awk '{print \$1}' + md5=\$(dd if=${DEV} bs=1M count=1 iflag=direct 2>/tmp/burnin-dd-primary.err | md5sum | awk '{print \$1}') + rc=\${PIPESTATUS[0]} + n=\$(awk '/bytes/ {print \$1; exit}' /tmp/burnin-dd-primary.err) drbdadm secondary ${RD} + if [ \"\$rc\" -ne 0 ] || [ \"\$n\" != \"${EXPECT_BYTES}\" ]; then echo 'READFAIL'; else echo \"\$md5\"; fi " | tail -1) PEER_MD5=$(on_node "$PEER" bash -c " drbdadm primary ${RD} - dd if=${DEV} bs=1M count=1 status=none iflag=direct | md5sum | awk '{print \$1}' + md5=\$(dd if=${DEV} bs=1M count=1 iflag=direct 2>/tmp/burnin-dd-peer.err | md5sum | awk '{print \$1}') + rc=\${PIPESTATUS[0]} + n=\$(awk '/bytes/ {print \$1; exit}' /tmp/burnin-dd-peer.err) drbdadm secondary ${RD} + if [ \"\$rc\" -ne 0 ] || [ \"\$n\" != \"${EXPECT_BYTES}\" ]; then echo 'READFAIL'; else echo \"\$md5\"; fi " | tail -1) - if [[ "$PRIMARY_MD5" == "$PEER_MD5" ]]; then + if [[ "$PRIMARY_MD5" == "READFAIL" || "$PEER_MD5" == "READFAIL" \ + || -z "$PRIMARY_MD5" || -z "$PEER_MD5" ]]; then + # Transient read failure on at least one side — neither PASS nor + # FAIL. A real mismatch is only credible when BOTH reads succeeded + # and returned the full 1 MiB. + echo "[$(date -u +%FT%TZ)] iter=$ITER SKIP: transient dd read failure (primary='$PRIMARY_MD5' peer='$PEER_MD5'); not comparing" + elif [[ "$PRIMARY_MD5" == "$PEER_MD5" ]]; then PASS=$((PASS + 1)) else FAIL=$((FAIL + 1)) From cfd36f08ef6075e733de0e10eb247a49155cac41 Mon Sep 17 00:00:00 2001 From: Andrei Kvapil Date: Sun, 14 Jun 2026 09:10:58 +0300 Subject: [PATCH 5/6] test(cli-parity): normalize rg-l border/DfltRscGrp drift, whitelist sp-l topology MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The pool-name normalizer cleared n l / r l / v l / vd l false diffs but left rg l / rg l --pastable as WIRE_SHAPE: after the pool token is collapsed to , two artefacts survived `diff -w`: - ASCII-table border width tracks the longest cell, so BS's pool name `stand` (5) rendered one dash wider than the oracle's `pool` (4) on every +----+ / |====| rule line (dashes are not whitespace). - The built-in DfltRscGrp row prints `PlaceCount: N` on upstream but a blank cell on BS — a render choice over the same default. Extend normalize_side_output to collapse border-char runs and strip the DfltRscGrp PlaceCount value on both sides. rg l (05) and rg l --pastable (20) now classify PARITY (verified against the live dev-stand raw artefacts). The residual `sp l` (03) divergence is genuine — the BS stand and the upstream oracle have different storage-pool topologies, and BS's FILE_THIN DTO leaves PoolName/SharedName blank where the oracle renders a backing path / shared name. Whitelisted as row 03 with justification rather than masked; sp l -m (79) and sp l --show-props (04) still catch real capacity/driver regressions. Co-Authored-By: Claude Signed-off-by: Andrei Kvapil --- docs/cli-parity-known-deltas.md | 2 + tests/operator-harness/cli-parity-refresh.sh | 39 ++++++++++++++++++-- 2 files changed, 38 insertions(+), 3 deletions(-) diff --git a/docs/cli-parity-known-deltas.md b/docs/cli-parity-known-deltas.md index f7710ff8..698670d0 100644 --- a/docs/cli-parity-known-deltas.md +++ b/docs/cli-parity-known-deltas.md @@ -15,6 +15,7 @@ Row IDs match the command-catalogue indexes used by `cli-parity-refresh.sh` (see | # | command | delta_kind | accepted_until | why | |---|---------|------------|----------------|-----| +| 03 | `sp l` | WIRE_SHAPE | permanent | ENV_TOPOLOGY + envelope-shape. After pool-name + table-border normalization the `sp l` listing still diverges because the BS stand and the upstream oracle are provisioned with DIFFERENT storage-pool topologies (BS carries `lvm-thin` / `zfs-thin` / `zfs-thick` rows the FILE_THIN-only oracle never had) and because of two deliberate wire-shape choices on the FILE_THIN pool: (a) BS leaves the FILE_THIN `PoolName` cell blank where the oracle renders its backing-dir path `/var/lib/linstor-oracle-` (BS's FILE_THIN provider has no operator-facing PoolName — same family as row 04's StorDriver omission); (b) BS leaves the `SharedName` column blank where the oracle stamps `;`, and BS provisions a single shared `DfltDisklessStorPool` row rather than one per node. None of these are behavioural deltas — they reflect the stand's pool fixtures and BS's leaner FILE_THIN DTO; linstor-csi / piraeus-operator read pool capacity (`sp l -m`, row 79) not the PoolName/SharedName render columns. Whitelisted so a topology-shaped `sp l` diff does not block the refresh; a genuine capacity/driver regression still surfaces via `sp l -m` (row 79) and `sp l --show-props` (row 04). | | 04 | `sp l --show-props StorDriver/*` | WIRE_SHAPE | permanent | BLOCKSTOR_SUPERSET: BS surfaces extra `StorDriver/LvmVg`/`ThinPool` keys; client glob still matches → CLI render identical. Operator-visible parity OK. | | 08 | `r l` (tiebreaker layers DRBD,STORAGE) | WIRE_SHAPE | 2026-12-31 | F10 residual: DRBD layer enrichment (`may_promote`, `promotion_score`, `node_id`, `al_*`) not yet stamped. Audit refresh 2026-05-19 reclassifies F10 as partial — `drbdtop`-style monitoring depends on it but CSI does not. | | 18 | `controller version` | WIRE_SHAPE | permanent | Intentional version stamping: BS reports `1.33.2+ git=blockstor`. Downstream tooling MUST NOT grep a hex git_hash from BS. | @@ -71,3 +72,4 @@ These rows are **NOT** whitelisted on purpose — they appear in the audit but b - 2026-05-14 — original one-shot audit `docs/cli-parity-audit-2026-05-14.md`. - 2026-05-19 — refresh `docs/cli-parity-audit-2026-05-19-refresh.md`; F1-F20 closed (F10 partial residual remains as accepted delta #08); L7 harness `261d9e32f` lands re-runnable cli-parity-refresh.sh as the going-forward audit driver. - 2026-06-14 — reclassified two cli-matrix cells from "bug" to intentional BS contract: row 84 (tiebreaker→diskful promotion full-syncs by design, `anyDiskfulPeerHasData` gate avoids the UUID-mint StandAlone wedge) and row 85 (bare `SyncSource`/`SyncTarget` literal is the intended shape when `OutOfSyncKib<=0`). Cells `r-c-over-tiebreaker-skip-sync.sh` and `r-l-conns-shapes.sh` now assert the BS-intended behaviour. +- 2026-06-14 — extended `cli-parity-refresh.sh`'s `normalize_side_output` to collapse ASCII-table border-width drift (pool-name length sets column width) and the upstream-only `DfltRscGrp PlaceCount` render, clearing the `rg l` (05) and `rg l --pastable` (20) false WIRE_SHAPE diffs → both now PARITY. The residual `sp l` (03) divergence is a genuine ENV_TOPOLOGY + FILE_THIN envelope-shape delta and is whitelisted as row 03 (not masked). diff --git a/tests/operator-harness/cli-parity-refresh.sh b/tests/operator-harness/cli-parity-refresh.sh index ce9f8424..192d739b 100755 --- a/tests/operator-harness/cli-parity-refresh.sh +++ b/tests/operator-harness/cli-parity-refresh.sh @@ -107,12 +107,27 @@ up() { linstor --controllers "$UP_URL" "$@"; } # # - Storage-pool names differ purely by environment config: BS's stand # uses `stand` / `lvm-thin` / `zfs-thin` / `zfs-thick`, the upstream -# oracle uses `pool`. A `sp l` / `r l` / `vd l` row that is otherwise -# identical would diff solely on the pool column. Collapse every -# side's pool token to a canonical `` so only REAL shape +# oracle uses `pool`. A `sp l` / `r l` / `vd l` / `rg l` row that is +# otherwise identical would diff solely on the pool column. Collapse +# every side's pool token to a canonical `` so only REAL shape # divergence survives the diff. # - The PARITY_PREFIX-suffixed object names already match across sides # (same prefix seeded on both), so they need no normalization. +# - `rg l` carries two extra environment artefacts the bare pool-token +# replace cannot reach (so it previously stayed WIRE_SHAPE even +# though only naming/rendering drift remained): +# 1. ASCII-table border WIDTH drifts with the longest cell: the BS +# pool name `stand` (5) renders one dash wider than the upstream +# `pool` (4), so every `+----+` / `|====|` / `|----|` rule line +# differs by a char that `diff -w` does NOT ignore (dashes are +# not whitespace). Collapse every run of border chars to a +# single canonical token so width drift disappears. +# 2. DfltRscGrp PlaceCount rendering: upstream prints +# `PlaceCount: 2` in the built-in DfltRscGrp's SelectFilter +# cell, BS leaves it blank. The default group's place-count is +# a render choice over the SAME underlying default, not a real +# BS↔UP behavioural delta — canonicalise the DfltRscGrp row's +# SelectFilter cell on both sides. # # Side = "bs" or "up" selects which pool-name set to canonicalize. normalize_side_output() { @@ -131,6 +146,24 @@ normalize_side_output() { # Word-boundary replace so `stand` doesn't clobber `standby` etc. sed -i -E "s/\\b${p}\\b//g" "$file" 2>/dev/null || true done + + # Collapse ASCII-table border-char runs so pool-name length drift + # (which sets the column width, and thus the rule-line length) cannot + # leave a false diff. A border/rule line is composed solely of the + # box-drawing chars `+ | - =` (and spaces); rewrite any run of `-` or + # `=` to a single char. Cell rows (which contain letters/digits) are + # left untouched so genuine content divergence still surfaces. + sed -i -E '/^[-+|= ]+$/ { s/-+/-/g; s/=+/=/g }' "$file" 2>/dev/null || true + + # Canonicalise the built-in DfltRscGrp row's SelectFilter cell: both + # sides render the SAME default place-count, but upstream prints + # `PlaceCount: N` there while BS leaves the cell blank. Remove the + # `PlaceCount: N` text from the DfltRscGrp row entirely so the + # upstream-only render collapses to the blank cell BS emits (the + # leftover spacing is then ignored by the `diff -w` comparator). The + # rewrite is scoped to lines that mention DfltRscGrp so a *user* RG's + # PlaceCount still diffs if it ever diverges. + sed -i -E '/DfltRscGrp/ s/PlaceCount: *[0-9]+//' "$file" 2>/dev/null || true } # reap_residual_resource_groups — delete leftover resource-groups From 7ea408e59d004b6c5a5a5a221e73e0dc7e8782a5 Mon Sep 17 00:00:00 2001 From: Andrei Kvapil Date: Sun, 14 Jun 2026 09:16:21 +0300 Subject: [PATCH 6/6] test(cli-matrix): fix multi-volume-late convergence detection (null vlms) The wait loops parsed `linstor --machine-readable resource list` for `.vlms[].state.disk_state`, but the blockstor apiserver leaves that array null in the golinstor JSON projection, so the jq always read empty and the loop timed out at the vol-0 stage even though `linstor r l` (which reads Resource.Status) showed every replica UpToDate. Switch the convergence checks to status_disk_state / wait_diskful_count (the CRD-backed `.status.volumes[].diskState`, the same reliable helpers the passing cells use), resolve the diskful node set via mapfile over a process substitution (no SIGPIPE), and probe the kernel-truth check via on_node drbdsetup instead of the same null-projection jq. Co-Authored-By: Claude Signed-off-by: Andrei Kvapil --- .../cli-matrix/multi-volume-late-vd-create.sh | 120 +++++++++++------- 1 file changed, 75 insertions(+), 45 deletions(-) diff --git a/tests/e2e/cli-matrix/multi-volume-late-vd-create.sh b/tests/e2e/cli-matrix/multi-volume-late-vd-create.sh index 6c552f35..6c9ef09c 100755 --- a/tests/e2e/cli-matrix/multi-volume-late-vd-create.sh +++ b/tests/e2e/cli-matrix/multi-volume-late-vd-create.sh @@ -69,27 +69,56 @@ echo ">> [Bug 332] rd c + vd c (vol-0)" echo ">> [Bug 332] r c --auto-place=3 -s $POOL" "${LCTL[@]}" resource create --auto-place=3 --storage-pool="$POOL" "$RD" >/dev/null +# Resolve the 3 diskful nodes auto-place picked, so every convergence +# check below reads the per-replica Resource.Status directly. +echo ">> wait for 3 diskful replicas to be placed" +if ! wait_diskful_count "$RD" 3 90; then + echo "FAIL: auto-place did not land 3 diskful replicas" >&2 + "${LCTL[@]}" resource list --resources "$RD" 2>&1 | tail -30 >&2 + exit 1 +fi +mapfile -t DISKFUL_NODES < <(linstor_diskful_nodes "$RD") +if (( ${#DISKFUL_NODES[@]} != 3 )); then + echo "FAIL: expected 3 diskful nodes, got ${#DISKFUL_NODES[@]}: ${DISKFUL_NODES[*]}" >&2 + exit 1 +fi +echo " diskful nodes: ${DISKFUL_NODES[*]}" + +# wait_all_replicas_volume_uptodate — poll +# Resource.Status.volumes[].diskState on EVERY diskful node until +# all report UpToDate, or timeout. Reads the observer-stamped CRD via +# status_disk_state (lib.sh) — the SAME wire surface the passing cells +# use. NOTE: the `--machine-readable resource list` projection leaves +# `.vlms` null on this apiserver, so the previous jq-on-vlms detection +# always read empty and timed out even while `linstor r l` (which reads +# Resource.Status) showed every replica UpToDate. status_disk_state +# reads the populated CRD field instead. +wait_all_replicas_volume_uptodate() { + local vol=$1 timeout=$2 + local deadline=$(( $(date +%s) + timeout )) + local node st all_ok + while (( $(date +%s) < deadline )); do + all_ok=true + for node in "${DISKFUL_NODES[@]}"; do + st=$(status_disk_state "$RD" "$node" "$vol") + if [[ "$st" != "UpToDate" ]]; then + all_ok=false + break + fi + done + if [[ "$all_ok" == "true" ]]; then + return 0 + fi + sleep 3 + done + return 1 +} + # 240s: a 1G x3 initial sync shares the stand with the rest of the -# sweep; the previous 120s budget flaked under load (the failure dump -# printed all three replicas UpToDate moments after the deadline). +# sweep; the previous 120s budget flaked under load. echo ">> wait up to 240s for vol-0 to reach UpToDate on all 3 replicas" -deadline=$(( $(date +%s) + 240 )) -all_up=false -while (( $(date +%s) < deadline )); do - # Per-replica volume-0 state, three rows expected. The wire - # shape is `linstor r l --resources ` → DiskState column. - states=$("${LCTL[@]}" --machine-readable resource list --resources "$RD" 2>/dev/null \ - | jq -r '[.[][]? | .vlms[]? | select(.vlm_nr == 0) | .state.disk_state // "Unknown"] | join(",")' \ - 2>/dev/null || echo "") - if [[ "$states" == "UpToDate,UpToDate,UpToDate" ]]; then - all_up=true - break - fi - sleep 3 -done - -if [[ "$all_up" != "true" ]]; then - echo "FAIL: vol-0 did not reach UpToDate on all 3 replicas within 120s" >&2 +if ! wait_all_replicas_volume_uptodate 0 240; then + echo "FAIL: vol-0 did not reach UpToDate on all 3 replicas within 240s" >&2 "${LCTL[@]}" resource list --resources "$RD" 2>&1 | tail -30 >&2 exit 1 fi @@ -101,32 +130,30 @@ echo ">> [Bug 332] late vd c (vol-1)" echo ">> [Bug 332] late vd c (vol-2)" "${LCTL[@]}" volume-definition create "$RD" 1G >/dev/null -echo ">> wait up to 60s for vol-1 + vol-2 to reach UpToDate on all 3 replicas" -deadline=$(( $(date +%s) + 60 )) -late_up=false -while (( $(date +%s) < deadline )); do - # Pull every (replica, volume) disk_state. We need exactly: - # 3 rows × 3 volumes = 9 disk_state strings, all UpToDate. - # A Bug-332-bitten path will show vol-1/vol-2 stuck on - # "Diskless" with the operator-set DISKLESS flag absent. - states=$("${LCTL[@]}" --machine-readable resource list --resources "$RD" 2>/dev/null \ - | jq -r '[.[][]? | .vlms[]? | .state.disk_state // "Unknown"] | join(",")' \ - 2>/dev/null || echo "") - count_uptodate=$(awk -F, '{ for (i=1;i<=NF;i++) if ($i=="UpToDate") n++ } END { print n+0 }' <<<"$states") - if (( count_uptodate == 9 )); then - late_up=true +# Each late-added 1G volume needs its own initial sync on every replica; +# under sweep load that takes well past the old 60s budget, so give the +# late volumes the same 240s headroom vol-0 gets. +echo ">> wait up to 240s for vol-1 + vol-2 to reach UpToDate on all 3 replicas" +late_up=true +for vol in 1 2; do + if ! wait_all_replicas_volume_uptodate "$vol" 240; then + late_up=false break fi - sleep 3 done if [[ "$late_up" != "true" ]]; then - echo "FAIL (Bug 332): late-added vol-1/vol-2 not UpToDate on all 3 replicas within 60s" >&2 + echo "FAIL (Bug 332): late-added vol-1/vol-2 not UpToDate on all 3 replicas within 240s" >&2 "${LCTL[@]}" resource list --resources "$RD" 2>&1 | tail -40 >&2 - # Surface the smoking gun: a Diskless line for a non-DISKLESS spec. - echo "----- linstor r l --resources $RD (with flags) -----" >&2 - "${LCTL[@]}" --machine-readable resource list --resources "$RD" 2>/dev/null \ - | jq -r '.[][]? | "\(.node_name) vol=\(.vlms[]?.vlm_nr) state=\(.vlms[]?.state.disk_state) flags=\(.flags//[])"' >&2 || true + # Surface the smoking gun: per (node, vol) disk_state straight from + # the populated Resource.Status — a Bug-332-bitten path shows vol-1 + # or vol-2 stuck on Diskless on a non-DISKLESS replica. + echo "----- per (node, vol) Resource.Status diskState -----" >&2 + for node in "${DISKFUL_NODES[@]}"; do + for vol in 0 1 2; do + echo " $node vol=$vol state=$(status_disk_state "$RD" "$node" "$vol")" >&2 + done + done exit 1 fi @@ -135,10 +162,13 @@ fi # MUST NOT report any volume as Diskless when the spec lacks the # DISKLESS flag. This is the kernel-truth assertion that distinguishes # Bug 332 (Unintentional Diskless) from spec-pinned diskless replicas. -echo ">> [Bug 332] kernel-truth: drbdadm status on a diskful node" -satellite_node=$("${LCTL[@]}" --machine-readable resource list --resources "$RD" 2>/dev/null \ - | jq -r '.[][]? | select((.flags//[]) | (map(. == "DISKLESS") | any | not)) | .node_name' \ - 2>/dev/null | head -1) +echo ">> [Bug 332] kernel-truth: drbdsetup status on a diskful node" +# Use the already-resolved diskful node set (reliable, CRD-backed) and +# probe the kernel directly via the satellite pod (on_node, lib.sh) — +# the same drbdsetup status path the parent helpers use. The previous +# `--machine-readable ... | jq .flags | head -1` resolution shared the +# null-vlms / SIGPIPE failure modes of the wait loops above. +satellite_node=${DISKFUL_NODES[0]:-} if [[ -z "$satellite_node" ]]; then echo "SKIP-PARTIAL: could not resolve a diskful node for kernel-truth check" @@ -146,7 +176,7 @@ if [[ -z "$satellite_node" ]]; then exit 0 fi -if status_out=$(kubectl debug node/"$satellite_node" --image=alpine -- chroot /host drbdadm status "$RD" 2>&1); then +if status_out=$(on_node "$satellite_node" drbdsetup status "$RD" --verbose 2>&1); then echo "$status_out" if grep -E 'volume:[12].*disk:Diskless' <<<"$status_out" >/dev/null; then echo "FAIL (Bug 332): diskful node $satellite_node reports volume:1 or volume:2 as Diskless on kernel state" >&2 @@ -154,7 +184,7 @@ if status_out=$(kubectl debug node/"$satellite_node" --image=alpine -- chroot /h exit 1 fi else - echo "SKIP-PARTIAL: kubectl debug to inspect drbdadm status failed (RBAC / image pull); REST-level pin still asserted" + echo "SKIP-PARTIAL: drbdsetup status on $satellite_node failed; REST-level pin still asserted" fi echo ">> multi-volume-late-vd-create OK (Bug 332 pinned: late vd c on $RD brought vol-1/vol-2 to UpToDate, no Unintentional Diskless)"