-
Notifications
You must be signed in to change notification settings - Fork 1
test(harness): release-gate cleanup — cli-matrix/replay/burnin/cli-parity (batch 4) + unmask BUG-048 #162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test(harness): release-gate cleanup — cli-matrix/replay/burnin/cli-parity (batch 4) + unmask BUG-048 #162
Changes from all commits
822668c
42f4400
994a4ec
16c511f
cfd36f0
7ea408e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -135,20 +135,46 @@ EOF | |||||||
| fi | ||||||||
|
|
||||||||
| DEV=$(on_node "$PRIMARY" bash -c "grep -oE '/dev/drbd[0-9]+' /etc/drbd.d/${RD}.res | head -1") | ||||||||
|
|
||||||||
| # Write 1 MiB urandom on PRIMARY and capture its md5, then read it | ||||||||
| # back on PEER and compare. Each remote read GUARDS dd's exit code | ||||||||
| # (via PIPESTATUS) AND the byte count: under churn `dd` can fail to | ||||||||
| # open /dev/drbdN (EAGAIN, device transiently busy) and read ZERO | ||||||||
| # bytes; md5("") is a fixed digest, so an unguarded read produces a | ||||||||
| # FALSE mismatch alarm that would drown out a real future divergence. | ||||||||
| # The md5 is computed by piping dd STRAIGHT into md5sum (never via a | ||||||||
| # shell variable — command substitution strips NUL bytes, which a | ||||||||
| # binary 1 MiB read is full of). On a guarded read failure the snippet | ||||||||
| # emits the sentinel "READFAIL"; the iteration's compare is then | ||||||||
| # SKIPPED (not counted as a real FAIL) and re-tried next iteration. | ||||||||
| # `bs=1M count=1` reads exactly 1048576 bytes on success. | ||||||||
| EXPECT_BYTES=1048576 | ||||||||
| PRIMARY_MD5=$(on_node "$PRIMARY" bash -c " | ||||||||
| drbdadm primary ${RD} | ||||||||
| dd if=/dev/urandom of=${DEV} bs=1M count=1 status=none oflag=direct | ||||||||
| dd if=${DEV} bs=1M count=1 status=none iflag=direct | md5sum | awk '{print \$1}' | ||||||||
| md5=\$(dd if=${DEV} bs=1M count=1 iflag=direct 2>/tmp/burnin-dd-primary.err | md5sum | awk '{print \$1}') | ||||||||
| rc=\${PIPESTATUS[0]} | ||||||||
| n=\$(awk '/bytes/ {print \$1; exit}' /tmp/burnin-dd-primary.err) | ||||||||
| drbdadm secondary ${RD} | ||||||||
| if [ \"\$rc\" -ne 0 ] || [ \"\$n\" != \"${EXPECT_BYTES}\" ]; then echo 'READFAIL'; else echo \"\$md5\"; fi | ||||||||
| " | tail -1) | ||||||||
|
|
||||||||
| PEER_MD5=$(on_node "$PEER" bash -c " | ||||||||
| drbdadm primary ${RD} | ||||||||
| dd if=${DEV} bs=1M count=1 status=none iflag=direct | md5sum | awk '{print \$1}' | ||||||||
| md5=\$(dd if=${DEV} bs=1M count=1 iflag=direct 2>/tmp/burnin-dd-peer.err | md5sum | awk '{print \$1}') | ||||||||
| rc=\${PIPESTATUS[0]} | ||||||||
|
Comment on lines
+164
to
+165
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The exit status of
Suggested change
|
||||||||
| n=\$(awk '/bytes/ {print \$1; exit}' /tmp/burnin-dd-peer.err) | ||||||||
| drbdadm secondary ${RD} | ||||||||
| if [ \"\$rc\" -ne 0 ] || [ \"\$n\" != \"${EXPECT_BYTES}\" ]; then echo 'READFAIL'; else echo \"\$md5\"; fi | ||||||||
| " | tail -1) | ||||||||
|
|
||||||||
| if [[ "$PRIMARY_MD5" == "$PEER_MD5" ]]; then | ||||||||
| if [[ "$PRIMARY_MD5" == "READFAIL" || "$PEER_MD5" == "READFAIL" \ | ||||||||
| || -z "$PRIMARY_MD5" || -z "$PEER_MD5" ]]; then | ||||||||
| # Transient read failure on at least one side — neither PASS nor | ||||||||
| # FAIL. A real mismatch is only credible when BOTH reads succeeded | ||||||||
| # and returned the full 1 MiB. | ||||||||
| echo "[$(date -u +%FT%TZ)] iter=$ITER SKIP: transient dd read failure (primary='$PRIMARY_MD5' peer='$PEER_MD5'); not comparing" | ||||||||
| elif [[ "$PRIMARY_MD5" == "$PEER_MD5" ]]; then | ||||||||
|
Comment on lines
+171
to
+177
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. READFAIL-only runs can incorrectly end as success.
💡 Patch sketch PASS=0
FAIL=0
+SKIP=0
+READFAIL_STREAK=0
ITER=0
@@
if [[ "$PRIMARY_MD5" == "READFAIL" || "$PEER_MD5" == "READFAIL" \
|| -z "$PRIMARY_MD5" || -z "$PEER_MD5" ]]; then
+ SKIP=$((SKIP + 1))
+ READFAIL_STREAK=$((READFAIL_STREAK + 1))
echo "[$(date -u +%FT%TZ)] iter=$ITER SKIP: transient dd read failure (primary='$PRIMARY_MD5' peer='$PEER_MD5'); not comparing"
+ if (( READFAIL_STREAK >= 10 )); then
+ FAIL=$((FAIL + 1))
+ echo "[$(date -u +%FT%TZ)] iter=$ITER FAIL: persistent dd read failures ($READFAIL_STREAK in a row)"
+ fi
elif [[ "$PRIMARY_MD5" == "$PEER_MD5" ]]; then
PASS=$((PASS + 1))
+ READFAIL_STREAK=0
else
FAIL=$((FAIL + 1))
+ READFAIL_STREAK=0
echo "[$(date -u +%FT%TZ)] iter=$ITER FAIL: md5 mismatch primary=$PRIMARY_MD5 peer=$PEER_MD5"
fi
@@
-echo "[$(date -u +%FT%TZ)] DONE iter=$ITER pass=$PASS fail=$FAIL"
-[[ $FAIL -eq 0 ]] || exit 1
+echo "[$(date -u +%FT%TZ)] DONE iter=$ITER pass=$PASS fail=$FAIL skip=$SKIP"
+[[ $FAIL -eq 0 && $PASS -gt 0 ]] || exit 1🤖 Prompt for AI Agents |
||||||||
| PASS=$((PASS + 1)) | ||||||||
| else | ||||||||
| FAIL=$((FAIL + 1)) | ||||||||
|
|
||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exit status of
ddinside the command substitutionmd5=\$(...)is not captured by\${PIPESTATUS[0]}in the outer remote shell. In Bash,PIPESTATUSin the outer shell only reflects the exit status of the command substitution itself (which is the exit status of the last command in the pipeline,awk, which is0even ifddfails). Furthermore,pipefailis not enabled in the remotebash -csession, so the subshell itself exits with0even ifddfails.\n\nTo reliably capture the exit status ofddwithout relying onPIPESTATUSin the outer shell, enablepipefailinside the subshell and capture the exit status of the assignment using\$?.