fix: handle NULL pg_stat_wal.stats_reset#1
Draft
honi-at-simspace wants to merge 1 commit into
Draft
Conversation
The stats_reset column of pg_stat_wal can be NULL on instances that have never had WAL stats initialized -- most commonly replicas (which do not write WAL stats locally) and primaries promoted from a replica that was bootstrapped via pg_basebackup. The exporter scanned this column into a plain string, so on such instances every collection failed with "sql: Scan error on column index 4, name stats_reset: converting NULL to string is unsupported", dropping all pg_stat_wal metrics for the pod and logging an error at every scrape interval. Scan stats_reset into sql.NullString and use the empty string as the metric label when the column is NULL. The actual WAL counters are then emitted normally. The pg_stat_wal scan logic is extracted into a package-private getPgStatWAL helper so the NULL behaviour can be exercised directly with sqlmock. Closes cloudnative-pg#9106 Signed-off-by: Honi Sanders <honi.sanders@simspace.com>
This was referenced May 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
pg_stat_wal.stats_resetis per-instance and can legitimately be NULL on:pg_basebackup—the WAL stats subsystem on a standby never initialises
stats_reset, andpromotion does not reset it. After ≥1 failover (rolling minor upgrades,
node maintenance, etc.) every pod in the cluster reaches this state.
The exporter scanned the column into a
string, so on such instances everycollection failed with
sql: Scan error on column index 4, name "stats_reset": converting NULL to string is unsupported. Allpg_stat_walcounters weredropped for the affected pod, and the error was logged at every scrape.
This change scans
stats_resetinto asql.NullStringand uses the emptystring as the metric label when NULL. The actual WAL counters
(
wal_records,wal_fpi,wal_bytes,wal_buffers_full, plus the PG<18write/sync ones) are then collected normally.
Relation to cloudnative-pg#9788
cloudnative-pg#9788 takes the orthogonal approach of skipping
pg_stat_walcollection onreplicas entirely. That is correct on its own merits, but does not cover the
primary-with-NULL-
stats_resetcase (clusters that have failed over at leastonce), which is what we observe in our environments. The two changes are
complementary — cloudnative-pg#9788 stops collecting data that isn't meaningful on replicas;
this change makes the scan robust to NULL on the primary.
Diff scope — two options for reviewers
Happy to ship either of the following; reviewer preference decides.
Option A (current PR, ~110 LOC): type-safe scan + a small refactor that
extracts the scan logic into a package-private
getPgStatWAL(db, version)helper, plus three sqlmock-backed unit tests covering populated / NULL on
PG<18 / NULL on PG≥18. Pro: regression coverage. Con: introduces a small
refactor that is not strictly required by the bug fix.
Option B (~9 LOC, no test): just change
StatsReset string→sql.NullStringinPgStatWaland use.Stringat the metric-label callsites. No helper, no new tests; relies on the existing E2E suite for
integration coverage. Pro: minimal review surface. Con: no unit test of the
NULL path.
If you prefer Option B I'll force-push the branch with the helper and the
new tests removed.
Testing
go build ./...probes_test.gocover three cases: populatedstats_reseton PG<18, NULL
stats_reseton PG<18, NULLstats_reseton PG≥18.go test ./pkg/management/postgres/... ./pkg/management/postgres/webserver/metricserver/...passes.Closes cloudnative-pg#9106
Drafted with AI assistance (Claude). All code, tests, and design choices were
reviewed by the author before submission.