fix: handle NULL pg_stat_wal.stats_reset (minimal)#2
Draft
honi-at-simspace wants to merge 1 commit into
Draft
Conversation
The stats_reset column of pg_stat_wal can be NULL on instances that have never had WAL stats initialized -- most commonly replicas (which do not write WAL stats locally) and primaries promoted from a replica that was bootstrapped via pg_basebackup. The exporter scanned this column into a plain string, so on such instances every collection failed with "sql: Scan error on column index 4, name stats_reset: converting NULL to string is unsupported", dropping all pg_stat_wal metrics for the pod and logging an error at every scrape interval. Scan stats_reset into sql.NullString and use the empty string as the metric label when the column is NULL. The actual WAL counters are then emitted normally. Closes cloudnative-pg#9106 Signed-off-by: Honi Sanders <honi.sanders@simspace.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
pg_stat_wal.stats_resetis per-instance and can legitimately be NULL on:pg_basebackup—the WAL stats subsystem on a standby never initialises
stats_reset, andpromotion does not reset it. After ≥1 failover (rolling minor upgrades,
node maintenance, etc.) every pod in the cluster reaches this state.
The exporter scanned the column into a
string, so on such instances everycollection failed with
sql: Scan error on column index 4, name "stats_reset": converting NULL to string is unsupported. Allpg_stat_walcounters weredropped for the affected pod, and the error was logged at every scrape.
This change scans
stats_resetinto asql.NullStringand uses the emptystring as the metric label when NULL. The actual WAL counters
(
wal_records,wal_fpi,wal_bytes,wal_buffers_full, plus the PG<18write/sync ones) are then collected normally.
Relation to cloudnative-pg#9788
cloudnative-pg#9788 takes the orthogonal approach of skipping
pg_stat_walcollection onreplicas entirely. That is correct on its own merits, but does not cover the
primary-with-NULL-
stats_resetcase (clusters that have failed over at leastonce), which is what we observe in our environments. The two changes are
complementary — cloudnative-pg#9788 stops collecting data that isn't meaningful on replicas;
this change makes the scan robust to NULL on the primary.
Diff scope — minimal version
This is the minimal version of the fix: just the
string→sql.NullStringtype change in
PgStatWaland the corresponding.Stringaccessor at themetric-label call sites. No refactor, no new tests; relies on the existing
E2E suite for integration coverage. Net diff: 2 files, 9 lines.
A larger version that also extracts a
getPgStatWAL(db, version)helper andadds three sqlmock-backed unit tests (populated / NULL on PG<18 / NULL on
PG≥18) is available on branch
dev/9106(see fork PR #1). Reviewerpreference decides which scope ships.
Testing
go build ./...go test ./pkg/management/postgres/... ./pkg/management/postgres/webserver/metricserver/...passes.Closes cloudnative-pg#9106
Drafted with AI assistance (Claude). All code and design choices were
reviewed by the author before submission.