Skip to content

benchmark: revisit NOTICE-vs-pgss instrumentation choice (observer effect on DO-wrapped queries) #127

@NikolayS

Description

@NikolayS

Summary

benchmark/ documentation (PR #66) explains:

"Why NOTICE rather than pgss: DO-wrappers hide per-statement rows from pg_stat_statements..."

The choice is functional but has a real observer effect on the metric being measured: every RAISE NOTICE produces a server→client protocol message AND a server-side log write, both of which add latency to the very queries being instrumented.

Why this matters

Two compounding costs:

  1. Server→client protocol overhead. RAISE NOTICE flushes a NoticeResponse on each invocation; the client (psql/pgbench/etc.) must read it. At high TPS this is non-trivial.
  2. Log subsystem write. The notice also goes to whatever log destination is configured (related to issue benchmark: revisit "zero PG log I/O" claim — logging_collector=off still produces journald/disk I/O #123). Even when log volume is "low," it's not zero.

Combined, RAISE NOTICE-based instrumentation has a higher floor than pg_stat_statements-based measurement.

Why DO blocks were used

DO blocks let arbitrary PL/pgSQL run without persisting a function definition. But pg_stat_statements records the OUTER DO statement (or sometimes nothing, depending on version) — not the inner per-statement timings.

Alternatives for next bench cycle

  • Replace DO blocks with named functions for the hot-path measurement points. pg_stat_statements then captures per-call statistics with negligible observer effect at typical TPS.
  • Use pg_stat_kcache or pg_stat_statements.toplevel = on (PG 14+) to capture nested call stats that include DO contents.
  • Accept DO + use pg_ash 1Hz sampling (already in the pgque bench stack) which is lower-overhead than NOTICE for high-frequency events; reserve NOTICE for boundary events only.
  • Switch to auto_explain with log_min_duration_sample — less per-query log volume but still useful for outlier diagnosis.

Action

  • Do not block PR docs: bench methodology + tooling under benchmark/ #66 — the bench numbers themselves stand at the current overhead level.
  • For the next bench cycle:
    • quantify the NOTICE overhead with a comparison run (NOTICE vs no-NOTICE for the same workload), then either move to functions+pgss or accept the overhead with a measured floor.
    • update the methodology doc with the comparison.

Severity

LOW — methodology refinement, not a numerical correction. Joins the cluster with #123 (log I/O) and #124 (planner-cost framing) as bench-doc revisits for the next round.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions