Skip to content

Use stack for node instrumentation (v6)#32

Open
lfittl wants to merge 8 commits into
masterfrom
instrument-usage-stack-v6
Open

Use stack for node instrumentation (v6)#32
lfittl wants to merge 8 commits into
masterfrom
instrument-usage-stack-v6

Conversation

@lfittl
Copy link
Copy Markdown
Owner

@lfittl lfittl commented Feb 18, 2026

No description provided.

@lfittl lfittl changed the title Use stack for node instrumentation (v5) Use stack for node instrumentation (v6) Feb 18, 2026
@lfittl lfittl force-pushed the instrument-usage-stack-v6 branch 7 times, most recently from b089cf4 to 9971d3e Compare February 23, 2026 22:20
This was incorrectly named "LT" for "larger than" in e5a5e0a, but
that is against existing conventions, where "LT" means "less than".
Clarify by using "GT" for "greater than" in macro name, and add a missing
comment at the top of instr_time.h to note the macro's existence.

Reported by: Peter Smith <smithpb2250@gmail.com>
Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by:
Discussion: https://www.postgresql.org/message-id/flat/CAHut%2BPut94CTpjQsqOJHdHkgJ2ZXq%2BqVSfMEcmDKLiWLW-hPfA%40mail.gmail.com#0690d99bebc6dd9b035724f66d9986c1
…ith INSTR_* macros

This encapsulates the ownership of these globals better, and will allow
a subsequent refactoring.

Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/CAP53PkzZ3UotnRrrnXWAv%3DF4avRq9MQ8zU%2BbxoN9tpovEu6fGQ%40mail.gmail.com#fc7140e8af21e07a90a09d7e76b300c4
Previously different places (e.g. query "total time") were repurposing
the Instrumentation struct initially introduced for capturing per-node
statistics during execution. This overuse of the same struct is confusing,
e.g. by cluttering calls of InstrStartNode/InstrStopNode in unrelated
code paths, and prevents future refactorings.

Instead, simplify the Instrumentation struct to only track time and
WAL/buffer usage. Similarly, drop the use of InstrEndLoop outside of
per-node instrumentation - these calls were added without any apparent
benefit since the relevant fields were never read.

Introduce the NodeInstrumentation struct to carry forward the per-node
instrumentation information, and introduce TriggerInstrumentation to
capture trigger timing and firings (previously counted in "ntuples").
WorkerInstrumentation is renamed to WorkerNodeInstrumentation for clarity.

In passing, drop the "n" argument to InstrAlloc, as all remaining callers
need exactly one Instrumentation struct.

Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by:
Discussion:
@lfittl lfittl force-pushed the instrument-usage-stack-v6 branch 2 times, most recently from 63c26a8 to 525abbd Compare February 23, 2026 23:12
This adds regression tests that cover some of the expected behaviour
around the buffer statistics reported in EXPLAIN ANALYZE, specifically
how they behave in parallel query, nested function calls and abort
situations.

Testing this is challenging because there can be different sources of
buffer activity, so we rely on temporary tables where we can to prove
that activity was captured and not lost. This supports a future commit
that will rework some of the instrumentation logic that could cause
areas covered by these tests to fail.

Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by:
Discussion:
@lfittl lfittl force-pushed the instrument-usage-stack-v6 branch from 525abbd to 07558c5 Compare February 23, 2026 23:48
Previously, in order to determine the buffer/WAL usage of a given code
section, we utilized continuously incrementing global counters that get
updated when the actual activity (e.g. shared block read) occurred, and
then calculated a diff when the code section ended. This resulted in a
bottleneck for executor node instrumentation specifically, with the
function BufferUsageAccumDiff showing up in profiles and in some cases
adding up to 10% overhead to an EXPLAIN (ANALYZE, BUFFERS) run.

Instead, introduce a stack-based mechanism, where the actual activity
writes into the current stack entry. In the case of executor nodes, this
means that each node gets its own stack entry that is pushed at
InstrStartNode, and popped at InstrEndNode. Stack entries are zero
initialized (avoiding the diff mechanism) and get added to their parent
entry when they are finalized, i.e. no more modifications can occur.

To correctly handle abort situations, any use of instrumentation stacks
must involve either a top-level Instrumentation struct, and its associated
InstrStart/InstrStop helpers (which use resource owners to handle aborts),
or dedicated PG_TRY/PG_FINALLY calls that ensure the stack is in a
consistent state after an abort.

Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by:
Discussion:
This sets up a separate instrumentation stack that is used whilst an Index Scan
does scanning on the table, for example due to additional data being needed.

EXPLAIN ANALYZE will now show "Table Buffers" that represent such activity. The
activity is also included in regular "Buffers" together with index activity and
that of any child nodes.

Author: Lukas Fittl <lukas@fittl.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Reviewed-by:
Discussion:
This is intended for testing instrumentation related logic as it pertains
to the top level stack that is maintained as a running total. There is
currently no in-core user that utilizes the top-level values in this
manner, and especially during abort situations this helps ensure we don't
lose activity due to incorrect handling of unfinalized node stacks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant