feat: emit parent_query_id to link nested SPI queries#95
Open
JoshDreamland wants to merge 10 commits into
Open
feat: emit parent_query_id to link nested SPI queries#95JoshDreamland wants to merge 10 commits into
JoshDreamland wants to merge 10 commits into
Conversation
Each event now carries the queryid of its calling query (0 for top-level). Lets aggregations filter to top-level work with WHERE parent_query_id = 0, avoiding the double-counting that happens when plpgsql functions issue SPI statements that themselves emit events. The Event payload's prior `bool top_level` is replaced in-place by `uint64 parent_query_id`; static asserts in ring_entry.h continue to verify layout equivalence with the on-wire ring slot. Single statics for rusage_start, query_start_ts, and current_query_is_top_level are consolidated into a fixed-size query_stack[16] indexed by nesting_level. The previous single rusage_start was clobbered on nested SPI getrusage() calls, so nested CPU deltas were measuring just the inner portion and the outer's CPU was wrong; per-frame baselines fix that alongside parent linkage. The PG_TRY/nesting_level pattern in ExecutorRun/Finish/ProcessUtility is untouched. Schema gains a parent_query_id UInt64 column; migrations/001_add_parent_query_id.sql covers existing deployments. This supersedes #61 (which mixed the feature with broader refactor churn — vector→array→array+counter iterations, helper extraction, depth-cap retunes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> EOF )
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds parent_query_id to emitted events so nested SPI queries can be linked to their caller and excluded from “top-level only” aggregations, while also fixing nested CPU timing by switching from single baselines to a per-nesting frame stack.
Changes:
- Replaced
top_levelwithparent_query_idin the in-memory event/ring-entry payload. - Introduced a fixed-size
query_stackto track per-nesting queryid, CPU baseline, and start timestamp. - Exported and persisted
parent_query_idvia ClickHouse schema changes plus a migration.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/queue/ring_entry.h | Replaces top_level with parent_query_id in the ring buffer entry while preserving the block-copy prefix layout checks. |
| src/queue/event.h | Replaces top_level with parent_query_id in the public event struct. |
| src/hooks/hooks.c | Implements per-nesting query frames and populates parent_query_id during executor/utility/log event emission. |
| src/export/stats_exporter.cc | Adds a parent_query_id column to exported event rows. |
| docker/init/00-schema.sql | Adds parent_query_id column to the canonical ClickHouse schema. |
| migrations/001_add_parent_query_id.sql | Adds a migration for adding parent_query_id to existing ClickHouse tables. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+11
to
+12
| ALTER TABLE pg_stat_ch.events_raw | ||
| ADD COLUMN IF NOT EXISTS parent_query_id UInt64 DEFAULT 0; |
Comment on lines
888
to
892
| static void CaptureLogEvent(ErrorData* edata) { | ||
| PschEvent event; | ||
| InitBaseEvent(&event, GetCurrentTimestamp(), (nesting_level == 0), PSCH_CMD_UNKNOWN); | ||
| InitBaseEvent(&event, GetCurrentTimestamp(), GetParentQueryId(), PSCH_CMD_UNKNOWN); | ||
|
|
||
| UnpackSqlState(edata->sqlerrcode, event.err_sqlstate); |
clang-format wants the column-aligned trailing comments above parent_query_id to share its (wider) alignment column. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot review on #95: 1. parent_query_id was UInt64 while query_id is Int64; comparisons/joins between the two would require explicit casts in ClickHouse. Switch the ClickHouse column, migration, and exporter to Int64. Keep the in-memory struct field as uint64 (matches PG's queryId type) and cast at append time the same way query_id already does. 2. At nesting_level >= PSCH_MAX_NESTING_DEPTH, the PschExecutorEnd frame pointer is NULL, leaving start_ts at 0 (the PG epoch). Any path that computed GetCurrentTimestamp() - start_ts would yield ~25 years of us. Fall back to GetCurrentTimestamp() so deltas are ~0 instead. In practice query_desc->totaltime supplies a real duration via instrumentation, so the fallback subtraction is only used when totaltime wasn't allocated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
028_parent_query_id.pl exercises the three semantic invariants by
inspecting actual exported rows in ClickHouse:
1. Top-level queries report parent_query_id = 0.
2. Nested SPI queries report parent_query_id = outer caller's query_id
(verified by self-join on the events_raw table).
3. A log/error event captured during nested execution reports
parent_query_id = outer caller (not the running query itself) and a
non-zero query_id (the running statement). This catches the
CaptureLogEvent off-by-one: nesting_level is bumped inside
ExecutorRun, so slot nesting_level - 1 holds the running query and
its caller lives at nesting_level - 2.
Filters key off distinctive table/function names rather than constants
or comments — those survive query normalization, where literals would
be replaced with $N placeholders.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The CaptureLogEvent off-by-one came from nesting_level moving in places
where the frame stack didn't — Run/Finish bumped/decremented it around
the body, while the frame slot was written in Start and only read in
End. Inside the body (where CaptureLogEvent fires) nesting_level was
one higher than it was at End, so the same "slot[nesting_level - 1]"
expression meant different things in each place.
Bind the two together: nesting_level moves exactly where a frame is
pushed or popped, and parent_query_id is captured into the frame at
push time. Now slot[nesting_level - 1] is the currently-active query
at every emit site — End, ProcessUtility, CaptureLogEvent — and parent
is read directly from frame->parent_query_id without offset arithmetic.
Mechanics:
- PushQueryFrame in PschExecutorStart bumps nesting_level after
writing the slot. PopQueryFrame in PschExecutorEnd decrements.
- PschExecutorRun and PschExecutorFinish no longer touch nesting_level
on the success path; each just wraps its chain in PG_TRY/PG_CATCH
so a longjmp out of the body pops the frame on the unwind and
keeps depth balanced. Each PG_CATCH on the unwind path decrements
once per level — three deep nesting with an error at the bottom
cascades through three PG_CATCH blocks for three pops, exactly
matching three pushes. No subxact callback needed.
- PschProcessUtility brackets its body with PG_TRY/PG_FINALLY in the
same function, so push at the top and PopQueryFrame in PG_FINALLY
balance regardless of how the body exits. ExecuteUtilityWithNesting
is gone.
- CaptureLogEvent now reads TopQueryFrame() and uses both its queryid
(the running query, attached as event.queryid for attribution) and
its parent_query_id (the running query's caller, attached as
event.parent_query_id with strict semantics — previously this slot
was misread as the running query's own id).
PeekQueryStack and its offset parameter are gone.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The totaltime allocation runs after standard_ExecutorStart returns, between our PG_END_TRY and the close of PschExecutorStart. InstrAlloc goes through MemoryContextAllocZero, which can ereport(ERROR) on OOM — that longjmp would unwind past our PG_END_TRY without firing the PG_CATCH, leaving the pushed frame unpopped. Move the allocation inside the existing PG_TRY block so the same PG_CATCH cleans up the frame. No behavior change on the happy path; new robustness against any future code added between the push and the function's return. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The push had been doing more conditional work than necessary to figure
out whether a parent existed. Convert nesting_level so that:
- -1 is the resting state (no active query),
- n in [0, PSCH_MAX_NESTING_DEPTH-1] is the slot index of the
currently-active frame on top of the stack,
- >= PSCH_MAX_NESTING_DEPTH is the overflow region.
Push now increments first and bails on overflow, so the cap check
collapses to a single comparison. The parent_query_id lookup still
needs one conditional ("do we have a parent at all"), but the previous
double-condition (positive AND within cap) is gone. TopQueryFrame /
PopQueryFrame benefit too: < 0 doubles as a nullity check.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two cleanups around the runaway-nesting case:
- PushQueryFrame now emits a WARNING when it returns NULL because the
cap was exceeded. Previously the overflow was silent — telemetry
just went missing for the frame. WARNING is loud enough to notice
if it happens in practice but doesn't fail the query.
- PschProcessUtility's start_ts fallback was still 0 (PG epoch) when
frame was NULL. Match PschExecutorEnd by falling back to
GetCurrentTimestamp() so the emitted event carries an approximately
correct ts_start rather than 1970.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3232e8e to
c632cdc
Compare
3 tasks
Earlier commits on this branch added parent_query_id to PschEvent and to the ClickHouse-native exporter, plus the ClickHouse schema migration, but arrow_batch.cc keeps its own hard-coded column list and was missed. The result: the production export path (OTel + Arrow IPC) silently dropped parent_query_id on every event, so downstream consumers saw 0 across the board. Caught via the OTel/Arrow quickstart-validate harness (see PR #96). Local script flips from 4/8 passing to all-green once this lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
serprex
approved these changes
May 14, 2026
31eadfe to
18a6eae
Compare
18a6eae to
85266f0
Compare
t/028 already exercises parent_query_id linkage through the ClickHouse-native exporter. Add t/029 to exercise the same invariants through the Arrow/OTel export path (the production pathway), using pg_stat_ch.debug_arrow_dump_dir to capture each Arrow IPC batch to disk before the gRPC send (which we point at a non-existent collector so it fails harmlessly). Same trick as t/026_arrow_dump. t/029 guards specifically against the earlier surgical-PR regression where arrow_batch.cc was missing parent_query_id entirely (the ClickHouse-native exporter had it, so t/028 alone would have let the gap through). Test 3 in both files uses RAISE WARNING inside a nested SPI call to exercise CaptureLogEvent's queryid/parent_query_id assignment. We cannot use a caught ERROR here: errfinish PG_RE_THROWs at elog.c:539 for ERROR-level events without calling EmitErrorReport, so emit_log_hook only fires later from PostgresMain's top-level catch (after all frames have been popped via PG_CATCH unwinding) — or never at all, for errors caught by a plpgsql EXCEPTION block. WARNING-level events go through EmitErrorReport directly, so our hook fires while the inner SPI's frame is still on the stack, which is the only scenario where CaptureLogEvent's slot choice is observable. The assertions distinguish "inner SPI queryid" from "outer caller queryid" so a regression that attributes the log event to the wrong slot (the off-by-one we're guarding against) would be caught. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
85266f0 to
f52d76c
Compare
Contributor
Author
|
I missed stuff, because Arrow is a different code path with different implications. I think that's addressed but I'm feeling less sure of myself. |
3 tasks
JoshDreamland
added a commit
that referenced
this pull request
Jun 8, 2026
In-place patch of docker/init/00-schema.sql, the CH-native exporter, and
the TAP tests so the docker quickstart schema aligns with what prod
actually writes to (datagres_otel.query_logs_arrow in clickgres-platform).
This is the pre-cutover unification: pg_stat_ch's CH-native path was
previously isolated from prod, and the two schemas had drifted apart on
both column naming and types.
Column renames (prod-side naming wins; closer to OTel semantic
conventions and minimizes downstream churn):
ts_start -> ts
db -> db_name
username -> db_user
cmd_type -> db_operation
query -> query_text
Type fix:
err_sqlstate FixedString(5) -> LowCardinality(String)
FixedString does not round-trip through Arrow IPC cleanly, and ~270
SQLSTATE codes are dictionary-friendly. The CH-native exporter is
updated to write the column via TagString (clickhouse-cpp's
ColumnString -> CH LowCardinality(String) is fine on the wire).
Envelope columns added (with DEFAULT '' so the CH-native exporter, which
does not yet emit these, continues to insert successfully):
instance_ubid, server_ubid, server_role, region, cell,
service_version, host_id, pod_name
Engine/partitioning aligned with prod:
ORDER BY ts -> ORDER BY (instance_ubid, ts) (tenant locality)
TTL added: toDate(ts) + INTERVAL 180 DAY
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1
Materialized views (events_recent_1h, query_stats_5m, db_app_user_1m,
errors_recent) updated to reference the new column names and to include
instance_ubid in their ORDER BY / GROUP BY / SELECT projections so they
remain consistent with the events_raw partitioning strategy.
Test fixtures updated to query the new column names:
t/010_clickhouse_export.pl, t/012_timing_accuracy.pl,
t/021_cmd_type_counts.pl, t/027_query_normalization.pl,
t/031_normalize_cache.pl
parent_query_id is intentionally NOT included here — it's the subject of
PR #95 (parent-query-id-surgical) and lands as its own follow-up
migration after this PR.
Validated end-to-end: docker/init/00-schema.sql applies cleanly on
clickhouse/clickhouse-server:26.1 (the version pinned in
docker/docker-compose.test.yml); INSERTs that omit the envelope columns
fill them via DEFAULT ''; all 4 MVs build. CI will run the TAP suite.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JoshDreamland
added a commit
that referenced
this pull request
Jun 8, 2026
In-place patch of docker/init/00-schema.sql, the CH-native exporter, and
the TAP tests so the docker quickstart schema aligns with what prod
actually writes to (datagres_otel.query_logs_arrow in clickgres-platform).
This is the pre-cutover unification: pg_stat_ch's CH-native path was
previously isolated from prod, and the two schemas had drifted apart on
both column naming and types.
Column renames (prod-side naming wins; closer to OTel semantic
conventions and minimizes downstream churn):
ts_start -> ts
db -> db_name
username -> db_user
cmd_type -> db_operation
query -> query_text
Type fix:
err_sqlstate FixedString(5) -> LowCardinality(String)
FixedString does not round-trip through Arrow IPC cleanly, and ~270
SQLSTATE codes are dictionary-friendly. The CH-native exporter is
updated to write the column via TagString (clickhouse-cpp's
ColumnString -> CH LowCardinality(String) is fine on the wire).
Envelope columns added (with DEFAULT '' so the CH-native exporter, which
does not yet emit these, continues to insert successfully):
instance_ubid, server_ubid, server_role, region, cell,
service_version, host_id, pod_name
Engine/partitioning aligned with prod:
ORDER BY ts -> ORDER BY (instance_ubid, ts) (tenant locality)
TTL added: toDate(ts) + INTERVAL 180 DAY
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1
Materialized views (events_recent_1h, query_stats_5m, db_app_user_1m,
errors_recent) updated to reference the new column names and to include
instance_ubid in their ORDER BY / GROUP BY / SELECT projections so they
remain consistent with the events_raw partitioning strategy.
Test fixtures updated to query the new column names:
t/010_clickhouse_export.pl, t/012_timing_accuracy.pl,
t/021_cmd_type_counts.pl, t/027_query_normalization.pl,
t/031_normalize_cache.pl
parent_query_id is intentionally NOT included here — it's the subject of
PR #95 (parent-query-id-surgical) and lands as its own follow-up
migration after this PR.
Validated end-to-end: docker/init/00-schema.sql applies cleanly on
clickhouse/clickhouse-server:26.1 (the version pinned in
docker/docker-compose.test.yml); INSERTs that omit the envelope columns
fill them via DEFAULT ''; all 4 MVs build. CI will run the TAP suite.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tasks
JoshDreamland
added a commit
that referenced
this pull request
Jun 18, 2026
…_raw
Closes the test gap that's existed since the Arrow path went live: no
existing test proves that pg_stat_ch's Arrow IPC output can actually be
ingested by ClickHouse against the unified events_raw schema. t/026
asserts on the IPC schema shape via pyarrow but never pushes the bytes
into CH; t/010 etc. exercise the CH-native Block path, not Arrow.
The new test wires the full producer-to-CH chain locally, bypassing
the OTel collector + receiver service entirely:
1. Spin up a node with use_unified_arrow_exporter=on +
debug_arrow_dump_dir set, an OTel endpoint that doesn't resolve so
gRPC send fails — MaybeDumpArrowBatch fires BEFORE send so IPC
files land on disk regardless.
2. Run a deliberately-shaped workload (SELECT, CREATE, INSERT,
SELECT count, DROP — five distinct statements).
3. Force pg_stat_ch_flush(), wait for IPC files in $dump_dir.
4. TRUNCATE pg_stat_ch.events_raw, then for each IPC file:
curl -X POST --data-binary @$f \
'http://localhost:18123/?query=INSERT INTO pg_stat_ch.events_raw FORMAT ArrowStream'
A type mismatch on the wire (e.g. if the producer regressed to
writing query_id as String) would surface here as a 4xx with a
clear error rather than silently corrupting data.
5. SELECT count() FROM events_raw, assert >= 5 rows.
6. Pull system.columns and assert each id/counter column has the
declared type from PR #99's schema (no silent string-typed regressions).
7. Pinpoint the marker SELECT row and assert db_name/db_operation/
query_text values match what we sent.
8. Assert envelope columns (instance_ubid, server_role, region, cell,
read_replica_type) carry the values from pg_stat_ch.extra_attributes.
9. Assert parent_query_id is 0 across all rows (synthesized by the
exporter until PR #95 lands).
Skips cleanly when Docker / the test CH container / the events_raw
schema aren't available — same patterns as t/010, t/013, t/021.
The "no OTel collector required" property makes this test purely a
producer⇄CH wire-format check. The clickgres-platform Go receiver is
not exercised here, since for verifying that the bytes match the
schema, a curl invocation is the simplest possible expression of "POST
this Arrow IPC body to CH" — the receiver's only added value over
curl in prod is OTel-collector pipeline integration, which we don't
care about for wire-format correctness.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JoshDreamland
added a commit
that referenced
this pull request
Jun 18, 2026
…to-end Introduce a new exporter that builds Arrow IPC RecordBatches through the typed StatsExporter column-factory interface (StatLC/StatHC/StatTimestamp) instead of the open-coded ArrowBatchBuilder used by arrow_batch.cc. Composition over inheritance: the new exporter holds an OTelExporter for gRPC transport (SendArrowBatch) but doesn't extend it, so the per-row LogRecord state machine in OTelExporter — which is unused on this path post-PR-#72 — stays out of scope. Wire shape targets events_raw (the unified schema authored in PR #99), not the legacy query_logs_arrow: * query_id, parent_query_id: Int64 (no sprintf decimal-string encoding) * pid: Int32 * err_elevel: UInt8 * buffer counters (shared/local/temp_blks_*, *_blk_*_time_us, wal_*, cpu_*_time_us): Int64 * parallel_workers_planned/launched: Int16 * jit_*: Int32 * LC strings (db_*, err_sqlstate, app, server_role, region, cell, service_version, read_replica_type) -> DictionaryUtf8 * HC strings (query_text, err_message, client_addr, instance_ubid, server_ubid, host_id, pod_name) -> plain utf8 * ts: arrow::timestamp(MICRO, "UTC") matching DateTime64(6, 'UTC') Column<T> wrappers are nested private types inside OTelArrowExporter (not at namespace scope) so they can inherit from the protected Column<T> base — same convention OTelExporter and ClickHouseExporter use for their own column types. Columns the caller doesn't explicitly populate are synthesized in BeginRow by the exporter itself, so stats_exporter.cc's column-emission loop stays unchanged: * parent_query_id (hardcoded 0 until PR #95 lands and PschEvent carries the field — events_raw requires the column on every insert, no DEFAULT) * 8 envelope columns from pg_stat_ch.extra_attributes (instance_ubid, server_ubid, server_role, region, cell, host_id, pod_name) plus read_replica_type (default 'none' if extra_attributes didn't supply) * service_version pinned to the compile-time PG_STAT_CH_VERSION macro This commit only adds the exporter file (no dispatcher wiring yet) — the next commit adds the GUC and routes batches through it when on. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JoshDreamland
added a commit
that referenced
this pull request
Jun 18, 2026
…_raw
Closes the test gap that's existed since the Arrow path went live: no
existing test proves that pg_stat_ch's Arrow IPC output can actually be
ingested by ClickHouse against the unified events_raw schema. t/026
asserts on the IPC schema shape via pyarrow but never pushes the bytes
into CH; t/010 etc. exercise the CH-native Block path, not Arrow.
The new test wires the full producer-to-CH chain locally, bypassing
the OTel collector + receiver service entirely:
1. Spin up a node with use_unified_arrow_exporter=on +
debug_arrow_dump_dir set, an OTel endpoint that doesn't resolve so
gRPC send fails — MaybeDumpArrowBatch fires BEFORE send so IPC
files land on disk regardless.
2. Run a deliberately-shaped workload (SELECT, CREATE, INSERT,
SELECT count, DROP — five distinct statements).
3. Force pg_stat_ch_flush(), wait for IPC files in $dump_dir.
4. TRUNCATE pg_stat_ch.events_raw, then for each IPC file:
curl -X POST --data-binary @$f \
'http://localhost:18123/?query=INSERT INTO pg_stat_ch.events_raw FORMAT ArrowStream'
A type mismatch on the wire (e.g. if the producer regressed to
writing query_id as String) would surface here as a 4xx with a
clear error rather than silently corrupting data.
5. SELECT count() FROM events_raw, assert >= 5 rows.
6. Pull system.columns and assert each id/counter column has the
declared type from PR #99's schema (no silent string-typed regressions).
7. Pinpoint the marker SELECT row and assert db_name/db_operation/
query_text values match what we sent.
8. Assert envelope columns (instance_ubid, server_role, region, cell,
read_replica_type) carry the values from pg_stat_ch.extra_attributes.
9. Assert parent_query_id is 0 across all rows (synthesized by the
exporter until PR #95 lands).
Skips cleanly when Docker / the test CH container / the events_raw
schema aren't available — same patterns as t/010, t/013, t/021.
The "no OTel collector required" property makes this test purely a
producer⇄CH wire-format check. The clickgres-platform Go receiver is
not exercised here, since for verifying that the bytes match the
schema, a curl invocation is the simplest possible expression of "POST
this Arrow IPC body to CH" — the receiver's only added value over
curl in prod is OTel-collector pipeline integration, which we don't
care about for wire-format correctness.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JoshDreamland
added a commit
that referenced
this pull request
Jun 18, 2026
…_raw
Closes the test gap that's existed since the Arrow path went live: no
existing test proves that pg_stat_ch's Arrow IPC output can actually be
ingested by ClickHouse against the unified events_raw schema. t/026
asserts on the IPC schema shape via pyarrow but never pushes the bytes
into CH; t/010 etc. exercise the CH-native Block path, not Arrow.
The new test wires the full producer-to-CH chain locally, bypassing
the OTel collector + receiver service entirely:
1. Spin up a node with use_unified_arrow_exporter=on +
debug_arrow_dump_dir set, an OTel endpoint that doesn't resolve so
gRPC send fails — MaybeDumpArrowBatch fires BEFORE send so IPC
files land on disk regardless.
2. Run a deliberately-shaped workload (SELECT, CREATE, INSERT,
SELECT count, DROP — five distinct statements).
3. Force pg_stat_ch_flush(), wait for IPC files in $dump_dir.
4. TRUNCATE pg_stat_ch.events_raw, then for each IPC file:
curl -X POST --data-binary @$f \
'http://localhost:18123/?query=INSERT INTO pg_stat_ch.events_raw FORMAT ArrowStream'
A type mismatch on the wire (e.g. if the producer regressed to
writing query_id as String) would surface here as a 4xx with a
clear error rather than silently corrupting data.
5. SELECT count() FROM events_raw, assert >= 5 rows.
6. Pull system.columns and assert each id/counter column has the
declared type from PR #99's schema (no silent string-typed regressions).
7. Pinpoint the marker SELECT row and assert db_name/db_operation/
query_text values match what we sent.
8. Assert envelope columns (instance_ubid, server_role, region, cell,
read_replica_type) carry the values from pg_stat_ch.extra_attributes.
9. Assert parent_query_id is 0 across all rows (synthesized by the
exporter until PR #95 lands).
Skips cleanly when Docker / the test CH container / the events_raw
schema aren't available — same patterns as t/010, t/013, t/021.
The "no OTel collector required" property makes this test purely a
producer⇄CH wire-format check. The clickgres-platform Go receiver is
not exercised here, since for verifying that the bytes match the
schema, a curl invocation is the simplest possible expression of "POST
this Arrow IPC body to CH" — the receiver's only added value over
curl in prod is OTel-collector pipeline integration, which we don't
care about for wire-format correctness.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JoshDreamland
added a commit
that referenced
this pull request
Jun 18, 2026
…to-end Introduce a new exporter that builds Arrow IPC RecordBatches through the typed StatsExporter column-factory interface (StatLC/StatHC/StatTimestamp) instead of the open-coded ArrowBatchBuilder used by arrow_batch.cc. Composition over inheritance: the new exporter holds an OTelExporter for gRPC transport (SendArrowBatch) but doesn't extend it, so the per-row LogRecord state machine in OTelExporter — which is unused on this path post-PR-#72 — stays out of scope. Wire shape targets events_raw (the unified schema authored in PR #99), not the legacy query_logs_arrow: * query_id, parent_query_id: Int64 (no sprintf decimal-string encoding) * pid: Int32 * err_elevel: UInt8 * buffer counters (shared/local/temp_blks_*, *_blk_*_time_us, wal_*, cpu_*_time_us): Int64 * parallel_workers_planned/launched: Int16 * jit_*: Int32 * LC strings (db_*, err_sqlstate, app, server_role, region, cell, service_version, read_replica_type) -> DictionaryUtf8 * HC strings (query_text, err_message, client_addr, instance_ubid, server_ubid, host_id, pod_name) -> plain utf8 * ts: arrow::timestamp(MICRO, "UTC") matching DateTime64(6, 'UTC') Column<T> wrappers are nested private types inside OTelArrowExporter (not at namespace scope) so they can inherit from the protected Column<T> base — same convention OTelExporter and ClickHouseExporter use for their own column types. Columns the caller doesn't explicitly populate are synthesized in BeginRow by the exporter itself, so stats_exporter.cc's column-emission loop stays unchanged: * parent_query_id (hardcoded 0 until PR #95 lands and PschEvent carries the field — events_raw requires the column on every insert, no DEFAULT) * 8 envelope columns from pg_stat_ch.extra_attributes (instance_ubid, server_ubid, server_role, region, cell, host_id, pod_name) plus read_replica_type (default 'none' if extra_attributes didn't supply) * service_version pinned to the compile-time PG_STAT_CH_VERSION macro This commit only adds the exporter file (no dispatcher wiring yet) — the next commit adds the GUC and routes batches through it when on. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JoshDreamland
added a commit
that referenced
this pull request
Jun 18, 2026
…_raw
Closes the test gap that's existed since the Arrow path went live: no
existing test proves that pg_stat_ch's Arrow IPC output can actually be
ingested by ClickHouse against the unified events_raw schema. t/026
asserts on the IPC schema shape via pyarrow but never pushes the bytes
into CH; t/010 etc. exercise the CH-native Block path, not Arrow.
The new test wires the full producer-to-CH chain locally, bypassing
the OTel collector + receiver service entirely:
1. Spin up a node with use_unified_arrow_exporter=on +
debug_arrow_dump_dir set, an OTel endpoint that doesn't resolve so
gRPC send fails — MaybeDumpArrowBatch fires BEFORE send so IPC
files land on disk regardless.
2. Run a deliberately-shaped workload (SELECT, CREATE, INSERT,
SELECT count, DROP — five distinct statements).
3. Force pg_stat_ch_flush(), wait for IPC files in $dump_dir.
4. TRUNCATE pg_stat_ch.events_raw, then for each IPC file:
curl -X POST --data-binary @$f \
'http://localhost:18123/?query=INSERT INTO pg_stat_ch.events_raw FORMAT ArrowStream'
A type mismatch on the wire (e.g. if the producer regressed to
writing query_id as String) would surface here as a 4xx with a
clear error rather than silently corrupting data.
5. SELECT count() FROM events_raw, assert >= 5 rows.
6. Pull system.columns and assert each id/counter column has the
declared type from PR #99's schema (no silent string-typed regressions).
7. Pinpoint the marker SELECT row and assert db_name/db_operation/
query_text values match what we sent.
8. Assert envelope columns (instance_ubid, server_role, region, cell,
read_replica_type) carry the values from pg_stat_ch.extra_attributes.
9. Assert parent_query_id is 0 across all rows (synthesized by the
exporter until PR #95 lands).
Skips cleanly when Docker / the test CH container / the events_raw
schema aren't available — same patterns as t/010, t/013, t/021.
The "no OTel collector required" property makes this test purely a
producer⇄CH wire-format check. The clickgres-platform Go receiver is
not exercised here, since for verifying that the bytes match the
schema, a curl invocation is the simplest possible expression of "POST
this Arrow IPC body to CH" — the receiver's only added value over
curl in prod is OTel-collector pipeline integration, which we don't
care about for wire-format correctness.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Assigns a
parent_query_idto events to avoid polluting metrics for nested queries (when plpgsql functions issue SPI statements that emit events within events).Changes
PschEvent::top_level→PschEvent::parent_query_id(same slot in event/ring_entry; static asserts still hold).rusage_start,query_start_ts,current_query_is_top_level) consolidated into one fixed-sizequery_stack[16]of{queryid, rusage_start, query_start_ts}indexed bynesting_level. The existing PG_TRY/nesting_levelpattern in ExecutorRun/Finish/ProcessUtility is unchanged.RecordUInt64+Appendfor the new column instats_exporter.cc.parent_query_id UInt64column indocker/init/00-schema.sql;migrations/001_add_parent_query_id.sqlfor existing deployments.Bug fix
The prior single static
rusage_startwas overwritten by nestedgetrusage()calls, so the outer query's CPU delta only measured the post-SPI portion. Per-frame baselines fix that alongside parent linkage.Test plan
mise run test:regressmise run test:isolation