schema: unify pg_stat_ch events_raw with prod Arrow path, move into Goose layout#99
Conversation
|
does this need corresponding change in clickgres-platform? eg for query_id being Int64 vs String can clickgres-platform submodule pg_stat_ch for schema? description talks of cutover, but seems that's not too necessary since CH exporter not being used in prod can we add test that does both CH & arrow? potentially one-after-the-other. to test schema compatibility |
ca96ef5 to
e462b17
Compare
In-place patch of docker/init/00-schema.sql, the CH-native exporter, and
the TAP tests so the docker quickstart schema aligns with what prod
actually writes to (datagres_otel.query_logs_arrow in clickgres-platform).
This is the pre-cutover unification: pg_stat_ch's CH-native path was
previously isolated from prod, and the two schemas had drifted apart on
both column naming and types.
Column renames (prod-side naming wins; closer to OTel semantic
conventions and minimizes downstream churn):
ts_start -> ts
db -> db_name
username -> db_user
cmd_type -> db_operation
query -> query_text
Type fix:
err_sqlstate FixedString(5) -> LowCardinality(String)
FixedString does not round-trip through Arrow IPC cleanly, and ~270
SQLSTATE codes are dictionary-friendly. The CH-native exporter is
updated to write the column via TagString (clickhouse-cpp's
ColumnString -> CH LowCardinality(String) is fine on the wire).
Envelope columns added (with DEFAULT '' so the CH-native exporter, which
does not yet emit these, continues to insert successfully):
instance_ubid, server_ubid, server_role, region, cell,
service_version, host_id, pod_name
Engine/partitioning aligned with prod:
ORDER BY ts -> ORDER BY (instance_ubid, ts) (tenant locality)
TTL added: toDate(ts) + INTERVAL 180 DAY
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1
Materialized views (events_recent_1h, query_stats_5m, db_app_user_1m,
errors_recent) updated to reference the new column names and to include
instance_ubid in their ORDER BY / GROUP BY / SELECT projections so they
remain consistent with the events_raw partitioning strategy.
Test fixtures updated to query the new column names:
t/010_clickhouse_export.pl, t/012_timing_accuracy.pl,
t/021_cmd_type_counts.pl, t/027_query_normalization.pl,
t/031_normalize_cache.pl
parent_query_id is intentionally NOT included here — it's the subject of
PR #95 (parent-query-id-surgical) and lands as its own follow-up
migration after this PR.
Validated end-to-end: docker/init/00-schema.sql applies cleanly on
clickhouse/clickhouse-server:26.1 (the version pinned in
docker/docker-compose.test.yml); INSERTs that omit the envelope columns
fill them via DEFAULT ''; all 4 MVs build. CI will run the TAP suite.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mechanical move + goose annotations. The pg_stat_ch ClickHouse schema
that was previously the docker quickstart init script becomes the first
real Goose migration under schema/migrations/, matching
clickgres-platform's runner layout (pressly/goose v3,
DialectClickHouse, embed.FS).
Changes to the content of the moved file:
* Header banner rewritten from "CANONICAL SCHEMA REFERENCE / single
source of truth / dual role as docker init" to "initial migration"
framing.
* Added -- +goose Up / -- +goose Down section markers.
* Each CREATE DATABASE / CREATE TABLE / CREATE MATERIALIZED VIEW
wrapped in -- +goose StatementBegin / StatementEnd so goose's
parser handles the multi-statement bodies correctly.
* Removed the pre-CREATE "DROP TABLE IF EXISTS X" idioms — those
existed to make the docker init script idempotent on container
restart, but goose tracks state via goose_db_version. Drops now
live exclusively in the -- +goose Down section in reverse
dependency order.
The schema content itself (column names, types, MV definitions,
ORDER BY / TTL / SETTINGS) is unchanged from the previous commit.
Git rename detection should follow docker/init/00-schema.sql ->
schema/migrations/20260519000001_create_initial_schema.sql.
Also adds schema/migrations/00000000000001_bootstrap.sql, a no-op
SELECT 1 migration required by goose to seed the goose_db_version
table (copied verbatim from clickgres-platform's bootstrap).
Validated end-to-end against clickhouse/clickhouse-server:26.1:
pressly goose v3.27.1 `up` and `reset` round-trip cleanly. All 51
columns and 4 MVs land with the expected types.
Note: this leaves docker/init/ empty. The docker-compose mounts will
need updating in a follow-on PR to point at schema/migrations/ (which
requires a small shim to invoke goose-up at container start, since
clickhouse-server's docker entrypoint cannot parse goose's
StatementBegin/End directives directly).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…client The previous "Initialize ClickHouse schema" step ran `clickhouse-client --multiquery < docker/init/00-schema.sql`. That file moved in the previous commit; pointing the step at the new location without further changes would not work because clickhouse-client cannot parse goose -- +goose Up/Down/StatementBegin/End directives, and would execute the Down section's DROP statements right after the Up section's CREATEs. Switch the step to install pressly/goose v3.27.1 (~5 sec on Ubuntu CI runners which have Go preinstalled) and apply the migrations from schema/migrations/ via `goose ... up` against the running CH container. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
e462b17 to
857400c
Compare
Pre-add the column being introduced on both sides of the Arrow pipeline to keep the unified events_raw schema wire-compatible once the cutover happens: * pg_stat_ch PR #107 — adds read_replica_type to the Arrow IPC output in arrow_batch.cc (dictionary-encoded, populated from the pg_stat_ch.extra_attributes GUC). * clickgres-platform PR #448 — adds the matching ALTER TABLE on query_logs_arrow (LowCardinality(String) DEFAULT 'none' AFTER server_role) and promotes read-replica traffic into query_logs via a widened MV filter. Mirroring the same type, default, and position here means the eventual cutover from query_logs_arrow to events_raw needs zero further schema changes — and lets PR #107 rebase onto unified_schema without having to amend the schema migration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR unifies pg_stat_ch’s ClickHouse events_raw schema with the production Arrow receiver shape (column names/types, ordering/TTL, and MV definitions), and makes schema/migrations/ (pressly/goose layout) the canonical, CI-applied schema source. It also updates the ClickHouse exporter + TAP suite queries to use the unified column names.
Changes:
- Moved the canonical ClickHouse schema into Goose migrations (
schema/migrations/), including a bootstrap migration and an initial schema migration (events_raw + 4 MVs). - Updated exporters/tests to the unified column names (
ts,db_name,db_user,db_operation,query_text) and changederr_sqlstatetoLowCardinality(String)(exporter usesTagString). - Updated TAP CI to install and run goose migrations against the test ClickHouse container.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
t/034_query_intern_oom_export.pl |
Updates ClickHouse queries to query_text and related fields for OOM export assertions. |
t/031_normalize_cache.pl |
Updates ClickHouse query_text usage and ordering by ts. |
t/027_query_normalization.pl |
Updates ClickHouse queries to query_text and ordering by ts. |
t/021_cmd_type_counts.pl |
Switches aggregation queries to group by db_operation. |
t/013_clickhouse_tls.pl |
Updates ClickHouse query checks to query_text. |
t/012_timing_accuracy.pl |
Updates ClickHouse filters/order-by to query_text/ts. |
t/010_clickhouse_export.pl |
Updates ClickHouse validation queries to query_text, db_name, db_operation. |
src/export/stats_exporter.cc |
Emits ts and uses TagString("err_sqlstate") with string conversion. |
src/export/exporter_interface.h |
Updates semantic column comments/examples to unified names. |
src/export/clickhouse_exporter.cc |
Maps semantic columns to db_name, db_user, db_operation, query_text. |
schema/migrations/20260519000001_create_initial_schema.sql |
Adds Goose “initial schema” migration with unified table/MVs and prod-aligned ordering/TTL/settings. |
schema/migrations/00000000000001_bootstrap.sql |
Adds Goose bootstrap migration (SELECT 1). |
docker/init/.gitkeep |
Leaves docker/init/ as an empty placeholder for legacy compose bind mounts. |
.github/workflows/ci-tap.yml |
Installs goose and applies migrations to initialize ClickHouse schema in CI. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - name: Install goose | ||
| run: go install github.com/pressly/goose/v3/cmd/goose@v3.27.1 | ||
|
|
||
| - name: Initialize ClickHouse schema | ||
| run: | | ||
| docker exec psch-clickhouse clickhouse-client --multiquery < docker/init/00-schema.sql | ||
| docker exec psch-clickhouse clickhouse-client -q "CREATE DATABASE IF NOT EXISTS pg_stat_ch" | ||
| "$HOME/go/bin/goose" -dir schema/migrations \ | ||
| clickhouse "tcp://localhost:19000?database=pg_stat_ch" up |
| # Reserved for legacy docker/quickstart bind mounts. The canonical CH schema | ||
| # now lives in schema/migrations/ and is applied via goose (see CI workflow). | ||
| # This directory is intentionally empty; the .gitkeep keeps the bind mount | ||
| # in docker/docker-compose.test.yml and docker/quickstart/docker-compose.yml | ||
| # from failing on a missing host path. |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
… $HOME/go/bin Per Copilot review on #99: $HOME/go/bin assumes GOPATH=$HOME/go, which is the Go default but not guaranteed (CI runners can configure GOPATH elsewhere, and GOBIN can override the install location entirely). Resolve the actual path via `go env GOPATH` and add it to $GITHUB_PATH so subsequent steps can just call `goose` without a path prefix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
schema/migrations/20260519000001_create_initial_schema.sql:196
- The comment suggests
DEFAULT ''is required for ClickHouse inserts to succeed, but the ClickHouse exporter buildsINSERT INTO events_raw (<col list>)(seesrc/export/clickhouse_exporter.cc), so omitted envelope columns will already use their DEFAULT/type defaults. Updating the comment avoids implying a stricter requirement than actually exists.
| # Helper: parse db_operation counts from ClickHouse | ||
| sub get_cmd_type_counts { | ||
| my $result = psch_query_clickhouse( | ||
| "SELECT cmd_type, count() FROM pg_stat_ch.events_raw GROUP BY cmd_type FORMAT TabSeparated" | ||
| "SELECT db_operation, count() FROM pg_stat_ch.events_raw GROUP BY db_operation FORMAT TabSeparated" | ||
| ); |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Per Copilot review on #99: identifiers carrying the old column names through Perl helpers and C++ locals were misleading after the column renames in commit 1. Updating them so failure messages and stack traces reflect what the code now actually does. t/021_cmd_type_counts.pl: sub get_cmd_type_counts -> sub get_db_operation_counts (+ 4 call sites) node name 'cmd_type_counts' -> 'db_operation_counts' src/export/stats_exporter.cc: col_db -> col_db_name col_username -> col_db_user col_cmd_type -> col_db_operation col_query -> col_query_text File name t/021_cmd_type_counts.pl is left as-is (per earlier decision - file rename is more churn than it's worth pre-GA; the TAP summary report will read fine). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit b8282dc. Configure here.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
schema/migrations/20260519000001_create_initial_schema.sql:196
- The envelope header comment says these resource-attribute columns default to
'', butread_replica_typeactually defaults to'none'. Tweaking the comment avoids documenting a default that the schema doesn’t implement.
| ); | ||
|
|
||
| # Helper: parse cmd_type counts from ClickHouse | ||
| # Helper: parse db_operation counts from ClickHouse |
| # - every intern_failed row still carries duration_us, db_name, and | ||
| # db_operation — the numeric/identity telemetry the customer relies on | ||
| # for slow-query analysis even when SQL text is unavailable. |
| # Reserved for legacy docker/quickstart bind mounts. The canonical CH schema | ||
| # now lives in schema/migrations/ and is applied via goose (see CI workflow). | ||
| # This directory is intentionally empty; the .gitkeep keeps the bind mount | ||
| # in docker/docker-compose.test.yml and docker/quickstart/docker-compose.yml | ||
| # from failing on a missing host path. |
Copilot's autofix in 12ab775 shifted the line break from before std::string( to after the opening paren while changing strnlen(..., sizeof(...) - 1). The new wrap position trips clang-format on the column-limit rule. Move the break back to where it was originally (before std::string), keep the sizeof - 1 fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
schema/migrations/20260519000001_create_initial_schema.sql:196
- The envelope section comment says attributes default to '' so the CH-native exporter can omit them, but
read_replica_typedefaults to'none'. This makes the comment inaccurate/misleading for anyone editing defaults later.
| - name: Install goose | ||
| run: | | ||
| go install github.com/pressly/goose/v3/cmd/goose@v3.27.1 | ||
| echo "$(go env GOPATH)/bin" >> "$GITHUB_PATH" | ||
|
|
…FixedStringCol class `MetricFixedString` was added in Feb 2026 (commit 9f89f4) when err_sqlstate was the only column in events_raw of type FixedString(5). It had exactly one caller, exactly one width, and was always going to. PR #99 (merged 2026-06-10) changed err_sqlstate from FixedString(5) to LowCardinality(String) and updated the call site from `exporter->MetricFixedString(5, "err_sqlstate")` to `exporter->TagString("err_sqlstate")`. The interface method became unreferenced. PR #104 (merged 2026-06-04 — before #99) had to reproduce MetricFixedString in clickhouse-c by hand-rolling a `FixedStringCol` class, since clickhouse-c has no built-in FixedString column type unlike clickhouse-cpp. That work was correct when authored — the column was still live — but became orphaned a week later when #99 landed. Nobody noticed. Deletes the now-dead virtual + both impls. The CH-native side loses the 27-line FixedStringCol class entirely; the OTel side loses a 3-line forward to MakeSvCol; the interface loses one virtual declaration. Stats_exporter.cc already doesn't reference it. No behavior change — this was unreachable code. -35 LOC. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…LC/StatHC
Consolidate the StatsExporter column-factory methods around cardinality intent.
The previous Tag/Metric/Record split was a vestige of pre-OTel naming and didn't
actually map to anything semantically meaningful — every backend treated
MetricInt64 and RecordInt64 identically. Replace with StatLC* (low-cardinality,
dimension-eligible) and StatHC* (high-cardinality, value-only) variants per
type, plus a domain-specific StatTimestamp for the event timestamp.
Cardinality intent matters per-backend in ways the old naming concealed:
ClickHouse: LC -> may be stored as LowCardinality(<Type>); HC -> plain.
Schema-declared encoding wins on write; the LC hint helps
the producer pick the cheapest column representation.
Arrow IPC: LC -> DictBuilder (dictionary-encoded array); HC -> plain
typed builder. Required for batch-rate efficiency on
low-cardinality dimensions. (Honored by the upcoming
unified Arrow exporter; not yet exercised here.)
OTel: LC -> eligible as histogram dimension or metric label;
HC -> log attribute only, *never* a metric dimension
(cardinality explosion).
Interface shrinks from 14 column factories (TagString + 5 Metric* + 5 Record* +
RecordDateTime + RecordString + MetricFixedString) to 8 (4 LC + 3 HC + Timestamp),
keeping only the wire types stats_exporter.cc actually instantiates. Future
column types (Int8, UInt16, UInt32) can be added when their first caller appears.
Removed:
* MetricFixedString — dead since PR #99 retired FixedString(5) for err_sqlstate
in favor of LowCardinality(String).
* FixedStringCol class in clickhouse_exporter.cc — only used by MetricFixedString.
Rename map applied at the only call site (ExportEventStatsInternal in
stats_exporter.cc), with cardinality chosen per column based on observed data
shape rather than just type width: err_elevel (UInt8) is LC; query_id (Int64)
is HC; parallel_workers_* (Int16) is LC; duration_us (UInt64) is HC. The Db*
semantic shortcuts remain.
Net diff: -30 LOC, no behavior change. Two doc-comments updated in t/024 and
t/psch.pm. The OTel column-emission machinery itself stays alive for now — it
gets retired in a later commit alongside the new unified Arrow exporter that
will satisfy ExportEventStats for the OTel path going forward.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`MetricFixedString` was added in Feb 2026 (commit 9f89f4) when err_sqlstate was the only column in events_raw of type FixedString(5). It had exactly one caller, exactly one width, and was always going to. PR #99 (merged 2026-06-10) changed err_sqlstate from FixedString(5) to LowCardinality(String) and updated the call site from `exporter->MetricFixedString(5, "err_sqlstate")` to `exporter->TagString("err_sqlstate")`. The interface method became unreferenced. PR #104 (merged 2026-06-04 — before #99) had to reproduce MetricFixedString in clickhouse-c by hand-rolling a `FixedStringCol` class, since clickhouse-c has no built-in FixedString column type unlike clickhouse-cpp. That work was correct when authored — the column was still live — but became orphaned a week later when #99 landed. Nobody noticed. Deletes the now-dead virtual + both impls. The CH-native side loses the 27-line FixedStringCol class entirely; the OTel side loses a 3-line forward to MakeSvCol; the interface loses one virtual declaration. Stats_exporter.cc already doesn't reference it. No behavior change — this was unreachable code. -35 LOC. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…LC/StatHC
Consolidate the StatsExporter column-factory methods around cardinality intent.
The previous Tag/Metric/Record split was a vestige of pre-OTel naming and didn't
actually map to anything semantically meaningful — every backend treated
MetricInt64 and RecordInt64 identically. Replace with StatLC* (low-cardinality,
dimension-eligible) and StatHC* (high-cardinality, value-only) variants per
type, plus a domain-specific StatTimestamp for the event timestamp.
Cardinality intent matters per-backend in ways the old naming concealed:
ClickHouse: LC -> may be stored as LowCardinality(<Type>); HC -> plain.
Schema-declared encoding wins on write; the LC hint helps
the producer pick the cheapest column representation.
Arrow IPC: LC -> DictBuilder (dictionary-encoded array); HC -> plain
typed builder. Required for batch-rate efficiency on
low-cardinality dimensions. (Honored by the upcoming
unified Arrow exporter; not yet exercised here.)
OTel: LC -> eligible as histogram dimension or metric label;
HC -> log attribute only, *never* a metric dimension
(cardinality explosion).
Interface shrinks from 14 column factories (TagString + 5 Metric* + 5 Record* +
RecordDateTime + RecordString + MetricFixedString) to 8 (4 LC + 3 HC + Timestamp),
keeping only the wire types stats_exporter.cc actually instantiates. Future
column types (Int8, UInt16, UInt32) can be added when their first caller appears.
Removed:
* MetricFixedString — dead since PR #99 retired FixedString(5) for err_sqlstate
in favor of LowCardinality(String).
* FixedStringCol class in clickhouse_exporter.cc — only used by MetricFixedString.
Rename map applied at the only call site (ExportEventStatsInternal in
stats_exporter.cc), with cardinality chosen per column based on observed data
shape rather than just type width: err_elevel (UInt8) is LC; query_id (Int64)
is HC; parallel_workers_* (Int16) is LC; duration_us (UInt64) is HC. The Db*
semantic shortcuts remain.
Net diff: -30 LOC, no behavior change. Two doc-comments updated in t/024 and
t/psch.pm. The OTel column-emission machinery itself stays alive for now — it
gets retired in a later commit alongside the new unified Arrow exporter that
will satisfy ExportEventStats for the OTel path going forward.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…LC/StatHC
Consolidate the StatsExporter column-factory methods around cardinality intent.
The previous Tag/Metric/Record split was a vestige of pre-OTel naming and didn't
actually map to anything semantically meaningful — every backend treated
MetricInt64 and RecordInt64 identically. Replace with StatLC* (low-cardinality,
dimension-eligible) and StatHC* (high-cardinality, value-only) variants per
type, plus a domain-specific StatTimestamp for the event timestamp.
Cardinality intent matters per-backend in ways the old naming concealed:
ClickHouse: LC -> may be stored as LowCardinality(<Type>); HC -> plain.
Schema-declared encoding wins on write; the LC hint helps
the producer pick the cheapest column representation.
Arrow IPC: LC -> DictBuilder (dictionary-encoded array); HC -> plain
typed builder. Required for batch-rate efficiency on
low-cardinality dimensions. (Honored by the upcoming
unified Arrow exporter; not yet exercised here.)
OTel: LC -> eligible as histogram dimension or metric label;
HC -> log attribute only, *never* a metric dimension
(cardinality explosion).
Interface shrinks from 14 column factories (TagString + 5 Metric* + 5 Record* +
RecordDateTime + RecordString + MetricFixedString) to 8 (4 LC + 3 HC + Timestamp),
keeping only the wire types stats_exporter.cc actually instantiates. Future
column types (Int8, UInt16, UInt32) can be added when their first caller appears.
Removed:
* MetricFixedString — dead since PR #99 retired FixedString(5) for err_sqlstate
in favor of LowCardinality(String).
* FixedStringCol class in clickhouse_exporter.cc — only used by MetricFixedString.
Rename map applied at the only call site (ExportEventStatsInternal in
stats_exporter.cc), with cardinality chosen per column based on observed data
shape rather than just type width: err_elevel (UInt8) is LC; query_id (Int64)
is HC; parallel_workers_* (Int16) is LC; duration_us (UInt64) is HC. The Db*
semantic shortcuts remain.
Net diff: -30 LOC, no behavior change. Two doc-comments updated in t/024 and
t/psch.pm. The OTel column-emission machinery itself stays alive for now — it
gets retired in a later commit alongside the new unified Arrow exporter that
will satisfy ExportEventStats for the OTel path going forward.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…LC/StatHC
Consolidate the StatsExporter column-factory methods around cardinality intent.
The previous Tag/Metric/Record split was a vestige of pre-OTel naming and didn't
actually map to anything semantically meaningful — every backend treated
MetricInt64 and RecordInt64 identically. Replace with StatLC* (low-cardinality,
dimension-eligible) and StatHC* (high-cardinality, value-only) variants per
type, plus a domain-specific StatTimestamp for the event timestamp.
Cardinality intent matters per-backend in ways the old naming concealed:
ClickHouse: LC -> may be stored as LowCardinality(<Type>); HC -> plain.
Schema-declared encoding wins on write; the LC hint helps
the producer pick the cheapest column representation.
Arrow IPC: LC -> DictBuilder (dictionary-encoded array); HC -> plain
typed builder. Required for batch-rate efficiency on
low-cardinality dimensions. (Honored by the upcoming
unified Arrow exporter; not yet exercised here.)
OTel: LC -> eligible as histogram dimension or metric label;
HC -> log attribute only, *never* a metric dimension
(cardinality explosion).
Interface shrinks from 14 column factories (TagString + 5 Metric* + 5 Record* +
RecordDateTime + RecordString + MetricFixedString) to 8 (4 LC + 3 HC + Timestamp),
keeping only the wire types stats_exporter.cc actually instantiates. Future
column types (Int8, UInt16, UInt32) can be added when their first caller appears.
Removed:
* MetricFixedString — dead since PR #99 retired FixedString(5) for err_sqlstate
in favor of LowCardinality(String).
* FixedStringCol class in clickhouse_exporter.cc — only used by MetricFixedString.
Rename map applied at the only call site (ExportEventStatsInternal in
stats_exporter.cc), with cardinality chosen per column based on observed data
shape rather than just type width: err_elevel (UInt8) is LC; query_id (Int64)
is HC; parallel_workers_* (Int16) is LC; duration_us (UInt64) is HC. The Db*
semantic shortcuts remain.
Net diff: -30 LOC, no behavior change. Two doc-comments updated in t/024 and
t/psch.pm. The OTel column-emission machinery itself stays alive for now — it
gets retired in a later commit alongside the new unified Arrow exporter that
will satisfy ExportEventStats for the OTel path going forward.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…LC/StatHC (#115) Consolidate the StatsExporter column-factory methods around cardinality intent. The previous Tag/Metric/Record split was a vestige of pre-OTel naming and didn't actually map to anything semantically meaningful — every backend treated MetricInt64 and RecordInt64 identically. Replace with StatLC* (low-cardinality, dimension-eligible) and StatHC* (high-cardinality, value-only) variants per type, plus a domain-specific StatTimestamp for the event timestamp. Cardinality intent matters per-backend in ways the old naming concealed: ClickHouse: LC -> may be stored as LowCardinality(<Type>); HC -> plain. Schema-declared encoding wins on write; the LC hint helps the producer pick the cheapest column representation. Arrow IPC: LC -> DictBuilder (dictionary-encoded array); HC -> plain typed builder. Required for batch-rate efficiency on low-cardinality dimensions. (Honored by the upcoming unified Arrow exporter; not yet exercised here.) OTel: LC -> eligible as histogram dimension or metric label; HC -> log attribute only, *never* a metric dimension (cardinality explosion). Interface shrinks from 14 column factories (TagString + 5 Metric* + 5 Record* + RecordDateTime + RecordString + MetricFixedString) to 8 (4 LC + 3 HC + Timestamp), keeping only the wire types stats_exporter.cc actually instantiates. Future column types (Int8, UInt16, UInt32) can be added when their first caller appears. Removed: * MetricFixedString — dead since PR #99 retired FixedString(5) for err_sqlstate in favor of LowCardinality(String). * FixedStringCol class in clickhouse_exporter.cc — only used by MetricFixedString. Rename map applied at the only call site (ExportEventStatsInternal in stats_exporter.cc), with cardinality chosen per column based on observed data shape rather than just type width: err_elevel (UInt8) is LC; query_id (Int64) is HC; parallel_workers_* (Int16) is LC; duration_us (UInt64) is HC. The Db* semantic shortcuts remain. Net diff: -30 LOC, no behavior change. Two doc-comments updated in t/024 and t/psch.pm. The OTel column-emission machinery itself stays alive for now — it gets retired in a later commit alongside the new unified Arrow exporter that will satisfy ExportEventStats for the OTel path going forward. Note that the `Tag...` methods were also unceremoniously renamed in this commit to look like everything else... that's because they silently started behaving like everything else as of #72. --- Two unrelated doc fixes flagged by Copilot's review on #115: exporter_interface.h: StatTimestamp's doc said "Postgres-epoch microsecond timestamp" but all current callers convert to Unix-epoch by adding kPostgresEpochOffsetUs before append. CH DateTime64(6) and OTel time_unix_nano both interpret the wire value as Unix-epoch. Clarify the contract to describe what the column wants on the wire, not what shape the input data happens to be in — defends against a future caller passing raw PG-epoch values and getting wrong-by-30- years timestamps. t/024_otel_export.pl: comment on subtest 'metric labels populated' described producer-side metric promotion that hasn't existed since PR #72 ripped the OTel SDK out. The producer-side OTel exporter emits OTLP log attributes only; the test's Prometheus assertions succeed because the downstream OTel collector's log-to-metric processor promotes specific log attributes to histogram labels. Reword to describe the actual flow. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…_raw
Closes the test gap that's existed since the Arrow path went live: no
existing test proves that pg_stat_ch's Arrow IPC output can actually be
ingested by ClickHouse against the unified events_raw schema. t/026
asserts on the IPC schema shape via pyarrow but never pushes the bytes
into CH; t/010 etc. exercise the CH-native Block path, not Arrow.
The new test wires the full producer-to-CH chain locally, bypassing
the OTel collector + receiver service entirely:
1. Spin up a node with use_unified_arrow_exporter=on +
debug_arrow_dump_dir set, an OTel endpoint that doesn't resolve so
gRPC send fails — MaybeDumpArrowBatch fires BEFORE send so IPC
files land on disk regardless.
2. Run a deliberately-shaped workload (SELECT, CREATE, INSERT,
SELECT count, DROP — five distinct statements).
3. Force pg_stat_ch_flush(), wait for IPC files in $dump_dir.
4. TRUNCATE pg_stat_ch.events_raw, then for each IPC file:
curl -X POST --data-binary @$f \
'http://localhost:18123/?query=INSERT INTO pg_stat_ch.events_raw FORMAT ArrowStream'
A type mismatch on the wire (e.g. if the producer regressed to
writing query_id as String) would surface here as a 4xx with a
clear error rather than silently corrupting data.
5. SELECT count() FROM events_raw, assert >= 5 rows.
6. Pull system.columns and assert each id/counter column has the
declared type from PR #99's schema (no silent string-typed regressions).
7. Pinpoint the marker SELECT row and assert db_name/db_operation/
query_text values match what we sent.
8. Assert envelope columns (instance_ubid, server_role, region, cell,
read_replica_type) carry the values from pg_stat_ch.extra_attributes.
9. Assert parent_query_id is 0 across all rows (synthesized by the
exporter until PR #95 lands).
Skips cleanly when Docker / the test CH container / the events_raw
schema aren't available — same patterns as t/010, t/013, t/021.
The "no OTel collector required" property makes this test purely a
producer⇄CH wire-format check. The clickgres-platform Go receiver is
not exercised here, since for verifying that the bytes match the
schema, a curl invocation is the simplest possible expression of "POST
this Arrow IPC body to CH" — the receiver's only added value over
curl in prod is OTel-collector pipeline integration, which we don't
care about for wire-format correctness.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…to-end Introduce a new exporter that builds Arrow IPC RecordBatches through the typed StatsExporter column-factory interface (StatLC/StatHC/StatTimestamp) instead of the open-coded ArrowBatchBuilder used by arrow_batch.cc. Composition over inheritance: the new exporter holds an OTelExporter for gRPC transport (SendArrowBatch) but doesn't extend it, so the per-row LogRecord state machine in OTelExporter — which is unused on this path post-PR-#72 — stays out of scope. Wire shape targets events_raw (the unified schema authored in PR #99), not the legacy query_logs_arrow: * query_id, parent_query_id: Int64 (no sprintf decimal-string encoding) * pid: Int32 * err_elevel: UInt8 * buffer counters (shared/local/temp_blks_*, *_blk_*_time_us, wal_*, cpu_*_time_us): Int64 * parallel_workers_planned/launched: Int16 * jit_*: Int32 * LC strings (db_*, err_sqlstate, app, server_role, region, cell, service_version, read_replica_type) -> DictionaryUtf8 * HC strings (query_text, err_message, client_addr, instance_ubid, server_ubid, host_id, pod_name) -> plain utf8 * ts: arrow::timestamp(MICRO, "UTC") matching DateTime64(6, 'UTC') Column<T> wrappers are nested private types inside OTelArrowExporter (not at namespace scope) so they can inherit from the protected Column<T> base — same convention OTelExporter and ClickHouseExporter use for their own column types. Columns the caller doesn't explicitly populate are synthesized in BeginRow by the exporter itself, so stats_exporter.cc's column-emission loop stays unchanged: * parent_query_id (hardcoded 0 until PR #95 lands and PschEvent carries the field — events_raw requires the column on every insert, no DEFAULT) * 8 envelope columns from pg_stat_ch.extra_attributes (instance_ubid, server_ubid, server_role, region, cell, host_id, pod_name) plus read_replica_type (default 'none' if extra_attributes didn't supply) * service_version pinned to the compile-time PG_STAT_CH_VERSION macro This commit only adds the exporter file (no dispatcher wiring yet) — the next commit adds the GUC and routes batches through it when on. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…_raw
Closes the test gap that's existed since the Arrow path went live: no
existing test proves that pg_stat_ch's Arrow IPC output can actually be
ingested by ClickHouse against the unified events_raw schema. t/026
asserts on the IPC schema shape via pyarrow but never pushes the bytes
into CH; t/010 etc. exercise the CH-native Block path, not Arrow.
The new test wires the full producer-to-CH chain locally, bypassing
the OTel collector + receiver service entirely:
1. Spin up a node with use_unified_arrow_exporter=on +
debug_arrow_dump_dir set, an OTel endpoint that doesn't resolve so
gRPC send fails — MaybeDumpArrowBatch fires BEFORE send so IPC
files land on disk regardless.
2. Run a deliberately-shaped workload (SELECT, CREATE, INSERT,
SELECT count, DROP — five distinct statements).
3. Force pg_stat_ch_flush(), wait for IPC files in $dump_dir.
4. TRUNCATE pg_stat_ch.events_raw, then for each IPC file:
curl -X POST --data-binary @$f \
'http://localhost:18123/?query=INSERT INTO pg_stat_ch.events_raw FORMAT ArrowStream'
A type mismatch on the wire (e.g. if the producer regressed to
writing query_id as String) would surface here as a 4xx with a
clear error rather than silently corrupting data.
5. SELECT count() FROM events_raw, assert >= 5 rows.
6. Pull system.columns and assert each id/counter column has the
declared type from PR #99's schema (no silent string-typed regressions).
7. Pinpoint the marker SELECT row and assert db_name/db_operation/
query_text values match what we sent.
8. Assert envelope columns (instance_ubid, server_role, region, cell,
read_replica_type) carry the values from pg_stat_ch.extra_attributes.
9. Assert parent_query_id is 0 across all rows (synthesized by the
exporter until PR #95 lands).
Skips cleanly when Docker / the test CH container / the events_raw
schema aren't available — same patterns as t/010, t/013, t/021.
The "no OTel collector required" property makes this test purely a
producer⇄CH wire-format check. The clickgres-platform Go receiver is
not exercised here, since for verifying that the bytes match the
schema, a curl invocation is the simplest possible expression of "POST
this Arrow IPC body to CH" — the receiver's only added value over
curl in prod is OTel-collector pipeline integration, which we don't
care about for wire-format correctness.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…_raw
Closes the test gap that's existed since the Arrow path went live: no
existing test proves that pg_stat_ch's Arrow IPC output can actually be
ingested by ClickHouse against the unified events_raw schema. t/026
asserts on the IPC schema shape via pyarrow but never pushes the bytes
into CH; t/010 etc. exercise the CH-native Block path, not Arrow.
The new test wires the full producer-to-CH chain locally, bypassing
the OTel collector + receiver service entirely:
1. Spin up a node with use_unified_arrow_exporter=on +
debug_arrow_dump_dir set, an OTel endpoint that doesn't resolve so
gRPC send fails — MaybeDumpArrowBatch fires BEFORE send so IPC
files land on disk regardless.
2. Run a deliberately-shaped workload (SELECT, CREATE, INSERT,
SELECT count, DROP — five distinct statements).
3. Force pg_stat_ch_flush(), wait for IPC files in $dump_dir.
4. TRUNCATE pg_stat_ch.events_raw, then for each IPC file:
curl -X POST --data-binary @$f \
'http://localhost:18123/?query=INSERT INTO pg_stat_ch.events_raw FORMAT ArrowStream'
A type mismatch on the wire (e.g. if the producer regressed to
writing query_id as String) would surface here as a 4xx with a
clear error rather than silently corrupting data.
5. SELECT count() FROM events_raw, assert >= 5 rows.
6. Pull system.columns and assert each id/counter column has the
declared type from PR #99's schema (no silent string-typed regressions).
7. Pinpoint the marker SELECT row and assert db_name/db_operation/
query_text values match what we sent.
8. Assert envelope columns (instance_ubid, server_role, region, cell,
read_replica_type) carry the values from pg_stat_ch.extra_attributes.
9. Assert parent_query_id is 0 across all rows (synthesized by the
exporter until PR #95 lands).
Skips cleanly when Docker / the test CH container / the events_raw
schema aren't available — same patterns as t/010, t/013, t/021.
The "no OTel collector required" property makes this test purely a
producer⇄CH wire-format check. The clickgres-platform Go receiver is
not exercised here, since for verifying that the bytes match the
schema, a curl invocation is the simplest possible expression of "POST
this Arrow IPC body to CH" — the receiver's only added value over
curl in prod is OTel-collector pipeline integration, which we don't
care about for wire-format correctness.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…to-end Introduce a new exporter that builds Arrow IPC RecordBatches through the typed StatsExporter column-factory interface (StatLC/StatHC/StatTimestamp) instead of the open-coded ArrowBatchBuilder used by arrow_batch.cc. Composition over inheritance: the new exporter holds an OTelExporter for gRPC transport (SendArrowBatch) but doesn't extend it, so the per-row LogRecord state machine in OTelExporter — which is unused on this path post-PR-#72 — stays out of scope. Wire shape targets events_raw (the unified schema authored in PR #99), not the legacy query_logs_arrow: * query_id, parent_query_id: Int64 (no sprintf decimal-string encoding) * pid: Int32 * err_elevel: UInt8 * buffer counters (shared/local/temp_blks_*, *_blk_*_time_us, wal_*, cpu_*_time_us): Int64 * parallel_workers_planned/launched: Int16 * jit_*: Int32 * LC strings (db_*, err_sqlstate, app, server_role, region, cell, service_version, read_replica_type) -> DictionaryUtf8 * HC strings (query_text, err_message, client_addr, instance_ubid, server_ubid, host_id, pod_name) -> plain utf8 * ts: arrow::timestamp(MICRO, "UTC") matching DateTime64(6, 'UTC') Column<T> wrappers are nested private types inside OTelArrowExporter (not at namespace scope) so they can inherit from the protected Column<T> base — same convention OTelExporter and ClickHouseExporter use for their own column types. Columns the caller doesn't explicitly populate are synthesized in BeginRow by the exporter itself, so stats_exporter.cc's column-emission loop stays unchanged: * parent_query_id (hardcoded 0 until PR #95 lands and PschEvent carries the field — events_raw requires the column on every insert, no DEFAULT) * 8 envelope columns from pg_stat_ch.extra_attributes (instance_ubid, server_ubid, server_role, region, cell, host_id, pod_name) plus read_replica_type (default 'none' if extra_attributes didn't supply) * service_version pinned to the compile-time PG_STAT_CH_VERSION macro This commit only adds the exporter file (no dispatcher wiring yet) — the next commit adds the GUC and routes batches through it when on. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…_raw
Closes the test gap that's existed since the Arrow path went live: no
existing test proves that pg_stat_ch's Arrow IPC output can actually be
ingested by ClickHouse against the unified events_raw schema. t/026
asserts on the IPC schema shape via pyarrow but never pushes the bytes
into CH; t/010 etc. exercise the CH-native Block path, not Arrow.
The new test wires the full producer-to-CH chain locally, bypassing
the OTel collector + receiver service entirely:
1. Spin up a node with use_unified_arrow_exporter=on +
debug_arrow_dump_dir set, an OTel endpoint that doesn't resolve so
gRPC send fails — MaybeDumpArrowBatch fires BEFORE send so IPC
files land on disk regardless.
2. Run a deliberately-shaped workload (SELECT, CREATE, INSERT,
SELECT count, DROP — five distinct statements).
3. Force pg_stat_ch_flush(), wait for IPC files in $dump_dir.
4. TRUNCATE pg_stat_ch.events_raw, then for each IPC file:
curl -X POST --data-binary @$f \
'http://localhost:18123/?query=INSERT INTO pg_stat_ch.events_raw FORMAT ArrowStream'
A type mismatch on the wire (e.g. if the producer regressed to
writing query_id as String) would surface here as a 4xx with a
clear error rather than silently corrupting data.
5. SELECT count() FROM events_raw, assert >= 5 rows.
6. Pull system.columns and assert each id/counter column has the
declared type from PR #99's schema (no silent string-typed regressions).
7. Pinpoint the marker SELECT row and assert db_name/db_operation/
query_text values match what we sent.
8. Assert envelope columns (instance_ubid, server_role, region, cell,
read_replica_type) carry the values from pg_stat_ch.extra_attributes.
9. Assert parent_query_id is 0 across all rows (synthesized by the
exporter until PR #95 lands).
Skips cleanly when Docker / the test CH container / the events_raw
schema aren't available — same patterns as t/010, t/013, t/021.
The "no OTel collector required" property makes this test purely a
producer⇄CH wire-format check. The clickgres-platform Go receiver is
not exercised here, since for verifying that the bytes match the
schema, a curl invocation is the simplest possible expression of "POST
this Arrow IPC body to CH" — the receiver's only added value over
curl in prod is OTel-collector pipeline integration, which we don't
care about for wire-format correctness.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Summary
Pre-GA unification of pg_stat_ch's ClickHouse schema. The docker quickstart schema (
docker/init/00-schema.sql) and the production Arrow receiver schema (datagres_otel.query_logs_arrowin clickgres-platform) had drifted apart on both column naming and column types. This PR makes pg_stat_ch the source of truth for the unified shape and moves the canonical schema into a Goose migrations layout underschema/migrations/matching the runner clickgres-platform already uses (pressly/goose v3,DialectClickHouse,embed.FS).Two commits, structured so git rename detection can follow the file's evolution cleanly:
Commit 1 —
9c39ecd: in-place unification + tests passIn-place edits to
docker/init/00-schema.sqlso the docker quickstart schema matches what prod actually writes to, plus the CH-native exporter and TAP tests updated to use the new column names.Column renames (prod-side wins — closer to OTel semantic conventions, minimizes downstream churn):
ts_start→tsdb→db_nameusername→db_usercmd_type→db_operationquery→query_textType fix:
err_sqlstate FixedString(5)→LowCardinality(String). FixedString doesn't round-trip through Arrow IPC cleanly, and ~270 SQLSTATE codes are dictionary-friendly. CH-native exporter switches fromMetricFixedString(5, …)toTagString(…)(clickhouse-cppColumnString→ CHLowCardinality(String)is fine on the wire).Envelope columns added with
DEFAULT ''so the CH-native exporter (which doesn't yet emit these) still inserts successfully:instance_ubid,server_ubid,server_role,region,cell,service_version,host_id,pod_name.Engine/partitioning aligned with prod:
ORDER BY ts→ORDER BY (instance_ubid, ts)(tenant locality)TTL toDate(ts) + INTERVAL 180 DAYSETTINGS index_granularity = 8192, ttl_only_drop_parts = 1All four MVs (
events_recent_1h,query_stats_5m,db_app_user_1m,errors_recent) updated to reference the new column names and to includeinstance_ubidin theirORDER BY/GROUP BY/SELECTprojections.parent_query_idis intentionally not included here — it belongs to PR #95 (parent-query-id-surgical) and will land as its own follow-up migration inschema/migrations/after this PR.Commit 2 —
299115b: rename + goose annotationsgit mv docker/init/00-schema.sql schema/migrations/20260519000001_create_initial_schema.sql(96% similarity per git's rename detection), with:-- +goose Up/-- +goose Downsection markers added.CREATEwrapped in-- +goose StatementBegin/StatementEnd.CREATEDROP TABLE IF EXISTS Xidioms removed — those existed for docker init idempotency on restart; goose tracks state viagoose_db_version. Drops live exclusively in the-- +goose Downsection in reverse dependency order.Also adds
schema/migrations/00000000000001_bootstrap.sql, a no-opSELECT 1migration required by goose to seed its version table (copied verbatim from clickgres-platform's bootstrap).What's been validated locally
docker/init/00-schema.sql(commit 1) applies cleanly onclickhouse/clickhouse-server:26.1(the version pinned indocker/docker-compose.test.yml); all 51 columns and 4 MVs land with the expected types.DEFAULT ''lets INSERTs that omit them succeed.schema/migrations/(commit 2) round-trips clean viagoose v3.27.1 upandgoose resetagainst CH 26.1.Out of scope (follow-on PRs)
docker/init/anddocker-compose.test.ymlto run goose-up at container start. Currentlydocker/init/00-schema.sqlis gone, so the docker quickstart and the test compose need a small shim to apply migrations fromschema/migrations/(clickhouse-server's docker entrypoint can't parse-- +goose Up/Downdirectly).query_logs_arrowto the unifiedevents_raw, including historical backfill viaINSERT INTO events_raw SELECT … FROM query_logs_arrowwith explicit casts for the renamed/retyped columns.schema/migrations/<ts>_add_parent_query_id.sqlfor the column it introduces.Test plan
🤖 Generated with Claude Code
Note
High Risk
Breaking ClickHouse schema and INSERT column contract for any deployment still on the old
docker/initSQL; existing data and dashboards need coordinated migration. Core export path and all CH integration tests depend on the new shape landing correctly.Overview
Moves the canonical ClickHouse schema into Goose migrations under
schema/migrations/(bootstrap + initial migration) and changes CI TAP setup to installgooseand apply migrations instead of pipingdocker/init/00-schema.sql.docker/init/is kept as an empty bind-mount placeholder via.gitkeep.The initial migration unifies
events_rawwith the production Arrow shape: renames core columns (ts_start→ts,db→db_name,username→db_user,cmd_type→db_operation,query→query_text), changeserr_sqlstatetoLowCardinality(String), adds OTel envelope columns withDEFAULT '', and aligns ORDER BY to(instance_ubid, ts)plus TTL on raw data. All four materialized views are updated for the new names and tenant locality.The ClickHouse native exporter and TAP tests are updated to insert/query the new column names;
err_sqlstateis exported viaTagStringinstead of fixed-width metrics.Reviewed by Cursor Bugbot for commit 5934589. Bugbot is set up for automated code reviews on this repo. Configure here.