refactor(exporter): rename Tag/Metric/Record column interface to StatLC/StatHC#115
Conversation
212a7f0 to
50bca23
Compare
a145a5c to
73d41e0
Compare
There was a problem hiding this comment.
Pull request overview
Refactors the StatsExporter column-factory interface to replace the legacy Tag* / Metric* / Record* split with explicit low- vs high-cardinality factories (StatLC* / StatHC*) plus StatTimestamp, and updates the shared export call sites and exporter backends accordingly.
Changes:
- Renamed and collapsed the exporter column-factory interface to cardinality-typed methods (
StatLC*,StatHC*,StatTimestamp). - Updated
stats_exporter.ccto declare LC/HC intent per exported field at the call sites. - Updated ClickHouse and OTel exporter implementations (and TAP comments) to match the new interface naming.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| t/psch.pm | Updates TAP helper comment to refer to StatLCString. |
| t/024_otel_export.pl | Updates test comment describing LC string → metric label behavior. |
| src/export/stats_exporter.cc | Updates column factory calls to the new StatLC* / StatHC* / StatTimestamp API. |
| src/export/otel_exporter.cc | Replaces old factory methods with new cardinality-typed ones; adds rationale comment about LC/HC collapsing for log attributes. |
| src/export/exporter_interface.h | Defines the new cardinality-typed interface and updates semantic column documentation. |
| src/export/clickhouse_exporter.cc | Implements the new interface methods and updates semantic column helpers to call them. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| virtual shared_ptr<Column<uint64_t>> RecordUInt64(string_view name) = 0; | ||
| virtual shared_ptr<Column<int64_t>> RecordDateTime(string_view name) = 0; | ||
| virtual shared_ptr<Column<string_view>> RecordString(string_view name) = 0; | ||
| // Domain-specific. Caller appends a Postgres-epoch microsecond timestamp. |
73d41e0 to
b0af1b7
Compare
…LC/StatHC
Consolidate the StatsExporter column-factory methods around cardinality intent.
The previous Tag/Metric/Record split was a vestige of pre-OTel naming and didn't
actually map to anything semantically meaningful — every backend treated
MetricInt64 and RecordInt64 identically. Replace with StatLC* (low-cardinality,
dimension-eligible) and StatHC* (high-cardinality, value-only) variants per
type, plus a domain-specific StatTimestamp for the event timestamp.
Cardinality intent matters per-backend in ways the old naming concealed:
ClickHouse: LC -> may be stored as LowCardinality(<Type>); HC -> plain.
Schema-declared encoding wins on write; the LC hint helps
the producer pick the cheapest column representation.
Arrow IPC: LC -> DictBuilder (dictionary-encoded array); HC -> plain
typed builder. Required for batch-rate efficiency on
low-cardinality dimensions. (Honored by the upcoming
unified Arrow exporter; not yet exercised here.)
OTel: LC -> eligible as histogram dimension or metric label;
HC -> log attribute only, *never* a metric dimension
(cardinality explosion).
Interface shrinks from 14 column factories (TagString + 5 Metric* + 5 Record* +
RecordDateTime + RecordString + MetricFixedString) to 8 (4 LC + 3 HC + Timestamp),
keeping only the wire types stats_exporter.cc actually instantiates. Future
column types (Int8, UInt16, UInt32) can be added when their first caller appears.
Removed:
* MetricFixedString — dead since PR #99 retired FixedString(5) for err_sqlstate
in favor of LowCardinality(String).
* FixedStringCol class in clickhouse_exporter.cc — only used by MetricFixedString.
Rename map applied at the only call site (ExportEventStatsInternal in
stats_exporter.cc), with cardinality chosen per column based on observed data
shape rather than just type width: err_elevel (UInt8) is LC; query_id (Int64)
is HC; parallel_workers_* (Int16) is LC; duration_us (UInt64) is HC. The Db*
semantic shortcuts remain.
Net diff: -30 LOC, no behavior change. Two doc-comments updated in t/024 and
t/psch.pm. The OTel column-emission machinery itself stays alive for now — it
gets retired in a later commit alongside the new unified Arrow exporter that
will satisfy ExportEventStats for the OTel path going forward.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
b0af1b7 to
ba7a9f6
Compare
| # Test 4: Metric labels populated | ||
| # Tags (db, username) appear as Prometheus labels on histogram metrics. | ||
| # The OTel exporter maps TagString columns to both log attributes and metric tags. | ||
| # The OTel exporter maps StatLCString columns to both log attributes and metric tags. |
f668092 to
1a8a5ac
Compare
1a8a5ac to
ba7a9f6
Compare
Two unrelated doc fixes flagged by Copilot's review on #115: exporter_interface.h: StatTimestamp's doc said "Postgres-epoch microsecond timestamp" but all current callers convert to Unix-epoch by adding kPostgresEpochOffsetUs before append. CH DateTime64(6) and OTel time_unix_nano both interpret the wire value as Unix-epoch. Clarify the contract to describe what the column wants on the wire, not what shape the input data happens to be in — defends against a future caller passing raw PG-epoch values and getting wrong-by-30- years timestamps. t/024_otel_export.pl: comment on subtest 'metric labels populated' described producer-side metric promotion that hasn't existed since PR #72 ripped the OTel SDK out. The producer-side OTel exporter emits OTLP log attributes only; the test's Prometheus assertions succeed because the downstream OTel collector's log-to-metric processor promotes specific log attributes to histogram labels. Reword to describe the actual flow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Forgive my floundering... noticed that tags had been renamed as if they act like everything else, which would be an error, except they DO act like everything else because the arrow migration broke the mechanism of treating them like OTel dimensions. Further comments on the matter reserved, with angst. Applied Copilot suggestions; merging. |
Summary
Rename the
StatsExportercolumn-factory interface fromTag*/Metric*/Record*(three categories, mostly vestigial) toStatLC*/StatHC*/StatTimestamp(two categories, principled). Cardinality intent (low vs high) becomes a first-class part of the interface, and each backend honors it appropriately.Stacked on #114 — that PR is the prerequisite FixedString cleanup. Retarget to
mainafter #114 merges.Net diff vs main (both commits): 6 files, -30 LOC. The rename itself is roughly net-zero; the savings come from collapsing the redundant Metric/Record duplication.
Why
The previous
Tag/Metric/Recordsplit was a vestige of pre-OTel naming that didn't actually map to anything semantically meaningful at runtime. The OTel exporter treatedMetricInt64andRecordInt64identically (both became log attributes). The CH-native exporter treated them identically as well (bothclickhouse::ColumnInt64). Same C++ type, three category names, indistinguishable behavior.The dimension that does matter — cardinality — wasn't represented in the interface at all. Authors had to guess via convention (e.g. "use TagString for things that look LC-ish; use RecordString for everything else"). Each backend then re-derived the cardinality intent (or failed to). PR #67 (
AppendLabels) hit this hard: the labels column ends up serialized to JSON on the CH side and fanned out to N OTel attributes on the OTel side, with no shared shape — because the interface couldn't express "this is a map whose values are high-cardinality."Replacing the trichotomy with
StatLC/StatHCmakes cardinality intent explicit at the interface boundary. Each backend interprets it according to its own type system:LowCardinality(<Type>)if schema declared it; otherwise plain. The hint helps the producer pick the cheapest column representation.DictBuilder(dictionary-encoded array). Required for batch-rate efficiency on low-cardinality dimensions. Honored by the upcoming unified Arrow exporter — not yet exercised in this PR.Interface shape
Interface shrinks from 14 column factories (TagString + 5 Metric* + 5 Record* + RecordDateTime + RecordString + MetricFixedString — the latter retired in #114) to 8 (4 LC + 3 HC + Timestamp), keeping only the wire types stats_exporter.cc actually instantiates:
The
Db*Columnsemantic shortcuts (DbNameColumn, DbUserColumn, DbDurationColumn, DbOperationColumn, DbQueryTextColumn) remain — they encode OTel semconv naming that differs between backends.Cardinality decisions per column
Cardinality is a property of the data, not the type width. The rename pass commits to per-column declarations at the call sites in
stats_exporter.cc, chosen by observed data shape:tspidquery_id,parent_query_id, all*blks_*,*blk_*time_us,wal_*(except wal_bytes),cpu_*_time_usrows,wal_bytesjit_*parallel_workers_*err_elevelerr_sqlstateapperr_message,client_addrquery_textdb_name,db_user,db_operationThe CH-native exporter implements LC and HC identically for ints (clickhouse-c speaks plain typed columns; the server-side
LowCardinality(<Type>)wrap in the schema is what applies dictionary encoding). The OTel exporter also collapses LC and HC at this layer — OTel logs are flat attribute bags, and the dimension-vs-attribute distinction is made downstream by metric processors. The new unified Arrow exporter (coming separately) is where the LC/HC distinction materially changes the wire shape (DictBuildervs plain builder). The rename is preparing the call sites for that.What stays untouched
AppendLabels(ParseResult)virtual is a separate problem to solve later (probably with a newAppendStringMapprimitive that's explicit about key cardinality).OTelExporter— IntColumn, SvColumn, StrColumn, DateTimeCol, DurationCol classes + the BeginBatch/BeginRow/FlushChunk loop. ~201 LOC of dead-in-prod code that gets retired in a follow-up PR alongside the new unified Arrow exporter.arrow_batch.ccandOTelExporter::SendArrowBatch— the production Arrow path. Stays alive until the GUC-gated cutover happens and prod has flipped.Test plan
🤖 Generated with Claude Code
Note
Low Risk
Mechanical API rename with no intended wire-format change for ClickHouse or OTel; risk is mainly missing call-site updates or future backends misinterpreting LC/HC hints.
Overview
Replaces the
StatsExporterTag / Metric / Record column factories with StatLC*, StatHC*, and StatTimestamp, so each field declares low vs high cardinality intent at the call site—not just a C++ wire type.exporter_interface.hdocuments how ClickHouse, Arrow IPC, and OTel are expected to honor LC vs HC;ClickHouseExporterandOTelExporterimplement the slimmer virtual surface (with comments that LC/HC is identical on the wire for those backends today).ExportEventStatsInternalinstats_exporter.ccis updated field-by-field (e.g.pid→StatLCInt32, block/WAL counters →StatHCInt64,err_sqlstate→StatLCString,query_text/client_addr→ HC strings).TAP comments in
t/024_otel_export.plandt/psch.pmnow describe StatLCString and clarify that Prometheus labels come from the collector’s log-to-metric path, not the producer exporter.Reviewed by Cursor Bugbot for commit cac0f0c. Bugbot is set up for automated code reviews on this repo. Configure here.