Skip to content

refactor(exporter): rename Tag/Metric/Record column interface to StatLC/StatHC#115

Merged
JoshDreamland merged 2 commits into
mainfrom
arrow_exporter_unification
Jun 18, 2026
Merged

refactor(exporter): rename Tag/Metric/Record column interface to StatLC/StatHC#115
JoshDreamland merged 2 commits into
mainfrom
arrow_exporter_unification

Conversation

@JoshDreamland

@JoshDreamland JoshDreamland commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Summary

Rename the StatsExporter column-factory interface from Tag*/Metric*/Record* (three categories, mostly vestigial) to StatLC*/StatHC*/StatTimestamp (two categories, principled). Cardinality intent (low vs high) becomes a first-class part of the interface, and each backend honors it appropriately.

Stacked on #114 — that PR is the prerequisite FixedString cleanup. Retarget to main after #114 merges.

Net diff vs main (both commits): 6 files, -30 LOC. The rename itself is roughly net-zero; the savings come from collapsing the redundant Metric/Record duplication.

Why

The previous Tag / Metric / Record split was a vestige of pre-OTel naming that didn't actually map to anything semantically meaningful at runtime. The OTel exporter treated MetricInt64 and RecordInt64 identically (both became log attributes). The CH-native exporter treated them identically as well (both clickhouse::ColumnInt64). Same C++ type, three category names, indistinguishable behavior.

The dimension that does matter — cardinality — wasn't represented in the interface at all. Authors had to guess via convention (e.g. "use TagString for things that look LC-ish; use RecordString for everything else"). Each backend then re-derived the cardinality intent (or failed to). PR #67 (AppendLabels) hit this hard: the labels column ends up serialized to JSON on the CH side and fanned out to N OTel attributes on the OTel side, with no shared shape — because the interface couldn't express "this is a map whose values are high-cardinality."

Replacing the trichotomy with StatLC / StatHC makes cardinality intent explicit at the interface boundary. Each backend interprets it according to its own type system:

Backend LC handling HC handling
ClickHouse LowCardinality(<Type>) if schema declared it; otherwise plain. The hint helps the producer pick the cheapest column representation. Plain typed column.
Arrow IPC DictBuilder (dictionary-encoded array). Required for batch-rate efficiency on low-cardinality dimensions. Honored by the upcoming unified Arrow exporter — not yet exercised in this PR. Plain typed builder.
OTel Eligible as histogram dimension / metric label / structured log attribute. Log attribute only; never a metric dimension (cardinality explosion).

Interface shape

Interface shrinks from 14 column factories (TagString + 5 Metric* + 5 Record* + RecordDateTime + RecordString + MetricFixedString — the latter retired in #114) to 8 (4 LC + 3 HC + Timestamp), keeping only the wire types stats_exporter.cc actually instantiates:

// Low-cardinality columns: dimensions
virtual shared_ptr<Column<string>>     StatLCString(string_view name) = 0;
virtual shared_ptr<Column<uint8_t>>    StatLCUInt8(string_view name) = 0;
virtual shared_ptr<Column<int16_t>>    StatLCInt16(string_view name) = 0;
virtual shared_ptr<Column<int32_t>>    StatLCInt32(string_view name) = 0;

// High-cardinality columns: observations
virtual shared_ptr<Column<string_view>> StatHCString(string_view name) = 0;
virtual shared_ptr<Column<int64_t>>     StatHCInt64(string_view name) = 0;
virtual shared_ptr<Column<uint64_t>>    StatHCUInt64(string_view name) = 0;

// Domain-specific
virtual shared_ptr<Column<int64_t>>     StatTimestamp(string_view name) = 0;

The Db*Column semantic shortcuts (DbNameColumn, DbUserColumn, DbDurationColumn, DbOperationColumn, DbQueryTextColumn) remain — they encode OTel semconv naming that differs between backends.

Cardinality decisions per column

Cardinality is a property of the data, not the type width. The rename pass commits to per-column declarations at the call sites in stats_exporter.cc, chosen by observed data shape:

Column Old New Rationale
ts RecordDateTime StatTimestamp domain-specific
pid RecordInt32 StatLCInt32 bounded per-host
query_id, parent_query_id, all *blks_*, *blk_*time_us, wal_* (except wal_bytes), cpu_*_time_us RecordInt64 / MetricInt64 StatHCInt64 wide measurements
rows, wal_bytes MetricUInt64 / RecordUInt64 StatHCUInt64 wide measurements
jit_* RecordInt32 StatLCInt32 small bounded values, repeats heavily
parallel_workers_* RecordInt16 StatLCInt16 0-100ish
err_elevel RecordUInt8 StatLCUInt8 ~5 values
err_sqlstate TagString StatLCString ~270 values
app RecordString StatLCString bounded per-instance
err_message, client_addr RecordString StatHCString high-cardinality values
query_text DbQueryTextColumn (HC implied) unchanged semantic shortcut
db_name, db_user, db_operation Db*Column (LC implied) unchanged semantic shortcut

The CH-native exporter implements LC and HC identically for ints (clickhouse-c speaks plain typed columns; the server-side LowCardinality(<Type>) wrap in the schema is what applies dictionary encoding). The OTel exporter also collapses LC and HC at this layer — OTel logs are flat attribute bags, and the dimension-vs-attribute distinction is made downstream by metric processors. The new unified Arrow exporter (coming separately) is where the LC/HC distinction materially changes the wire shape (DictBuilder vs plain builder). The rename is preparing the call sites for that.

What stays untouched

  • PR Add query labels via sqlcommenter comment parsing #67 (sqlcommenter labels) — open, unrelated to the rename, will rebase trivially over this. Its AppendLabels(ParseResult) virtual is a separate problem to solve later (probably with a new AppendStringMap primitive that's explicit about key cardinality).
  • OTel column-emission machinery in OTelExporter — IntColumn, SvColumn, StrColumn, DateTimeCol, DurationCol classes + the BeginBatch/BeginRow/FlushChunk loop. ~201 LOC of dead-in-prod code that gets retired in a follow-up PR alongside the new unified Arrow exporter.
  • arrow_batch.cc and OTelExporter::SendArrowBatch — the production Arrow path. Stays alive until the GUC-gated cutover happens and prod has flipped.

Test plan

  • All builds pass on PG 16/17/18 × amd64/arm64 (CI)
  • TAP suite passes
  • Code Style + Clang checks pass
  • No behavioral regressions — rename is mechanical at the C++ level

🤖 Generated with Claude Code


Note

Low Risk
Mechanical API rename with no intended wire-format change for ClickHouse or OTel; risk is mainly missing call-site updates or future backends misinterpreting LC/HC hints.

Overview
Replaces the StatsExporter Tag / Metric / Record column factories with StatLC*, StatHC*, and StatTimestamp, so each field declares low vs high cardinality intent at the call site—not just a C++ wire type.

exporter_interface.h documents how ClickHouse, Arrow IPC, and OTel are expected to honor LC vs HC; ClickHouseExporter and OTelExporter implement the slimmer virtual surface (with comments that LC/HC is identical on the wire for those backends today). ExportEventStatsInternal in stats_exporter.cc is updated field-by-field (e.g. pidStatLCInt32, block/WAL counters → StatHCInt64, err_sqlstateStatLCString, query_text / client_addr → HC strings).

TAP comments in t/024_otel_export.pl and t/psch.pm now describe StatLCString and clarify that Prometheus labels come from the collector’s log-to-metric path, not the producer exporter.

Reviewed by Cursor Bugbot for commit cac0f0c. Bugbot is set up for automated code reviews on this repo. Configure here.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the StatsExporter column-factory interface to replace the legacy Tag* / Metric* / Record* split with explicit low- vs high-cardinality factories (StatLC* / StatHC*) plus StatTimestamp, and updates the shared export call sites and exporter backends accordingly.

Changes:

  • Renamed and collapsed the exporter column-factory interface to cardinality-typed methods (StatLC*, StatHC*, StatTimestamp).
  • Updated stats_exporter.cc to declare LC/HC intent per exported field at the call sites.
  • Updated ClickHouse and OTel exporter implementations (and TAP comments) to match the new interface naming.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
t/psch.pm Updates TAP helper comment to refer to StatLCString.
t/024_otel_export.pl Updates test comment describing LC string → metric label behavior.
src/export/stats_exporter.cc Updates column factory calls to the new StatLC* / StatHC* / StatTimestamp API.
src/export/otel_exporter.cc Replaces old factory methods with new cardinality-typed ones; adds rationale comment about LC/HC collapsing for log attributes.
src/export/exporter_interface.h Defines the new cardinality-typed interface and updates semantic column documentation.
src/export/clickhouse_exporter.cc Implements the new interface methods and updates semantic column helpers to call them.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/export/exporter_interface.h Outdated
virtual shared_ptr<Column<uint64_t>> RecordUInt64(string_view name) = 0;
virtual shared_ptr<Column<int64_t>> RecordDateTime(string_view name) = 0;
virtual shared_ptr<Column<string_view>> RecordString(string_view name) = 0;
// Domain-specific. Caller appends a Postgres-epoch microsecond timestamp.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're the same picture

Base automatically changed from delete-fixed-string-col to main June 18, 2026 03:08
@JoshDreamland JoshDreamland force-pushed the arrow_exporter_unification branch from 73d41e0 to b0af1b7 Compare June 18, 2026 14:38
…LC/StatHC

Consolidate the StatsExporter column-factory methods around cardinality intent.
The previous Tag/Metric/Record split was a vestige of pre-OTel naming and didn't
actually map to anything semantically meaningful — every backend treated
MetricInt64 and RecordInt64 identically. Replace with StatLC* (low-cardinality,
dimension-eligible) and StatHC* (high-cardinality, value-only) variants per
type, plus a domain-specific StatTimestamp for the event timestamp.

Cardinality intent matters per-backend in ways the old naming concealed:

  ClickHouse: LC -> may be stored as LowCardinality(<Type>); HC -> plain.
              Schema-declared encoding wins on write; the LC hint helps
              the producer pick the cheapest column representation.
  Arrow IPC:  LC -> DictBuilder (dictionary-encoded array); HC -> plain
              typed builder. Required for batch-rate efficiency on
              low-cardinality dimensions. (Honored by the upcoming
              unified Arrow exporter; not yet exercised here.)
  OTel:       LC -> eligible as histogram dimension or metric label;
              HC -> log attribute only, *never* a metric dimension
              (cardinality explosion).

Interface shrinks from 14 column factories (TagString + 5 Metric* + 5 Record* +
RecordDateTime + RecordString + MetricFixedString) to 8 (4 LC + 3 HC + Timestamp),
keeping only the wire types stats_exporter.cc actually instantiates. Future
column types (Int8, UInt16, UInt32) can be added when their first caller appears.

Removed:
  * MetricFixedString — dead since PR #99 retired FixedString(5) for err_sqlstate
    in favor of LowCardinality(String).
  * FixedStringCol class in clickhouse_exporter.cc — only used by MetricFixedString.

Rename map applied at the only call site (ExportEventStatsInternal in
stats_exporter.cc), with cardinality chosen per column based on observed data
shape rather than just type width: err_elevel (UInt8) is LC; query_id (Int64)
is HC; parallel_workers_* (Int16) is LC; duration_us (UInt64) is HC. The Db*
semantic shortcuts remain.

Net diff: -30 LOC, no behavior change. Two doc-comments updated in t/024 and
t/psch.pm. The OTel column-emission machinery itself stays alive for now — it
gets retired in a later commit alongside the new unified Arrow exporter that
will satisfy ExportEventStats for the OTel path going forward.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 18, 2026 14:41
@JoshDreamland JoshDreamland force-pushed the arrow_exporter_unification branch from b0af1b7 to ba7a9f6 Compare June 18, 2026 14:41

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Comment thread t/024_otel_export.pl Outdated
Comment on lines +88 to +90
# Test 4: Metric labels populated
# Tags (db, username) appear as Prometheus labels on histogram metrics.
# The OTel exporter maps TagString columns to both log attributes and metric tags.
# The OTel exporter maps StatLCString columns to both log attributes and metric tags.
Comment thread src/export/exporter_interface.h Outdated
@JoshDreamland JoshDreamland force-pushed the arrow_exporter_unification branch 2 times, most recently from f668092 to 1a8a5ac Compare June 18, 2026 14:54
@JoshDreamland JoshDreamland changed the title refactor(exporter): rename Tag/Metric/Record column interface to StatLC/StatHC refactor(exporter): collapse Metric/Record to StatLC/StatHC; preserve Tag Jun 18, 2026
Copilot AI review requested due to automatic review settings June 18, 2026 19:06
@JoshDreamland JoshDreamland force-pushed the arrow_exporter_unification branch from 1a8a5ac to ba7a9f6 Compare June 18, 2026 19:06
@JoshDreamland JoshDreamland changed the title refactor(exporter): collapse Metric/Record to StatLC/StatHC; preserve Tag refactor(exporter): rename Tag/Metric/Record column interface to StatLC/StatHC Jun 18, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

Two unrelated doc fixes flagged by Copilot's review on #115:

  exporter_interface.h: StatTimestamp's doc said "Postgres-epoch
  microsecond timestamp" but all current callers convert to Unix-epoch
  by adding kPostgresEpochOffsetUs before append. CH DateTime64(6) and
  OTel time_unix_nano both interpret the wire value as Unix-epoch.
  Clarify the contract to describe what the column wants on the wire,
  not what shape the input data happens to be in — defends against a
  future caller passing raw PG-epoch values and getting wrong-by-30-
  years timestamps.

  t/024_otel_export.pl: comment on subtest 'metric labels populated'
  described producer-side metric promotion that hasn't existed since
  PR #72 ripped the OTel SDK out. The producer-side OTel exporter
  emits OTLP log attributes only; the test's Prometheus assertions
  succeed because the downstream OTel collector's log-to-metric
  processor promotes specific log attributes to histogram labels.
  Reword to describe the actual flow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@JoshDreamland

Copy link
Copy Markdown
Contributor Author

Forgive my floundering... noticed that tags had been renamed as if they act like everything else, which would be an error, except they DO act like everything else because the arrow migration broke the mechanism of treating them like OTel dimensions. Further comments on the matter reserved, with angst. Applied Copilot suggestions; merging.

@JoshDreamland JoshDreamland merged commit ff9e538 into main Jun 18, 2026
12 checks passed
@JoshDreamland JoshDreamland deleted the arrow_exporter_unification branch June 18, 2026 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants