Skip to content

feat(arrow): add read_replica_type resource attribute column#107

Merged
amogiska merged 4 commits into
mainfrom
amogiska/read-replica-arrow-column
Jun 8, 2026
Merged

feat(arrow): add read_replica_type resource attribute column#107
amogiska merged 4 commits into
mainfrom
amogiska/read-replica-arrow-column

Conversation

@amogiska

@amogiska amogiska commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add a read_replica_type column to the exported Arrow IPC schema, populated from the pg_stat_ch.extra_attributes GUC. Dictionary-encoded like server_role (low-cardinality enum: regional/none).
  • Update the GUC doc example and the t/026_arrow_dump.pl schema assertion (presence + dictionary type).

Why

Ubicloud emits read_replica_type in extra_attributes; surfacing it as an Arrow column lets the ClickHouse side distinguish and promote read-replica query traffic, which otherwise reports server_role=standby. Unknown extra-attribute keys were parsed-and-ignored before this; this makes read_replica_type a first-class column.

Test plan

  • All six schema sites mirror server_role; schema-field order matches finish-array order
  • t/026_arrow_dump.pl asserts read_replica_type is present and dictionary-encoded
  • CI: builds (PG 16/17/18 x amd64/arm64) + TAP + clang-format green

@amogiska amogiska marked this pull request as ready for review June 8, 2026 15:24
Copilot AI review requested due to automatic review settings June 8, 2026 15:24

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new read_replica resource attribute column to the exported Arrow IPC schema so downstream consumers (e.g., ClickHouse) can distinguish read-replica traffic based on pg_stat_ch.extra_attributes.

Changes:

  • Export a new dictionary-encoded Arrow column read_replica, sourced from extra_attributes (mirrors the existing server_role plumbing).
  • Update the pg_stat_ch.extra_attributes GUC documentation example to include read_replica:false.
  • Extend the Arrow dump test to assert presence (and dict-encoding) of the new read_replica field.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/export/arrow_batch.cc Adds read_replica dictionary builder + schema field + append/finish/reset/size-estimate wiring.
src/config/guc.c Updates the extra_attributes documentation example string to include read_replica.
t/026_arrow_dump.pl Updates schema expectations and adds a type assertion for read_replica being dictionary-encoded UTF-8.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread t/026_arrow_dump.pl Outdated
'shared_blks_hit', 'shared_blks_read',
'wal_records', 'wal_bytes',
'service_version', 'region',
'service_version', 'region', 'read_replica',
Copilot AI review requested due to automatic review settings June 8, 2026 16:36

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

Comment thread src/export/arrow_batch.cc
Comment on lines 261 to 265
arrow::field("instance_ubid", arrow::utf8()),
arrow::field("server_ubid", arrow::utf8()),
arrow::field("server_role", DictionaryUtf8Type()),
arrow::field("read_replica_type", DictionaryUtf8Type()),
arrow::field("region", DictionaryUtf8Type()),
Comment thread t/026_arrow_dump.pl
Comment on lines 184 to +210
expected = [
'ts', 'db_name', 'db_user', 'db_operation', 'query_text',
'duration_us', 'rows', 'pid', 'query_id',
'shared_blks_hit', 'shared_blks_read',
'wal_records', 'wal_bytes',
'service_version', 'region',
'service_version', 'region', 'read_replica_type',
]
missing = [c for c in expected if schema.get_field_index(c) == -1]
if missing:
print(f"MISSING_COLUMNS:{','.join(missing)}")
sys.exit(1)

# Check types for a few key columns
ts_field = schema.field('ts')
assert ts_field.type == pa.timestamp('ns', tz='UTC'), f"ts type wrong: {ts_field.type}"

dur_field = schema.field('duration_us')
assert dur_field.type == pa.uint64(), f"duration_us type wrong: {dur_field.type}"

rows_field = schema.field('rows')
assert rows_field.type == pa.uint64(), f"rows type wrong: {rows_field.type}"

# read_replica_type is a dictionary-encoded column (like server_role/region), not plain utf8.
rr_field = schema.field('read_replica_type')
assert pa.types.is_dictionary(rr_field.type), f"read_replica_type not dict-encoded: {rr_field.type}"
assert rr_field.type.value_type == pa.utf8(), f"read_replica_type dict value type wrong: {rr_field.type.value_type}"

Comment thread src/config/guc.c
Comment on lines 372 to 377
DefineCustomStringVariable(
"pg_stat_ch.extra_attributes",
"Key-value pairs appended to exported Arrow batches.",
"Semicolon-separated k:v pairs for resource columns: "
"'instance_ubid:abc;server_role:primary;region:us-east-1'.",
"'instance_ubid:abc;server_role:primary;read_replica_type:regional;region:us-east-1'.",
&psch_extra_attributes,
@amogiska amogiska changed the title feat(arrow): add read_replica resource attribute column feat(arrow): add read_replica_type resource attribute column Jun 8, 2026
@serprex serprex requested a review from JoshDreamland June 8, 2026 19:23

@serprex serprex left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should update clickhouse schema too

@amogiska

amogiska commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

should update clickhouse schema too

@serprex, There is already deviation between the arrow path and the ClickHouse schema. This PR is trying to unify it: #99. I think it is better to add the read replica there as well, because the PR 99 is defining the migration standard for the ClickHouse schema. @JoshDreamland let me know if you are okay with this idea or have any thoughts.

@amogiska amogiska requested a review from serprex June 8, 2026 20:01
@JoshDreamland

JoshDreamland commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

@amogiska Chatted with Serp separately; if I merge #99 nowish, do you mind babysitting it through to production along with this change? You can tweak the "unified schema" contained within to be current with your change here, even though it is not, in fact, unified as of yet. We can fake it 'til we make it, in that regard.

To clarify, #99 should be a no-op on production; a breakage would be extremely interesting, and I am happy to debug if you observe one. I would just like to piggy-back its evaluation onto this change if possible. This should move us into a better position to do a migration later.

JoshDreamland added a commit that referenced this pull request Jun 8, 2026
Pre-add the column being introduced on both sides of the Arrow pipeline
to keep the unified events_raw schema wire-compatible once the cutover
happens:

  * pg_stat_ch PR #107 — adds read_replica_type to the Arrow IPC output
    in arrow_batch.cc (dictionary-encoded, populated from the
    pg_stat_ch.extra_attributes GUC).
  * clickgres-platform PR #448 — adds the matching ALTER TABLE on
    query_logs_arrow (LowCardinality(String) DEFAULT 'none' AFTER
    server_role) and promotes read-replica traffic into query_logs via
    a widened MV filter.

Mirroring the same type, default, and position here means the eventual
cutover from query_logs_arrow to events_raw needs zero further schema
changes — and lets PR #107 rebase onto unified_schema without having
to amend the schema migration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@amogiska amogiska merged commit 2142634 into main Jun 8, 2026
13 checks passed
@amogiska amogiska deleted the amogiska/read-replica-arrow-column branch June 8, 2026 22:42

@JoshDreamland JoshDreamland left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed more offline—we'll merge this, I'll merge #99, we'll proceed to release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants