Memory-aware C exporter rewrite (no gRPC/Arrow-C++; OTLP/HTTP + nanoarrow)#108
Draft
iskakaushik wants to merge 11 commits into
Draft
Memory-aware C exporter rewrite (no gRPC/Arrow-C++; OTLP/HTTP + nanoarrow)#108iskakaushik wants to merge 11 commits into
iskakaushik wants to merge 11 commits into
Conversation
Captures the design driving the C++->C exporter rewrite: the memory-aware OTLP/HTTP + nanoarrow architecture, the preallocated zero-allocation memory model, the OTLP/Arrow wire contracts, the failure semantics (peek/commit two-phase dequeue, backoff, poison-batch valve), the memory_limit GUC consolidation, and the SIGABRT backstop. Motivated by a production SIGABRT where a gRPC allocation aborted under strict overcommit and forced database-wide crash recovery.
Checked-in amalgamation (not a submodule) generated with bundle.py --with-ipc --with-flatcc --symbol-namespace PgStatCh, pinned in third_party/nanoarrow/VERSION. Provides Arrow array building plus the flatcc runtime for hand-built IPC messages. Built with NDEBUG + FLATCC_NO_ASSERT so every failure path returns an error code instead of abort()/assert() — the property the exporter rewrite depends on.
Pure clang-format reflow (K&R/PostgreSQL brace and wrapping style -> the project's Google-derived .clang-format). No logic changes. These files were never format-enforced before because the mise/CI globs only matched .cc.
Replace the C++ virtual exporter_interface.h with exporter.h: a function-pointer
ops table (connect/export_events/send_arrow/...), the PschExportStatus contract,
and the preallocated export-arena split shared by driver and backends.
Add otlp_encode.{h,c}: a hand-rolled, zero-allocation protobuf wire encoder for
OTLP logs pinned to opentelemetry-proto v1.9.0. Single-pass nested messages via
fixed-width overlong varint length slots; overflow-safe (sticky flag, no partial
out-of-bounds writes); bounds-checked response/Status parsers. PschPbMsgEnd flags
overflow rather than Assert()ing on an unbalanced slot, so an encoder bug cannot
SIGABRT the bgworker.
Replace the Arrow C++ ArrowBatchBuilder with a C builder producing byte-compatible Arrow IPC streaming payloads (56-field schema, 13 dictionary batches, ZSTD BUFFER compression, validated against pyarrow). nanoarrow's encoder cannot emit dictionary batches or compressed buffers, so arrow_ipc_emit.c hand-builds the Message/Schema/RecordBatch/DictionaryBatch/BodyCompression flatbuffers via the flatcc runtime. All buffers, the ZSTD context, and dictionary memo tables are preallocated; steady-state Append/Finish/Reset perform zero heap allocation. flatcc_emitter_alloc routes flatcc's emitter pages through a fixed pre-reserved pool so flatcc's page-shrink heuristic cannot malloc/free on the hot path (force-included into flatcc.c via CMake). Length-clamp WARNINGs are rate-limited to 1/sec.
Replace the gRPC direct-proto exporter with a hand-rolled HTTP/1.1 client over blocking sockets on the bgworker thread (OpenSSL when the endpoint is https), emitting the identical OTLP ExportLogsServiceRequest payloads for both the per-record and Arrow-passthrough paths. Encode/network buffers and the constant request-head prefix are preallocated; reconnects allocate but never per batch. Removes gRPC entirely — and with it the gpr_malloc abort site and the background gRPC threads in the shmem-attached bgworker. Retry only on 429/502/503/504 and connection drop (Retry-After honored); other 4xx/5xx are permanent. otel_headers values are rejected if they contain control characters (header-injection guard); db.name/db.user clamp to NAMEDATALEN-1, matching the other paths.
C port of the clickhouse_exporter (clickhouse-c is already C underneath). Static column descriptor table replaces the C++ column-factory/registry; goto-cleanup replaces the RAII guards; preallocated per-column buffers replace std::vector. An in_flight flag forces a reconnect after a longjmp interrupts a wire exchange. The cancel callback now also fires on ProcSignalBarrierPending so a DROP DATABASE barrier is not blocked behind a flowing ClickHouse read. Backend failure stats are recorded by the driver, not here (avoids double-counting).
…ackstop Port the export driver to C and harden the failure path (OTEL_REWRITE_DESIGN.md section 5a): - shmem.c gains two-phase consume: PschPeekEvents resolves DSA strings without freeing or advancing the tail; PschConsumeEvents frees and advances. Slot dsa_pointers are cleared BEFORE freeing so a longjmp out of dsa_free leaks rather than double-frees on retry. - Failure routing: ERR_CONN never consumes (events survive a collector outage); ERR_SEND requeues until a poison threshold; ERR_NOMEM/ERR_INTERNAL drop. A mid-chunk Arrow failure counts already-delivered events as exported and only drops/requeues the undelivered remainder. New export_dropped counter, distinct from enqueue overflow. Every failure mode increments the backoff counter. - bgworker.c installs a SIGABRT backstop (async-signal-safe write + _exit(1)) so a residual abort costs a bgworker restart, not database-wide crash recovery; SIGSEGV/SIGBUS keep full crash semantics. Drain loop is bounded per cycle so procsignal barriers are processed promptly. Ring-sanity check at worker start.
Collapse 8 interacting memory knobs into one operator knob plus three -1=auto expert overrides (OTEL_REWRITE_DESIGN.md section 6): - pg_stat_ch.memory_limit (default 160 MB = verified equivalence point of the old defaults; -1 = opt-in auto from shared_buffers; min 32 MB so a configured value is honored rather than always auto-raised). - queue_capacity/string_area_size/export_buffer_size default -1=auto; explicit queue_capacity is rounded up to a power of 2 and floored at the ring minimum. - Overrides auto-raise the budget with a WARNING, never FATAL; resolved values are written back so SHOW reports effective sizes. The intern HTAB is charged to the budget. Deleted dead shims; legacy knobs (batch_max, otel_*) become hidden one-release compat bridges; otel_log_delay_ms -> export_timeout (1000 ms). - pg_stat_ch_memory() view exposes per-component budget/source; control bumped to 0.4 with a migration; guc.out updated.
project(LANGUAGES C); any .cc/.cxx/.cpp under src/ is now a configure-time FATAL_ERROR. Drop opentelemetry-cpp/Arrow (and transitively gRPC/protobuf/ abseil) from vcpkg.json, leaving openssl/lz4/zstd; vcpkg is kept (3 deps) for static, pinned release artifacts. Remove the cxx_std_17 / -include libintl.h / -Wglobal-constructors C++ scaffolding; pin C_STANDARD 17 + C_EXTENSIONS. .clang-tidy drops google-*/modernize-* and adds bugprone-*; mise/CI format and lint globs now cover .c. CI drops g++/CXX settings. Update CLAUDE.md and README for the C/dependency story; retire the cpp-* skills.
otlp_encode_test.c: 541 known-answer/overflow/parser checks, clean under ASan/UBSan. arrow_batch_test.sh: builds the IPC payload for synthetic events and validates schema + decoded values against pyarrow (the byte-compat oracle).
serprex
reviewed
Jun 10, 2026
| return (int)max; | ||
| } | ||
|
|
||
| static uint32 HashBytes(const char* s, int len) { |
Member
There was a problem hiding this comment.
can use postgres's hash function, hash_bytes
Contributor
|
Will investigate pulling this into new unified exporter arm |
Member
so what's the one that's not fixed? |
Contributor
|
Holy never mind. NanoArrow brings its own baggage, and I misunderstood how massive this PR is. I think we should just secure our arena interactions and then convert all aborts that do not poison the arena into plain exits. I'll investigate that afterward. Can you get Claude to make those 14 crashes accessible somewhere? They're worth addressing without... all this at once. |
Member
|
agreed, nanoarrow is why I abandoned my own foray into porting pg_stat_ch to C |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
From-scratch C rewrite of the export path (the extension was C++), motivated by a production SIGABRT under memory exhaustion that forced 49s of database-wide crash recovery: gRPC's
gpr_malloccalledabort()on a NULL malloc undervm.overcommit_memory=2, and a shmem-attached bgworker dying by signal forces the postmaster to crash-recover the whole cluster.The fix removes every
abort()-reachable path from the process and makes telemetry-stack failure cost at most a bgworker restart. Design and rationale:OTEL_REWRITE_DESIGN.md.What changed
project(LANGUAGES C); any.ccundersrc/is a configure-time error. Drops opentelemetry-cpp/gRPC/protobuf/abseil/Arrow-C++; keeps openssl/lz4/zstd.gpr_malloc._exit(1), so any residual abort (e.g. a vendored-lib assert) costs a bgworker restart, not cluster crash recovery. SIGSEGV/SIGBUS keep full crash semantics.pg_stat_ch.memory_limit(default 160 MB = old-default equivalence point) + three-1=autoexpert overrides;pg_stat_ch_memory()view; control0.3 → 0.4.export_droppedcounter.Verification
-Wstrict-prototypes/-Wmissing-prototypes); extension links.abort()/assert()reachable from extension code or from nanoarrow/flatcc (builtNDEBUG+FLATCC_NO_ASSERT); only_exitis the SIGABRT backstop.otlp_encodeunit test: 541 checks, clean under ASan/UBSan.arrow_batch: pyarrow decoded byte-compat.Known follow-ups
t/015_guc_validationandt/023_drain_loopencode pre-consolidation GUC semantics and need updating; README GUC docs still describe old knobs.pg_stat_ch_memory()reportsexport_dropped(a count) in thebudget_bytescolumn — a documented single-shape compromise; fixing it would ripple the SQL signature.