-
Notifications
You must be signed in to change notification settings - Fork 22
feat: Add Configurable Nested KeyValue Support for ClickHouse JSON Export #293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Add Configurable Nested KeyValue Support for ClickHouse JSON Export #293
Conversation
|
After ingesting some data, I can indeed see the JSON values as a deep structure, but I wasn't able to find a elegant way to say get the last item in an array of input messages eg see below: however there didn't Could be worth following up with Clickhouse folks on the best way to implement this. |
|
Hey @brightsparc, as we discussed I support increasing the depth we can support here. So a few things here as I start to look at this:
Are you seeing any error messages from the exporter? I'll look at this some more of the next few days, but I may be busy. I'd like to support this, just need to clean it up a bit and identify where the gaps are. |
Summary
This PR adds support for properly converting nested OpenTelemetry
KeyValueListandArrayValuestructures to native JSON objects/arrays in the ClickHouse exporter, with a configurable depth limit and backwards-compatible default behavior.Motivation
When exporting GenAI span attributes (e.g.,
gen_ai.input.messages,gen_ai.output.messages) that contain nestedKeyValueListstructures, the previous implementation would serialize them as empty strings or raw protobuf JSON, losing the semantic structure. This made querying these fields in ClickHouse difficult.Before (flat mode):
{"gen_ai.input.messages": ""}After (nested mode enabled):
{"gen_ai.input.messages": [{"role": "user", "parts": [{"type": "text", "content": "Hello"}]}]}Changes
1. Unified String Handling with
Cow<str>Before:
After:
Rationale:
Str+StrOwned)Cow<str>)The
Cow(Copy-on-Write) smart pointer elegantly handles both borrowed and owned strings:Cow::Borrowed(&str)- zero allocation, just stores pointer+lengthCow::Owned(String)- takes ownership, same asStrOwnedbefore2. New
JsonType::ObjectVariantAdded support for JSON objects (ClickHouse named tuples):
Serialization format follows ClickHouse rowbinary JSON spec:
3. Configurable Nested Conversion
NoneSome(0)Some(n)4. Separate Code Paths for Performance
Rather than adding depth checks to every recursive call in the common case, we maintain two separate implementations:
Flat mode (
anyvalue_to_jsontype_flat):Nested mode (
anyvalue_to_jsontype_nested):KvlistValue→JsonType::ObjectArrayValue→JsonType::Array(recursive)Performance Analysis
Why Separate Code Paths?
A single function with
if depth_enabled { check_depth() }on every call would add:By separating, the flat path compiles to tight, branch-free code.
Memory Layout
The discriminant for
JsonTypevariant +Cowtag fits in one cache line. Access pattern is identical to before.Benchmarking Expectations
The nested mode slowdown is acceptable because:
CLI Configuration
New flag added to configure nested KV conversion:
--clickhouse-exporter-nested-kv-max-depthROTEL_CLICKHOUSE_EXPORTER_NESTED_KV_MAX_DEPTHTransformer Configuration (Internal)
Wire Format
Flat Mode (unchanged)
Nested Mode (new)
Backwards Compatibility
None= flat modeTest Plan
test_anyvalue_arrayvalue_flat_mode- KvlistValue → JSON string, nested Array → JSON stringtest_anyvalue_arrayvalue_nested_mode- KvlistValue → Object, nested Array → Arraycargo test- 514 passed, 0 failedFuture Work