As part of implementing fluent/fluent-bit#10651 I discovered that if you use a histogram metric with whole number bucket values over a certain size, they start to suffer from precision loss due to the digit limit when formatting the double values the buckets are defined with.
e.g. with this as setup:
struct cmt_histogram_buckets *input_record_buckets = \
cmt_histogram_buckets_create_size((double[]){ 100, 1024, 2048, 4096,
100 * 1024, 1024 * 1024, 4 * 1024 * 1024,
10 * 1024 * 1024}, 8);
This is what comes out in the prometheus scrape
# HELP fluentbit_input_record_sizes Histogram of the size of input records
# TYPE fluentbit_input_record_sizes histogram
fluentbit_input_record_sizes_bucket{le="0.0",name="tail.0"} 0
fluentbit_input_record_sizes_bucket{le="100.0",name="tail.0"} 0
fluentbit_input_record_sizes_bucket{le="1024.0",name="tail.0"} 1
fluentbit_input_record_sizes_bucket{le="2048.0",name="tail.0"} 2
fluentbit_input_record_sizes_bucket{le="4096.0",name="tail.0"} 3
fluentbit_input_record_sizes_bucket{le="102400.0",name="tail.0"} 5
fluentbit_input_record_sizes_bucket{le="1.04858e+06",name="tail.0"} 5
fluentbit_input_record_sizes_bucket{le="4.1943e+06",name="tail.0"} 5
fluentbit_input_record_sizes_bucket{le="+Inf",name="tail.0"} 0
fluentbit_input_record_sizes_sum{name="tail.0"} 48412
fluentbit_input_record_sizes_count{name="tail.0"} 5
As best I can tell this stems from this line (and presumably some default precision for the %g printf specifier)::
|
len = snprintf(str, 64, "%g", val); |
Extra info
In the Prometheus text format docs/spec, as best as I can see, there's no specific stipulation for type or formatting of the le labels: https://prometheus.io/docs/instrumenting/exposition_formats/#histograms-and-summaries . The only restrictions are the general ones placed on label values:
label_value can be any sequence of UTF-8 characters, but the backslash (\), double-quote ("), and line feed (\n) characters have to be escaped as \\, \", and \n, respectively
In general, from what I've personally seen so far at least, metric tools don't really give you any numerical or mathematical means of reasoning about these bucket values, given that the majority of metric querying etc. seems to work with string-based searching/filtering anyway - but I could be wrong on this.
As another C library reference, In the DigitalOcean prometheus C library, sprintf with %g is also used, so presumably it would suffer the same issue:
https://github.com/digitalocean/prometheus-client-c/blob/c57034d196582d99267d027abb52a05a55dc07f6/prom/src/prom_metric_sample_histogram.c#L502-L509
In the OpenTelemetry project, the buckets are similarly defined as double values:
https://github.com/open-telemetry/opentelemetry-proto/blob/8672494217bfc858e2a82a4e8c623d4a5530473a/opentelemetry/proto/metrics/v1/metrics.proto#L554-L568
There is/was an IntegerHistogram type, but this was for integer observation values, and ironically it seems it has/had double bucket boundaries anyway, and they decided to deprecate it (see open-telemetry/opentelemetry-proto#257, open-telemetry/opentelemetry-proto#270)
Info on %g specifier:
As part of implementing fluent/fluent-bit#10651 I discovered that if you use a histogram metric with whole number bucket values over a certain size, they start to suffer from precision loss due to the digit limit when formatting the
doublevalues the buckets are defined with.e.g. with this as setup:
This is what comes out in the prometheus scrape
As best I can tell this stems from this line (and presumably some default precision for the
%gprintfspecifier)::cmetrics/src/cmt_encode_prometheus.c
Line 311 in ab80dd0
Extra info
In the Prometheus text format docs/spec, as best as I can see, there's no specific stipulation for type or formatting of the
lelabels: https://prometheus.io/docs/instrumenting/exposition_formats/#histograms-and-summaries . The only restrictions are the general ones placed on label values:As another C library reference, In the DigitalOcean prometheus C library,
sprintfwith%gis also used, so presumably it would suffer the same issue:https://github.com/digitalocean/prometheus-client-c/blob/c57034d196582d99267d027abb52a05a55dc07f6/prom/src/prom_metric_sample_histogram.c#L502-L509
In the OpenTelemetry project, the buckets are similarly defined as
doublevalues:https://github.com/open-telemetry/opentelemetry-proto/blob/8672494217bfc858e2a82a4e8c623d4a5530473a/opentelemetry/proto/metrics/v1/metrics.proto#L554-L568
There is/was an
IntegerHistogramtype, but this was for integer observation values, and ironically it seems it has/haddoublebucket boundaries anyway, and they decided to deprecate it (see open-telemetry/opentelemetry-proto#257, open-telemetry/opentelemetry-proto#270)Info on
%gspecifier: