fix: correct UCUM units for TiBy and kilobytes in unitMap#66
fix: correct UCUM units for TiBy and kilobytes in unitMap#66Naman-B-Parlecha wants to merge 6 commits into
Conversation
Signed-off-by: Naman-B-Parlecha <naman.parlecha@finalroundai.com> Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
|
PTAL @ArthurSens @dashpole |
aknuds1
left a comment
There was a problem hiding this comment.
@Naman-B-Parlecha @dashpole Some questions:
- Won't we need to update the tebibytes spelling simultaneously also in opentelemetry-collector-contrib?
- Won't renaming "tibibytes" metrics units to "tebibytes" be a backwards compatible change in Prometheus, that might break things for users?
- If we stop supporting "KBy" as an OTLP unit, won't that silently break support for producers sending the "KBy" spelling?
- Won't we need to update the "KBy" spelling simultaneously also in opentelemetry-collector-contrib?
|
Sure. I'm OK keeping KBy for backwards-compatibility. We can also coordinate the update to the collector. FYI @Naman-B-Parlecha that is here: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/138fbe1878d1b8c8bd31e3c942ec0d34e08c0c87/pkg/translator/prometheus/unit_to_ucum.go#L24 |
|
Thanks @dashpole - WDYT about my concern that renaming "tibibytes" metrics units to "tebibytes" will be a backwards compatible change in Prometheus, potentially breaking things for users though? |
Worth adding a warning log where this conversion happens in prometheus/prometheus for user visibility? |
|
I'm wondering whether this backwards compatible change should be possible to disable, e.g. for configurability from Grafana Mimir. As for Prometheus itself, maybe the old behaviour should be enabled until v4? As backwards incompatible changes shouldn't be made until the next major version (of Prometheus). @dashpole @Naman-B-Parlecha. |
Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
|
Hmmmm, while I understand the potential negative effect, I'm not sure the units being changed are popular enough for us to be THIS careful. I can see kylobytes being used somehow (and we're keeping the backward compatible behavior), but Tebibytes? 😅 I don't think this is used by any Prometheus exporter, and it's not part of any semantic conventions in OTel either. The only way this is used is when someone creates a custom metric and intentionally measures things in Terabytes. There are very few companies in the world that measure things on this scale. I'd be super happy if we could just do a patch release for this, but if you're feeling very strongly opinionated about this, we could do a 2.0. What do you think? |
|
@ArthurSens I'm not suggesting a major bump for otlptranslator, but potential configurability. I'm further suggesting that we might want to disable this new behaviour in Prometheus until v4, for backwards compatibility. It's a question from my side though, and I'd like feedback from other maintainers. As for Grafana Mimir, the configurability could be good to have to protect users from potential metric renames. |
|
I'm not that worried about this this breaking users. I think this is a case where it is clearly a bug fix, and unlikely to have a significant negative impact. If upstream projects want configurability, it should be pretty easy for them to add special handling for TiBy prior to calling into this library to preserve existing behavior. I would probably advocate for using a feature gate to allow users to temporarily revert back to the old name (and provide feedback, if necessary) for a period of time. |
|
@dashpole That's what I'm suggesting too. We already have precedence for such configuration parameters in otlptranslator, see:
@Naman-B-Parlecha The above two are examples of configuration parameters controlling backwards compatible translation behaviour. |
|
Sure. Lets do that then |
…suffixes Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
|
Hey @dashpole @aknuds1 @ArthurSens! one concern i have is naming currently added |
…capingWithUpdatedSuffixes Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
|
@dashpole PTAL added two new strategies |
|
We shouldn't need a new strategy, right? Just booleans on the Namer structs (they should probably be "legacyUnitMapping" or something like that so that the default value of false uses the new behavior). |
| - **Namespace Support**: Add configurable namespace prefixes | ||
| - **UTF-8 Support**: Choose between Prometheus legacy scheme compliant metric/label names (`[a-zA-Z0-9:_]`) or untranslated metric/label names | ||
| - **Translation Strategy Configuration**: Select a translation strategy with a standard set of strings. | ||
| - **Updated UCUM Unit Mappings**: Opt in to corrected UCUM unit suffixes (`TiBy`: `tebibytes`, `kBy`: `kilobytes`) by selecting the `UnderscoreEscapingWithUpdatedSuffixes` or `NoUTF8EscapingWithUpdatedSuffixes` translation strategy. |
There was a problem hiding this comment.
This currently reads like a changelog item. Please see my suggestion for line 10 instead.
| - **Updated UCUM Unit Mappings**: Opt in to corrected UCUM unit suffixes (`TiBy`: `tebibytes`, `kBy`: `kilobytes`) by selecting the `UnderscoreEscapingWithUpdatedSuffixes` or `NoUTF8EscapingWithUpdatedSuffixes` translation strategy. |
| ## Features | ||
|
|
||
| - **Metric Name and Label Translation**: Convert OTLP metric names and attributes to Prometheus-compliant format | ||
| - **Unit Handling**: Translate OTLP units to Prometheus unit conventions |
There was a problem hiding this comment.
| - **Unit Handling**: Translate OTLP units to Prometheus unit conventions, using spec-correct UCUM mappings by default with an opt-out (`LegacyUnitMapping`) for callers pinned to pre-correction names | |
| - ``` |
| // Use UnderscoreEscapingWithUpdatedSuffixes to opt into corrected UCUM unit mappings: | ||
| // TiBy -> "tebibytes" instead of "tibibytes" and kBy -> "kilobytes". | ||
| strategy := otlptranslator.UnderscoreEscapingWithUpdatedSuffixes |
| unitNamer.Build("By") // bytes | ||
| unitNamer.Build("requests/s") // requests_per_second | ||
| unitNamer.Build("1") // "" (dimensionless) | ||
| ``` |
There was a problem hiding this comment.
I suggest adding a corresponding migration sub-section:
| ### Migration: Pre-correction unit suffixes | |
| This library now uses the spec-correct UCUM unit suffixes by default. Two mappings changed compared to earlier releases: | |
| - `TiBy` → `tebibytes` (previously `tibibytes` — `tebi-` is the SI binary prefix; the older spelling was a misspelling). | |
| - `kBy` → `kilobytes` is **newly** recognised (previously not mapped). `KBy` continues to map to `kilobytes` as before, for backwards compatibility. | |
| **Behavior change for existing consumers**: dashboards, alerts, and recording rules that reference `*_tibibytes` will silently stop receiving new datapoints once your Prometheus / OTel Collector deployment picks up this library version. | |
| To preserve the pre-correction names while you migrate, opt in to the legacy mapping (combine with `UTF8Allowed: true` if your callers use UTF-8 metric names): | |
| \\\go | |
| namer := otlptranslator.MetricNamer{ | |
| WithMetricSuffixes: true, | |
| LegacyUnitMapping: true, | |
| } | |
| \\\ | |
| "KiBy": "kibibytes", | ||
| "MiBy": "mebibytes", | ||
| "GiBy": "gibibytes", | ||
| "TiBy": "tibibytes", |
There was a problem hiding this comment.
I think the current units should live in the canonical unitMap:
| "TiBy": "tebibytes", | |
| "kBy": "kilobytes", |
| }, | ||
| wantMetricName: "requêsts_per_second", | ||
| wantUnitName: "per_second", | ||
| }, |
There was a problem hiding this comment.
I suggest adding more test cases:
| }, | |
| { | |
| name: "utf8 metric with TiBy under default unit mapping", | |
| namer: MetricNamer{ | |
| UTF8Allowed: true, | |
| WithMetricSuffixes: true, | |
| }, | |
| metric: Metric{ | |
| Name: "capacity", | |
| Unit: "TiBy", | |
| Type: MetricTypeGauge, | |
| }, | |
| wantMetricName: "capacity_tebibytes", | |
| wantUnitName: "tebibytes", | |
| }, | |
| { | |
| name: "utf8 metric with TiBy under legacy unit mapping", | |
| namer: MetricNamer{ | |
| UTF8Allowed: true, | |
| WithMetricSuffixes: true, | |
| LegacyUnitMapping: true, | |
| }, | |
| metric: Metric{ | |
| Name: "capacity", | |
| Unit: "TiBy", | |
| Type: MetricTypeGauge, | |
| }, | |
| wantMetricName: "capacity_tibibytes", | |
| wantUnitName: "tibibytes", | |
| }, | |
| { | |
| name: "utf8 metric with kBy under legacy unit mapping passes through unmapped", | |
| namer: MetricNamer{ | |
| UTF8Allowed: true, | |
| WithMetricSuffixes: true, | |
| LegacyUnitMapping: true, | |
| }, | |
| metric: Metric{ | |
| Name: "transfer", | |
| Unit: "kBy", | |
| Type: MetricTypeGauge, | |
| }, | |
| wantMetricName: "transfer_kBy", | |
| wantUnitName: "kBy", | |
| }, | |
| { |
| unitNamer := UnitNamer{ | ||
| UTF8Allowed: tt.namer.UTF8Allowed, | ||
| UTF8Allowed: tt.namer.UTF8Allowed, | ||
| UpdatedMetricMapping: tt.namer.UpdatedMetricMapping, |
There was a problem hiding this comment.
| UpdatedMetricMapping: tt.namer.UpdatedMetricMapping, | |
| LegacyUnitMapping: tt.namer.LegacyUnitMapping, |
| // NoUTF8EscapingWithUpdatedSuffixes will accept metric/label names as they are. Unit | ||
| // and type suffixes may be added to metric names, according to certain rules. | ||
| // This option includes an updated mapping for the UCUM metrics suffixes tebibyte and kBy. | ||
| NoUTF8EscapingWithUpdatedSuffixes TranslationStrategyOption = "NoUTF8EscapingWithUpdatedSuffixes" | ||
| // UnderscoreEscapingWithSuffixes is the default option for translating OTLP | ||
| // to Prometheus. This option will translate metric name characters that are | ||
| // not alphanumerics/underscores/colons to underscores, and label name | ||
| // characters that are not alphanumerics/underscores to underscores. Unit and | ||
| // type suffixes may be appended to metric names, according to certain rules. | ||
| // This option includes an updated mapping for the UCUM metrics suffixes tebibyte and kBy. | ||
| UnderscoreEscapingWithUpdatedSuffixes TranslationStrategyOption = "UnderscoreEscapingWithUpdatedSuffixes" |
There was a problem hiding this comment.
New strategies is the wrong approach:
| // NoUTF8EscapingWithUpdatedSuffixes will accept metric/label names as they are. Unit | |
| // and type suffixes may be added to metric names, according to certain rules. | |
| // This option includes an updated mapping for the UCUM metrics suffixes tebibyte and kBy. | |
| NoUTF8EscapingWithUpdatedSuffixes TranslationStrategyOption = "NoUTF8EscapingWithUpdatedSuffixes" | |
| // UnderscoreEscapingWithSuffixes is the default option for translating OTLP | |
| // to Prometheus. This option will translate metric name characters that are | |
| // not alphanumerics/underscores/colons to underscores, and label name | |
| // characters that are not alphanumerics/underscores to underscores. Unit and | |
| // type suffixes may be appended to metric names, according to certain rules. | |
| // This option includes an updated mapping for the UCUM metrics suffixes tebibyte and kBy. | |
| UnderscoreEscapingWithUpdatedSuffixes TranslationStrategyOption = "UnderscoreEscapingWithUpdatedSuffixes" |
| type UnitNamer struct { | ||
| UTF8Allowed bool | ||
| UTF8Allowed bool | ||
| UpdatedMetricMapping bool |
There was a problem hiding this comment.
| UpdatedMetricMapping bool | |
| // LegacyUnitMapping selects the pre-correction UCUM unit mappings (e.g. | |
| // "TiBy" -> "tibibytes" instead of the spec-correct "tebibytes"). The | |
| // default value (false) uses the spec-correct mappings. Set to true to | |
| // preserve metric names produced by older versions of this library. | |
| LegacyUnitMapping bool |
| if updatedSuffix { | ||
| // checks for updatd values, TiBy <-> tebibytes and kBy <-> kilobytes. | ||
| if promUnit, ok := updatedUnitMap[unit]; ok { | ||
| return promUnit | ||
| } | ||
| // doesnt return cause what if the unit is not in the updated map but is | ||
| // in the original map, we want to check that as well. | ||
| } |
There was a problem hiding this comment.
| if updatedSuffix { | |
| // checks for updatd values, TiBy <-> tebibytes and kBy <-> kilobytes. | |
| if promUnit, ok := updatedUnitMap[unit]; ok { | |
| return promUnit | |
| } | |
| // doesnt return cause what if the unit is not in the updated map but is | |
| // in the original map, we want to check that as well. | |
| } | |
| if legacy { | |
| if promUnit, ok := legacyUnitMap[unit]; ok { | |
| return promUnit | |
| } | |
| if _, excluded := legacyUnitExclusions[unit]; excluded { | |
| return unit | |
| } | |
| } |
Fixes two incorrect UCUM unit mappings as mentioned in #65
TiByshould map totebibytesand nottibibyteskilobytesshould map tokByand notKBy