Skip to content

Prometheus: preserve job/instance when translating Prometheus -> OTLP#4956

Open
dashpole wants to merge 10 commits into
open-telemetry:mainfrom
dashpole:prom_stabilize_resource
Open

Prometheus: preserve job/instance when translating Prometheus -> OTLP#4956
dashpole wants to merge 10 commits into
open-telemetry:mainfrom
dashpole:prom_stabilize_resource

Conversation

@dashpole

@dashpole dashpole commented Mar 17, 2026

Copy link
Copy Markdown
Contributor

Part of #4753

Fixes #4577

Related to open-telemetry/opentelemetry-collector-contrib#45982

Issues with the current state

  • The receiver can get both job (from scrape config) and service.name from target info. Currently, we let one win, but ideally we would keep both.
  • Users that push OTLP to prometheus still expect to see service.name and service.instance.id, and are confused when it is missing. We should consider keeping those and having a job + instance. We can add configuration to disable it, if necessary.

Changes

There are a bunch of changes to the formatting and wording included in this PR. If needed, I can split those out.

  • service.name MAY default to job, and service.instance.id MAY default to instance.
  • job and instance are now preserved when translating Prometheus -> OTLP.

Alternatives

  • Use prometheus.job or prometheus.instance instead of job and instance.
  • Default to not defaulting service.name and service.instance.id attributes to job or instance. This would be breaking for users.

@open-telemetry/prometheus-interoperability @aknuds1 @kln21002 @cyrille-leclerc

@ArthurSens

Copy link
Copy Markdown
Member

Hmmmm, to make quicker progress here, would it make sense to focus on the Prometheus -> OTLP transformation and leave OTLP -> Prometheus in a separate PR?

I feel like OTLP -> Prometheus is a lot more complicated, and will require deeper discussions with other parts of OTel, like Infrastructure semantic conventions, OBI, etc 🤔

@dashpole

Copy link
Copy Markdown
Contributor Author

My main goal is to make sure that everything still round-trips properly. If I make changes to one direction, I usually have to make the inverse change to the other direction to ensure that. I can remove some of the editorial changes to the OTLP -> Prometheus section if that helps. But I would like aggregated exporters to accept prometheus.job and prometheus.instance if we add those to the receiver.

@aknuds1 aknuds1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see comments :)

Comment thread specification/compatibility/prometheus_and_openmetrics.md Outdated
Comment thread specification/compatibility/prometheus_and_openmetrics.md Outdated
@dashpole dashpole moved this to Discussion Needed in Prometheus SIG Mar 18, 2026
@github-actions

github-actions Bot commented Apr 3, 2026

Copy link
Copy Markdown

This PR was marked stale. It will be closed in 14 days without additional activity.

@github-actions github-actions Bot added the Stale label Apr 3, 2026
@dashpole

dashpole commented Apr 6, 2026

Copy link
Copy Markdown
Contributor Author

@github-actions github-actions Bot removed the Stale label Apr 7, 2026
@dashpole dashpole changed the title [WIP] Prometheus: update resource translation Prometheus: update resource translation Apr 10, 2026
@dashpole dashpole marked this pull request as ready for review April 10, 2026 15:41
@dashpole dashpole requested review from a team as code owners April 10, 2026 15:41
Comment thread specification/compatibility/prometheus_and_openmetrics.md Outdated

@jmacd jmacd left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a very minor question.

@ArthurSens ArthurSens left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a pass through the PR, and so far, I have just one comment.

I feel like I haven't fully grasped the potential consequences of this change, so I'm taking some extra time to reflect, I hope that's ok :)

Comment thread specification/compatibility/prometheus_and_openmetrics.md
@ArthurSens ArthurSens moved this from Discussion Needed to In progress in Prometheus SIG Apr 15, 2026
Comment thread specification/compatibility/prometheus_and_openmetrics.md Outdated
Comment thread specification/compatibility/prometheus_and_openmetrics.md Outdated
Comment thread specification/compatibility/prometheus_and_openmetrics.md
@dashpole dashpole force-pushed the prom_stabilize_resource branch from 7248acd to 4dda69a Compare May 4, 2026 20:15

@ArthurSens ArthurSens left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior seems very very similar to the configuration option called keep_identifying_attributes from the Prometheus Server configuration.

I know we're only modifying the data model spec here, not the exporter configuration spec, but I wonder whether we'd like to make adjustments to the exporter spec or the Prometheus configuration so both OTel and Prometheus are aligned.

@aknuds1, I think this is somewhat close to work you've done with @cyrille-leclerc in the past. Do you have anything extra to add?

@dashpole

dashpole commented May 6, 2026

Copy link
Copy Markdown
Contributor Author

This behavior seems very very similar to the configuration option called keep_identifying_attributes from the Prometheus Server configuration.

Yeah. If we went with this change, it would make sense to eventually switch that to be true by default (e.g. in the next major version, if we consider it breaking). It would be less likely to be duplicate information, so it would make more sense for users to keep the original ones.

@aknuds1

aknuds1 commented May 7, 2026

Copy link
Copy Markdown
Contributor

I'll have a look.

@aknuds1 aknuds1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see comments.

Comment thread specification/compatibility/prometheus_and_openmetrics.md
Comment thread specification/compatibility/prometheus_and_openmetrics.md
Comment thread specification/compatibility/prometheus_and_openmetrics.md Outdated
Comment thread specification/compatibility/prometheus_and_openmetrics.md Outdated
Comment thread specification/compatibility/prometheus_and_openmetrics.md
@aknuds1

aknuds1 commented May 7, 2026

Copy link
Copy Markdown
Contributor

I think this is somewhat close to work you've done with @cyrille-leclerc in the past. Do you have anything extra to add?

@ArthurSens @dashpole AFAICT, the following implies keep_identifying_resource_attributes=true, without it Prometheus' OTLP endpoint will avoid including service_instance_id and service_name as target_info labels?

The Resource attributes MUST NOT be copied to labels of exported metric families
by default

@dashpole

dashpole commented May 10, 2026

Copy link
Copy Markdown
Contributor Author

The Resource attributes MUST NOT be copied to labels of exported metric families
by default

This is meant to say that promote_resource_attributes is empty by default, and promote_all_resource_attributes is false by default (which is the case, currently). LMK if there is wording that can make that clearer.

@aknuds1 aknuds1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM sans remaining two suggestions.

Comment thread specification/compatibility/prometheus_and_openmetrics.md Outdated
Comment thread specification/compatibility/prometheus_and_openmetrics.md Outdated
Comment thread specification/compatibility/prometheus_and_openmetrics.md
Comment thread specification/compatibility/prometheus_and_openmetrics.md
Comment thread specification/compatibility/prometheus_and_openmetrics.md Outdated
Comment thread specification/compatibility/prometheus_and_openmetrics.md Outdated
Comment thread specification/compatibility/prometheus_and_openmetrics.md
Comment thread specification/compatibility/prometheus_and_openmetrics.md
@dashpole dashpole requested a review from a team as a code owner June 8, 2026 19:48
@linux-foundation-easycla

linux-foundation-easycla Bot commented Jun 8, 2026

Copy link
Copy Markdown

CLA Signed
The committers listed above are authorized under a signed CLA.

@dashpole dashpole force-pushed the prom_stabilize_resource branch from c5d9afc to 3f74736 Compare June 8, 2026 19:53
Comment thread specification/compatibility/prometheus_and_openmetrics.md
Comment thread specification/compatibility/prometheus_and_openmetrics.md
Comment thread specification/compatibility/prometheus_and_openmetrics.md
@jack-berg

jack-berg commented Jun 9, 2026

Copy link
Copy Markdown
Member

@bogdandrutu, @jmacd, @reyang can you take another look? You're previous approvals were reset.

@aknuds1 aknuds1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have doubts regarding this proposed spec change. Please see comment.

Comment on lines +385 to +386
contains all metrics with that `job` and `instance`. `job` and `instance` labels
MUST be added as resource attributes, and not as metric attributes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure it's a good idea to add job and instance labels as resource attributes? OTLP -> Prometheus translation considers (according to spec and implementation) instead service.instance.id and service.name to be the identfiying resource attributes. Also, I think that job and instance resource attributes would be dropped in favour of corresponding labels synthesized by Prometheus when generating target_info.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See this discussion above: #4956 (comment). While our specification stated that service.* attributes were identifying, that was based on a misread of the service attributes specification.

This is now proposing that job and instance are respected if they already exist as resource attributes when ingesting OTLP (since it means that the metric was originally scraped from a prometheus endpoint.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dashpole What exactly is based on a misread? That Prometheus' OTLP endpoint treats service.instance.id and service.name resource attributes as identifying? If so, how precisely? Changing it would be backwards incompatible, and I would argue outside of this PR's scope.

What about the second problem I raised, i.e. that job and instance resource attributes would be dropped by the Prometheus OTLP endpoint in favour of corresponding labels synthesized from identifying resource attributes when generating the target_info metric?

This is now proposing that job and instance are respected if they already exist as resource attributes when ingesting OTLP

Where is that stated in the PR's proposed text?

@aknuds1 aknuds1 Jun 11, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since it means that the metric was originally scraped from a prometheus endpoint

@dashpole We can't make that assumption. It's but one possible source of job and instance OTel resource attributes.

@aknuds1 aknuds1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see comment.

Comment on lines +753 to +759
If `job` is not present as a resource attribute and `service.name` is present,
the `service.name` and `service.namespace` resource attributes MUST be combined
as `<service.namespace>/<service.name>`, or `<service.name>` if namespace is
empty, to form the `job` metric label; otherwise, `job` SHOULD be added with an
empty value. If `instance` is not present as a resource attribute and
`service.instance.id` is present, `service.instance.id` MUST be converted to the
`instance` label; otherwise, `instance` SHOULD be added with an empty value.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On re-reading the PR, I think this must be where you intend to define that job and instance resource attributes respectively should be identifying iff present, with service.name and service.instance.id as respective fallbacks. That should be clearly spelled out, as the intent is effectively ambiguous as currently written.

There are some issues with such a radical change however:

  1. Users whose resource attributes include job and/or instance (doesn't sound entirely unlikely) will experience that their series identities change, when said attributes differ from (service.namespace/ +)service.name/service.instance.id - or when the latter are absent.
  2. The effect of the keep_identifying_resource_attributes config parameter becomes data-dependent: It can only meaningfully apply to the service.name and service.instance.id resource attributes synthesized into job and instance. When job and/or instance resource attributes take precedence, the former presumably have to be kept on target_info regardless of the parameter.

@aknuds1 aknuds1 Jun 11, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this some more in the meantime - I think that projecting the Prometheus meaning of job and instance onto OTel resource attributes is a flawed design. They don't have this intrinsic meaning within the OTel scope, so I do think that the previous proposal within this PR to prefix them with "prometheus." was an objectively better choice. See also my comment above regarding the assumption that job and instance OTel resource attributes would be proof of Prometheus provenance.

@krajorama krajorama left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @aknuds1 that this is a breaking change for some people.

On Prometheus->OTLP side new labels showing up can in general lead to different results if the query language allows without construct, that is generating series by ignoring some attributes, but allowing everything else.

On OTLP->Prometheus side: user might have knowingly or unknowingly masked job, instance resource attributes by overwriting from service.name, service.id. That would break now as we'd keep job and instance.

Let's discuss at the next WG meeting. See also #4753 (comment)

@jmacd

jmacd commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

The (resolved) comment thread #4956 (comment) and the comments above lead me to agree that this will be a breaking change, likely in more than one way. See particularly the comment by @ArthurSens #4956 (comment): we're going to need Collector documentation updates for the users that have become used to the service.* resource attributes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

[OTel to Prometheus] Promote resource attr service.name, service.namespace, and service.instance.id as Prometheus metric labels

10 participants