Skip to content

Add control plane Istio scrape config and settings#1393

Merged
rashmichandrashekar merged 16 commits intomainfrom
shalier/add-controlplaneistio
Mar 23, 2026
Merged

Add control plane Istio scrape config and settings#1393
rashmichandrashekar merged 16 commits intomainfrom
shalier/add-controlplaneistio

Conversation

@shalier
Copy link
Copy Markdown
Contributor

@shalier shalier commented Jan 15, 2026

PR Description

This PR adds support for scraping Prometheus metrics from Istio Control Plane (MCP - Managed Control Plane) components in AKS clusters with Azure-managed Istio service mesh enabled. Metrics being scraped:

Metric Pattern Description
citadel_server_csr_count Certificate signing requests count
galley_validation_failed Config validation failures
galley_validation_passed Config validation passes
pilot_conflict_inbound_listener Inbound listener conflicts detected
pilot_conflict_outbound_listener_http_over_current_tcp HTTP over TCP listener conflicts
pilot_conflict_outbound_listener_tcp_over_current_http TCP over HTTP listener conflicts
pilot_conflict_outbound_listener_tcp_over_current_tcp TCP over TCP listener conflicts
pilot_info Pilot version and build info
pilot_proxy_convergence_time* Proxy push convergence time (matches _sum, _count, _bucket)
pilot_services Total number of services
pilot_virt_services Number of virtual services
pilot_xds Active xDS connections
pilot_xds_push_context_errors Errors during push context creation
pilot_xds_pushes Total number of xDS pushes to proxies

New Feature Checklist

  • List telemetry added about the feature.
  • Link to the one-pager about the feature.
  • List any tasks necessary for release (3P docs, AKS RP chart changes, etc.) after merging the PR.
  • Attach results of scale and perf testing.

Tests Checklist

  • Have end-to-end Ginkgo tests been run on your cluster and passed? To bootstrap your cluster to run the tests, follow these instructions.
    • Labels used when running the tests on your cluster:
      • operator
      • windows
      • arm64
      • arc-extension
      • fips
  • Have new tests been added? For features, have tests been added for this feature? For fixes, is there a test that could have caught this issue and could validate that the fix works?

@shalier shalier marked this pull request as ready for review January 21, 2026 20:31
@shalier shalier requested a review from a team as a code owner January 21, 2026 20:31
@shalier shalier force-pushed the shalier/add-controlplaneistio branch from 6e08df7 to f9bbe8b Compare January 22, 2026 22:38
@rashmichandrashekar
Copy link
Copy Markdown
Contributor

@shalier - Could you pls add description of what you are trying to do and what metrics are being scraped with these jobs?

@shalier shalier force-pushed the shalier/add-controlplaneistio branch from 164446f to 5b49aa2 Compare January 27, 2026 19:18
@rashmichandrashekar
Copy link
Copy Markdown
Contributor

@shalier - Could you pls add description of what you are trying to do and what metrics are being scraped with these jobs?

Thanks @shalier - could you pls also add labels that are being collected with each metric? And has this been tested with an image?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 4, 2026

This PR is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 9, 2026

This PR was closed because it has been stalled for 12 days with no activity.

@shalier
Copy link
Copy Markdown
Contributor Author

shalier commented Feb 19, 2026

These are the labels collected -

Metric Type # Labels Labels
pilot_xds_pushes counter 6 app, instance, job, pod_name, revision, type
pilot_xds_push_context_errors counter 5 app, instance, job, pod_name, revision
pilot_conflict_inbound_listener gauge 5 app, instance, job, pod_name, revision
pilot_conflict_outbound_listener_http_over_current_tcp gauge 5 app, instance, job, pod_name, revision
pilot_conflict_outbound_listener_tcp_over_current_tcp gauge 5 app, instance, job, pod_name, revision
pilot_conflict_outbound_listener_tcp_over_current_http gauge 5 app, instance, job, pod_name, revision
pilot_virt_services gauge 5 app, instance, job, pod_name, revision
pilot_services gauge 5 app, instance, job, pod_name, revision
pilot_proxy_convergence_time_bucket histogram 6 app, instance, job, pod_name, revision, le
pilot_proxy_convergence_time_count histogram 5 app, instance, job, pod_name, revision
pilot_proxy_convergence_time_sum histogram 5 app, instance, job, pod_name, revision
pilot_xds gauge 6 app, instance, job, pod_name, revision, version
citadel_server_csr_count counter 5 app, instance, job, pod_name, revision
galley_validation_passed counter 8 app, instance, job, pod_name, revision, group, resource, version
galley_validation_failed counter 9 app, instance, job, pod_name, revision, group, resource, version, reason
pilot_info gauge 6 app, instance, job, pod_name, revision, version

@github-actions
Copy link
Copy Markdown
Contributor

This PR is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Comment thread otelcollector/configmapparser/default-prom-configs/controlplane_istio.yml Outdated
@shalier shalier force-pushed the shalier/add-controlplaneistio branch from 88ce6ca to 959d0ce Compare March 5, 2026 00:27
@github-actions github-actions Bot added size/XL and removed size/XXL labels Mar 5, 2026
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@shalier shalier force-pushed the shalier/add-controlplaneistio branch from 6fe8935 to e788ace Compare March 19, 2026 18:20
@rashmichandrashekar
Copy link
Copy Markdown
Contributor

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

gracewehner added a commit that referenced this pull request Mar 20, 2026
Analyzed review comments across PRs #1393, #1397, #1403, #1408, #1452
to extract team conventions and common pitfalls:
- ME CLI arg formatting (leading dash)
- Keep vs drop list preference for scrape configs
- Helm values defaults (features off by default)
- CCP vs addon chart parity
- Extension migration coordination
- Build flag consistency with dalec-build-defs
- Test coverage requirements (Ginkgo E2E, TestKube)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@shalier
Copy link
Copy Markdown
Contributor Author

shalier commented Mar 20, 2026

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

shalier and others added 16 commits March 20, 2026 15:35
[comment]: # (Note that your PR title should follow the conventional
commit format: https://conventionalcommits.org/en/v1.0.0/#summary)

[comment]: # (The below checklist is for PRs adding new features. If a
box is not checked, add a reason why it's not needed.)

This is merging into another feature branch

- [ ] List telemetry added about the feature.
- [ ] Link to the one-pager about the feature.
- [ ] List any tasks necessary for release (3P docs, AKS RP chart
changes, etc.) after merging the PR.
- [ ] Attach results of scale and perf testing.

[comment]: # (The below checklist is for code changes. Not all boxes
necessarily need to be checked. Build, doc, and template changes do not
need to fill out the checklist.)

- [ ] Have end-to-end Ginkgo tests been run on your cluster and passed?
To bootstrap your cluster to run the tests, follow [these
instructions](/otelcollector/test/README.md#bootstrap-a-dev-cluster-to-run-ginkgo-tests).
  - Labels used when running the tests on your cluster:
    - [ ] `operator`
    - [ ] `windows`
    - [ ] `arm64`
    - [ ] `arc-extension`
    - [ ] `fips`
- [ ] Have new tests been added? For features, have tests been added for
this feature? For fixes, is there a test that could have caught this
issue and could validate that the fix works?
  - [ ] Is a new scrape job needed?
- [ ] The scrape job was added to the folder
[test-cluster-yamls](/otelcollector/test/test-cluster-yamls/) in the
correct configmap or as a CR.
  - [ ] Was a new test label added?
- [ ] A string constant for the label was added to
[constants.go](/otelcollector/test/utils/constants.go).
- [ ] The label and description was added to the [test
README](/otelcollector/test/README.md).
- [ ] The label was added to this [PR
checklist](/.github/pull_request_template).
- [ ] The label was added as needed to
[testkube-test-crs.yaml](/otelcollector/test/testkube/testkube-test-crs.yaml).
  - [ ] Are additional API server permissions needed for the new tests?
- [ ] These permissions have been added to
[api-server-permissions.yaml](/otelcollector/test/testkube/api-server-permissions.yaml).
  - [ ] Was a new test suite (a new folder under `/tests`) added?
- [ ] The new test suite is included in
[testkube-test-crs.yaml](/otelcollector/test/testkube/testkube-test-crs.yaml).
The preceding test steps (e.g. errorprone global ext labels) leave behind
ama-metrics-prometheus-config configmaps with invalid config (e.g.
scrape_interval as integer instead of string). When the controlplane-istio
test runs without cleaning these up, the config merger takes the custom
config path, validation fails, and the fallback produces only 10 default
scrape configs instead of the expected 11 (10 default + controlplane-istio).

Add kubectl delete --ignore-not-found for the three custom prometheus
configmaps before applying the controlplane-istio settings configmap.
@shalier shalier force-pushed the shalier/add-controlplaneistio branch from e788ace to be01076 Compare March 20, 2026 22:35
@shalier
Copy link
Copy Markdown
Contributor Author

shalier commented Mar 20, 2026

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Comment thread .pipelines/azure-pipeline-config-tests.yml
@rashmichandrashekar rashmichandrashekar merged commit 8b94c4f into main Mar 23, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants