Skip to content

observability: backend panic and maestro alerts#4869

Open
geoberle wants to merge 1 commit intomainfrom
maestro-alerts
Open

observability: backend panic and maestro alerts#4869
geoberle wants to merge 1 commit intomainfrom
maestro-alerts

Conversation

@geoberle
Copy link
Copy Markdown
Collaborator

What

  • added BackendControllerPanic alert to detect panics caught by the runtime panic handler
  • added Maestro alerts: MaestroGRPCSourceClientExcessConnections, MaestroRESTAPIErrorRate, MaestroGRPCServerErrorRate, MaestroSpecControllerReconcileErrors
  • removed mise alert from non-msft alerts
  • Increase PrometheusOperatorRejectedResources for duration from 5m to 20m to reduce noise during infra provisioning
  • known issues require less screen estate in gather-observability rendering
  • registered known issues for backend external auth controllers

Why

Testing

Special notes for your reviewer

@openshift-ci openshift-ci bot requested review from janboll and miquelsi April 14, 2026 10:50
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 14, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: geoberle

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@geoberle geoberle force-pushed the maestro-alerts branch 2 times, most recently from 1b9471b to 6277628 Compare April 14, 2026 13:53
* added BackendControllerPanic alert to detect panics caught by the runtime panic handler
* added Maestro alerts: MaestroGRPCSourceClientExcessConnections, MaestroRESTAPIErrorRate, MaestroGRPCServerErrorRate, MaestroSpecControllerReconcileErrors
* Increase PrometheusOperatorRejectedResources for duration from 5m to 20m to reduce noise during infra provisioning
* known issues require less screen estate in gather-observability rendering
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 14, 2026

@geoberle: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-parallel 2ef9aad link true /test e2e-parallel

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@geoberle
Copy link
Copy Markdown
Collaborator Author

/test e2e-parallel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant