openshift · lahinson · Jun 26, 2026
diff --git a/hosted_control_planes/hcp-observability.adoc b/hosted_control_planes/hcp-observability.adoc
@@ -9,10 +9,33 @@ toc::[]
 [role="_abstract"]
 You can gather metrics for {hcp} by configuring metrics sets. Monitoring dashboards are created in the management cluster for each hosted cluster that it manages.
 
+// Metrics sets
 include::modules/hosted-control-planes-metrics-sets.adoc[leveloffset=+1]
 
 include::modules/hosted-control-planes-monitoring-dashboard.adoc[leveloffset=+1]
 
+// Monitoring CP metrics from the hosted cluster
+include::modules/hcp-cp-metrics-overview.adoc[leveloffset=+1]
+
+include::modules/hcp-cp-metrics-enable.adoc[leveloffset=+2]
+
+include::modules/hcp-cp-query-metrics.adoc[leveloffset=+2]
+
+[role="_additional-resources"]
+.Additional resources
+
+* xref:../operators/understanding/olm/olm-understanding-metrics.adoc#olm-metrics_olm-understanding-metrics[Exposed metrics]
+
+include::modules/hcp-cp-query-metrics-console.adoc[leveloffset=+2]
+
+[role="_additional-resources"]
+.Additional resources
+
+* xref:../operators/understanding/olm/olm-understanding-metrics.adoc#olm-metrics_olm-understanding-metrics[Exposed metrics]
+
+include::modules/hcp-cp-metrics-dashboards.adoc[leveloffset=+2]
+
+//Connectivity metrics
 include::modules/hcp-connectivity-metrics.adoc[leveloffset=+1]
 
 include::modules/hcp-connect-data-plane.adoc[leveloffset=+2]

diff --git a/modules/hcp-cp-metrics-dashboards.adoc b/modules/hcp-cp-metrics-dashboards.adoc
@@ -0,0 +1,119 @@
+// Module included in the following assemblies:
+//
+// * hosted_control_planes/hcp-observability.adoc
+
+:_mod-docs-content-type: PROCEDURE
+[id="hcp-cp-metrics-dashboards_{context}"]
+= Importing control plane health dashboards
+
+[role="_abstract"]
+You can import a sample Grafana dashboard that visualizes propagated control plane metrics in the hosted cluster web console. The dashboard covers API server, etcd, cluster Operators, scheduler, controller manager, and OLM health panels.
+
+.Prerequisites
+
+* Metrics forwarding is enabled and verified.
+
+* The HyperShift Operator uses `METRICS_SET=All` or `METRICS_SET=SRE` with a matching `sre-metric-set` `ConfigMap` object in the hosted control plane namespace. The default `Telemetry` metrics set forwards only a small metric subset and leaves most dashboard panels empty.
+
+* You have `cluster-admin` access to the hosted cluster.
+
+.Procedure
+
+. Download the sample dashboard JSON file by entering the following command:
++
+[source,terminal]
+----
+$ curl -LO https://raw.githubusercontent.com/openshift/hypershift/main/contrib/metrics/guest-control-plane-dashboard.json
+----
++
+[NOTE]
+====
+If you deploy user-workload Grafana through the Grafana Operator, import the dashboard JSON as a `GrafanaDashboard` custom resource instead of using a console `ConfigMap` object.
+====
+
+. Create a `ConfigMap` object from the dashboard file in the `openshift-config-managed` namespace by entering the following command:
++
+[source,terminal]
+----
+$ oc create configmap guest-control-plane-dashboard \
+  --from-file=guest-control-plane-dashboard.json=guest-control-plane-dashboard.json \
+  -n openshift-config-managed
+----
+
+. Label the `ConfigMap` object so the console discovers it as a dashboard by entering the following command:
++
+[source,terminal]
+----
+$ oc label configmap guest-control-plane-dashboard \
+  console.openshift.io/dashboard=true \
+  -n openshift-config-managed
+----
+
+. Log in to the web console and click *Observe* -> *Dashboards*.
+
+. Select the *Hosted Cluster Control Plane* dashboard.
+
+. Optional: If you use `METRICS_SET=SRE` on the HyperShift Operator, configure the Operator and create or update the `sre-metric-set` `ConfigMap` object in the hosted control plane namespace with relabel configurations that forward the dashboard metric names.
++
+.. Log in to the management cluster and set the metrics set on the HyperShift Operator by entering the following command:
++
+[source,terminal]
+----
+$ oc set env -n hypershift deployment/operator METRICS_SET=SRE
+----
+
+.. Replace `<hcp_namespace>` with your hosted control plane namespace and create the `ConfigMap` object:
++
+[source,yaml]
+----
+apiVersion: v1
+kind: `ConfigMap` object
+metadata:
+  name: sre-metric-set
+  namespace: <hcp_namespace>
+data:
+  config: |
+    kubeAPIServer:
+      - action: keep
+        sourceLabels: ["__name__"]
+        regex: "(apiserver_request_total|apiserver_request_duration_seconds_bucket|apiserver_current_inflight_requests|apiserver_storage_objects)"
+    etcd:
+      - action: keep
+        sourceLabels: ["__name__"]
+        regex: "(etcd_mvcc_db_total_size_in_bytes|etcd_mvcc_db_total_size_in_use_in_bytes|etcd_disk_wal_fsync_duration_seconds_bucket|etcd_disk_backend_commit_duration_seconds_bucket|etcd_network_peer_round_trip_time_seconds_bucket|etcd_server_leader_changes_seen_total|etcd_server_has_leader)"
+    kubeControllerManager:
+      - action: keep
+        sourceLabels: ["__name__"]
+        regex: "(workqueue_depth|workqueue_adds_total)"
+    kubeScheduler:
+      - action: keep
+        sourceLabels: ["__name__"]
+        regex: "(scheduler_e2e_scheduling_duration_seconds_count|scheduler_schedule_attempts_total|scheduler_pending_pods)"
+    cvo:
+      - action: keep
+        sourceLabels: ["__name__"]
+        regex: "(cluster_version|cluster_operator_up|cluster_operator_conditions)"
+    olm:
+      - action: keep
+        sourceLabels: ["__name__"]
+        regex: "(csv_succeeded)"
+----
++
+This configuration forwards 20 metric names across five components that the dashboard uses.
++
+For full `SRE` metrics set configuration, see "Configuring the SRE metrics set".
+
+.. Apply the `ConfigMap` object on the management cluster:
++
+[source,terminal]
+----
+$ oc apply -f sre-metric-set.yaml
+----
++
+The Control Plane Operator detects the `ConfigMap` object change and updates the `metrics-proxy` configuration.
+
+.Verification
+
+* The dashboard is displayed under *Observe* -> *Dashboards* in the web console.
+* Panels display data when the configured metrics set includes the required metric names.
+* The etcd database size panels show current use relative to the 8 GB limit.
diff --git a/modules/hcp-cp-metrics-enable.adoc b/modules/hcp-cp-metrics-enable.adoc
@@ -0,0 +1,37 @@
+[#hcp-cp-metrics-enablement_{context}]
+// Module included in the following assemblies:
+//
+// * hosted_control_planes/hcp-observability.adoc
+
+:_mod-docs-content-type: PROCEDURE
+[id="hcp-cp-metrics-enable_{context}"]
+= Enabling metrics forwarding
+
+[role="_abstract"]
+Enable metrics forwarding so that you can observe hosted control plane health from the hosted cluster monitoring stack.
+
+If you are a hosted cluster administrator without management cluster access, ask a platform administrator enable metrics forwarding on your `HostedCluster` resource.
+
+.Prerequisites
+
+* You are logged in to the management cluster. Alternatively, you can use a `kubeconfig` file with access to the namespace that contains the `HostedCluster` resource. The `HostedCluster` object exists on the management cluster; annotating it from a hosted cluster `kubeconfig` file fails or targets the wrong resource.
+
+.Procedure
+
+. Add the `hypershift.openshift.io/enable-metrics-forwarding=true` annotation to the `HostedCluster` resource on the management cluster by entering the following command:
++
+[source,terminal]
+----
+$ oc annotate hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
+  hypershift.openshift.io/enable-metrics-forwarding=true
+----
++
+Replace `<hosted_cluster_namespace>` with the namespace of the hosted cluster and `<hosted_cluster_name>` with the name of the hosted cluster.
+
+. To disable metrics forwarding, remove the annotation by entering the following command:
++
+[source,terminal]
+----
+$ oc annotate hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
+  hypershift.openshift.io/enable-metrics-forwarding-
+----
diff --git a/modules/hcp-cp-metrics-overview.adoc b/modules/hcp-cp-metrics-overview.adoc
@@ -0,0 +1,41 @@
+// Module included in the following assemblies:
+//
+// * hosted_control_planes/hcp-observability.adoc
+
+:_mod-docs-content-type: CONCEPT
+[id="hcp-cp-metrics-overview_{context}"]
+= Control plane metrics for {hcp}
+
+[role="_abstract"]
+You can observe hosted control plane health from the hosted cluster monitoring stack when metrics forwarding is enabled.
+
+With propagated metrics, you can diagnose API server, etcd, Operator, and scheduling issues from the hosted cluster web console and CLI without management cluster credentials.
+
+This capability is available in {product-title} 4.22 and later.
+
+Before {product-title} 4.22, control plane components for {hcp} ran on the management cluster and were invisible to the Cluster Monitoring Operator stack in the hosted cluster. Hosted cluster administrators could not query metrics such as `apiserver_request_total`, `etcd_mvcc_db_total_size_in_bytes`, or `csv_succeeded` from the hosted cluster Prometheus.
+
+With metrics forwarding, selected control plane metrics are propagated from the management cluster into the hosted cluster platform Prometheus.
+
+After you enable forwarding on the `HostedCluster` resource, you can use familiar PromQL queries, alerts, and dashboards.
+
+[#hcp-cp-metrics-architecture_{context}]
+== Metrics forwarding architecture
+
+When you enable metrics forwarding, {hcp} deploys components on both the management cluster and the hosted cluster.
+
+On the management cluster, in the hosted control plane namespace, the following steps take place:
+
+* The `endpoint-resolver` deployment discovers pod IP addresses for control plane components.
+* The `metrics-proxy` deployment scrapes control plane pods, applies per-component metric filters, injects {product-title}-compatible labels, and serves aggregated metrics at paths, such as `/metrics/kube-apiserver` and `/metrics/etcd`, behind a TLS-passthrough Route.
+
+On the hosted cluster, in the `openshift-monitoring` namespace, the following steps take place:
+
+* The `control-plane-metrics-forwarder` deployment runs HAProxy and TCP-proxies scrape requests to the management cluster `metrics-proxy` Route.
+* A `PodMonitor` named `control-plane-metrics-forwarder` configures platform Prometheus to scrape the forwarder using mutual TLS (mTLS).
+
+The data path is as follows:
+
+. Platform Prometheus in the hosted cluster discovers the `PodMonitor` and scrapes the metrics-forwarder.
+. The metrics-forwarder forwards the scrape over mTLS to the management cluster `metrics-proxy` Route.
+. The metrics-proxy scrapes control plane pods through the endpoint-resolver and returns filtered, relabeled metrics.
diff --git a/modules/hcp-cp-query-metrics-console.adoc b/modules/hcp-cp-query-metrics-console.adoc
@@ -0,0 +1,81 @@
+// Module included in the following assemblies:
+//
+// * hosted_control_planes/hcp-observability.adoc
+
+:_mod-docs-content-type: PROCEDURE
+[id="hcp-cp-query-metrics-console_{context}"]
+= Querying control plane metrics in hosted clusters by using the web console
+
+[role="_abstract"]
+After you enable metrics forwarding, you can verify that control plane metrics are ingested and query them from the web console.
+
+Use the same PromQL patterns as standalone {product-title} clusters because the metrics-proxy injects compatible labels.
+
+.Prerequisites
+
+* Metrics forwarding is enabled on the `HostedCluster` resource.
+For enablement steps, see "Enabling metrics forwarding".
+
+* You have `cluster-admin` access to the hosted cluster.
+
+* At least two minutes have elapsed since you enabled forwarding so Prometheus can complete initial scrapes.
+
+.Procedure
+
+. Log in to the {product-title} web console for the hosted cluster.
+
+. Click *Observe* -> *Metrics*.
+
+. In the query field, enter a PromQL expression and run the query.
++
+Use the following examples:
++
+*Operator health*: list CSVs that are not in the `Succeeded` state:
++
+[source,plaintext]
+----
+csv_succeeded{job="olm-operator-metrics"} == 0
+----
++
+*API server request rate*:
++
+[source,plaintext]
+----
+sum(rate(apiserver_request_total{job="apiserver"}[5m])) by (verb, code)
+----
++
+*Scheduler activity* ({product-title} 4.22 and later with metrics forwarding enabled):
++
+[source,plaintext]
+----
+sum(rate(scheduler_schedule_attempts_total[5m])) by (result)
+----
++
+*Workload-oriented API saturation*:
++
+[source,plaintext]
+----
+apiserver_current_inflight_requests{job="apiserver"}
+----
++
+*Scheduling backlog*:
++
+[source,plaintext]
+----
+scheduler_pending_pods
+----
++
+*Controller workqueue depth*:
++
+[source,plaintext]
+----
+workqueue_depth{job="kube-controller-manager"}
+----
++
+For `csv_succeeded` and other OLM metrics, see "Exposed metrics".
+
+.Verification
+
+* Prometheus targets for `control-plane-metrics-forwarder` scrape pools report the `health: up` status.
+* PromQL queries for `apiserver_request_total{job="apiserver"}` return nonzero results.
+* Example queries in the web console return time series for enabled components.