-
Notifications
You must be signed in to change notification settings - Fork 36
Use rhobs/obs-mcp toolset for Prometheus/Alertmanager (replaces observability mcp) #124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
56bc77d
d9b8dca
15f8c13
1f038b4
91b21b5
7a04e1c
172b93b
91fd9a1
dbba4b8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -260,15 +260,15 @@ The following sets of tools are available (toolsets marked with ✓ in the Defau | |
|
|
||
| <!-- AVAILABLE-TOOLSETS-START --> | ||
|
|
||
| | Toolset | Description | Default | | ||
| |---------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------| | ||
| | config | View and manage the current local Kubernetes configuration (kubeconfig) | ✓ | | ||
| | core | Most common tools for Kubernetes management (Pods, Generic Resources, Events, etc.) | ✓ | | ||
| | helm | Tools for managing Helm charts and releases | | | ||
| | kcp | Manage kcp workspaces and multi-tenancy features | | | ||
| | kubevirt | KubeVirt virtual machine management tools, check the [KubeVirt documentation](https://github.com/containers/kubernetes-mcp-server/blob/main/docs/kubevirt.md) for more details. | | | ||
| | observability | Cluster observability tools for querying Prometheus metrics and Alertmanager alerts | | | ||
| | ossm | Most common tools for managing OSSM, check the [OSSM documentation](https://github.com/openshift/openshift-mcp-server/blob/main/docs/OSSM.md) for more details. | | | ||
| | Toolset | Description | Default | | ||
| |----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------| | ||
| | config | View and manage the current local Kubernetes configuration (kubeconfig) | ✓ | | ||
| | core | Most common tools for Kubernetes management (Pods, Generic Resources, Events, etc.) | ✓ | | ||
| | helm | Tools for managing Helm charts and releases | | | ||
| | kcp | Manage kcp workspaces and multi-tenancy features | | | ||
| | kubevirt | KubeVirt virtual machine management tools, check the [KubeVirt documentation](https://github.com/containers/kubernetes-mcp-server/blob/main/docs/kubevirt.md) for more details. | | | ||
| | obs-mcp | Toolset for querying Prometheus and Alertmanager endpoints in efficient ways. | | | ||
| | ossm | Most common tools for managing OSSM, check the [OSSM documentation](https://github.com/openshift/openshift-mcp-server/blob/main/docs/OSSM.md) for more details. | | | ||
|
|
||
| <!-- AVAILABLE-TOOLSETS-END --> | ||
|
|
||
|
|
@@ -453,48 +453,137 @@ In case multi-cluster support is enabled (default) and you have access to multip | |
|
|
||
| <details> | ||
|
|
||
| <summary>observability</summary> | ||
|
|
||
| - **prometheus_query** - Execute an instant PromQL query against the cluster's Thanos Querier. | ||
| Returns current metric values at the specified time (or current time if not specified). | ||
| Use this for point-in-time metric values. | ||
|
|
||
| Common queries: | ||
| - up{job="apiserver"} - Check if API server is up | ||
| - sum by(namespace) (container_memory_usage_bytes) - Memory usage by namespace | ||
| - rate(container_cpu_usage_seconds_total[5m]) - CPU usage rate | ||
| - kube_pod_status_phase{phase="Running"} - Running pods count | ||
| - `query` (`string`) **(required)** - PromQL query string (e.g., 'up{job="apiserver"}', 'sum by(namespace) (container_memory_usage_bytes)') | ||
| - `time` (`string`) - Optional evaluation timestamp. Accepts RFC3339 format (e.g., '2024-01-01T12:00:00Z') or Unix timestamp. If not provided, uses current time. | ||
|
|
||
| - **prometheus_query_range** - Execute a range PromQL query against the cluster's Thanos Querier. | ||
| Returns metric values over a time range with specified resolution. | ||
| Use this for time-series data, trends, and historical analysis. | ||
|
|
||
| Supports relative times: | ||
| - 'now' for current time | ||
| - '-10m', '-1h', '-1d' for relative past times | ||
|
|
||
| Example: Get CPU usage over the last hour with 1-minute resolution. | ||
| - `end` (`string`) **(required)** - End time. Accepts RFC3339 timestamp, Unix timestamp, 'now', or relative time | ||
| - `query` (`string`) **(required)** - PromQL query string (e.g., 'rate(container_cpu_usage_seconds_total[5m])') | ||
| - `start` (`string`) **(required)** - Start time. Accepts RFC3339 timestamp (e.g., '2024-01-01T12:00:00Z'), Unix timestamp, or relative time (e.g., '-1h', '-30m', '-1d') | ||
| - `step` (`string`) - Query resolution step width (e.g., '15s', '1m', '5m'). Determines the granularity of returned data points. Default: '1m' | ||
|
|
||
| - **alertmanager_alerts** - Query active and pending alerts from the cluster's Alertmanager. | ||
| Useful for monitoring cluster health, detecting issues, and incident response. | ||
|
|
||
| Returns alerts with their labels, annotations, status, and timing information. | ||
| Can filter by active/silenced/inhibited state. | ||
|
|
||
| Common use cases: | ||
| - Check for critical alerts affecting the cluster | ||
| - Monitor for specific alert types (e.g., high CPU, disk pressure) | ||
| - Verify alert silences are working correctly | ||
| - `active` (`boolean`) - Filter for active (firing) alerts. Default: true | ||
| - `filter` (`string`) - Optional filter using Alertmanager filter syntax. Examples: 'alertname=Watchdog', 'severity=critical', 'namespace=openshift-monitoring' | ||
| - `inhibited` (`boolean`) - Include inhibited alerts in the results. Default: false | ||
| - `silenced` (`boolean`) - Include silenced alerts in the results. Default: false | ||
| <summary>obs-mcp</summary> | ||
|
|
||
| - **list_metrics** - MANDATORY FIRST STEP: List all available metric names in Prometheus. | ||
|
|
||
| YOU MUST CALL THIS TOOL BEFORE ANY OTHER QUERY TOOL | ||
|
|
||
| This tool MUST be called first for EVERY observability question to: | ||
| 1. Discover what metrics actually exist in this environment | ||
| 2. Find the EXACT metric name to use in queries | ||
| 3. Avoid querying non-existent metrics | ||
| 4. The 'name_regex' parameter should always be provided, and be a best guess of what the metric would be named like. | ||
| 5. Do not use a blanket regex like .* or .+ in the 'name_regex' parameter. Use specific ones like kube.*, node.*, etc. | ||
|
|
||
| NEVER skip this step. NEVER guess metric names. Metric names vary between environments. | ||
|
|
||
| After calling this tool: | ||
| 1. Search the returned list for relevant metrics | ||
| 2. Use the EXACT metric name found in subsequent queries | ||
| 3. If no relevant metric exists, inform the user | ||
| - `name_regex` (`string`) **(required)** - Regex pattern to filter metric names (e.g., 'http_.*', 'node_.*', 'kube.*'). This parameter is required. Don't pass in blanket regex. | ||
|
|
||
| - **execute_instant_query** - Execute a PromQL instant query to get current/point-in-time values. | ||
|
|
||
| PREREQUISITE: You MUST call list_metrics first to verify the metric exists | ||
|
|
||
| WHEN TO USE: | ||
| - Current state questions: "What is the current error rate?" | ||
| - Point-in-time snapshots: "How many pods are running?" | ||
| - Latest values: "Which pods are in Pending state?" | ||
|
|
||
| The 'query' parameter MUST use metric names that were returned by list_metrics. | ||
| - `query` (`string`) **(required)** - PromQL query string using metric names verified via list_metrics | ||
| - `time` (`string`) - Evaluation time as RFC3339 or Unix timestamp. Omit or use 'NOW' for current time. | ||
|
|
||
| - **execute_range_query** - Execute a PromQL range query to get time-series data over a period. | ||
|
|
||
| PREREQUISITE: You MUST call list_metrics first to verify the metric exists | ||
|
|
||
| WHEN TO USE: | ||
| - Trends over time: "What was CPU usage over the last hour?" | ||
| - Rate calculations: "How many requests per second?" | ||
| - Historical analysis: "Were there any restarts in the last 5 minutes?" | ||
|
|
||
| TIME PARAMETERS: | ||
| - 'duration': Look back from now (e.g., "5m", "1h", "24h") | ||
| - 'step': Data point resolution (e.g., "1m" for 1-hour duration, "5m" for 24-hour duration) | ||
|
|
||
| The 'query' parameter MUST use metric names that were returned by list_metrics. | ||
| - `duration` (`string`) - Duration to look back from now (e.g., '1h', '30m', '1d', '2w') (optional) | ||
| - `end` (`string`) - End time as RFC3339 or Unix timestamp (optional). Use `NOW` for current time. | ||
| - `query` (`string`) **(required)** - PromQL query string using metric names verified via list_metrics | ||
| - `start` (`string`) - Start time as RFC3339 or Unix timestamp (optional) | ||
| - `step` (`string`) **(required)** - Query resolution step width (e.g., '15s', '1m', '1h'). Choose based on time range: shorter ranges use smaller steps. | ||
|
|
||
| - **get_label_names** - Get all label names (dimensions) available for filtering a metric. | ||
|
|
||
| WHEN TO USE (after calling list_metrics): | ||
| - To discover how to filter metrics (by namespace, pod, service, etc.) | ||
| - Before constructing label matchers in PromQL queries | ||
|
|
||
| The 'metric' parameter should use a metric name from list_metrics output. | ||
| - `end` (`string`) - End time for label discovery as RFC3339 or Unix timestamp (optional, defaults to now) | ||
| - `metric` (`string`) - Metric name (from list_metrics) to get label names for. Leave empty for all metrics. | ||
| - `start` (`string`) - Start time for label discovery as RFC3339 or Unix timestamp (optional, defaults to 1 hour ago) | ||
|
|
||
| - **get_label_values** - Get all unique values for a specific label. | ||
|
|
||
| WHEN TO USE (after calling list_metrics and get_label_names): | ||
| - To find exact label values for filtering (namespace names, pod names, etc.) | ||
| - To see what values exist before constructing queries | ||
|
|
||
| The 'metric' parameter should use a metric name from list_metrics output. | ||
| - `end` (`string`) - End time for label value discovery as RFC3339 or Unix timestamp (optional, defaults to now) | ||
| - `label` (`string`) **(required)** - Label name (from get_label_names) to get values for | ||
| - `metric` (`string`) - Metric name (from list_metrics) to scope the label values to. Leave empty for all metrics. | ||
| - `start` (`string`) - Start time for label value discovery as RFC3339 or Unix timestamp (optional, defaults to 1 hour ago) | ||
|
|
||
| - **get_series** - Get time series matching selectors and preview cardinality. | ||
|
|
||
| WHEN TO USE (optional, after calling list_metrics): | ||
| - To verify label filters match expected series before querying | ||
| - To check cardinality and avoid slow queries | ||
|
|
||
| CARDINALITY GUIDANCE: | ||
| - <100 series: Safe | ||
| - 100-1000: Usually fine | ||
| - >1000: Add more label filters | ||
|
|
||
| The selector should use metric names from list_metrics output. | ||
| - `end` (`string`) - End time for series discovery as RFC3339 or Unix timestamp (optional, defaults to now) | ||
| - `matches` (`string`) **(required)** - PromQL series selector using metric names from list_metrics | ||
| - `start` (`string`) - Start time for series discovery as RFC3339 or Unix timestamp (optional, defaults to 1 hour ago) | ||
|
|
||
| - **get_alerts** - Get alerts from Alertmanager. | ||
|
|
||
| WHEN TO USE: | ||
| - START HERE when investigating issues: if the user asks about things breaking, errors, failures, outages, services being down, or anything going wrong in the cluster | ||
| - When the user mentions a specific alert name - use this tool to get the alert's full labels (namespace, pod, service, etc.) which are essential for further investigation with other tools | ||
| - To see currently firing alerts in the cluster | ||
| - To check which alerts are active, silenced, or inhibited | ||
| - To understand what's happening before diving into metrics or logs | ||
|
|
||
| INVESTIGATION TIP: Alert labels often contain the exact identifiers (pod names, namespaces, job names) needed for targeted queries with prometheus tools. | ||
|
|
||
| FILTERING: | ||
| - Use 'active' to filter for only active alerts (not resolved) | ||
| - Use 'silenced' to filter for silenced alerts | ||
| - Use 'inhibited' to filter for inhibited alerts | ||
| - Use 'filter' to apply label matchers (e.g., "alertname=HighCPU") | ||
| - Use 'receiver' to filter alerts by receiver name | ||
|
|
||
| All filter parameters are optional. Without filters, all alerts are returned. | ||
| - `active` (`boolean`) - Filter for active alerts only (true/false, optional) | ||
| - `filter` (`string`) - Label matchers to filter alerts (e.g., 'alertname=HighCPU', optional) | ||
| - `inhibited` (`boolean`) - Filter for inhibited alerts only (true/false, optional) | ||
| - `receiver` (`string`) - Receiver name to filter alerts (optional) | ||
| - `silenced` (`boolean`) - Filter for silenced alerts only (true/false, optional) | ||
| - `unprocessed` (`boolean`) - Filter for unprocessed alerts only (true/false, optional) | ||
|
|
||
| - **get_silences** - Get silences from Alertmanager. | ||
|
|
||
| WHEN TO USE: | ||
| - To see which alerts are currently silenced | ||
| - To check active, pending, or expired silences | ||
| - To investigate why certain alerts are not firing notifications | ||
|
|
||
| FILTERING: | ||
| - Use 'filter' to apply label matchers to find specific silences | ||
|
|
||
| Silences are used to temporarily mute alerts based on label matchers. This tool helps you understand what is currently silenced in your environment. | ||
| - `filter` (`string`) - Label matchers to filter silences (e.g., 'alertname=HighCPU', optional) | ||
|
|
||
|
Comment on lines
+456
to
587
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fix markdownlint warnings in the Static analysis flags this block for MD005 (list indentation) and MD049 (emphasis style). Please normalize list indentation and emphasis style here to keep docs lint clean. 🧰 Tools🪛 markdownlint-cli2 (0.21.0)[warning] 467-467: Emphasis style (MD049, emphasis-style) [warning] 467-467: Emphasis style (MD049, emphasis-style) [warning] 475-475: Emphasis style (MD049, emphasis-style) [warning] 475-475: Emphasis style (MD049, emphasis-style) [warning] 477-477: Inconsistent indentation for list items at the same level (MD005, list-indent) [warning] 490-490: Inconsistent indentation for list items at the same level (MD005, list-indent) [warning] 510-510: Inconsistent indentation for list items at the same level (MD005, list-indent) [warning] 521-521: Inconsistent indentation for list items at the same level (MD005, list-indent) [warning] 533-533: Inconsistent indentation for list items at the same level (MD005, list-indent) [warning] 549-549: Inconsistent indentation for list items at the same level (MD005, list-indent) [warning] 575-575: Inconsistent indentation for list items at the same level (MD005, list-indent) 🤖 Prompt for AI Agents |
||
| </details> | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update default indicators to match actual defaults.
This table currently leaves
helmandobs-mcpunmarked as defaults, which conflicts withpkg/config/config_test.goexpectations (Line 1052 includesobs-mcpin defaults).🤖 Prompt for AI Agents