prompt: Improve list_metrics prompt by saswatamcode · Pull Request #44 · rhobs/obs-mcp

saswatamcode · 2026-03-11T04:34:39Z

LLMs seem to get confused with what regex to pass to this tool. Guidance in the prompt helps improve it a bit

This improves it a bit

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

openshift-ci · 2026-03-11T04:34:47Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: saswatamcode

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [saswatamcode]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

iNecas · 2026-03-11T11:18:00Z

TOOLS.md


 - YOU MUST CALL THIS TOOL BEFORE ANY OTHER QUERY TOOL
 - This tool MUST be called first for EVERY observability question to: 1. Discover what metrics actually exist in this environment 2. Find the EXACT metric name to use in queries 3. Avoid querying non-existent metrics 4. The 'name_regex' parameter should always be provided, and be a best guess of what the metric would be named like. 5. Do not use a blanket regex like .* or .+ in the 'name_regex' parameter. Use specific ones like kube.*, node.*, etc.
+- REGEX PATTERN GUIDANCE: - Prometheus metrics are typically prefixed (e.g., 'prometheus_tsdb_head_series', 'kube_pod_status_phase') - To match metrics CONTAINING a substring, use wildcards: '.*tsdb.*' matches 'prometheus_tsdb_head_series' - Without wildcards, the pattern matches EXACTLY: 'tsdb' only matches a metric literally named 'tsdb' (which rarely exists) - Common patterns: 'kube_pod.*' (pods), '.*memory.*' (memory-related), 'node_.*' (node metrics) - If you get empty results, try adding '.*' before/after your search term


How common you think it's would be more correct to match prefix vs. inner-word. We could also direct the LLM to use ^ to match prefix, and do "contain" search by default. Exact search would then be achieved by ^tsdb$ . It seems it would be hardly to make by mistake.

Here is a one scenario I tried on a kind cluster Are any containers in OOMKilled state?, there was no metric available for this in the cluster.

Tool Call History: "Are any containers in OOMKilled state?"

First Attempt

# Tool Parameters Result

1 list_metrics name_regex: "oom|OOM|terminated" []

2 list_metrics name_regex: "kube_pod_container_status" []

3 list_metrics name_regex: "container_status|container_state" []

4 list_metrics name_regex: "kube_pod_container" []

5 list_metrics name_regex: ".*reason.*" kube_pod_container_status_waiting_reason, kube_pod_init_container_status_terminated_reason, kube_pod_status_reason

6 list_metrics name_regex: ".*terminated.*" kube_pod_container_status_terminated, kube_pod_init_container_status_terminated, kube_pod_init_container_status_terminated_reason, ...

7 list_metrics name_regex: "kube_pod_container_status_last_terminated_reason" []

8 list_metrics name_regex: "kube_pod_container_status_terminated" kube_pod_container_status_terminated

9 get_label_names metric: "kube_pod_container_status_terminated" Error: Aborted

10 list_metrics name_regex: "kube_pod_container_status" []

11 list_metrics name_regex: ".*terminated_reason" kube_pod_init_container_status_terminated_reason

12 get_label_names metric: "kube_pod_init_container_status_terminated_reason" __name__, container, instance, job, namespace, pod, reason, uid

13 get_label_names metric: "kube_pod_container_status_terminated" __name__, container, instance, job, namespace, pod, uid

14 list_metrics name_regex: "last_terminated" []

15 get_label_values metric: "kube_pod_init_container_status_terminated_reason", label: "reason" ["Completed"]

16 execute_instant_query query: kube_pod_init_container_status_terminated_reason{reason="OOMKilled"} []

17 execute_instant_query query: kube_pod_container_status_terminated{namespace=~".+"} == 1 []

18 list_metrics name_regex: "container.*memory|memory.*container" User rejected

19 list_metrics name_regex: "node_vmstat_oom" Error: Aborted

Summary

Total: 19 tool calls

Outcome: No OOMKilled containers found

Note: kube_pod_container_status_last_terminated_reason is not present in this cluster. Only kube_pod_init_container_status_terminated_reason exists (init containers), with reason values ["Completed"].

So for a scenario like a metric corresponding to a question doesn't exist, we may need to alert user earlier than letting them try more and being frustrated?

Say

**FAIL FAST:** If list_metrics returns no relevant metric after 2–3 attempts, STOP and inform the user immediately. Do not keep trying more patterns. Say something like: "The metric for [question] does not appear to exist in your cluster—it may not be scraped by your monitoring setup." This avoids frustrating the user with many failed tool calls.

This can vary based on model used but still may be we should have a description for fail fast also?

Another way could be have a new parameter attempt_number that the server uses to return a “metric not found” hint after N attempts

wdyt?

prompt: Improve list_metrics prompt

4503681

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

saswatamcode requested a review from a team March 11, 2026 04:34

openshift-ci bot requested review from iNecas and slashpai March 11, 2026 04:34

openshift-ci bot added the approved label Mar 11, 2026

docs

5474c0c

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

iNecas reviewed Mar 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prompt: Improve list_metrics prompt#44

prompt: Improve list_metrics prompt#44
saswatamcode wants to merge 2 commits intorhobs:mainfrom
saswatamcode:promptlistm

saswatamcode commented Mar 11, 2026

Uh oh!

openshift-ci bot commented Mar 11, 2026

Uh oh!

iNecas Mar 11, 2026

Uh oh!

slashpai Mar 11, 2026 •

edited

Loading

Uh oh!

slashpai Mar 11, 2026

Uh oh!

slashpai Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

#	Tool	Parameters	Result
1	`list_metrics`	`name_regex: "oom\|OOM\|terminated"`	`[]`
2	`list_metrics`	`name_regex: "kube_pod_container_status"`	`[]`
3	`list_metrics`	`name_regex: "container_status\|container_state"`	`[]`
4	`list_metrics`	`name_regex: "kube_pod_container"`	`[]`
5	`list_metrics`	`name_regex: ".reason."`	`kube_pod_container_status_waiting_reason`, `kube_pod_init_container_status_terminated_reason`, `kube_pod_status_reason`
6	`list_metrics`	`name_regex: ".terminated."`	`kube_pod_container_status_terminated`, `kube_pod_init_container_status_terminated`, `kube_pod_init_container_status_terminated_reason`, ...
7	`list_metrics`	`name_regex: "kube_pod_container_status_last_terminated_reason"`	`[]`
8	`list_metrics`	`name_regex: "kube_pod_container_status_terminated"`	`kube_pod_container_status_terminated`
9	`get_label_names`	`metric: "kube_pod_container_status_terminated"`	Error: Aborted
10	`list_metrics`	`name_regex: "kube_pod_container_status"`	`[]`
11	`list_metrics`	`name_regex: ".*terminated_reason"`	`kube_pod_init_container_status_terminated_reason`
12	`get_label_names`	`metric: "kube_pod_init_container_status_terminated_reason"`	`__name__`, `container`, `instance`, `job`, `namespace`, `pod`, `reason`, `uid`
13	`get_label_names`	`metric: "kube_pod_container_status_terminated"`	`__name__`, `container`, `instance`, `job`, `namespace`, `pod`, `uid`
14	`list_metrics`	`name_regex: "last_terminated"`	`[]`
15	`get_label_values`	`metric: "kube_pod_init_container_status_terminated_reason"`, `label: "reason"`	`["Completed"]`
16	`execute_instant_query`	`query: kube_pod_init_container_status_terminated_reason{reason="OOMKilled"}`	`[]`
17	`execute_instant_query`	`query: kube_pod_container_status_terminated{namespace=~".+"} == 1`	`[]`
18	`list_metrics`	`name_regex: "container.memory\|memory.container"`	User rejected
19	`list_metrics`	`name_regex: "node_vmstat_oom"`	Error: Aborted

Conversation

saswatamcode commented Mar 11, 2026

Uh oh!

openshift-ci bot commented Mar 11, 2026

Uh oh!

iNecas Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

slashpai Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Tool Call History: "Are any containers in OOMKilled state?"

First Attempt

Summary

Uh oh!

slashpai Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

slashpai Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

slashpai Mar 11, 2026 •

edited

Loading