Add node debug tool with tests by harche · Pull Request #38 · openshift/openshift-mcp-server

harche · 2025-10-01T19:23:35Z

Tested with claude code,

> what is the version of cgroup the node harpatilcluster000a7-m9vgd-worker-a-6jz6f is using? 

⏺ I'll check the cgroup version for the specified node.

⏺ kubernetes-mcp-server - Nodes: Debug Exec (MCP)(node: "harpatilcluster000a7-m9vgd-worker-a-6jz6f",
                                                 command: ["stat","-fc","%T","/sys/fs/cgroup/"])
  ⎿  cgroup2fs                                                                                       

⏺ cgroup v2

Summary by CodeRabbit

New Features
- Added an OpenShift toolset with tools for running node-level debug commands and planning must-gather collection.
Documentation
- Updated README and configuration docs to document the OpenShift toolset and its tool usage.
Chores
- OpenShift toolset enabled by default in configuration.
Tests
- Added end-to-end and unit tests covering node debug execution, helpers, and tool input validation.

harche · 2025-10-01T19:34:48Z

/hold

harche · 2025-10-01T19:39:23Z

/hold cancel

harche · 2025-10-01T22:56:10Z

/hold for fixing CI issues.

ardaguclu · 2025-10-02T04:02:51Z

I'll defer the review to;

@manusa @Cali0707 @matzew

harche · 2025-10-02T11:26:41Z

Failiures in the linter and security jobs are not related to the changes in this PR. The linter failures are getting addressed in #39, while security failures fixes are in #40

Cali0707

Thanks for working on this @harche

The code looks good overall, left a few comments throughout

I'll test this on an OpenShift cluster tmrw

Cali0707 · 2025-10-06T19:49:18Z

pkg/toolsets/core/nodes.go

+	internalk8s "github.com/containers/kubernetes-mcp-server/pkg/kubernetes"
+)
+
+func initNodes(_ internalk8s.Openshift) []api.ServerTool {


Nit: maybe we can drop the internalk8s.Openshift parameter here since we don't need it? For the initXYZ methods, there's no requirement on this parameter existing - it seems to be present in only some of them

Suggested change

func initNodes(_ internalk8s.Openshift) []api.ServerTool {

func initNodes() []api.ServerTool {

Addressed, thanks.

Cali0707 · 2025-10-06T19:56:41Z

pkg/kubernetes/nodes.go

+//
+// When namespace is empty, the configured namespace (or "default" if none) is used. When image is empty the
+// default debug image is used. Timeout controls how long we wait for the pod to complete.
+func (k *Kubernetes) NodesDebugExec(


Would it be possible to split this function up a little bit? IMO it is getting quite large and is responsible for too much

Maybe we can create functions that:

create the debug pod

poll for debug completion

Retrieve the logs

Addressed, thanks.

Cali0707 · 2025-10-06T20:00:10Z

pkg/kubernetes/nodes.go

+	// nodeDebugContainerName is the name used for the debug container, matching oc debug defaults.
+	nodeDebugContainerName = "debug"
+	// defaultNodeDebugTimeout is the maximum time to wait for the debug pod to finish executing.
+	defaultNodeDebugTimeout = 5 * time.Minute


We may want to lower this timeout as by default this will significantly exceed the client tool call connection timeout: https://github.com/modelcontextprotocol/typescript-sdk/blob/e0de0829019a4eab7af29c05f9a7ec13364f121e/src/shared/protocol.ts#L60

We probably also want to add support for progress notifications to be sent to the client for longer running tool calls like this one (cc @mrunalp @ardaguclu @matzew @manusa )

Reduced to 1 min, thanks.

Cali0707 · 2025-10-06T20:01:00Z

pkg/kubernetes/nodes.go

+		grace := int64(0)
+		_ = podsClient.Delete(deleteCtx, created.Name, metav1.DeleteOptions{GracePeriodSeconds: &grace})


nit: let's use ptr.To here like we do elsewhere

The refactored file at pkg/ocp/nodes_debug.go, now uses ptr.To. Thanks.

matzew · 2025-10-07T05:45:11Z

pkg/ocp/nodes_debug.go

+	defaultNodeDebugTimeout = 5 * time.Minute
+)
+
+// NodesDebugExec mimics `oc debug node/<name> -- <command...>` by creating a privileged pod on the target


Since this is ocp specific;
would it make sense, to group this functionality into some pkg/ocp package?

File pkg/ocp/nodes_debug.go is part of ocp package.

Hold on, I will move everything this PR adds from pkg/kubernetes to pkg/ocp so future rebasing avoids the conflicts.

Cali0707

@harche would it be possible to improve the error messages we provide?

When running this server with claude code claude was able to call the tool, but it frequently got errors such as:

⏺ ocp-debug - Nodes: Debug Exec (MCP)(node: "ip-10-0-112-253.us-east-2.compute.internal", command: ["systemctl","status","kubelet"])
  ⎿  Error: command exited with code 1 (Error)

I'm not sure if there is a way to get more info about what went wrong, but with the current lack of error messages it was hard for the agent to figure out what went wrong and how to fix it

ardaguclu · 2025-10-08T04:06:16Z

pkg/kubernetes/kubernetes.go

 type Kubernetes struct {
-	manager *Manager
+	manager          *Manager
+	podClientFactory func(namespace string) (corev1client.PodInterface, error)


Wouldn't these changes in here cause the divergence from upstream?. I think, in this repository we shouldn't allow any changes touching these packages such as kubernetes/, mcp/, etc.

My suggestion is to add first in upstream and simply use it in downstream. Downstream repository should touch only ocp/ directory.

yes, working on it, #38 (comment)

harche · 2025-10-15T19:42:55Z

/test test

Cali0707 · 2025-10-15T19:50:44Z

pkg/toolsets/core/toolset.go

 	return slices.Concat(
 		initEvents(),
 		initNamespaces(o),


IMO - let's not add any openshift-specific tools to the core toolset. I'm worried this may lead to conflicts in the future if we make any upstream changes.

Instead, let's create a new openshift specific toolset (maybe openshift-core) that will hold all the openshift only tools

@Cali0707 thanks for that feedback. Created new pkg/toolsets/openshift/ package with the openshift-core toolset, also moved nodes_debug_exec from core to openshift-core

harche · 2025-10-22T19:39:54Z

/retest

mrunalp · 2025-10-22T20:40:58Z

pkg/toolsets/openshift/toolset.go

+var _ api.Toolset = (*Toolset)(nil)
+
+func (t *Toolset) GetName() string {
+	return "openshift-core"


Let's rename this to be more specific like openshift-node-debug

how about openshift-debug? I was thinking of adding oc rsync kind of tool too, which may not be limited to just node related debugging.

That tool in turn can be used in future for copying pprofs from the nodes.

harche · 2025-10-22T20:50:36Z

/hold for containers#391

matzew · 2025-10-29T15:28:34Z

@harche the containers#391 is merged now

wanna rebase ?

harche · 2025-10-29T18:39:56Z

@harche the containers#391 is merged now

wanna rebase ?

Sure, I was planning to get back to this after openshift shiftweek (i.e. from upcoming Monday)

harche · 2025-11-04T16:56:11Z

I am looking into the rebasing it, need to resolve this merge conflict.

harche · 2025-11-04T18:56:12Z

@harche the containers#391 is merged now

wanna rebase ?

Done. Thanks.

cc @matzew @manusa

/hold cancel

manusa · 2025-11-25T16:12:35Z

pkg/config/config_test.go

+		s.Require().Lenf(config.Toolsets, 4, "Expected 4 toolsets, got %d", len(config.Toolsets))
+		for _, toolset := range []string{"core", "config", "helm", "openshift-core"} {


I'll need to look into this upstream

Do you think we should wait here for these upstream changes you are looking at?

I added a skip to this test since it relies on default values that might be overridden downstream.

containers#517

matzew · 2026-01-08T12:03:12Z

@harche Mind updating the PR? Back in Nov there were some comments - not sure if all go solved now /cc @Cali0707

openshift-merge-robot · 2026-01-08T12:03:21Z

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

manusa · 2026-01-08T15:14:17Z

@harche Mind updating the PR? Back in Nov there were some comments - not sure if all go solved now /cc @Cali0707

The core/upstream tests should not need updating now.
If any upstream file (besides those designed to be modified downstream) needs updating, we should tackle that before adding the changes here.

matzew · 2026-03-16T15:26:36Z

pkg/config/config_default_overrides.go

 		// IMPORTANT: this file is used to override default config values in downstream builds.
-		// This is intentionally left blank.
+		// OpenShift-specific defaults: add openshift-core toolset
+		Toolsets: []string{"core", "config", "helm", "openshift-core"},


we should perhaps now place that into the openshift toolset: https://github.com/openshift/openshift-mcp-server/blob/main/pkg/toolsets/openshift/toolset.go

thanks @matzew for the feedback.

coderabbitai · 2026-03-19T13:19:30Z

📝 Walkthrough

Walkthrough

Added an OpenShift toolset (enabled by default) with nodes_debug_exec and plan_mustgather; implemented node-debug execution (privileged debug pod lifecycle, polling, logs, error handling), tests and test helpers; updated documentation and default configuration to register the toolset.

Changes

Cohort / File(s)	Summary
Documentation `README.md`, `docs/configuration.md`	Added `openshift` toolset to "Available Toolsets" and documented `nodes_debug_exec` and `plan_mustgather` tool signatures and parameters.
Configuration & Registration `internal/tools/update-readme/main.go`, `pkg/config/config_default_overrides.go`	Blank-imported the openshift toolset for README generation and added `openshift` to default-enabled toolsets.
Node Debug Implementation `pkg/ocp/nodes_debug.go`, `pkg/ocp/nodes_debug_test.go`, `pkg/ocp/nodes_debug_testhelpers_test.go`	New implementation for node-level debug execution: pod creation (privileged, host mounts), polling for completion, image-pull failure detection, logs retrieval, deletion, result/error formatting, plus comprehensive unit tests and in-memory test helpers/fakes. Review for concurrency/timeouts, error wrapping, and pod spec security settings.
OpenShift Toolset Handler `pkg/toolsets/openshift/nodes.go`, `pkg/toolsets/openshift/nodes_test.go`, `pkg/toolsets/openshift/toolset.go`	Registered `nodes_debug_exec` tool with JSON schema and handler that validates/parses inputs (node, command, timeout), calls ocp NodesDebugExec, and returns tool-call results. Unit tests validate input validation behavior and handler responses.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Handler as nodes_debug_exec Handler
    participant K8sClient as NodeDebugClient
    participant API as Kubernetes API
    participant Node as Target Node

    Client->>Handler: Invoke nodes_debug_exec(node, command, ...)
    Handler->>K8sClient: Validate inputs, resolve namespace/image/timeout
    Handler->>K8sClient: Create privileged debug pod on node
    K8sClient->>API: Create Pod
    API->>Node: Schedule debug pod

    loop Polling
        Handler->>K8sClient: Get pod status
        K8sClient->>API: Fetch pod
        API-->>Handler: Pod status
        alt Image Pull Error
            Handler->>Handler: Fail fast on ErrImagePull/ImagePullBackOff
        end
    end

    Handler->>K8sClient: Retrieve container logs (30s bounded)
    K8sClient->>API: Get logs
    API-->>Handler: Container output

    Handler->>K8sClient: Delete debug pod
    K8sClient->>API: Delete Pod

    alt Container exit code == 0
        Handler-->>Client: Return logs as output
    else Container exit code != 0
        Handler-->>Client: Return error with exit code + logs
    else Termination info missing
        Handler-->>Client: Return derived error from pod/container status
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I dug a little pod beneath the node so deep,
Commands hopped in and made the logs leap,
I fetched the traces, scrubbed them clean and bright,
Now clusters sing and debuggers delight — hop, delight! 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 46.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding a node debug tool with accompanying tests. It is concise, clear, and directly reflects the primary content of the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

CodeRabbit can generate a title for your PR based on the changes.

Add @coderabbitai placeholder anywhere in the title of your PR and CodeRabbit will replace it with a title based on the changes in the PR. You can change the placeholder by changing the reviews.auto_title_placeholder setting.

openshift-ci · 2026-03-19T13:21:15Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: harche
Once this PR has been reviewed and has the lgtm label, please assign cali0707 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

pkg/ocp/nodes_debug.go (1)

111-113: ⚠️ Potential issue | 🟠 Major

Timeout path still drops logs and last pod status.

Line 112 returns on poll errors before any log collection, and Line 258 throws away lastPod / waitMsg on timeout. A command that produces output and then hangs—or a pod that stays pending because the node/scheduling is wrong—will currently collapse to a generic timeout with none of the useful diagnostics.

💡 Suggested direction

 	terminated, lastPod, waitMsg, err := pollForCompletion(ctx, k, ns, created.Name, timeout)
 	if err != nil {
-		return "", err
+		logs, _ := retrieveLogs(context.Background(), k, ns, created.Name)
+		if logs != "" {
+			return "", fmt.Errorf("%w\nOutput:\n%s", err, logs)
+		}
+		if waitMsg != "" {
+			return "", fmt.Errorf("%w\nLast status: %s", err, waitMsg)
+		}
+		return "", err
 	}

+		if current.Status.Message != "" {
+			waitMsg = current.Status.Message
+		} else if current.Status.Reason != "" {
+			waitMsg = current.Status.Reason
+		}
+
 		// Wait for the next tick interval before checking pod status again, or timeout if context is done.
 		select {
 		case <-pollCtx.Done():
-			return nil, nil, "", fmt.Errorf("timed out waiting for debug pod %s to complete: %w", podName, pollCtx.Err())
+			return nil, lastPod, waitMsg, fmt.Errorf("timed out waiting for debug pod %s to complete: %w", podName, pollCtx.Err())
 		case <-ticker.C:
 		}

Also applies to: 230-259

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/ocp/nodes_debug.go` around lines 111 - 113, The code currently returns
immediately on errors from pollForCompletion (variables terminated, lastPod,
waitMsg) which discards useful diagnostics; modify the error/timeout handling in
the functions using pollForCompletion so that when pollForCompletion returns an
error (including timeout) you first gather logs and status from lastPod and
include waitMsg and lastPod status/output in the returned error or in the logged
output before returning; update both the early-return site around
pollForCompletion and the timeout path around lines 230-259 to call the existing
log/status collection helpers (or add a small helper) to capture pod logs and
status and attach them to the error message instead of throwing them away.

🧹 Nitpick comments (1)

pkg/toolsets/openshift/nodes_test.go (1)

22-45: Consider adding test coverage for additional validation cases.

The current tests cover missing node and invalid command type. Consider adding tests for:

Missing command argument
Empty command array

These edge cases are handled in toStringSlice (lines 108-117 in nodes.go) but aren't exercised by these tests.

💡 Additional test cases

+	s.Run("missing command", func() {
+		params := api.ToolHandlerParams{
+			ToolCallRequest: staticRequest{args: map[string]any{
+				"node": "worker-0",
+			}},
+		}
+		result, err := nodesDebugExec(params)
+		s.Require().NoError(err)
+		s.Require().NotNil(result.Error)
+		s.Contains(result.Error.Error(), "command is required")
+	})
+
+	s.Run("empty command array", func() {
+		params := api.ToolHandlerParams{
+			ToolCallRequest: staticRequest{args: map[string]any{
+				"node":    "worker-0",
+				"command": []interface{}{},
+			}},
+		}
+		result, err := nodesDebugExec(params)
+		s.Require().NoError(err)
+		s.Require().NotNil(result.Error)
+		s.Contains(result.Error.Error(), "command array cannot be empty")
+	})

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/toolsets/openshift/nodes_test.go` around lines 22 - 45, Add two subtests
inside NodesHandlerSuite.TestValidatesInput to exercise the toStringSlice
validation: one "missing command" where params.ToolCallRequest has "node" set
but no "command" key, and one "empty command array" where "command" is an empty
[]any{} or []string{}. Call nodesDebugExec(params) in each, require no call
error, require result.Error is not nil, and assert the returned error message
matches the toStringSlice error for missing command and for empty command array
respectively; reference the existing TestValidatesInput structure and the
toStringSlice behavior in nodes.go when naming/asserting the expected messages.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/ocp/nodes_debug.go`:
- Around line 152-156: The label value construction for
kubernetes.AppKubernetesName uses sanitizedNode which can end with a trailing
hyphen after truncation (via sanitizeForName/substring), producing an invalid
label; update the code that builds sanitizedNode (and the other occurrence
around lines 314-332) to re-trim any trailing hyphens (and whitespace) after
truncation—i.e., run a final trim that removes '-' characters from the end of
sanitizedNode before using it in fmt.Sprintf("node-debug-%s", sanitizedNode) so
the generated label never ends with a hyphen.
- Around line 267-275: The current retrieveLogs function calls strings.TrimSpace
which strips leading indentation and blank lines and corrupts exec-style debug
output; change the return to preserve verbatim logs by removing the
strings.TrimSpace call and return the raw logs string from PodsLog (i.e., in
retrieveLogs replace the trimmed return with returning logs unchanged), ensuring
NodeDebugContainerName and the PodsLog invocation remain unchanged.

---

Duplicate comments:
In `@pkg/ocp/nodes_debug.go`:
- Around line 111-113: The code currently returns immediately on errors from
pollForCompletion (variables terminated, lastPod, waitMsg) which discards useful
diagnostics; modify the error/timeout handling in the functions using
pollForCompletion so that when pollForCompletion returns an error (including
timeout) you first gather logs and status from lastPod and include waitMsg and
lastPod status/output in the returned error or in the logged output before
returning; update both the early-return site around pollForCompletion and the
timeout path around lines 230-259 to call the existing log/status collection
helpers (or add a small helper) to capture pod logs and status and attach them
to the error message instead of throwing them away.

---

Nitpick comments:
In `@pkg/toolsets/openshift/nodes_test.go`:
- Around line 22-45: Add two subtests inside
NodesHandlerSuite.TestValidatesInput to exercise the toStringSlice validation:
one "missing command" where params.ToolCallRequest has "node" set but no
"command" key, and one "empty command array" where "command" is an empty []any{}
or []string{}. Call nodesDebugExec(params) in each, require no call error,
require result.Error is not nil, and assert the returned error message matches
the toStringSlice error for missing command and for empty command array
respectively; reference the existing TestValidatesInput structure and the
toStringSlice behavior in nodes.go when naming/asserting the expected messages.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: de697222-ba30-46fa-b7b6-7b17168b63ee

📥 Commits

Reviewing files that changed from the base of the PR and between 1103650 and 356f89f.

📒 Files selected for processing (10)

README.md
docs/configuration.md
internal/tools/update-readme/main.go
pkg/config/config_default_overrides.go
pkg/ocp/nodes_debug.go
pkg/ocp/nodes_debug_test.go
pkg/ocp/nodes_debug_testhelpers_test.go
pkg/toolsets/openshift/nodes.go
pkg/toolsets/openshift/nodes_test.go
pkg/toolsets/openshift/toolset.go

pkg/ocp/nodes_debug.go

coderabbitai · 2026-03-19T13:28:50Z

pkg/ocp/nodes_debug.go

+func retrieveLogs(ctx context.Context, k NodeDebugClient, namespace, podName string) (string, error) {
+	logCtx, logCancel := context.WithTimeout(ctx, 30*time.Second)
+	defer logCancel()
+	logs, logErr := k.PodsLog(logCtx, namespace, podName, NodeDebugContainerName, false, 0)
+	if logErr != nil {
+		return "", fmt.Errorf("failed to retrieve debug pod logs: %w", logErr)
+	}
+	return strings.TrimSpace(logs), nil
+}


⚠️ Potential issue | 🟡 Minor

Preserve command output verbatim here.

strings.TrimSpace removes leading indentation and blank lines, not just the trailing newline. For an exec-style debug tool, that's observable output corruption.

💡 Suggested fix

- return strings.TrimSpace(logs), nil + return strings.TrimRight(logs, "\r\n"), nil

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func retrieveLogs(ctx context.Context, k NodeDebugClient, namespace, podName string) (string, error) {

logCtx, logCancel := context.WithTimeout(ctx, 30*time.Second)

defer logCancel()

logs, logErr := k.PodsLog(logCtx, namespace, podName, NodeDebugContainerName, false, 0)

if logErr != nil {

return "", fmt.Errorf("failed to retrieve debug pod logs: %w", logErr)

}

return strings.TrimSpace(logs), nil

}

func retrieveLogs(ctx context.Context, k NodeDebugClient, namespace, podName string) (string, error) {

logCtx, logCancel := context.WithTimeout(ctx, 30*time.Second)

defer logCancel()

logs, logErr := k.PodsLog(logCtx, namespace, podName, NodeDebugContainerName, false, 0)

if logErr != nil {

return "", fmt.Errorf("failed to retrieve debug pod logs: %w", logErr)

}

return strings.TrimRight(logs, "\r\n"), nil

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/ocp/nodes_debug.go` around lines 267 - 275, The current retrieveLogs function calls strings.TrimSpace which strips leading indentation and blank lines and corrupts exec-style debug output; change the return to preserve verbatim logs by removing the strings.TrimSpace call and return the raw logs string from PodsLog (i.e., in retrieveLogs replace the trimmed return with returning logs unchanged), ensuring NodeDebugContainerName and the PodsLog invocation remain unchanged.

Add a privileged debug pod tool that mimics `oc debug node/<name>` for running commands on OpenShift nodes. The tool creates a temporary pod with the UBI9 toolbox image, executes the command, collects output, and cleans up automatically. Key design decisions based on PR review feedback: - All new code lives in pkg/ocp/ and pkg/toolsets/openshift/ only, no upstream packages (pkg/kubernetes/, pkg/mcp/) are modified - Tool is in the "openshift" toolset (not "core") to avoid conflicts - NodeDebugClient interface decouples from concrete k8s client for testability - Error messages include command output/logs for better diagnostics - Default timeout is 1 minute (within MCP client timeout limits) - Gated behind ReadOnly config (read_only=false required to use) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

♻️ Duplicate comments (1)

pkg/ocp/nodes_debug.go (1)
315-334: ⚠️ Potential issue | 🟡 Minor

Trailing hyphen may remain after truncation in sanitizeForName, causing invalid label values.

The sanitizeForName function trims leading/trailing hyphens at line 326, but then truncates at lines 330-331 without re-trimming. If the 40th character is a hyphen, the returned value will have a trailing hyphen. This creates invalid Kubernetes label values (used at line 156 for kubernetes.AppKubernetesName), since label values must match the pattern ([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9].

While the pod name is correctly handled (line 145 re-trims after its truncation), the label value uses sanitizedNode which may still contain a trailing hyphen from sanitizeForName.
🛠️ Proposed fix
 	if len(sanitized) > 40 {
-		sanitized = sanitized[:40]
+		sanitized = strings.TrimRight(sanitized[:40], "-")
+	}
+	if sanitized == "" {
+		sanitized = "node"
 	}
 	return sanitized
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/ocp/nodes_debug.go` around lines 315 - 334, sanitizeForName can return a
trailing hyphen when truncation cuts at a hyphen, producing invalid Kubernetes
label values (used for kubernetes.AppKubernetesName). Fix sanitizeForName by
re-trimming/truncation-sanitizing after the length cut: after you apply
sanitized = sanitized[:40] ensure you trim leading/trailing hyphens (e.g.,
strings.Trim(sanitized, "-")) and if that yields an empty string reset to
"node"; additionally ensure the final character is alphanumeric (remove trailing
non-alnum like '-') so the returned value always matches label rules.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@pkg/ocp/nodes_debug.go`:
- Around line 315-334: sanitizeForName can return a trailing hyphen when
truncation cuts at a hyphen, producing invalid Kubernetes label values (used for
kubernetes.AppKubernetesName). Fix sanitizeForName by
re-trimming/truncation-sanitizing after the length cut: after you apply
sanitized = sanitized[:40] ensure you trim leading/trailing hyphens (e.g.,
strings.Trim(sanitized, "-")) and if that yields an empty string reset to
"node"; additionally ensure the final character is alphanumeric (remove trailing
non-alnum like '-') so the returned value always matches label rules.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7c9ff437-5500-46a9-bfe5-9726b9ff099c

📥 Commits

Reviewing files that changed from the base of the PR and between 356f89f and b54f2bc.

📒 Files selected for processing (10)

README.md
docs/configuration.md
internal/tools/update-readme/main.go
pkg/config/config_default_overrides.go
pkg/ocp/nodes_debug.go
pkg/ocp/nodes_debug_test.go
pkg/ocp/nodes_debug_testhelpers_test.go
pkg/toolsets/openshift/nodes.go
pkg/toolsets/openshift/nodes_test.go
pkg/toolsets/openshift/toolset.go

✅ Files skipped from review due to trivial changes (2)

pkg/config/config_default_overrides.go
docs/configuration.md

🚧 Files skipped from review as they are similar to previous changes (4)

pkg/toolsets/openshift/toolset.go
pkg/toolsets/openshift/nodes.go
pkg/toolsets/openshift/nodes_test.go
README.md

openshift-ci · 2026-03-19T14:03:09Z

@harche: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

harche mentioned this pull request Oct 1, 2025

[RFE] node-specific tools #36

Closed

openshift-ci bot requested a review from ardaguclu October 1, 2025 19:24

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 1, 2025

harche force-pushed the main branch from dacc89d to f8e4a08 Compare October 1, 2025 19:38

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 1, 2025

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 1, 2025

harche changed the title ~~Add node debug tool with tests~~ WIP: Add node debug tool with tests Oct 2, 2025

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 2, 2025

harche force-pushed the main branch from f8e4a08 to fb7eed5 Compare October 2, 2025 11:25

harche changed the title ~~WIP: Add node debug tool with tests~~ Add node debug tool with tests Oct 2, 2025

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 2, 2025

Cali0707 reviewed Oct 6, 2025

View reviewed changes

matzew reviewed Oct 7, 2025

View reviewed changes

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 7, 2025

harche force-pushed the main branch from fb7eed5 to fc61e2e Compare October 7, 2025 15:46

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 7, 2025

harche force-pushed the main branch from fc61e2e to 9aef094 Compare October 7, 2025 15:55

Cali0707 reviewed Oct 7, 2025

View reviewed changes

ardaguclu reviewed Oct 8, 2025

View reviewed changes

harche force-pushed the main branch from 9aef094 to 9183526 Compare October 8, 2025 12:40

swghosh mentioned this pull request Oct 10, 2025

MG-34: Add oc cli like must-gather collection with ServerPrompt #51

Merged

harche force-pushed the main branch 2 times, most recently from c99af16 to 694501c Compare October 15, 2025 19:35

Cali0707 reviewed Oct 15, 2025

View reviewed changes

harche force-pushed the main branch from cd26a4d to be94848 Compare October 22, 2025 19:47

mrunalp reviewed Oct 22, 2025

View reviewed changes

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 22, 2025

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 29, 2025

harche force-pushed the main branch from be94848 to dbe51ab Compare November 4, 2025 18:52

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 4, 2025

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 4, 2025

swghosh mentioned this pull request Nov 5, 2025

proof: Add openshift-core toolset with node and must-gather tools #69

Closed

2 tasks

manusa reviewed Nov 25, 2025

View reviewed changes

manusa mentioned this pull request Nov 27, 2025

test(config): skip test if default configuration overrides are present containers/kubernetes-mcp-server#517

Merged

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 8, 2026

matzew reviewed Mar 16, 2026

View reviewed changes

harche force-pushed the main branch from dbe51ab to 356f89f Compare March 19, 2026 13:18

openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 19, 2026

coderabbitai bot reviewed Mar 19, 2026

View reviewed changes

harche force-pushed the main branch from 356f89f to b54f2bc Compare March 19, 2026 13:54

coderabbitai bot reviewed Mar 19, 2026

View reviewed changes

	func initNodes(_ internalk8s.Openshift) []api.ServerTool {
	func initNodes() []api.ServerTool {

		grace := int64(0)
		_ = podsClient.Delete(deleteCtx, created.Name, metav1.DeleteOptions{GracePeriodSeconds: &grace})

		s.Require().Lenf(config.Toolsets, 4, "Expected 4 toolsets, got %d", len(config.Toolsets))
		for _, toolset := range []string{"core", "config", "helm", "openshift-core"} {

Conversation

harche commented Oct 1, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

harche commented Oct 1, 2025

Uh oh!

harche commented Oct 1, 2025

Uh oh!

harche commented Oct 1, 2025

Uh oh!

ardaguclu commented Oct 2, 2025

Uh oh!

harche commented Oct 2, 2025

Uh oh!

Cali0707 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

harche Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Cali0707 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

harche commented Oct 15, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

harche commented Oct 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

harche Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

harche commented Oct 22, 2025

Uh oh!

matzew commented Oct 29, 2025

Uh oh!

harche commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

harche commented Nov 4, 2025

Uh oh!

harche commented Nov 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

harche commented Oct 1, 2025 •

edited by coderabbitai bot

Loading

harche Oct 7, 2025 •

edited

Loading

harche Oct 22, 2025 •

edited

Loading

harche commented Oct 29, 2025 •

edited

Loading

manusa Nov 27, 2025 •

edited

Loading

coderabbitai bot commented Mar 19, 2026 •

edited

Loading