Problem after everything set up for autonomous-k8s-engineer

I have tried to implement autonomous-k8s-engineer and followed all provided steps, but whatever I do, recive this error :) Can you suggest me what is problem with this, where to look for solution :) 

root@master:/home/kubernetes# kubectl logs -f -n kagent -l app.kubernetes.io/name=self-healing-agent
2026-01-30 13:42:01,665 - google_adk.google.adk.runners - WARNING - Event from an unknown agent: system, event id: 38dc1bd3-da14-4cd1-9dc7-74be4f6ee333
2026-01-30 13:42:01,665 - google_adk.google.adk.runners - WARNING - Event from an unknown agent: system, event id: 38dc1bd3-da14-4cd1-9dc7-74be4f6ee333
2026-01-30 13:42:01,672 - httpx - INFO - HTTP Request: POST http://kagent-tools.kagent:8084/mcp "HTTP/1.1 200 OK"
13:42:01 - LiteLLM:INFO: utils.py:3258 -
LiteLLM completion() model= llama3:latest; provider = ollama_chat
2026-01-30 13:42:01,688 - LiteLLM - INFO -
LiteLLM completion() model= llama3:latest; provider = ollama_chat
2026-01-30 13:42:01,697 - httpx - INFO - HTTP Request: POST http://kagent-tools.kagent:8084/mcp "HTTP/1.1 200 OK"
2026-01-30 13:42:01,983 - httpx - INFO - HTTP Request: POST http://ollama.ollama.svc.cluster.local/api/show "HTTP/1.1 200 OK"
2026-01-30 13:42:02,251 - httpx - INFO - HTTP Request: POST http://ollama.ollama.svc.cluster.local/api/show "HTTP/1.1 200 OK"
2026-01-30 13:42:09,843 - httpx - INFO - HTTP Request: POST http://ollama.ollama.svc.cluster.local/api/chat "HTTP/1.1 200 OK"
2026-01-30 13:42:10,130 - httpx - INFO - HTTP Request: POST http://ollama.ollama.svc.cluster.local/api/show "HTTP/1.1 200 OK"
2026-01-30 13:42:10,437 - httpx - INFO - HTTP Request: POST http://ollama.ollama.svc.cluster.local/api/show "HTTP/1.1 200 OK"
2026-01-30 13:42:10,723 - httpx - INFO - HTTP Request: POST http://ollama.ollama.svc.cluster.local/api/show "HTTP/1.1 200 OK"
2026-01-30 13:42:10,736 - httpx - INFO - HTTP Request: POST http://kagent-tools.kagent:8084/mcp "HTTP/1.1 200 OK"
2026-01-30 13:42:10,756 - httpx - INFO - HTTP Request: POST http://kagent-controller.kagent:8083/api/sessions/b346f1e0-e3d1-43c4-b081-9e8d4184071b/events?user_id=A2A_USER_b346f1e0-e3d1-43c4-b081-9e8d4184071b "HTTP/1.1 201 Created"
2026-01-30 13:42:10,758 - kagent_adk.kagent.adk._agent_executor - ERROR - Error handling A2A request: Tool 'self_healing_agent' not found.
Available tools: k8s_apply_manifest, k8s_delete_resource, k8s_describe_resource, k8s_get_available_api_resources, k8s_get_events, k8s_get_pod_logs, k8s_get_resources, k8s_patch_resource, k8s_scale

Possible causes:
  1. LLM hallucinated the function name - review agent instruction clarity
  2. Tool not registered - verify agent.tools list
  3. Name mismatch - check for typos

Suggested fixes:
  - Review agent instruction to ensure tool usage is clear
  - Verify tool is included in agent.tools list
  - Check for typos in function name
Traceback (most recent call last):
  File "/.kagent/packages/kagent-adk/src/kagent/adk/_agent_executor.py", line 146, in execute
    await self._handle_request(context, event_queue, runner, run_args)
  File "/.kagent/packages/kagent-adk/src/kagent/adk/_agent_executor.py", line 241, in _handle_request
    async for adk_event in agen:
    ...<7 lines>...
            await event_queue.enqueue_event(a2a_event)
  File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/runners.py", line 505, in run_async
    async for event in agen:
      yield event
  File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/runners.py", line 493, in _run_with_trace
    async for event in agen:
      yield event
  File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/runners.py", line 722, in _exec_with_plugin
    async for event in agen:
    ...<54 lines>...
      yield (modified_event if modified_event else event)
  File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/runners.py", line 482, in execute
    async for event in agen:
      yield event
  File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/agents/base_agent.py", line 294, in run_async
    async for event in agen:
      yield event
  File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/agents/llm_agent.py", line 460, in _run_async_impl
    async for event in agen:
    ...<5 lines>...
        should_pause = True
  File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/flows/llm_flows/base_llm_flow.py", line 370, in run_async
    async for event in agen:
      last_event = event
      yield event
  File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/flows/llm_flows/base_llm_flow.py", line 457, in _run_one_step_async
    async for event in agen:
    ...<3 lines>...
      yield event
  File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/flows/llm_flows/base_llm_flow.py", line 569, in _postprocess_async
    async for event in agen:
      yield event
  File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/flows/llm_flows/base_llm_flow.py", line 681, in _postprocess_handle_function_calls_async
    if function_response_event := await functions.handle_function_calls_async(
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        invocation_context, function_call_event, llm_request.tools_dict
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ):
    ^
  File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/flows/llm_flows/functions.py", line 198, in handle_function_calls_async
    return await handle_function_call_list_async(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<5 lines>...
    )
    ^
  File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/flows/llm_flows/functions.py", line 244, in handle_function_call_list_async
    function_response_events = await asyncio.gather(*tasks)
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/flows/llm_flows/functions.py", line 338, in _execute_single_function_call_async
    raise tool_error
  File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/flows/llm_flows/functions.py", line 324, in _execute_single_function_call_async
    tool = _get_tool(function_call, tools_dict)
  File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/flows/llm_flows/functions.py", line 729, in _get_tool
    raise ValueError(error_msg)
ValueError: Tool 'self_healing_agent' not found.
Available tools: k8s_apply_manifest, k8s_delete_resource, k8s_describe_resource, k8s_get_available_api_resources, k8s_get_events, k8s_get_pod_logs, k8s_get_resources, k8s_patch_resource, k8s_scale

Possible causes:
  1. LLM hallucinated the function name - review agent instruction clarity
  2. Tool not registered - verify agent.tools list
  3. Name mismatch - check for typos

Suggested fixes:
  - Review agent instruction to ensure tool usage is clear
  - Verify tool is included in agent.tools list
  - Check for typos in function name
2026-01-30 13:42:10,765 - httpx - INFO - HTTP Request: POST http://kagent-controller.kagent:8083/api/tasks "HTTP/1.1 201 Created"
2026-01-30 13:42:10,770 - httpx - INFO - HTTP Request: POST http://kagent-controller.kagent:8083/api/tasks "HTTP/1.1 201 Created"
INFO:     10.42.2.56:42992 - "POST / HTTP/1.1" 200 OK


Agent instructions: 

 modelConfig: default-model-config
    systemMessage: |
      You are a Kubernetes Self-Healing Agent responsible for maintaining cluster health.

      IMPORTANT:
      - The words DETECT, DIAGNOSE, PLAN, EXECUTE, VERIFY are NOT tool/function names.
      - You MUST ONLY call tools from this list exactly as written:
        k8s_get_resources, k8s_get_pod_logs, k8s_get_events, k8s_describe_resource,
        k8s_scale, k8s_patch_resource, k8s_apply_manifest, k8s_delete_resource,
        k8s_get_available_api_resources
      - Never call tools named "detect" or "perform_health_check".

      ## Your Mission
      Monitor the cluster for issues and automatically remediate them without human intervention.

      ## Your Capabilities
      You have access to the following tools:
      - Kubernetes tools: Get pods, logs, events, apply/delete resources
      - Prometheus tools: Query metrics, check alerts, analyze trends

      ## Your Process
      When investigating an issue:
      1. DETECT: Check for firing alerts or anomalous metrics (conceptual step, NOT a tool)
      2. DIAGNOSE: Gather logs, events, and metrics to identify root cause (use k8s_get_* tools)
      3. PLAN: Determine the remediation action
      4. EXECUTE: Apply the fix using available Kubernetes tools
      5. VERIFY: Confirm the issue is resolved using k8s_get_resources and events

      ## Common Remediation Strategies

      ### CrashLoopBackOff
      - Use k8s_get_pod_logs and k8s_get_events
      - If caused by OOM: Increase memory limits using k8s_patch_resource

      ### Pod Not Ready
      - Check pod status and events
      - Verify service endpoints

      ### Scale to Zero
      - If a deployment has replicas=0 and should be running, use k8s_scale to restore it to 3

      ### Resource Exhaustion
      - Identify affected pods
      - Scale horizontally using k8s_scale

      ## Safety Rules
      - Never delete namespaces: kube-system, kagent, monitoring
      - Always verify changes after applying
      - Prefer scaling or patching over deleting resources
      - Log every action you take
      - When scaling, be precise and explicit
    tools:
    - mcpServer:
        kind: RemoteMCPServer
        name: kagent-tool-server
        toolNames:
        - k8s_get_resources
        - k8s_get_pod_logs
        - k8s_get_events
        - k8s_apply_manifest
        - k8s_delete_resource
        - k8s_patch_resource
        - k8s_describe_resource
        - k8s_get_available_api_resources
        - k8s_scale
        - prometheus_query
        - prometheus_get_alerts
      type: McpServer
  description: An AI agent that monitors cluster health and automatically remediates



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem after everything set up for autonomous-k8s-engineer #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Problem after everything set up for autonomous-k8s-engineer #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions