I have tried to implement autonomous-k8s-engineer and followed all provided steps, but whatever I do, recive this error :) Can you suggest me what is problem with this, where to look for solution :)
root@master:/home/kubernetes# kubectl logs -f -n kagent -l app.kubernetes.io/name=self-healing-agent
2026-01-30 13:42:01,665 - google_adk.google.adk.runners - WARNING - Event from an unknown agent: system, event id: 38dc1bd3-da14-4cd1-9dc7-74be4f6ee333
2026-01-30 13:42:01,665 - google_adk.google.adk.runners - WARNING - Event from an unknown agent: system, event id: 38dc1bd3-da14-4cd1-9dc7-74be4f6ee333
2026-01-30 13:42:01,672 - httpx - INFO - HTTP Request: POST http://kagent-tools.kagent:8084/mcp "HTTP/1.1 200 OK"
13:42:01 - LiteLLM:INFO: utils.py:3258 -
LiteLLM completion() model= llama3:latest; provider = ollama_chat
2026-01-30 13:42:01,688 - LiteLLM - INFO -
LiteLLM completion() model= llama3:latest; provider = ollama_chat
2026-01-30 13:42:01,697 - httpx - INFO - HTTP Request: POST http://kagent-tools.kagent:8084/mcp "HTTP/1.1 200 OK"
2026-01-30 13:42:01,983 - httpx - INFO - HTTP Request: POST http://ollama.ollama.svc.cluster.local/api/show "HTTP/1.1 200 OK"
2026-01-30 13:42:02,251 - httpx - INFO - HTTP Request: POST http://ollama.ollama.svc.cluster.local/api/show "HTTP/1.1 200 OK"
2026-01-30 13:42:09,843 - httpx - INFO - HTTP Request: POST http://ollama.ollama.svc.cluster.local/api/chat "HTTP/1.1 200 OK"
2026-01-30 13:42:10,130 - httpx - INFO - HTTP Request: POST http://ollama.ollama.svc.cluster.local/api/show "HTTP/1.1 200 OK"
2026-01-30 13:42:10,437 - httpx - INFO - HTTP Request: POST http://ollama.ollama.svc.cluster.local/api/show "HTTP/1.1 200 OK"
2026-01-30 13:42:10,723 - httpx - INFO - HTTP Request: POST http://ollama.ollama.svc.cluster.local/api/show "HTTP/1.1 200 OK"
2026-01-30 13:42:10,736 - httpx - INFO - HTTP Request: POST http://kagent-tools.kagent:8084/mcp "HTTP/1.1 200 OK"
2026-01-30 13:42:10,756 - httpx - INFO - HTTP Request: POST http://kagent-controller.kagent:8083/api/sessions/b346f1e0-e3d1-43c4-b081-9e8d4184071b/events?user_id=A2A_USER_b346f1e0-e3d1-43c4-b081-9e8d4184071b "HTTP/1.1 201 Created"
2026-01-30 13:42:10,758 - kagent_adk.kagent.adk._agent_executor - ERROR - Error handling A2A request: Tool 'self_healing_agent' not found.
Available tools: k8s_apply_manifest, k8s_delete_resource, k8s_describe_resource, k8s_get_available_api_resources, k8s_get_events, k8s_get_pod_logs, k8s_get_resources, k8s_patch_resource, k8s_scale
Possible causes:
- LLM hallucinated the function name - review agent instruction clarity
- Tool not registered - verify agent.tools list
- Name mismatch - check for typos
Suggested fixes:
- Review agent instruction to ensure tool usage is clear
- Verify tool is included in agent.tools list
- Check for typos in function name
Traceback (most recent call last):
File "/.kagent/packages/kagent-adk/src/kagent/adk/_agent_executor.py", line 146, in execute
await self._handle_request(context, event_queue, runner, run_args)
File "/.kagent/packages/kagent-adk/src/kagent/adk/_agent_executor.py", line 241, in _handle_request
async for adk_event in agen:
...<7 lines>...
await event_queue.enqueue_event(a2a_event)
File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/runners.py", line 505, in run_async
async for event in agen:
yield event
File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/runners.py", line 493, in _run_with_trace
async for event in agen:
yield event
File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/runners.py", line 722, in _exec_with_plugin
async for event in agen:
...<54 lines>...
yield (modified_event if modified_event else event)
File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/runners.py", line 482, in execute
async for event in agen:
yield event
File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/agents/base_agent.py", line 294, in run_async
async for event in agen:
yield event
File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/agents/llm_agent.py", line 460, in _run_async_impl
async for event in agen:
...<5 lines>...
should_pause = True
File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/flows/llm_flows/base_llm_flow.py", line 370, in run_async
async for event in agen:
last_event = event
yield event
File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/flows/llm_flows/base_llm_flow.py", line 457, in _run_one_step_async
async for event in agen:
...<3 lines>...
yield event
File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/flows/llm_flows/base_llm_flow.py", line 569, in _postprocess_async
async for event in agen:
yield event
File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/flows/llm_flows/base_llm_flow.py", line 681, in _postprocess_handle_function_calls_async
if function_response_event := await functions.handle_function_calls_async(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
invocation_context, function_call_event, llm_request.tools_dict
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
):
^
File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/flows/llm_flows/functions.py", line 198, in handle_function_calls_async
return await handle_function_call_list_async(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<5 lines>...
)
^
File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/flows/llm_flows/functions.py", line 244, in handle_function_call_list_async
function_response_events = await asyncio.gather(*tasks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/flows/llm_flows/functions.py", line 338, in _execute_single_function_call_async
raise tool_error
File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/flows/llm_flows/functions.py", line 324, in _execute_single_function_call_async
tool = _get_tool(function_call, tools_dict)
File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/flows/llm_flows/functions.py", line 729, in _get_tool
raise ValueError(error_msg)
ValueError: Tool 'self_healing_agent' not found.
Available tools: k8s_apply_manifest, k8s_delete_resource, k8s_describe_resource, k8s_get_available_api_resources, k8s_get_events, k8s_get_pod_logs, k8s_get_resources, k8s_patch_resource, k8s_scale
Possible causes:
- LLM hallucinated the function name - review agent instruction clarity
- Tool not registered - verify agent.tools list
- Name mismatch - check for typos
Suggested fixes:
Agent instructions:
modelConfig: default-model-config
systemMessage: |
You are a Kubernetes Self-Healing Agent responsible for maintaining cluster health.
IMPORTANT:
- The words DETECT, DIAGNOSE, PLAN, EXECUTE, VERIFY are NOT tool/function names.
- You MUST ONLY call tools from this list exactly as written:
k8s_get_resources, k8s_get_pod_logs, k8s_get_events, k8s_describe_resource,
k8s_scale, k8s_patch_resource, k8s_apply_manifest, k8s_delete_resource,
k8s_get_available_api_resources
- Never call tools named "detect" or "perform_health_check".
## Your Mission
Monitor the cluster for issues and automatically remediate them without human intervention.
## Your Capabilities
You have access to the following tools:
- Kubernetes tools: Get pods, logs, events, apply/delete resources
- Prometheus tools: Query metrics, check alerts, analyze trends
## Your Process
When investigating an issue:
1. DETECT: Check for firing alerts or anomalous metrics (conceptual step, NOT a tool)
2. DIAGNOSE: Gather logs, events, and metrics to identify root cause (use k8s_get_* tools)
3. PLAN: Determine the remediation action
4. EXECUTE: Apply the fix using available Kubernetes tools
5. VERIFY: Confirm the issue is resolved using k8s_get_resources and events
## Common Remediation Strategies
### CrashLoopBackOff
- Use k8s_get_pod_logs and k8s_get_events
- If caused by OOM: Increase memory limits using k8s_patch_resource
### Pod Not Ready
- Check pod status and events
- Verify service endpoints
### Scale to Zero
- If a deployment has replicas=0 and should be running, use k8s_scale to restore it to 3
### Resource Exhaustion
- Identify affected pods
- Scale horizontally using k8s_scale
## Safety Rules
- Never delete namespaces: kube-system, kagent, monitoring
- Always verify changes after applying
- Prefer scaling or patching over deleting resources
- Log every action you take
- When scaling, be precise and explicit
tools:
- mcpServer:
kind: RemoteMCPServer
name: kagent-tool-server
toolNames:
- k8s_get_resources
- k8s_get_pod_logs
- k8s_get_events
- k8s_apply_manifest
- k8s_delete_resource
- k8s_patch_resource
- k8s_describe_resource
- k8s_get_available_api_resources
- k8s_scale
- prometheus_query
- prometheus_get_alerts
type: McpServer
description: An AI agent that monitors cluster health and automatically remediates
I have tried to implement autonomous-k8s-engineer and followed all provided steps, but whatever I do, recive this error :) Can you suggest me what is problem with this, where to look for solution :)
root@master:/home/kubernetes# kubectl logs -f -n kagent -l app.kubernetes.io/name=self-healing-agent
2026-01-30 13:42:01,665 - google_adk.google.adk.runners - WARNING - Event from an unknown agent: system, event id: 38dc1bd3-da14-4cd1-9dc7-74be4f6ee333
2026-01-30 13:42:01,665 - google_adk.google.adk.runners - WARNING - Event from an unknown agent: system, event id: 38dc1bd3-da14-4cd1-9dc7-74be4f6ee333
2026-01-30 13:42:01,672 - httpx - INFO - HTTP Request: POST http://kagent-tools.kagent:8084/mcp "HTTP/1.1 200 OK"
13:42:01 - LiteLLM:INFO: utils.py:3258 -
LiteLLM completion() model= llama3:latest; provider = ollama_chat
2026-01-30 13:42:01,688 - LiteLLM - INFO -
LiteLLM completion() model= llama3:latest; provider = ollama_chat
2026-01-30 13:42:01,697 - httpx - INFO - HTTP Request: POST http://kagent-tools.kagent:8084/mcp "HTTP/1.1 200 OK"
2026-01-30 13:42:01,983 - httpx - INFO - HTTP Request: POST http://ollama.ollama.svc.cluster.local/api/show "HTTP/1.1 200 OK"
2026-01-30 13:42:02,251 - httpx - INFO - HTTP Request: POST http://ollama.ollama.svc.cluster.local/api/show "HTTP/1.1 200 OK"
2026-01-30 13:42:09,843 - httpx - INFO - HTTP Request: POST http://ollama.ollama.svc.cluster.local/api/chat "HTTP/1.1 200 OK"
2026-01-30 13:42:10,130 - httpx - INFO - HTTP Request: POST http://ollama.ollama.svc.cluster.local/api/show "HTTP/1.1 200 OK"
2026-01-30 13:42:10,437 - httpx - INFO - HTTP Request: POST http://ollama.ollama.svc.cluster.local/api/show "HTTP/1.1 200 OK"
2026-01-30 13:42:10,723 - httpx - INFO - HTTP Request: POST http://ollama.ollama.svc.cluster.local/api/show "HTTP/1.1 200 OK"
2026-01-30 13:42:10,736 - httpx - INFO - HTTP Request: POST http://kagent-tools.kagent:8084/mcp "HTTP/1.1 200 OK"
2026-01-30 13:42:10,756 - httpx - INFO - HTTP Request: POST http://kagent-controller.kagent:8083/api/sessions/b346f1e0-e3d1-43c4-b081-9e8d4184071b/events?user_id=A2A_USER_b346f1e0-e3d1-43c4-b081-9e8d4184071b "HTTP/1.1 201 Created"
2026-01-30 13:42:10,758 - kagent_adk.kagent.adk._agent_executor - ERROR - Error handling A2A request: Tool 'self_healing_agent' not found.
Available tools: k8s_apply_manifest, k8s_delete_resource, k8s_describe_resource, k8s_get_available_api_resources, k8s_get_events, k8s_get_pod_logs, k8s_get_resources, k8s_patch_resource, k8s_scale
Possible causes:
Suggested fixes:
Traceback (most recent call last):
File "/.kagent/packages/kagent-adk/src/kagent/adk/_agent_executor.py", line 146, in execute
await self._handle_request(context, event_queue, runner, run_args)
File "/.kagent/packages/kagent-adk/src/kagent/adk/_agent_executor.py", line 241, in _handle_request
async for adk_event in agen:
...<7 lines>...
await event_queue.enqueue_event(a2a_event)
File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/runners.py", line 505, in run_async
async for event in agen:
yield event
File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/runners.py", line 493, in _run_with_trace
async for event in agen:
yield event
File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/runners.py", line 722, in _exec_with_plugin
async for event in agen:
...<54 lines>...
yield (modified_event if modified_event else event)
File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/runners.py", line 482, in execute
async for event in agen:
yield event
File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/agents/base_agent.py", line 294, in run_async
async for event in agen:
yield event
File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/agents/llm_agent.py", line 460, in _run_async_impl
async for event in agen:
...<5 lines>...
should_pause = True
File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/flows/llm_flows/base_llm_flow.py", line 370, in run_async
async for event in agen:
last_event = event
yield event
File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/flows/llm_flows/base_llm_flow.py", line 457, in _run_one_step_async
async for event in agen:
...<3 lines>...
yield event
File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/flows/llm_flows/base_llm_flow.py", line 569, in _postprocess_async
async for event in agen:
yield event
File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/flows/llm_flows/base_llm_flow.py", line 681, in _postprocess_handle_function_calls_async
if function_response_event := await functions.handle_function_calls_async(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
invocation_context, function_call_event, llm_request.tools_dict
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
):
^
File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/flows/llm_flows/functions.py", line 198, in handle_function_calls_async
return await handle_function_call_list_async(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<5 lines>...
)
^
File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/flows/llm_flows/functions.py", line 244, in handle_function_call_list_async
function_response_events = await asyncio.gather(*tasks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/flows/llm_flows/functions.py", line 338, in _execute_single_function_call_async
raise tool_error
File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/flows/llm_flows/functions.py", line 324, in _execute_single_function_call_async
tool = _get_tool(function_call, tools_dict)
File "/.kagent/.venv/lib/python3.13/site-packages/google/adk/flows/llm_flows/functions.py", line 729, in _get_tool
raise ValueError(error_msg)
ValueError: Tool 'self_healing_agent' not found.
Available tools: k8s_apply_manifest, k8s_delete_resource, k8s_describe_resource, k8s_get_available_api_resources, k8s_get_events, k8s_get_pod_logs, k8s_get_resources, k8s_patch_resource, k8s_scale
Possible causes:
Suggested fixes:
2026-01-30 13:42:10,765 - httpx - INFO - HTTP Request: POST http://kagent-controller.kagent:8083/api/tasks "HTTP/1.1 201 Created"
2026-01-30 13:42:10,770 - httpx - INFO - HTTP Request: POST http://kagent-controller.kagent:8083/api/tasks "HTTP/1.1 201 Created"
INFO: 10.42.2.56:42992 - "POST / HTTP/1.1" 200 OK
Agent instructions:
modelConfig: default-model-config
systemMessage: |
You are a Kubernetes Self-Healing Agent responsible for maintaining cluster health.
description: An AI agent that monitors cluster health and automatically remediates