Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -304,6 +304,7 @@ The `docs/` directory contains user-facing documentation:
- `docs/README.md` – Documentation index and navigation
- `docs/configuration.md` – **Complete TOML configuration reference** (all `StaticConfig` options, drop-in configuration, dynamic reload)
- `docs/prompts.md` – MCP Prompts configuration guide
- `docs/logging.md` – MCP Logging guide (automatic K8s error logging, secret redaction)
- `docs/OTEL.md` – OpenTelemetry observability setup
- `docs/KIALI.md` – Kiali toolset configuration
- `docs/GETTING_STARTED_KUBERNETES.md` – Kubernetes ServiceAccount setup
Expand Down
132 changes: 69 additions & 63 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -232,70 +232,10 @@ See the **[Configuration Reference](docs/configuration.md)**.
## 📊 MCP Logging <a id="mcp-logging"></a>

The server supports the MCP logging capability, allowing clients to receive debugging information via structured log messages.
Kubernetes API errors are automatically categorized and logged to clients with appropriate severity levels.
Sensitive data (tokens, keys, passwords, cloud credentials) is automatically redacted before being sent to clients.

### For Clients

Clients can control log verbosity by sending a `logging/setLevel` request:

```json
{
"method": "logging/setLevel",
"params": { "level": "info" }
}
```

**Available log levels** (in order of increasing severity):
- `debug` - Detailed debugging information
- `info` - General informational messages (default)
- `notice` - Normal but significant events
- `warning` - Warning messages
- `error` - Error conditions
- `critical` - Critical conditions
- `alert` - Action must be taken immediately
- `emergency` - System is unusable

### For Developers

Toolsets can optionally send debug information to clients using helper functions from the `mcplog` package:

**Recommended approach for Kubernetes errors** (automatically categorizes errors and sends appropriate messages):

```go
import "github.com/containers/kubernetes-mcp-server/pkg/mcplog"

// In your tool handler:
ret, err := client.CoreV1().Pods(namespace).Get(ctx, name, metav1.GetOptions{})
if err != nil {
mcplog.HandleK8sError(ctx, err, "pod access")
return api.NewToolCallResult("", fmt.Errorf("failed to get pod: %v", err)), nil
}
```

**Manual logging** (for custom messages):

```go
import "github.com/containers/kubernetes-mcp-server/pkg/mcplog"

// In your tool handler:
if err != nil {
mcplog.SendMCPLog(ctx, "error", "Operation failed - check permissions")
return api.NewToolCallResult("", err)
}
```

**Key Points:**
- Logging is **optional** - toolsets work fine without sending MCP logs
- Uses a dedicated named logger (`logger="mcp"`) for complete separation from server logs
- Server logs (klog) remain detailed and unaffected
- Client logs are high-level, helpful hints for debugging
- Authentication failures send generic messages to clients (no security info leaked)
- Sensitive data is automatically redacted with 28 pattern types:
- Generic fields (password, token, secret, api_key, etc.)
- Authorization headers (Bearer, Basic)
- Cloud credentials (AWS, GCP, Azure)
- API tokens (GitHub, GitLab, OpenAI, Anthropic)
- Cryptographic keys (JWT, SSH, PGP, RSA)
- Database connection strings (PostgreSQL, MySQL, MongoDB)
See the **[MCP Logging Guide](docs/logging.md)**.

## 🛠️ Tools and Functionalities <a id="tools-and-functionalities"></a>

Expand Down Expand Up @@ -329,6 +269,72 @@ In case multi-cluster support is enabled (default) and you have access to multip

<details>

<summary>kiali</summary>

- **kiali_mesh_graph** - Returns the topology of a specific namespaces, health, status of the mesh and namespaces. Includes a mesh health summary overview with aggregated counts of healthy, degraded, and failing apps, workloads, and services. Use this for high-level overviews
- `graphType` (`string`) - Optional type of graph to return: 'versionedApp', 'app', 'service', 'workload', 'mesh'
- `namespace` (`string`) - Optional single namespace to include in the graph (alternative to namespaces)
- `namespaces` (`string`) - Optional comma-separated list of namespaces to include in the graph
- `rateInterval` (`string`) - Optional rate interval for fetching (e.g., '10m', '5m', '1h').

- **kiali_manage_istio_config_read** - Lists or gets Istio configuration objects (Gateways, VirtualServices, etc.)
- `action` (`string`) **(required)** - Action to perform: list or get
- `group` (`string`) - API group of the Istio object (e.g., 'networking.istio.io', 'gateway.networking.k8s.io')
- `kind` (`string`) - Kind of the Istio object (e.g., 'DestinationRule', 'VirtualService', 'HTTPRoute', 'Gateway')
- `name` (`string`) - Name of the Istio object
- `namespace` (`string`) - Namespace containing the Istio object
- `version` (`string`) - API version of the Istio object (e.g., 'v1', 'v1beta1')

- **kiali_manage_istio_config** - Creates, patches, or deletes Istio configuration objects (Gateways, VirtualServices, etc.)
- `action` (`string`) **(required)** - Action to perform: create, patch, or delete
- `group` (`string`) - API group of the Istio object (e.g., 'networking.istio.io', 'gateway.networking.k8s.io')
- `json_data` (`string`) - JSON data to apply or create the object
- `kind` (`string`) - Kind of the Istio object (e.g., 'DestinationRule', 'VirtualService', 'HTTPRoute', 'Gateway')
- `name` (`string`) - Name of the Istio object
- `namespace` (`string`) - Namespace containing the Istio object
- `version` (`string`) - API version of the Istio object (e.g., 'v1', 'v1beta1')

- **kiali_get_resource_details** - Gets lists or detailed info for Kubernetes resources (services, workloads) within the mesh
- `namespaces` (`string`) - Comma-separated list of namespaces to get services from (e.g. 'bookinfo' or 'bookinfo,default'). If not provided, will list services from all accessible namespaces
- `resource_name` (`string`) - Name of the resource to get details for (optional string - if provided, gets details; if empty, lists all).
- `resource_type` (`string`) - Type of resource to get details for (service, workload)

- **kiali_get_metrics** - Gets lists or detailed info for Kubernetes resources (services, workloads) within the mesh
- `byLabels` (`string`) - Comma-separated list of labels to group metrics by (e.g., 'source_workload,destination_service'). Optional
- `direction` (`string`) - Traffic direction: 'inbound' or 'outbound'. Optional, defaults to 'outbound'
- `duration` (`string`) - Time range to get metrics for (optional string - if provided, gets metrics (e.g., '1m', '5m', '1h'); if empty, get default 30m).
- `namespace` (`string`) **(required)** - Namespace to get resources from
- `quantiles` (`string`) - Comma-separated list of quantiles for histogram metrics (e.g., '0.5,0.95,0.99'). Optional
- `rateInterval` (`string`) - Rate interval for metrics (e.g., '1m', '5m'). Optional, defaults to '10m'
- `reporter` (`string`) - Metrics reporter: 'source', 'destination', or 'both'. Optional, defaults to 'source'
- `requestProtocol` (`string`) - Filter by request protocol (e.g., 'http', 'grpc', 'tcp'). Optional
- `resource_name` (`string`) **(required)** - Name of the resource to get details for (optional string - if provided, gets details; if empty, lists all).
- `resource_type` (`string`) **(required)** - Type of resource to get details for (service, workload)
- `step` (`string`) - Step between data points in seconds (e.g., '15'). Optional, defaults to 15 seconds

- **kiali_workload_logs** - Get logs for a specific workload's pods in a namespace. Only requires namespace and workload name - automatically discovers pods and containers. Optionally filter by container name, time range, and other parameters. Container is auto-detected if not specified.
- `container` (`string`) - Optional container name to filter logs. If not provided, automatically detects and uses the main application container (excludes istio-proxy and istio-init)
- `namespace` (`string`) **(required)** - Namespace containing the workload
- `since` (`string`) - Time duration to fetch logs from (e.g., '5m', '1h', '30s'). If not provided, returns recent logs
- `tail` (`integer`) - Number of lines to retrieve from the end of logs (default: 100)
- `workload` (`string`) **(required)** - Name of the workload to get logs for

- **kiali_get_traces** - Gets traces for a specific resource (app, service, workload) in a namespace, or gets detailed information for a specific trace by its ID. If traceId is provided, it returns detailed trace information and other parameters are not required.
- `clusterName` (`string`) - Cluster name for multi-cluster environments (optional, only used when traceId is not provided)
- `endMicros` (`string`) - End time for traces in microseconds since epoch (optional, defaults to 10 minutes after startMicros if not provided, only used when traceId is not provided)
- `limit` (`integer`) - Maximum number of traces to return (default: 100, only used when traceId is not provided)
- `minDuration` (`integer`) - Minimum trace duration in microseconds (optional, only used when traceId is not provided)
- `namespace` (`string`) - Namespace to get resources from. Required if traceId is not provided.
- `resource_name` (`string`) - Name of the resource to get traces for. Required if traceId is not provided.
- `resource_type` (`string`) - Type of resource to get traces for (app, service, workload). Required if traceId is not provided.
- `startMicros` (`string`) - Start time for traces in microseconds since epoch (optional, defaults to 10 minutes before current time if not provided, only used when traceId is not provided)
- `tags` (`string`) - JSON string of tags to filter traces (optional, only used when traceId is not provided)
- `traceId` (`string`) - Unique identifier of the trace to retrieve detailed information for. If provided, this will return detailed trace information and other parameters (resource_type, namespace, resource_name) are not required.

</details>

<details>

<summary>config</summary>

- **configuration_contexts_list** - List all available context names and associated server urls from the kubeconfig file
Expand Down
1 change: 1 addition & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Choose the guide that matches your needs:

## Advanced Topics

- **[MCP Logging](logging.md)** - Structured logging to MCP clients with automatic K8s error categorization and secret redaction
- **[OpenTelemetry Observability](OTEL.md)** - Distributed tracing and metrics configuration
- **[MCP Prompts](prompts.md)** - Custom workflow templates for AI assistants
- **[Keycloak OIDC Setup](KEYCLOAK_OIDC_SETUP.md)** - Developer guide for local Keycloak environment and testing with MCP Inspector
Expand Down
120 changes: 120 additions & 0 deletions docs/VALIDATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# Pre-Execution Validation

The kubernetes-mcp-server includes a validation layer that catches errors before they reach the Kubernetes API. This prevents AI hallucinations (like typos in resource names) and permission issues from causing confusing failures.

## Why Validation?

When an AI assistant makes a Kubernetes API call with errors, the raw Kubernetes error messages can be cryptic:

```
the server doesn't have a resource type "Deploymnt"
```

With validation enabled, you get clearer feedback:

```
Resource apps/v1/Deploymnt does not exist in the cluster
```

The validation layer catches these types of issues:

1. **Resource Existence** - Catches typos like "Deploymnt" instead of "Deployment" (checked in access control)
2. **Schema Validation** - Catches invalid fields like "spec.replcias" instead of "spec.replicas"
3. **RBAC Validation** - Pre-checks permissions before attempting operations

## Configuration

Validation is **disabled by default**. Schema and RBAC validators run together when enabled. Resource existence is always checked as part of access control.

```toml
# Enable all validation (default: false)
validation_enabled = true
```

### Configuration Reference

| TOML Field | Default | Description |
|------------|---------|-------------|
| `validation_enabled` | `false` | Enable/disable all validators |

**Note:** The schema validator caches the OpenAPI schema for 15 minutes internally.

## How It Works

### Validation Flow

Validation happens at the HTTP RoundTripper level, intercepting all Kubernetes API calls:

```
MCP Tool Call → Kubernetes Client → HTTP RoundTripper → Kubernetes API
Access Control
- Check deny list
- Check resource exists
Schema Validator (if enabled)
"Are the fields valid?"
RBAC Validator (if enabled)
"Does the user have permission?"
Forward to K8s API
```

This HTTP-layer approach ensures **all** Kubernetes API calls are validated, including those from plugins (KubeVirt, Kiali, Helm, etc.) - not just the core tools.

If any validator fails, the request is rejected with a clear error message before reaching the Kubernetes API.

### 1. Resource Existence (Access Control)

The access control layer validates that the requested resource type exists in the cluster. This check runs regardless of whether validation is enabled.

**What it catches:**
- Typos in Kind names: "Deploymnt" → should be "Deployment"
- Wrong API versions: "apps/v2" → should be "apps/v1"
- Non-existent custom resources

**Example error:**
```
RESOURCE_NOT_FOUND: Resource deployments.apps does not exist in the cluster
```

### 2. Schema Validation

Validates resource manifests against the cluster's OpenAPI schema for create/update operations.

**What it catches:**
- Invalid field names: "spec.replcias" → should be "spec.replicas"
- Wrong field types: string where integer expected
- Missing required fields

**Example error:**
```
INVALID_FIELD: unknown field "spec.replcias"
```

**Note:** Schema validation uses kubectl's validation library and caches the OpenAPI schema for 15 minutes.

### 3. RBAC Validation

Pre-checks permissions using Kubernetes `SelfSubjectAccessReview` before attempting operations.

**What it catches:**
- Missing permissions: can't create Deployments in namespace X
- Cluster-scoped vs namespace-scoped mismatches
- Read-only access attempting writes

**Example error:**
```
PERMISSION_DENIED: Cannot create deployments.apps in namespace "production"
```

**Note:** RBAC validation uses the same credentials as the actual operation - either the server's service account or the user's token (when OAuth is enabled).

## Error Codes

| Code | Description |
|------|-------------|
| `RESOURCE_NOT_FOUND` | The requested resource type doesn't exist in the cluster |
| `INVALID_FIELD` | A field in the manifest doesn't exist or has the wrong type |
| `PERMISSION_DENIED` | RBAC denies the requested operation |
83 changes: 83 additions & 0 deletions docs/logging.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# MCP Logging

The server supports the MCP logging capability, allowing clients to receive debugging information via structured log messages.

## For Clients

Clients can control log verbosity by sending a `logging/setLevel` request:

```json
{
"method": "logging/setLevel",
"params": { "level": "info" }
}
```

**Available log levels** (in order of increasing severity):
- `debug` - Detailed debugging information
- `info` - General informational messages (default)
- `notice` - Normal but significant events
- `warning` - Warning messages
- `error` - Error conditions
- `critical` - Critical conditions
- `alert` - Action must be taken immediately
- `emergency` - System is unusable

## For Developers

### Automatic Kubernetes Error Logging

Kubernetes API errors returned by tool handlers are **automatically logged** to MCP clients.
When a tool handler returns a `ToolCallResult` with a non-nil error that is a Kubernetes API error (`StatusError`), the server categorizes it and sends an appropriate log message.

This means toolset authors **do not need to call any logging functions** for standard K8s error handling.
Simply return the error in the `ToolCallResult` and the server handles the rest:

```go
ret, err := client.CoreV1().Pods(namespace).Get(ctx, name, metav1.GetOptions{})
if err != nil {
return api.NewToolCallResult("", fmt.Errorf("failed to get pod: %w", err)), nil
}
```

The following Kubernetes error types are automatically categorized:

| Error Type | Log Level | Message |
|-----------|-----------|---------|
| Not Found | `info` | Resource not found - it may not exist or may have been deleted |
| Forbidden | `error` | Permission denied - check RBAC permissions for {tool} |
| Unauthorized | `error` | Authentication failed - check cluster credentials |
| Already Exists | `warning` | Resource already exists |
| Invalid | `error` | Invalid resource specification - check resource definition |
| Bad Request | `error` | Invalid request - check parameters |
| Conflict | `error` | Resource conflict - resource may have been modified |
| Timeout | `error` | Request timeout - cluster may be slow or overloaded |
| Server Timeout | `error` | Server timeout - cluster may be slow or overloaded |
| Service Unavailable | `error` | Service unavailable - cluster may be unreachable |
| Too Many Requests | `warning` | Rate limited - too many requests to the cluster |
| Other K8s API errors | `error` | Operation failed - cluster may be unreachable or experiencing issues |

Non-Kubernetes errors (e.g., input validation errors) are **not** logged to MCP clients.

### Manual Logging

For custom messages beyond automatic K8s error handling, use `SendMCPLog` directly:

```go
import "github.com/containers/kubernetes-mcp-server/pkg/mcplog"

mcplog.SendMCPLog(ctx, mcplog.LevelError, "Operation failed - check permissions")
```

## Security

- Authentication failures send generic messages to clients (no security info leaked)
- Sensitive data is automatically redacted before being sent to clients, covering:
- Generic fields (password, token, secret, api_key, etc.)
- Authorization headers (Bearer, Basic)
- Cloud credentials (AWS, GCP, Azure)
- API tokens (GitHub, GitLab, OpenAI, Anthropic)
- Cryptographic keys (JWT, SSH, PGP, RSA)
- Database connection strings (PostgreSQL, MySQL, MongoDB)
- Uses a dedicated named logger (`logger="mcp"`) for complete separation from server logs
- Server logs (klog) remain detailed and unaffected
27 changes: 27 additions & 0 deletions internal/test/mcp.go
Original file line number Diff line number Diff line change
Expand Up @@ -243,3 +243,30 @@ func (c *NotificationCapture) RequireLogNotification(t *testing.T, timeout time.
require.NotNil(t, logNotification, "failed to parse log notification")
return logNotification
}

// RequireNoLogNotification asserts that no logging notification is received within the given timeout.
// Use this to verify that non-Kubernetes errors do not produce MCP log notifications.
func (c *NotificationCapture) RequireNoLogNotification(t *testing.T, timeout time.Duration) {
ctx, cancel := context.WithTimeout(context.Background(), timeout)
defer cancel()

for {
c.mu.RLock()
for _, n := range c.notifications {
if n.Method == "notifications/message" {
c.mu.RUnlock()
require.Fail(t, "unexpected log notification received", "notification: %v", n)
return
}
}
c.mu.RUnlock()

select {
case <-c.signal:
// New notification arrived, check it
case <-ctx.Done():
// Timeout with no log notification — success
return
}
}
}
Loading