[Detail Bug] Extensions: User-supplied CRD manifest URLs allow blind SSRF from controller

# Summary
- **Context**: `Extension` is a user-owned CRD that allows users to specify CRD manifest URLs via the `CRDs` field, which are fetched by the controller and installed into nested control planes.
- **Bug**: The `CRDs` field accepts arbitrary strings with no URL validation (no format check, no scheme restriction, no pattern enforcement).
- **Actual vs. expected**: Any string is accepted, including dangerous URLs like `http://169.254.169.254/` (AWS metadata); expected behavior would enforce HTTPS-only valid URLs.
- **Impact**: Enables blind Server-Side Request Forgery (SSRF) attacks, allowing attackers to trigger HTTP requests from the controller to internal services and cloud metadata endpoints. The severity is limited by: (1) only HTTP/HTTPS schemes are supported by `http.DefaultClient`, (2) response body is not directly exposed to attackers, and (3) metadata access depends on deployment environment.

# Code with bug
```go
// CRDs is a list of URLs to CRD manifests to install into nested control
// planes when the extension is enabled.
// +optional
CRDs []string `json:"crds,omitempty"` // <-- BUG 🔴 No URL validation markers
```

# Evidence

## Evidence 1: User-controlled resource with no validation

The `Extension` type is explicitly documented as "fully user-owned" (`api/v1alpha1/extension_types.go:7-8`):
```go
// Extension is a user-defined extension that can be enabled for control planes
// within a project. Unlike PlatformExtension it is fully user-owned.
```

Users create `Extension` resources directly in their project VCP namespace. The `CRDs` field has no kubebuilder validation markers, meaning Kubernetes accepts any string values.

## Evidence 2: Controller fetches URLs without SSRF protection

The `fetchURL` function in `internal/controller/enabledextension_controller.go:225-239` fetches URLs directly with `http.DefaultClient`:
```go
func fetchURL(ctx context.Context, url string) ([]byte, error) {
	req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
	if err != nil {
		return nil, err
	}
	resp, err := http.DefaultClient.Do(req)
	// ...
}
```

No validation exists for:
- URL scheme (accepts `http://` and `https://` - `file://` and other schemes fail with "unsupported protocol scheme" from Go's net/http)
- Private IP ranges (169.254.x.x, 10.x.x.x, 192.168.x.x)
- Cloud metadata endpoints
- Localhost/loopback addresses

## Evidence 3: Data exfiltration analysis

The response body exposure is LIMITED:

1. **Status conditions** (`enabledextension_controller.go:91-92`):
   ```go
   r.setProgrammed(..., "InstallError", fmt.Sprintf("control plane %s: %v", cpName, err))
   ```
   The error propagates from:
   - Line 205: `fmt.Errorf("fetch CRD %s: %w", url, err)` - includes URL + wrapped error
   - Line 236: `fmt.Errorf("HTTP %d", resp.StatusCode)` - includes HTTP status only, NOT body
   - Line 248: `fmt.Errorf("yaml to json: %w", err)` - YAML parsing error
   - Line 251: `fmt.Errorf("unmarshal: %w", err)` - JSON unmarshal error

2. **YAML/JSON error analysis**: The `yaml.YAMLToJSON` function (line 246) accepts any valid YAML:
   - Plain text responses (e.g., AWS metadata) are valid YAML scalars
   - YAML parsing SUCCEEDS for plain text, converting it to a JSON string
   - The error occurs at `UnmarshalJSON` (line 250), which expects a Kubernetes CRD object (map)
   - JSON unmarshal errors are type mismatches like `"json: cannot unmarshal string into Go value of type map[string]interface {}"`
   - These errors do NOT expose the actual response content

3. **Conclusion**: Attackers can observe:
   - The URL they specified (already known)
   - HTTP status code (e.g., "HTTP 200" confirms endpoint reachable)
   - Type mismatch errors (confirm response received but not CRD format)
   
   They CANNOT observe response body content directly. This is **blind SSRF**.

## Evidence 4: No defense-in-depth URL validation

The codebase has no URL validation at any layer:
- No kubebuilder validation markers on the `CRDs` field (schema-level)
- No validating webhooks for Extension resources (admission-level)
- No CEL validation rules in CRD schemas
- No URL parsing/validation in the `fetchURL` function (controller-level)
- No NetworkPolicy restricting egress (network-level)

The only validation in the entire codebase is on `EnabledExtension` (`+kubebuilder:validation:MinItems=1` and `+kubebuilder:validation:Enum`), demonstrating the project uses kubebuilder validation markers but chose not to apply them to URL fields.

**Note on RBAC**: The controller's ClusterRole (`config/rbac/role.yaml`) does not grant `create` permissions for `Extension` or `EnabledExtension` resources. Users must receive explicit namespace-level RBAC grants from cluster administrators. The README (lines 91-116) shows example YAML for creating Extensions and EnabledExtensions, implying users are expected to have these permissions in project VCPs. RBAC is a valid compensating control but does not eliminate the design flaw—defense-in-depth requires URL validation regardless of who can trigger the fetch.

## Evidence 4: CRD schema confirms no constraints

The generated CRD manifest (`config/crd/extensions.kplane.dev_extensions.yaml`) shows the `crds` field has only basic type information:
```yaml
crds:
  description: CRDs is a list of URLs to CRD manifests...
  items:
    type: string
  type: array
```

No `format: uri`, no `pattern` regex, no CEL rules.

# Exploit scenario

**Prerequisites**: 
- RBAC permissions to create `Extension` and `EnabledExtension` resources in a namespace
- These permissions are NOT granted by default by the controller's ClusterRole

The README (lines 91-116) documents the "Bringing your own extension" workflow with example YAML, implying users are expected to have these permissions in project VCPs. In a multi-tenant kplane deployment, project owners would reasonably have namespace-level create permissions for Extensions to define their own CRDs.

**Attack steps**:

1. User with namespace-level permissions creates an `Extension` resource:
```yaml
apiVersion: extensions.kplane.dev/v1alpha1
kind: Extension
metadata:
  name: malicious-ext
  namespace: user-project
spec:
  displayName: "Malicious Extension"
  description: "..."
  crds:
    - "http://169.254.169.254/latest/meta-data/iam/security-credentials/"
```

2. User creates an `EnabledExtension` referencing their extension.

3. Controller fetches the URL. On GKE with Workload Identity or EKS with IRSA, the metadata endpoint may be blocked. On clusters without these protections, the request reaches the metadata service.

4. Response is parsed as YAML. AWS metadata returns plain text role names, which is valid YAML (scalar). YAML parsing succeeds, JSON unmarshal fails with type error.

5. Status condition shows: `control plane test: fetch CRD http://169.254.169.254/...: apply CRD ...: unmarshal: json: cannot unmarshal string into Go value of type map[string]interface {}`

6. Attacker confirms:
   - Endpoint is reachable (no connection error)
   - HTTP 200 returned (not "HTTP 403" or "HTTP 404")
   - Response is plain text (unmarshal error confirms it's not a valid CRD)

**What can be learned**:
- **Port scanning**: Create Extensions with `http://10.0.0.N:PORT/`, observe which return "HTTP 200" vs connection errors
- **Service enumeration**: Probe internal services to map network topology
- **Timing attacks**: Measure reconciliation time to infer endpoint behavior
- **Limited data leakage**: Error messages confirm response type but don't expose content

**What cannot be learned**: Response body content is not exposed through status conditions or logs.

# Why has this bug gone undetected?

1. **Documentation suggests HTTPS by convention**: README examples (lines 60, 103) use HTTPS URLs, creating an assumption that users follow this pattern.

2. **Trust boundary confusion**: PlatformExtensions are admin-controlled via catalog (README lines 36-68), which has a PR review process. Extension resources are user-owned (README lines 91-116) but use the same URL-fetching mechanism. The security model for admin-controlled resources was applied to user-controlled resources.

3. **RBAC creates a barrier**: Users must have namespace-level create permissions. This is not granted by default, so the bug doesn't manifest in a fresh install. However, the README documents the "Bringing your own extension" workflow, implying users should have these permissions in project VCPs.

4. **Blind SSRF is subtle**: Unlike direct data exfiltration, blind SSRF doesn't immediately expose data. The attacker can only infer endpoint reachability from error messages, making the impact harder to recognize.

5. **Cloud metadata protections mask the issue**: Modern managed Kubernetes (GKE Autopilot, EKS with IRSA) blocks metadata endpoints by default, making the most obvious SSRF target fail in testing.

6. **No explicit security review**: The README's catalog PR requirements (lines 63-68) mention URL stability but not URL validation. No threat modeling documentation exists for the Extension workflow.

# Recommended fix

## CRD-level validation (recommended)

Add kubebuilder validation markers to enforce HTTPS-only valid URLs:

```go
// CRDs is a list of URLs to CRD manifests to install into nested control
// planes when the extension is enabled.
// +optional
// +kubebuilder:validation:Format=uri
// +kubebuilder:validation:Pattern=`^https://`
CRDs []string `json:"crds,omitempty"`
```

**Migration path for HTTP deployments**: This is a breaking change for any existing HTTP URLs. Organizations using internal HTTP registries should:
1. Set up TLS for internal registries (recommended for security regardless)
2. Use the configuration-based approach below during migration
3. Reference internal registries by HTTPS with self-signed certs (configure controller's CA bundle)

This is the correct default because:
- HTTPS is the standard for secure software supply chain (SLSA, SBOM)
- HTTP URLs are vulnerable to MITM attacks on CRD content
- All README examples use HTTPS

## Alternative: Controller-level validation (non-breaking)

For organizations needing HTTP support during migration, add runtime validation:

```go
func fetchURL(ctx context.Context, url string) ([]byte, error) {
    parsed, err := url.Parse(url)
    if err != nil {
        return nil, err
    }
    
    // Block dangerous schemes
    if parsed.Scheme != "https" && parsed.Scheme != "http" {
        return nil, fmt.Errorf("unsupported scheme: %s", parsed.Scheme)
    }
    
    // Block metadata endpoints
    blockedHosts := []string{
        "169.254.169.254",      // AWS/GCP metadata
        "metadata.google.internal",
        "metadata.azure.",       // Azure metadata prefix
    }
    for _, blocked := range blockedHosts {
        if strings.HasPrefix(parsed.Host, blocked) {
            return nil, fmt.Errorf("metadata endpoints not allowed")
        }
    }
    
    // Optional: Block private IP ranges (requires DNS resolution)
    // ... implementation
    
    // Continue with fetch...
}
```

Command-line flags allow configuration:
- `--allow-http-urls=false` (default: false, enforce HTTPS-only)
- `--blocked-url-patterns=...` (additional patterns to block)

This allows gradual migration without breaking existing deployments.

# Severity assessment

**CVSS 3.1 Score: 5.3 (MEDIUM)** - `AV:N/AC:L/PR:L/UI:N/S:U/C:L/I:N/A:N`

**Justification**:
- **Attack Vector (Network)**: Attack executed remotely via Extension CRD creation
- **Attack Complexity (Low)**: No special conditions required; attacker creates standard Kubernetes resources
- **Privileges Required (Low)**: Requires namespace-level `create` permissions on `Extension` and `EnabledExtension` resources (not default, but documented workflow)
- **User Interaction (None)**: Controller automatically processes the malicious Extension
- **Scope (Unchanged)**: Attack stays within the controller's security context
- **Confidentiality (Low)**: Blind SSRF allows endpoint reachability detection but not direct data exfiltration
- **Integrity (None)**: Cannot modify data via GET requests
- **Availability (None)**: No direct availability impact

**Environmental factors that reduce severity**:
- Modern managed Kubernetes (GKE Autopilot, EKS IRSA) blocks metadata endpoints
- RBAC gating requires explicit permission grants
- Response body not exposed through error messages

**Environmental factors that increase severity**:
- On-premises deployments without cloud metadata protections
- Multi-tenant environments where users legitimately have Extension creation permissions
- Internal networks with sensitive HTTP services (e.g., unauthenticated admin panels)

**Final assessment**: MEDIUM severity blind SSRF vulnerability. Not critical due to limited data exfiltration and environmental mitigations, but represents a defense-in-depth failure that should be fixed.

# Related bugs

The same issue exists in `api/v1alpha1/platformextension_types.go` for the `CRDs` field. However, PlatformExtensions are admin-controlled via the catalog PR workflow (README lines 36-68), which includes:
- Requirement for stable, versioned refs
- PR review by maintainers
- No cluster-admin requirement on project VCP

This creates a security boundary: PlatformExtensions require maintainer approval, while Extensions are user-controlled. The same fix should be applied to PlatformExtension for consistency, but the primary concern is the user-controlled Extension resource.

# Environment-specific mitigations

Cloud metadata SSRF is mitigated in modern managed Kubernetes deployments:

**GKE with Workload Identity** (default for Autopilot, recommended for Standard): Pods use dedicated GSA credentials, node metadata is blocked/filtered. Risk: LOW for metadata SSRF, MEDIUM for internal service enumeration.

**EKS with IRSA** (recommended pattern): Pods assume IAM roles via webhook, node metadata restricted. Risk: LOW for metadata SSRF, MEDIUM for internal service enumeration.

**AKS with Managed Identity** (standard pattern): Pods use AAD pod identity. Risk: LOW for metadata SSRF, MEDIUM for internal service enumeration.

**On-premises / bare-metal / legacy cloud**: No cloud metadata protection. If controller runs on cloud VMs without these protections, metadata SSRF may be exploitable. Risk: MEDIUM-HIGH depending on deployment.

**Network Policy**: The default deployment has no NetworkPolicy. If cluster administrators add egress restrictions limiting controller destinations, SSRF impact is reduced.

**Important**: Blind SSRF for internal service enumeration remains exploitable in all environments. The attack allows mapping internal network topology, identifying internal services, and probing internal endpoints—even without direct data exfiltration.

# History
This bug was introduced in commit dd8066f (@zachsmith1, 2026-04-23). The initial "feat: initial extensions operator" commit created both the user-controlled `Extension.CRDs` field without URL validation markers and the `fetchURL()` function that fetches arbitrary URLs via `http.DefaultClient` without SSRF protection. The bug slipped in because the security model for admin-controlled PlatformExtensions (catalog PR workflow with maintainer review) was inadvertently applied to user-owned Extensions, which any user with namespace-level RBAC permissions can create.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Detail Bug] Extensions: User-supplied CRD manifest URLs allow blind SSRF from controller #1

Summary

Code with bug

Evidence

Evidence 1: User-controlled resource with no validation

Evidence 2: Controller fetches URLs without SSRF protection

Evidence 3: Data exfiltration analysis

Evidence 4: No defense-in-depth URL validation

Evidence 4: CRD schema confirms no constraints

Exploit scenario

Why has this bug gone undetected?

Recommended fix

CRD-level validation (recommended)

Alternative: Controller-level validation (non-breaking)

Severity assessment

Related bugs

Environment-specific mitigations

History

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Detail Bug] Extensions: User-supplied CRD manifest URLs allow blind SSRF from controller #1

Description

Summary

Code with bug

Evidence

Evidence 1: User-controlled resource with no validation

Evidence 2: Controller fetches URLs without SSRF protection

Evidence 3: Data exfiltration analysis

Evidence 4: No defense-in-depth URL validation

Evidence 4: CRD schema confirms no constraints

Exploit scenario

Why has this bug gone undetected?

Recommended fix

CRD-level validation (recommended)

Alternative: Controller-level validation (non-breaking)

Severity assessment

Related bugs

Environment-specific mitigations

History

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions