Skip to content

[Detail Bug] Multicluster: Internal cross-cluster capability can rewrite per-cluster mutation keys due to scope elevation after mutation check #25

@detail-app

Description

@detail-app

Summary

  • Bug: In pkg/multicluster/storage.go, scope elevation for internal users happens AFTER mutation rejection check, allowing internal users with cross-cluster capability to have mutation keys incorrectly rewritten.
  • Code Defect: rejectAllClustersMutation() checks scope before storeAndKey() can elevate it, and storeAndKey() elevates scope for ALL operations (reads AND mutations), not just reads.
  • Production Impact: LIMITED - the allClustersLoopbackConfig is only used by the shared runtime's loopback client, and the shared runtime's controllers (reconcileCRDStatusKey, deleteCRInstances) use ensureClusterClient which uses the BASE loopback config WITHOUT the capability.
  • Correctness Issue: The code is architecturally incorrect - scope elevation should only happen for read operations, or mutation rejection should check capability. The current design violates the documented intent that ResourceScopeCrossClusterRead is "intended for internal readers only."
  • Confirmed by Test: A runnable test demonstrates the bug - key written to /registry/configmaps/clusters instead of /registry/configmaps/clusters/c1/default/cm.

Code with Bug

Location: pkg/multicluster/storage.go

Lines 181-189: Mutation methods check scope BEFORE elevation

func (c *clusteredStorage) Create(ctx context.Context, key string, obj, out runtime.Object, ttl uint64) error {
    if err := c.rejectAllClustersMutation(ctx); err != nil {  // <-- Checks ORIGINAL scope
        return err
    }
    store, key, err := c.storeAndKey(ctx, key)  // <-- May ELEVATE scope inside
    if err != nil {
        return err
    }
    return store.Create(ctx, key, obj, out, ttl)
}

Lines 344-349: Mutation rejection only checks scope, not capability

func (c *clusteredStorage) rejectAllClustersMutation(ctx context.Context) error {
    _, scope, _ := FromContextScope(ctx)  // <-- Gets ORIGINAL scope, not elevated
    if scope == ResourceScopeCrossClusterRead {
        return ErrAllClustersMutationForbidden
    }
    return nil  // <-- PASSES if scope is Cluster, even if user has capability
}

Lines 308-342: Scope elevation happens for ALL operations

func (c *clusteredStorage) storeAndKey(ctx context.Context, key string) (storage.Interface, string, error) {
    cid, scope, _ := FromContextScope(ctx)
    if scope != ResourceScopeCrossClusterRead && HasInternalCrossClusterCapability(ctx) {
        scope = ResourceScopeCrossClusterRead  // <-- ELEVATES scope for mutations too
        ctx = internalcap.WithAllClustersCapability(ctx)
    }
    // ...
    if scope == ResourceScopeCrossClusterRead {
        rewritten := c.kindRootPrefix()  // Returns "/registry/<resource>/clusters"
        // ...
        return store, rewritten, nil  // <-- WRONG KEY for per-cluster mutations!
    }
    // ...
}

Evidence

Evidence 1: Complete Runnable Test (VERIFIED)

The following test can be added to storage_test.go to demonstrate the bug. This test has been executed and confirmed to fail with the current code:

type keyRecordingStorage struct {
    fakeStorage
    lastKey string
}

func (s *keyRecordingStorage) Create(_ context.Context, key string, _, _ runtime.Object, _ uint64) error {
    s.lastKey = key
    return nil
}

func TestScopeElevationAfterMutationRejection(t *testing.T) {
    record := &keyRecordingStorage{}
    cs, destroy, err := newClusteredStorage(
        func(*storagebackend.ConfigForResource, string, func(runtime.Object) (string, error), func() runtime.Object, func() runtime.Object, storage.AttrFunc, storage.IndexerFuncs, *cache.Indexers) (storage.Interface, factory.DestroyFunc, error) {
            return record, func() {}, nil
        },
        &storagebackend.ConfigForResource{},
        "/registry/configmaps",
        func(obj runtime.Object) (string, error) { return "/registry/configmaps/default/cm", nil },
        func() runtime.Object { return &corev1.ConfigMap{} },
        func() runtime.Object { return &corev1.ConfigMapList{} },
        nil, nil, nil,
        Options{DefaultCluster: "root"},
    )
    if err != nil {
        t.Fatalf("newClusteredStorage: %v", err)
    }
    defer destroy()

    ctx := context.Background()
    ctx = apirequest.WithUser(ctx, &user.DefaultInfo{
        Name:  DefaultInternalCrossClusterUser,
        Extra: map[string][]string{DefaultInternalCrossClusterCapability: {"true"}},
    })
    ctx = WithCluster(ctx, "c1", false)

    obj := &corev1.ConfigMap{ObjectMeta: metav1.ObjectMeta{Name: "cm"}}
    err = cs.Create(ctx, "/registry/configmaps/default/cm", obj, &corev1.ConfigMap{}, 0)
    if err != nil {
        t.Fatalf("Create failed: %v", err)
    }

    expected := "/registry/configmaps/clusters/c1/default/cm"
    if record.lastKey != expected {
        t.Fatalf("BUG: expected key %q, got %q", expected, record.lastKey)
    }
}

Test Output (confirmed bug):

=== RUN   TestScopeElevationAfterMutationRejection
    evidence_test.go:59: Context state: cid="c1" scope="cluster" hasInternalCap=true hasAllClustersCap=false
    evidence_test.go:82: BUG CONFIRMED: expected key "/registry/configmaps/clusters/c1/default/cm", got "/registry/configmaps/clusters" - scope elevation happened AFTER mutation rejection
--- FAIL: TestScopeElevationAfterMutationRejection (0.00s)

The test proves:

  1. Context has cid="c1" and scope="cluster" (correct per-cluster context)
  2. hasInternalCap=true (user has capability)
  3. hasAllClustersCap=false (capability not set by handler for per-cluster requests)
  4. Key written to wrong location: /registry/configmaps/clusters instead of /registry/configmaps/clusters/c1/default/cm

Evidence 2: Architectural Defect Analysis

The code has a clear architectural defect:

  1. Documented Intent: ResourceScopeCrossClusterRead is documented as "intended for internal readers only" (options.go:46-47)

  2. Actual Behavior: The scope elevation in storeAndKey applies to ALL operations including mutations

  3. Violation of Intent: If an internal user with capability makes a per-cluster mutation request:

    • rejectAllClustersMutation passes (scope is Cluster)
    • storeAndKey elevates scope to CrossClusterRead
    • Key is rewritten to kindRootPrefix() instead of rewriteKey(cid, key)

This is incorrect regardless of whether the production code path exists.

Evidence 3: Production Code Path Analysis

Where allClustersLoopbackConfig is used:

File: pkg/multicluster/bootstrap/crd_runtime_manager_wrapped.go

  • Line 183: baseGeneric.LoopbackClientConfig = allClustersLoopbackConfig(baseGeneric.LoopbackClientConfig)
  • This sets the loopback config for the SHARED runtime's API server

Where the shared runtime's loopback client would be used:

  1. Admission webhooks - CRD validation/defaulting webhooks calling back to the API server
  2. Internal authorization checks - Self-subject access reviews
  3. Built-in controllers - The apiextensions-apiserver has internal controllers that may use the loopback client

Why the production impact is LIMITED:

File: pkg/multicluster/bootstrap/crd_runtime_manager_wrapped.go

The shared runtime's controllers use the BASE config:

// Line 722-725: reconcileCRDStatusKey uses ensureClusterClient
cs, err := m.ensureClusterClient(clusterID)

// Line 850-852: deleteCRInstances uses dynamicClientForCluster  
dc, err := m.dynamicClientForCluster(clusterID)

// Line 237-242: ensureClusterClient uses baseLoopbackConfig
func (m *CRDRuntimeManager) ensureClusterClient(clusterID string) (...) {
    if m.apiClientPool == nil {
        base := m.baseLoopbackConfig()  // <-- BASE config, NOT allClustersLoopbackConfig
        m.apiClientPool = mc.NewAPIExtensionsClientPool(base, ...)
    }
}

// Line 921-922: dynamicClientForCluster uses baseLoopbackConfig
func (m *CRDRuntimeManager) dynamicClientForCluster(clusterID string) (...) {
    base := m.baseLoopbackConfig()  // <-- BASE config
}

The capability is set via impersonation:

File: pkg/multicluster/bootstrap/crd_runtime_manager_wrapped.go

// Lines 631-645
func allClustersLoopbackConfig(base *rest.Config) *rest.Config {
    cfg := rest.CopyConfig(base)
    cfg.Impersonate.UserName = mc.DefaultInternalCrossClusterUser  // <-- IMPERSONATION
    cfg.Impersonate.Extra[mc.DefaultInternalCrossClusterCapability] = []string{"true"}
    return cfg
}

When the shared runtime's loopback client makes a request, it impersonates system:apiserver with the capability. The request context would then have:

  • User: system:apiserver with capability extra
  • Scope: Determined by WithClusterRouting based on the request path

Evidence 4: The Bug Would Trigger IF the Scenario Occurred

If a request with:

  • User = system:apiserver with capability (from allClustersLoopbackConfig impersonation)
  • Scope = ResourceScopeCluster (from WithClusterRouting for a per-cluster path)
  • Operation = Mutation (Create/Delete/GuaranteedUpdate)

Were to hit clusteredStorage, the bug would trigger:

  1. rejectAllClustersMutation checks scope == CrossClusterRead → FALSE (scope is Cluster)
  2. Mutation rejection PASSES
  3. storeAndKey checks HasInternalCrossClusterCapability → TRUE
  4. Scope is ELEVATED to CrossClusterRead
  5. Key is rewritten to kindRootPrefix() instead of rewriteKey(cid, key)

Why Existing Tests Don't Catch This

The test TestWithClusterRoutingDoesNotTrustUserForScope (handler_test.go:29-63) verifies:

  • Per-cluster requests get ResourceScopeCluster scope
  • internalcap.HasAllClustersCapability(ctx) returns FALSE

But it doesn't test the storage layer where scope elevation happens.

The test TestStoreAndKey_AllClustersMutationRejected tests rejection when scope is ALREADY CrossClusterRead, not the elevation scenario.

Recommended Fix

Option 1 - Check capability in mutation rejection:

func (c *clusteredStorage) rejectAllClustersMutation(ctx context.Context) error {
    _, scope, _ := FromContextScope(ctx)
    if scope == ResourceScopeCrossClusterRead {
        return ErrAllClustersMutationForbidden
    }
    // Also reject if internal capability present (would cause scope elevation)
    if HasInternalCrossClusterCapability(ctx) {
        return ErrAllClustersMutationForbidden
    }
    return nil
}

Option 2 - Move elevation to read-only methods only:

// Remove elevation from storeAndKey and add to Watch, Get, GetList only
func (c *clusteredStorage) Get(ctx context.Context, key string, opts storage.GetOptions, objPtr runtime.Object) error {
    ctx = c.elevateScopeForRead(ctx)  // Only elevate for reads
    store, rewrittenKey, err := c.storeAndKey(ctx, key)
    // ...
}

Conclusion

The code has a clear architectural defect: scope elevation happens after mutation rejection, which violates the documented intent that ResourceScopeCrossClusterRead is for readers only. While the production impact is currently limited (the shared runtime's controllers use the base config without capability), the code is incorrect and should be fixed for correctness and to prevent future bugs if the code paths change.

History

This bug was introduced in commit efcfa44 (@zachsmith1, 2026-02-16, PR #10). The commit added shared informers support with internal cross-cluster read capability, but the mutation rejection logic only checked the original scope (not the capability) while storeAndKey simultaneously elevated scope for internal users. The developer added scope elevation for internal users but forgot to check capability in mutation rejection, creating the order-of-checks bug where mutation rejection passes before scope gets elevated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdetail

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions