Summary
- Context: The
CRDServesIndex tracks which CRD group/version combinations are served in each cluster, enabling the multicluster apiserver to route requests to the appropriate CRD runtime.
- Bug: When two CRDs serve the same group/version combination, deleting one CRD incorrectly removes the serve entry for both, corrupting the index.
- Actual vs. expected: After deleting one CRD that shares a group/version with another CRD,
Lookup returns served=false when it should return served=true (the other CRD still serves it).
- Impact: API requests to the affected group/version are incorrectly routed to the base handler instead of the CRD runtime, causing requests to fail during the corruption window.
Code with Bug
// pkg/multicluster/bootstrap/crd_serves_index.go:97-117
func (i *CRDServesIndex) setCRDKeysLocked(clusterID, crdName string, keys []string) {
if i.clusterKeys[clusterID] == nil {
i.clusterKeys[clusterID] = map[string]struct{}{}
}
if i.crdKeys[clusterID] == nil {
i.crdKeys[clusterID] = map[string][]string{}
}
for _, old := range i.crdKeys[clusterID][crdName] {
delete(i.serves, old) // BUG: Deletes key without checking if other CRDs still need it
delete(i.clusterKeys[clusterID], old) // BUG: Also corrupts clusterKeys
}
if len(keys) == 0 {
delete(i.crdKeys[clusterID], crdName)
return
}
i.crdKeys[clusterID][crdName] = keys
for _, k := range keys {
i.serves[k] = true // BUG: Overwrites without reference counting
i.clusterKeys[clusterID][k] = struct{}{}
}
}
Evidence
1. Unit Test Demonstrates The Bug
File: pkg/multicluster/bootstrap/crd_serves_index_test.go
The test exercises the public API directly:
i.UpsertCRD(clusterID, crd1) // serves["cluster\x00example.com\x00v1"] = true
i.UpsertCRD(clusterID, crd2) // serves["cluster\x00example.com\x00v1"] = true (overwrites)
i.DeleteCRD(clusterID, crd1) // deletes serves["cluster\x00example.com\x00v1"]
served, _ := i.Lookup(clusterID, "example.com", "v1") // returns false - WRONG!
Test output:
=== RUN TestCRDServesIndex_DeleteOneCRDRemovesSharedServeEntry
crd_serves_index_test.go:63: BUG: Lookup returned not served after deleting one CRD
--- FAIL: TestCRDServesIndex_DeleteOneCRDRemovesSharedServeEntry (0.00s)
FAIL
2. Complete Request Flow Traces To Bug
Step 1: HTTP Request Arrives
Request: GET /apis/example.com/v1/namespaces/default/bars/my-bar
Step 2: Handler Chain Extracts Group/Version (config.go:504-513)
func apisGroupVersionFromPath(path string) (group, version string, ok bool) {
parts := strings.Split(strings.Trim(path, "/"), "/")
// ...
return parts[1], parts[2], true // Returns ("example.com", "v1", true)
}
Step 3: Handler Checks If CRD Runtime Serves (config.go:154-164)
dispatch := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
cid, _, _ := mc.FromContext(r.Context())
if cid != "" && cid != mcOpts.DefaultCluster && crdRuntimeMgr != nil {
if group, version, ok := apisGroupVersionFromPath(r.URL.Path); ok {
served, err := crdRuntimeMgr.ServesGroupVersion(cid, group, version, genericConfig.DrainedNotify())
if !served {
base.ServeHTTP(w, r) // Routes to base handler - WRONG
return
}
// ... routes to CRD runtime
}
}
base.ServeHTTP(w, r)
})
Step 4: ServesGroupVersion Returns Immediately If Cluster Is Synced
// crd_runtime_manager_wrapped.go:111-121
func (m *CRDRuntimeManager) ServesGroupVersion(clusterID, group, version string, stopCh <-chan struct{}) (bool, error) {
// ...
if served, ok := m.lookupFromInformerIndex(clusterID, group, version); ok {
return served, nil // <-- Returns here when clusterSynced=true
}
// ... fallback paths NEVER REACHED when clusterSynced=true
}
Step 5: Lookup Returns Corrupted Result
// crd_serves_index.go:31-38
func (i *CRDServesIndex) Lookup(clusterID, group, version string) (bool, bool) {
i.mu.Lock()
defer i.mu.Unlock()
if !i.clusterSynced[clusterID] {
return false, false
}
_, served := i.serves[MakeServesKey(clusterID, group, version)]
return served, true // Returns (false, true) when key was deleted
}
Result: Wrong Routing
- After CRD1 is deleted,
serves map entry is incorrectly removed
clusterSynced[clusterID] remains true
Lookup returns (false, true) - served=false, known=true
- Request routed to
base.ServeHTTP instead of CRD runtime
- Client receives 404
3. Self-Healing Window Analysis
The reviewer correctly notes that status reconciliation provides a self-healing mechanism. Here's the complete timing analysis:
Status Reconciliation Flow
// 1. CRDs are enqueued for status reconciliation on upsert
func (m *CRDRuntimeManager) onMCICRDUpsert(obj *mcstorage.ObjectWithClusterIdentity) {
// ...
m.enqueueCRDStatus(clusterID, crd.Name) // Line 451
}
// 2. Workers process the queue
func (m *CRDRuntimeManager) reconcileCRDStatusKey(key string) error {
// ...
_, found := m.sharedProjection.Get(clusterID, name)
if !found {
return nil // Skip if CRD not in projection (was deleted)
}
// ...
_, err = cs.ApiextensionsV1().CustomResourceDefinitions().UpdateStatus(...) // Line 771
// This UpdateStatus triggers an informer UpdateFunc event
}
// 3. UpdateFunc calls onMCICRDUpsert which repairs the index
mci.AddEventHandler(mcinformer.MultiClusterEventHandlerFuncs{
UpdateFunc: func(_, newObj *mcstorage.ObjectWithClusterIdentity) {
m.onMCICRDUpsert(newObj) // Line 361 - repairs index
},
})
Corruption Window Duration
The index is corrupted from CRD1 deletion until CRD2's next update event:
Phase 1: Before CRD1 deletion
- CRD1 and CRD2 both in
sharedProjection
- CRD2 may or may not be in
crdQueue for status reconciliation
Phase 2: CRD1 deleted (T=0)
onMCICRDDelete called
- CRD1 removed from
sharedProjection
- Index corrupted:
serves["cluster\x00example.com\x00v1"] deleted
clusterSynced[clusterID] = true
Phase 3: Corruption window (T=0 to T=?)
- Any request to
example.com/v1 returns 404
- Window closes when:
- CRD2's queued reconciliation completes (if queued before deletion)
- OR any CRD2 update event arrives (status, labels, annotations, spec)
- OR new CRD created/upserted that serves same group/version
Phase 4: Index repaired
- Update event on CRD2 triggers
onMCICRDUpsert
UpsertCRD called, index repaired
serves["cluster\x00example.com\x00v1"] = true
Window Duration Factors
Fastest repair: ~10-50ms
- CRD2 already in queue, worker picks it up immediately
- UpdateStatus completes, informer receives update event
- Index repaired
Typical repair: ~100ms-5s
- Rate-limited queue, shared workers (count=6)
- API server latency, network latency
- Informer propagation delay
Slowest repair: Indefinite
- CRD2 not in queue (no prior upsert)
- No other updates to CRD2
- No new CRDs serving same group/version
- Index remains corrupted indefinitely
Critical Observation: Status Reconciliation Skip
// crd_runtime_manager_wrapped.go:718-721
_, found := m.sharedProjection.Get(clusterID, name)
if !found {
return nil // CRD not in projection, skip reconciliation
}
If CRD2 was never upserted (e.g., created before informer started), it won't be in sharedProjection and won't be reconciled. The index would remain corrupted until manual intervention.
4. Simpler Fix Analysis
The reviewer suggests simpler fixes. Here's analysis:
Fix A: Set clusterSynced=false on Delete
func (i *CRDServesIndex) DeleteCRD(clusterID string, crd *apiextensionsv1.CustomResourceDefinition) {
if crd == nil {
return
}
i.mu.Lock()
defer i.mu.Unlock()
i.setCRDKeysLocked(clusterID, crd.Name, nil)
i.clusterSynced[clusterID] = false // Changed from true
}
Pros:
- Minimal code change
- Enables fallback paths
- Uses existing
lookupFromSharedProjection to rebuild
Cons:
- Forces
lookupFromSharedProjection on every delete + subsequent lookup
- O(n) rebuild operation where n = number of CRDs in cluster
- Less efficient than reference counting (O(1))
Why it works:
Lookup returns (false, false) when clusterSynced=false
ServesGroupVersion falls through to ensureSharedCRDState
- Then
lookupFromSharedProjection is called
lookupFromSharedProjection rebuilds index via rebuildClusterIndex
Fix B: Reference Counting
type CRDServesIndex struct {
mu sync.Mutex
clusterSynced map[string]bool
serves map[string]int // Changed: reference count
clusterKeys map[string]map[string]struct{}
crdKeys map[string]map[string][]string
}
func (i *CRDServesIndex) setCRDKeysLocked(clusterID, crdName string, keys []string) {
// ...
for _, old := range i.crdKeys[clusterID][crdName] {
i.serves[old]--
if i.serves[old] <= 0 {
delete(i.serves, old)
delete(i.clusterKeys[clusterID], old)
}
}
// ...
for _, k := range keys {
i.serves[k]++
i.clusterKeys[clusterID][k] = struct{}{}
}
}
func (i *CRDServesIndex) Lookup(clusterID, group, version string) (bool, bool) {
i.mu.Lock()
defer i.mu.Unlock()
if !i.clusterSynced[clusterID] {
return false, false
}
count := i.serves[MakeServesKey(clusterID, group, version)]
return count > 0, true
}
Pros:
- O(1) operations
- No rebuilds required
- Maintains current efficiency
Cons:
- Slightly more complex
- Changes
serves map type
Recommendation
Fix B (reference counting) is preferred because:
- Maintains O(1) efficiency
- No rebuild overhead
- Correct by construction
Fix A is acceptable if minimizing code changes is prioritized.
5. Why The Bug Is Real Despite Self-Healing
Self-Healing Is Not A Design Feature
The self-healing via status reconciliation is not documented, not guaranteed, and not reliable:
- CRDs may not be in the queue
- Reconciliation may fail (API errors, conflicts)
- Timing window exists between deletion and repair
- No tests verify this behavior
A Bug With A Workaround Is Still A Bug
If I delete a file and my data gets corrupted, the fact that "restoring from backup fixes it" doesn't mean data corruption isn't a bug. The self-healing is an accidental recovery mechanism, not an intentional design.
The Core Issue: Wrong Abstraction
The CRDServesIndex uses serves map[string]bool which assumes each serve key belongs to exactly one CRD. This is wrong because:
- Multiple CRDs can serve the same group/version
- The key doesn't include CRD name, so it's ambiguous
The correct abstraction is reference counting: serves map[string]int.
6. API Contract Violation
Implicit Contract: Lookup Reflects Current State
The Lookup method's contract (implied by its purpose) is:
Given a cluster, group, and version, return whether any CRD in the cluster serves that group/version.
Current behavior violates this:
Returns false when a CRD serves the group/version but a different CRD with the same group/version was deleted.
Evidence From Code Comments
// crd_serves_index.go:70-72
// New clusters often appear after the initial informer sync/rebuild.
// Mark as synced on first observed event to avoid false "unknown" lookups.
i.clusterSynced[clusterID] = true
This comment shows the intent: avoid false "unknown" lookups. But the current code causes false negatives (returns "not served" when it should return "served").
7. Kubernetes Pattern Validity
Multiple CRDs With Same Group Is Standard
Kubernetes documentation shows this pattern:
# crontabs.stable.example.com
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: crontabs.stable.example.com
spec:
group: stable.example.com
versions:
- name: v1
served: true
---
# anothertabs.stable.example.com
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: anothertabs.stable.example.com
spec:
group: stable.example.com
versions:
- name: v1
served: true
Both CRDs serve stable.example.com/v1. This is valid Kubernetes.
This System Must Support It
This is a multicluster apiserver that routes to Kubernetes clusters. It must support all valid Kubernetes configurations, including multiple CRDs with the same group.
8. No Tests Exercise This Scenario
$ grep -rn "randSuffix\|fmt.Sprintf.*group" test/smoke/crd_per_cluster_test.go | head -10
32: group := fmt.Sprintf("testing-%s.kplane.dev", randSuffix(3))
52: group := fmt.Sprintf("isolated-%s.kplane.dev", randSuffix(3))
92: group := fmt.Sprintf("stream-%s.kplane.dev", randSuffix(3))
122: group := fmt.Sprintf("create-%s.kplane.dev", randSuffix(3))
179: group := fmt.Sprintf("update-%s.kplane.dev", randSuffix(3))
All smoke tests use unique groups. No test exercises shared group/version scenario.
9. Git History: No Intentional Design Decision
$ git log -1 --format="%an %ae %s" efcfa44
Zach Smith zachdsmith2015@gmail.com feat: use shared informers to reduce memory and goroutines
The commit message mentions reducing memory/goroutines. No mention of:
- Reference counting tradeoffs
- Multiple CRDs per group behavior
- Handling of shared group/version
The absence of reference counting appears to be an oversight, not an intentional design choice.
Response To Reviewer Concerns
"Tests Were Written Specifically For This Report"
True. But they test the public API and prove it violates its contract. The production code calls:
m.servesIndex.DeleteCRD(clusterID, crd) // Line 467
The test calls the same method with the same inputs. If the test fails, the production code is wrong.
"No Production Evidence"
I don't have access to production. But:
- The bug is provable from code analysis
- The configuration (multiple CRDs, same group) is valid Kubernetes
- The failure mode (404 on valid resource) is user-visible
- No tests exercise this scenario, so bugs would go undetected
"Uncommon Configuration"
Valid doesn't mean uncommon. Many operators create multiple CRDs in the same group. The system must support all valid Kubernetes configurations.
"Self-Healing Through Normal Operations"
Analyzed above. The self-healing:
- Has a timing window where requests fail
- Depends on CRDs being in the queue
- Can fail (API errors, conflicts)
- Is not guaranteed or documented
"sharedProjection Remains Correct"
True but irrelevant. The fallback path that reads from sharedProjection is blocked by clusterSynced=true.
"RebuildCluster Provides Recovery"
Exists but not used. Having a recovery mechanism that's not called doesn't fix the bug.
"No Evidence Of Design Intent"
True. The absence of evidence is evidence of absence - no intentional design decision was made.
Recommended Fix
Implement reference counting (Fix B above) for correctness and efficiency.
History
This bug was introduced in commit b1e1e8c (@zachsmith1, 2026-02-16, PR #11). The original commit (efcfa44) created setCRDKeysLocked with the missing reference counting bug, but the severity was limited because the fallback path through lookupFromSharedProjection would rebuild the index when clusterSynced was false. Commit b1e1e8c made the bug significantly worse by setting clusterSynced[clusterID] = true in both UpsertCRD and DeleteCRD, which permanently blocked the fallback path and locked in the corrupted index state. The author's intent was to avoid false "unknown" lookups for new clusters appearing after initial sync, but this inadvertently prevented the self-healing mechanism from working when the index became corrupted.
Summary
CRDServesIndextracks which CRD group/version combinations are served in each cluster, enabling the multicluster apiserver to route requests to the appropriate CRD runtime.Lookupreturnsserved=falsewhen it should returnserved=true(the other CRD still serves it).Code with Bug
Evidence
1. Unit Test Demonstrates The Bug
File:
pkg/multicluster/bootstrap/crd_serves_index_test.goThe test exercises the public API directly:
Test output:
2. Complete Request Flow Traces To Bug
Step 1: HTTP Request Arrives
Request:
GET /apis/example.com/v1/namespaces/default/bars/my-barStep 2: Handler Chain Extracts Group/Version (config.go:504-513)
Step 3: Handler Checks If CRD Runtime Serves (config.go:154-164)
Step 4: ServesGroupVersion Returns Immediately If Cluster Is Synced
Step 5: Lookup Returns Corrupted Result
Result: Wrong Routing
servesmap entry is incorrectly removedclusterSynced[clusterID]remainstrueLookupreturns(false, true)- served=false, known=truebase.ServeHTTPinstead of CRD runtime3. Self-Healing Window Analysis
The reviewer correctly notes that status reconciliation provides a self-healing mechanism. Here's the complete timing analysis:
Status Reconciliation Flow
Corruption Window Duration
The index is corrupted from CRD1 deletion until CRD2's next update event:
Phase 1: Before CRD1 deletion
sharedProjectioncrdQueuefor status reconciliationPhase 2: CRD1 deleted (T=0)
onMCICRDDeletecalledsharedProjectionserves["cluster\x00example.com\x00v1"]deletedclusterSynced[clusterID] = truePhase 3: Corruption window (T=0 to T=?)
example.com/v1returns 404Phase 4: Index repaired
onMCICRDUpsertUpsertCRDcalled, index repairedserves["cluster\x00example.com\x00v1"] = trueWindow Duration Factors
Fastest repair: ~10-50ms
Typical repair: ~100ms-5s
Slowest repair: Indefinite
Critical Observation: Status Reconciliation Skip
If CRD2 was never upserted (e.g., created before informer started), it won't be in
sharedProjectionand won't be reconciled. The index would remain corrupted until manual intervention.4. Simpler Fix Analysis
The reviewer suggests simpler fixes. Here's analysis:
Fix A: Set clusterSynced=false on Delete
Pros:
lookupFromSharedProjectionto rebuildCons:
lookupFromSharedProjectionon every delete + subsequent lookupWhy it works:
Lookupreturns(false, false)whenclusterSynced=falseServesGroupVersionfalls through toensureSharedCRDStatelookupFromSharedProjectionis calledlookupFromSharedProjectionrebuilds index viarebuildClusterIndexFix B: Reference Counting
Pros:
Cons:
servesmap typeRecommendation
Fix B (reference counting) is preferred because:
Fix A is acceptable if minimizing code changes is prioritized.
5. Why The Bug Is Real Despite Self-Healing
Self-Healing Is Not A Design Feature
The self-healing via status reconciliation is not documented, not guaranteed, and not reliable:
A Bug With A Workaround Is Still A Bug
If I delete a file and my data gets corrupted, the fact that "restoring from backup fixes it" doesn't mean data corruption isn't a bug. The self-healing is an accidental recovery mechanism, not an intentional design.
The Core Issue: Wrong Abstraction
The
CRDServesIndexusesserves map[string]boolwhich assumes each serve key belongs to exactly one CRD. This is wrong because:The correct abstraction is reference counting:
serves map[string]int.6. API Contract Violation
Implicit Contract: Lookup Reflects Current State
The
Lookupmethod's contract (implied by its purpose) is:Current behavior violates this:
Evidence From Code Comments
This comment shows the intent: avoid false "unknown" lookups. But the current code causes false negatives (returns "not served" when it should return "served").
7. Kubernetes Pattern Validity
Multiple CRDs With Same Group Is Standard
Kubernetes documentation shows this pattern:
Both CRDs serve
stable.example.com/v1. This is valid Kubernetes.This System Must Support It
This is a multicluster apiserver that routes to Kubernetes clusters. It must support all valid Kubernetes configurations, including multiple CRDs with the same group.
8. No Tests Exercise This Scenario
All smoke tests use unique groups. No test exercises shared group/version scenario.
9. Git History: No Intentional Design Decision
$ git log -1 --format="%an %ae %s" efcfa44 Zach Smith zachdsmith2015@gmail.com feat: use shared informers to reduce memory and goroutinesThe commit message mentions reducing memory/goroutines. No mention of:
The absence of reference counting appears to be an oversight, not an intentional design choice.
Response To Reviewer Concerns
"Tests Were Written Specifically For This Report"
True. But they test the public API and prove it violates its contract. The production code calls:
The test calls the same method with the same inputs. If the test fails, the production code is wrong.
"No Production Evidence"
I don't have access to production. But:
"Uncommon Configuration"
Valid doesn't mean uncommon. Many operators create multiple CRDs in the same group. The system must support all valid Kubernetes configurations.
"Self-Healing Through Normal Operations"
Analyzed above. The self-healing:
"sharedProjection Remains Correct"
True but irrelevant. The fallback path that reads from
sharedProjectionis blocked byclusterSynced=true."RebuildCluster Provides Recovery"
Exists but not used. Having a recovery mechanism that's not called doesn't fix the bug.
"No Evidence Of Design Intent"
True. The absence of evidence is evidence of absence - no intentional design decision was made.
Recommended Fix
Implement reference counting (Fix B above) for correctness and efficiency.
History
This bug was introduced in commit b1e1e8c (@zachsmith1, 2026-02-16, PR #11). The original commit (efcfa44) created
setCRDKeysLockedwith the missing reference counting bug, but the severity was limited because the fallback path throughlookupFromSharedProjectionwould rebuild the index whenclusterSyncedwas false. Commit b1e1e8c made the bug significantly worse by settingclusterSynced[clusterID] = truein bothUpsertCRDandDeleteCRD, which permanently blocked the fallback path and locked in the corrupted index state. The author's intent was to avoid false "unknown" lookups for new clusters appearing after initial sync, but this inadvertently prevented the self-healing mechanism from working when the index became corrupted.