fix(controller): reset stale REST mapper instead of restarting pod#15
Conversation
Self-cancel helper that fires the manager's context when an error chain carries a NoKindMatchError. Mirrors ChangeWatcher's restart-on-drift pattern so the operator recovers from stale remote-cluster discovery without operator intervention. Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
A target cluster's REST mapper caches discovery results for the lifetime of cluster.Cluster. When a freshly bootstrapped tenant installs its Peer CRD only after the operator's first reconcile of its ClusterMesh CR, the mapper holds a NoKindMatchError forever and every subsequent peer push fails until the pod restarts. Plumb the manager's cancel through ClusterMeshReconciler and invoke TriggerOnStaleDiscovery on every error return so kubelet rebuilds the registry against current discovery. Same recovery shape as the existing ChangeWatcher fingerprint-drift restart. Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
📝 WalkthroughWalkthroughThe PR adds automatic pod restart recovery when the Kubernetes REST mapper becomes stale. A new ChangesPod Restart on Stale Discovery
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a self-restart mechanism to handle stale discovery caches (specifically NoKindMatchError) on remote clusters by propagating the context's CancelFunc to the ClusterMeshReconciler. A new restart package is added with helper functions and unit tests to trigger this cancellation. The review feedback suggests checking the return value of TriggerOnStaleDiscovery in the reconciliation loop to return ctrl.Result{}, nil when a restart is triggered, preventing false-positive error logs and redundant requeues in controller-runtime during pod shutdown.
| restart.TriggerOnStaleDiscovery(err, r.Cancel, log) | ||
|
|
||
| return ctrl.Result{}, err |
There was a problem hiding this comment.
While TriggerOnStaleDiscovery returns a boolean indicating whether a self-restart was triggered, the return value is currently ignored here.
If a self-restart is triggered, the context is cancelled and the manager is shutting down. Returning the raw NoKindMatchError to controller-runtime will cause it to log a false-positive reconciler error and attempt to requeue the request, which is redundant and can trigger false alarms in log-monitoring systems.
Instead, we should check the return value of TriggerOnStaleDiscovery and return ctrl.Result{}, nil if a restart was triggered.
if restart.TriggerOnStaleDiscovery(err, r.Cancel, log) {
return ctrl.Result{}, nil
}
return ctrl.Result{}, err…n stale discovery When a reconcile error carries a NoKindMatchError, reset every remote cluster's REST mapper so the next reconcile re-discovers the missing kind. The previous self-cancel approach killed the leader lease and risked CrashLoopBackOff under repeated stale-CRD events. Mapper reset is scoped to cache invalidation, costs one discovery RPC on the next List, and keeps the rest of the operator running. - multicluster.ClusterRegistry gains Mappers() returning every remote cluster's REST mapper; directCluster.GetRESTMapper() returns nil so envtest-driven integration tests stay safe. - restart.RefreshMapperOnNoMatch replaces TriggerOnStaleDiscovery; accepts a single mapper, type-asserts to meta.ResettableRESTMapper. - ClusterMeshReconciler.Reconcile loops over Mappers() on every return. Cancel field removed from the reconciler; ChangeWatcher still cancels on kubeconfig drift where that is the correct shape. Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
…fire controller-runtime's apiutil.NewDynamicRESTMapper returns an unexported *mapper that has no Reset() method, so the meta ResettableRESTMapper type assertion inside restart.RefreshMapperOnNoMatch was failing in production and the recovery path was silently no-opping. The previous commit shipped a placebo: unit tests passed against a hand-rolled spy but the prod mapper never got reset, so the original mesh1 deadlock would not self-heal as advertised. Introduce resettableDynamicMapper, a thin proxy around apiutil.NewDynamicRESTMapper that implements meta.ResettableRESTMapper. Reset() atomically swaps the underlying mapper for a freshly constructed one; NewDynamicRESTMapper is lazy about discovery so the rebuild is cheap. On rebuild failure the old mapper is preserved so callers still have a working mapper for kinds already in cache. The wrapper is wired into multicluster.Build via cluster.Options.MapperProvider so every remote cluster.Cluster ends up with a resettable mapper. Tests: - TestResettableDynamicMapperImplementsResettable: load-bearing invariant — guards against the regression that just happened. - TestRawDynamicMapperIsNotResettable: documents the upstream gap; flips green and tells us to delete the wrapper if controller-runtime ever adds Reset to its dynamic mapper. - TestResettableDynamicMapperResetReplacesUnderlying: Reset is not a no-op stub. Also: log message in restart.RefreshMapperOnNoMatch said "target cluster"; on the reconciler's fan-out path it is one of N remote clusters, not a single target. Reword to "remote-cluster". Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
There was a problem hiding this comment.
Pull request overview
Replaces the prior "self-cancel the manager and let kubelet restart the pod" recovery from NoKindMatchError on remote clusters with a surgical mapper-reset: every remote cluster.Cluster is now built with a resettable wrapper around apiutil.NewDynamicRESTMapper, and the reconciler invokes Reset() on every registered mapper after a reconcile that surfaces a NoKindMatchError. This avoids leader-lease churn and CrashLoopBackOff caused by repeated pod restarts when tenant CRDs land after the operator's first reconcile.
Changes:
- Add
multicluster.resettableDynamicMapper, ameta.ResettableRESTMapperwrappingapiutil.NewDynamicRESTMapper, wired in viacluster.Options.MapperProviderinmulticluster.Build. - Add
restart.RefreshMapperOnNoMatchplusClusterRegistry.Mappers(), and call them from the refactoredClusterMeshReconciler.Reconcileso aNoKindMatchErrortriggers a discovery-cache invalidation instead of a pod restart. - Add unit coverage for the wrapper (incl. a regression guard on
meta.ResettableRESTMappersatisfaction and an upstream-gap marker) and for the trigger helper across nil/wrapped/non-resettable inputs.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/restart/trigger.go | New helper that detects NoKindMatchError in an error chain and resets a meta.ResettableRESTMapper. |
| internal/restart/trigger_test.go | Unit tests covering nil/unrelated/plain/wrapped/non-resettable/nil-logger paths. |
| internal/multicluster/mapper.go | Resettable wrapper around apiutil.NewDynamicRESTMapper with atomic swap on Reset(). |
| internal/multicluster/mapper_test.go | Tests for the wrapper, including regression guard for meta.ResettableRESTMapper. |
| internal/multicluster/registry.go | Wires the wrapper into cluster.New, adds Mappers(), and makes directCluster.GetRESTMapper return nil. |
| internal/controller/clustermesh_controller.go | Splits Reconcile into inner reconcile; on return, fans RefreshMapperOnNoMatch across all registered mappers. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // handful of restarts), lets the manager keep its leader lease, and | ||
| // scopes recovery to the one cluster whose CRD state actually drifted. | ||
| // The next reconcile picks up the fresh mapping via Discovery and the | ||
| // peer push succeeds without further intervention. | ||
| // | ||
| // mapper may be nil — the call is then a no-op. If the mapper does not | ||
| // implement meta.ResettableRESTMapper (an in-memory test fake, say) the | ||
| // call also no-ops; that is acceptable because the production code path | ||
| // always builds clusters through controller-runtime's dynamic mapper. | ||
| // log may be nil — logging is then skipped. | ||
| // Returns true when Reset() was actually invoked, so callers can avoid | ||
| // double-emitting the recovery event. |
| func (r *resettableDynamicMapper) Reset() { | ||
| fresh, err := apiutil.NewDynamicRESTMapper(r.cfg, r.httpClient) | ||
| if err != nil { | ||
| return | ||
| } | ||
|
|
||
| r.mu.Lock() | ||
| r.mapper = fresh | ||
| r.mu.Unlock() | ||
| } |
…17) The reconciler watches only ClusterMesh CRs, not the Nodes of remote clusters. If the first reconcile runs against a source cluster whose apiserver answers but whose kubelet has not joined yet (or whose kubeconfig secret is not in the registry at all), reconcile completes without error: ensureNodeEndpoints sees zero nodes, pushPeersToTargets emits an empty desired peer set, no useful state changes. controller runtime then has nothing to requeue on, and the next attempt waits for an external event on the CR — which can take many minutes (in the wild we saw a 17-minute mesh-up gap on a freshly bootstrapped tenant whose VM landed a few minutes after the operator's first reconcile). Flag the run as "incomplete" when any source is missing from the registry or returned zero nodes, and translate that into a RequeueAfter at the Reconcile boundary. The interval is a short constant (30s) so freshly bootstrapped tenants converge promptly without waiting on a stray watch event. Errors are unaffected: controller-runtime keeps its own exponential backoff for those. This is the source-side dual of the PR #15 fix (which addressed the target-side NoMatchError on the Peer CRD via mapper Reset). Together they close the two halves of the freshly-bootstrapped-tenant race. Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
Summary
A freshly bootstrapped tenant cluster installs its
PeerCRD only afterthe operator has reconciled the parent
ClusterMeshCR for the firsttime. The reconcile's
List(PeerList{})against that target then hitsNoKindMatchError, controller-runtime's REST mapper caches the negativediscovery entry for the lifetime of
cluster.Cluster, and everysubsequent peer push to that target fails forever — until the operator
pod is restarted by hand. This is a permanent stall, not a transient
error: requeue-with-backoff never refreshes discovery.
Observed in the wild: a freshly-recreated tenant
mesh1deadlocked atceph.peers=0for ~13 minutes while a sibling tenantmesh2, whose CRDlanded before the operator's first reconcile, kept reconciling
successfully. Both tenants shared the same operator pod.
Fix
Replace the every-target REST mapper with a resettable wrapper, and
call
Reset()on it from the reconciler when aNoKindMatchErrorsurfaces. The next reconcile re-discovers the missing kind over one
RPC; no pod restart, no leader-lease churn.
Why a wrapper
cluster.Newbuilds its mapper throughapiutil.NewDynamicRESTMapper, which returns an unexported*apiutil.mapper. That type does not implementmeta.ResettableRESTMapper— the upstream interface guard does notexist on any controller-runtime release through v0.24.0. Without a
wrapper,
mapper.(meta.ResettableRESTMapper)insiderestart.RefreshMapperOnNoMatchwould silently fail the typeassertion and Reset would never fire — the previous draft of this PR
did exactly that and shipped as a placebo, which the new test-guard
catches.
multicluster.resettableDynamicMapperis a thin proxy:meta.RESTMappermethod to a current dynamicmapper under an
RWMutex,Reset()by constructing a freshapiutil.NewDynamicRESTMapperand atomically swapping theunderlying mapper.
NewDynamicRESTMapperis lazy about discovery,so the rebuild is cheap (no RPC at construction). The next
List()against an uncached kind triggers one discoveryround-trip against the target apiserver,
have a working mapper for kinds already in cache; the next Reset
retries.
The wrapper is plugged in via
cluster.Options.MapperProviderinmulticluster.Build, so every remotecluster.Clusterthe operatorowns ends up with a resettable mapper. The reconciler reaches all of
them through
r.Registry.Mappers()and resets every mapper on aNoKindMatchError: the wrapped error chain does not carry thespecific failed-target name at the
Reconcileboundary, andReset()is cheap enough that fan-out cache invalidation is anacceptable tradeoff.
Why not the original self-cancel approach
Earlier drafts of this PR ended the manager's root context so kubelet
would restart the pod. It worked end-to-end (verified in the wild),
but the cure was disproportionate to the disease: every NoMatch event
dropped the leader lease, killed every in-flight reconcile across
every tenant, and after a handful of consecutive stale-CRD events the
pod inflated to CrashLoopBackOff with a multi-minute backoff. Mapper
reset is surgical: it scopes recovery to discovery-cache invalidation
and leaves the rest of the operator running.
Other changes
directCluster.GetRESTMapper()returns nil so envtest-drivenintegration tests stay safe;
ClusterRegistry.Mappers()skips nilentries.
Cancelfield removed fromClusterMeshReconciler. The cancelsignal stays wired through
ChangeWatcherwhere the pod-restartshape is correct (kubeconfig fingerprint drift legitimately
requires rebuilding the registry).
Test plan
go test ./...(excluding integration which requires envtestassets) passes, including:
resettableDynamicMapper:TestResettableDynamicMapperImplementsResettableis theload-bearing regression guard — if the wrapper ever stops
satisfying
meta.ResettableRESTMapper, CI fails immediately;TestRawDynamicMapperIsNotResettabledocuments the upstreamgap and will flip when controller-runtime closes it, telling us
to delete the wrapper;
TestResettableDynamicMapperResetReplacesUnderlyingproves Reset is not a stub.
RefreshMapperOnNoMatch(nil err, nilmapper, unrelated err, plain
NoKindMatchError, double-wrappedNoKindMatchError, non-resettable mapper, nil logger).go build ./...clean.golangci-lint run ./...zero issues.whose
PeerCRD lands after first reconcile should recover on thenext reconcile tick with no operator pod restart and no leader-lease
churn observed.