Skip to content

fix(controller): reset stale REST mapper instead of restarting pod#15

Merged
IvanHunters merged 4 commits into
mainfrom
fix/self-restart-on-stale-discovery-cache
May 28, 2026
Merged

fix(controller): reset stale REST mapper instead of restarting pod#15
IvanHunters merged 4 commits into
mainfrom
fix/self-restart-on-stale-discovery-cache

Conversation

@IvanHunters

@IvanHunters IvanHunters commented May 28, 2026

Copy link
Copy Markdown
Collaborator

Summary

A freshly bootstrapped tenant cluster installs its Peer CRD only after
the operator has reconciled the parent ClusterMesh CR for the first
time. The reconcile's List(PeerList{}) against that target then hits
NoKindMatchError, controller-runtime's REST mapper caches the negative
discovery entry for the lifetime of cluster.Cluster, and every
subsequent peer push to that target fails forever — until the operator
pod is restarted by hand. This is a permanent stall, not a transient
error: requeue-with-backoff never refreshes discovery.

Observed in the wild: a freshly-recreated tenant mesh1 deadlocked at
ceph.peers=0 for ~13 minutes while a sibling tenant mesh2, whose CRD
landed before the operator's first reconcile, kept reconciling
successfully. Both tenants shared the same operator pod.

Fix

Replace the every-target REST mapper with a resettable wrapper, and
call Reset() on it from the reconciler when a NoKindMatchError
surfaces. The next reconcile re-discovers the missing kind over one
RPC; no pod restart, no leader-lease churn.

Why a wrapper

cluster.New builds its mapper through
apiutil.NewDynamicRESTMapper, which returns an unexported
*apiutil.mapper. That type does not implement
meta.ResettableRESTMapper — the upstream interface guard does not
exist on any controller-runtime release through v0.24.0. Without a
wrapper, mapper.(meta.ResettableRESTMapper) inside
restart.RefreshMapperOnNoMatch would silently fail the type
assertion and Reset would never fire — the previous draft of this PR
did exactly that and shipped as a placebo, which the new test-guard
catches.

multicluster.resettableDynamicMapper is a thin proxy:

  • delegates every meta.RESTMapper method to a current dynamic
    mapper under an RWMutex,
  • implements Reset() by constructing a fresh
    apiutil.NewDynamicRESTMapper and atomically swapping the
    underlying mapper. NewDynamicRESTMapper is lazy about discovery,
    so the rebuild is cheap (no RPC at construction). The next
    List() against an uncached kind triggers one discovery
    round-trip against the target apiserver,
  • preserves the previous mapper on rebuild failure so callers still
    have a working mapper for kinds already in cache; the next Reset
    retries.

The wrapper is plugged in via cluster.Options.MapperProvider in
multicluster.Build, so every remote cluster.Cluster the operator
owns ends up with a resettable mapper. The reconciler reaches all of
them through r.Registry.Mappers() and resets every mapper on a
NoKindMatchError: the wrapped error chain does not carry the
specific failed-target name at the Reconcile boundary, and
Reset() is cheap enough that fan-out cache invalidation is an
acceptable tradeoff.

Why not the original self-cancel approach

Earlier drafts of this PR ended the manager's root context so kubelet
would restart the pod. It worked end-to-end (verified in the wild),
but the cure was disproportionate to the disease: every NoMatch event
dropped the leader lease, killed every in-flight reconcile across
every tenant, and after a handful of consecutive stale-CRD events the
pod inflated to CrashLoopBackOff with a multi-minute backoff. Mapper
reset is surgical: it scopes recovery to discovery-cache invalidation
and leaves the rest of the operator running.

Other changes

  • directCluster.GetRESTMapper() returns nil so envtest-driven
    integration tests stay safe; ClusterRegistry.Mappers() skips nil
    entries.
  • Cancel field removed from ClusterMeshReconciler. The cancel
    signal stays wired through ChangeWatcher where the pod-restart
    shape is correct (kubeconfig fingerprint drift legitimately
    requires rebuilding the registry).

Test plan

  • go test ./... (excluding integration which requires envtest
    assets) passes, including:
    • Three new tests on resettableDynamicMapper:
      TestResettableDynamicMapperImplementsResettable is the
      load-bearing regression guard — if the wrapper ever stops
      satisfying meta.ResettableRESTMapper, CI fails immediately;
      TestRawDynamicMapperIsNotResettable documents the upstream
      gap and will flip when controller-runtime closes it, telling us
      to delete the wrapper; TestResettableDynamicMapperResetReplacesUnderlying
      proves Reset is not a stub.
    • Seven unit cases on RefreshMapperOnNoMatch (nil err, nil
      mapper, unrelated err, plain NoKindMatchError, double-wrapped
      NoKindMatchError, non-resettable mapper, nil logger).
  • go build ./... clean.
  • golangci-lint run ./... zero issues.
  • Manual reproduction on a Cozystack hosting cluster: tenant
    whose Peer CRD lands after first reconcile should recover on the
    next reconcile tick with no operator pod restart and no leader-lease
    churn observed.

Self-cancel helper that fires the manager's context when an error chain
carries a NoKindMatchError. Mirrors ChangeWatcher's restart-on-drift
pattern so the operator recovers from stale remote-cluster discovery
without operator intervention.

Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
A target cluster's REST mapper caches discovery results for the lifetime
of cluster.Cluster. When a freshly bootstrapped tenant installs its Peer
CRD only after the operator's first reconcile of its ClusterMesh CR, the
mapper holds a NoKindMatchError forever and every subsequent peer push
fails until the pod restarts.

Plumb the manager's cancel through ClusterMeshReconciler and invoke
TriggerOnStaleDiscovery on every error return so kubelet rebuilds the
registry against current discovery. Same recovery shape as the existing
ChangeWatcher fingerprint-drift restart.

Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
@coderabbitai

coderabbitai Bot commented May 28, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

The PR adds automatic pod restart recovery when the Kubernetes REST mapper becomes stale. A new TriggerOnStaleDiscovery helper detects meta.NoKindMatchError, logs context, and invokes a cancel function to restart the pod. The reconciler is refactored to call this helper on errors, and the cancel function is wired from main.go through the dependency injection layer.

Changes

Pod Restart on Stale Discovery

Layer / File(s) Summary
Stale discovery detection helper
internal/restart/trigger.go, internal/restart/trigger_test.go
New TriggerOnStaleDiscovery function detects meta.NoKindMatchError, optionally logs the GroupKind, and invokes the provided cancel function. Tests verify behavior across nil inputs, plain and wrapped errors, and logger tolerance.
Reconciler stale discovery integration
internal/controller/clustermesh_controller.go
ClusterMeshReconciler adds Cancel context.CancelFunc field, imports restart package, and refactors Reconcile to invoke TriggerOnStaleDiscovery on errors. Reconciliation logic is extracted to a reconcile helper with simplified error propagation.
Cancel function wiring
cmd/main.go
The run() function passes the operator shutdown cancel function to wireReconciler, which assigns it to the reconciler's Cancel field.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • Arsolitt

Poem

🐰 A pod restart on stale discovery appears,
When the mapper grows dusty through kubernetes tears,
The cancel bell rings, and the pod takes a bow,
Refreshed and reborn—a fresh start is now! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title accurately reflects the main change: adding a stale REST mapper detection mechanism that triggers pod restart instead of manual intervention.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/self-restart-on-stale-discovery-cache

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a self-restart mechanism to handle stale discovery caches (specifically NoKindMatchError) on remote clusters by propagating the context's CancelFunc to the ClusterMeshReconciler. A new restart package is added with helper functions and unit tests to trigger this cancellation. The review feedback suggests checking the return value of TriggerOnStaleDiscovery in the reconciliation loop to return ctrl.Result{}, nil when a restart is triggered, preventing false-positive error logs and redundant requeues in controller-runtime during pod shutdown.

Comment on lines +85 to +87
restart.TriggerOnStaleDiscovery(err, r.Cancel, log)

return ctrl.Result{}, err

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While TriggerOnStaleDiscovery returns a boolean indicating whether a self-restart was triggered, the return value is currently ignored here.

If a self-restart is triggered, the context is cancelled and the manager is shutting down. Returning the raw NoKindMatchError to controller-runtime will cause it to log a false-positive reconciler error and attempt to requeue the request, which is redundant and can trigger false alarms in log-monitoring systems.

Instead, we should check the return value of TriggerOnStaleDiscovery and return ctrl.Result{}, nil if a restart was triggered.

	if restart.TriggerOnStaleDiscovery(err, r.Cancel, log) {
		return ctrl.Result{}, nil
	}

	return ctrl.Result{}, err

…n stale discovery

When a reconcile error carries a NoKindMatchError, reset every remote
cluster's REST mapper so the next reconcile re-discovers the missing
kind. The previous self-cancel approach killed the leader lease and
risked CrashLoopBackOff under repeated stale-CRD events. Mapper reset
is scoped to cache invalidation, costs one discovery RPC on the next
List, and keeps the rest of the operator running.

- multicluster.ClusterRegistry gains Mappers() returning every remote
  cluster's REST mapper; directCluster.GetRESTMapper() returns nil so
  envtest-driven integration tests stay safe.
- restart.RefreshMapperOnNoMatch replaces TriggerOnStaleDiscovery;
  accepts a single mapper, type-asserts to meta.ResettableRESTMapper.
- ClusterMeshReconciler.Reconcile loops over Mappers() on every
  return. Cancel field removed from the reconciler; ChangeWatcher
  still cancels on kubeconfig drift where that is the correct shape.

Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
@IvanHunters IvanHunters changed the title fix(controller): self-restart on stale remote-cluster discovery cache fix(controller): reset stale REST mapper instead of restarting pod May 28, 2026
…fire

controller-runtime's apiutil.NewDynamicRESTMapper returns an
unexported *mapper that has no Reset() method, so the meta
ResettableRESTMapper type assertion inside
restart.RefreshMapperOnNoMatch was failing in production and the
recovery path was silently no-opping. The previous commit shipped a
placebo: unit tests passed against a hand-rolled spy but the prod
mapper never got reset, so the original mesh1 deadlock would not
self-heal as advertised.

Introduce resettableDynamicMapper, a thin proxy around
apiutil.NewDynamicRESTMapper that implements
meta.ResettableRESTMapper. Reset() atomically swaps the underlying
mapper for a freshly constructed one; NewDynamicRESTMapper is lazy
about discovery so the rebuild is cheap. On rebuild failure the old
mapper is preserved so callers still have a working mapper for kinds
already in cache. The wrapper is wired into multicluster.Build via
cluster.Options.MapperProvider so every remote cluster.Cluster ends
up with a resettable mapper.

Tests:
- TestResettableDynamicMapperImplementsResettable: load-bearing
  invariant — guards against the regression that just happened.
- TestRawDynamicMapperIsNotResettable: documents the upstream gap;
  flips green and tells us to delete the wrapper if controller-runtime
  ever adds Reset to its dynamic mapper.
- TestResettableDynamicMapperResetReplacesUnderlying: Reset is not a
  no-op stub.

Also: log message in restart.RefreshMapperOnNoMatch said "target
cluster"; on the reconciler's fan-out path it is one of N remote
clusters, not a single target. Reword to "remote-cluster".

Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Replaces the prior "self-cancel the manager and let kubelet restart the pod" recovery from NoKindMatchError on remote clusters with a surgical mapper-reset: every remote cluster.Cluster is now built with a resettable wrapper around apiutil.NewDynamicRESTMapper, and the reconciler invokes Reset() on every registered mapper after a reconcile that surfaces a NoKindMatchError. This avoids leader-lease churn and CrashLoopBackOff caused by repeated pod restarts when tenant CRDs land after the operator's first reconcile.

Changes:

  • Add multicluster.resettableDynamicMapper, a meta.ResettableRESTMapper wrapping apiutil.NewDynamicRESTMapper, wired in via cluster.Options.MapperProvider in multicluster.Build.
  • Add restart.RefreshMapperOnNoMatch plus ClusterRegistry.Mappers(), and call them from the refactored ClusterMeshReconciler.Reconcile so a NoKindMatchError triggers a discovery-cache invalidation instead of a pod restart.
  • Add unit coverage for the wrapper (incl. a regression guard on meta.ResettableRESTMapper satisfaction and an upstream-gap marker) and for the trigger helper across nil/wrapped/non-resettable inputs.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
internal/restart/trigger.go New helper that detects NoKindMatchError in an error chain and resets a meta.ResettableRESTMapper.
internal/restart/trigger_test.go Unit tests covering nil/unrelated/plain/wrapped/non-resettable/nil-logger paths.
internal/multicluster/mapper.go Resettable wrapper around apiutil.NewDynamicRESTMapper with atomic swap on Reset().
internal/multicluster/mapper_test.go Tests for the wrapper, including regression guard for meta.ResettableRESTMapper.
internal/multicluster/registry.go Wires the wrapper into cluster.New, adds Mappers(), and makes directCluster.GetRESTMapper return nil.
internal/controller/clustermesh_controller.go Splits Reconcile into inner reconcile; on return, fans RefreshMapperOnNoMatch across all registered mappers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +35 to +46
// handful of restarts), lets the manager keep its leader lease, and
// scopes recovery to the one cluster whose CRD state actually drifted.
// The next reconcile picks up the fresh mapping via Discovery and the
// peer push succeeds without further intervention.
//
// mapper may be nil — the call is then a no-op. If the mapper does not
// implement meta.ResettableRESTMapper (an in-memory test fake, say) the
// call also no-ops; that is acceptable because the production code path
// always builds clusters through controller-runtime's dynamic mapper.
// log may be nil — logging is then skipped.
// Returns true when Reset() was actually invoked, so callers can avoid
// double-emitting the recovery event.
Comment on lines +85 to +94
func (r *resettableDynamicMapper) Reset() {
fresh, err := apiutil.NewDynamicRESTMapper(r.cfg, r.httpClient)
if err != nil {
return
}

r.mu.Lock()
r.mapper = fresh
r.mu.Unlock()
}
@IvanHunters IvanHunters merged commit 69faf59 into main May 28, 2026
10 checks passed
@IvanHunters IvanHunters deleted the fix/self-restart-on-stale-discovery-cache branch May 28, 2026 07:01
IvanHunters added a commit that referenced this pull request May 28, 2026
…17)

The reconciler watches only ClusterMesh CRs, not the Nodes of remote
clusters. If the first reconcile runs against a source cluster whose
apiserver answers but whose kubelet has not joined yet (or whose
kubeconfig secret is not in the registry at all), reconcile completes
without error: ensureNodeEndpoints sees zero nodes, pushPeersToTargets
emits an empty desired peer set, no useful state changes. controller
runtime then has nothing to requeue on, and the next attempt waits for
an external event on the CR — which can take many minutes (in the
wild we saw a 17-minute mesh-up gap on a freshly bootstrapped tenant
whose VM landed a few minutes after the operator's first reconcile).

Flag the run as "incomplete" when any source is missing from the
registry or returned zero nodes, and translate that into a
RequeueAfter at the Reconcile boundary. The interval is a short
constant (30s) so freshly bootstrapped tenants converge promptly
without waiting on a stray watch event. Errors are unaffected:
controller-runtime keeps its own exponential backoff for those.

This is the source-side dual of the PR #15 fix (which addressed the
target-side NoMatchError on the Peer CRD via mapper Reset). Together
they close the two halves of the freshly-bootstrapped-tenant race.

Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants