feat: InPlace update strategy for SeiNodeDeployment by bdchatham · Pull Request #80 · sei-protocol/sei-k8s-controller

bdchatham · 2026-04-13T20:19:28Z

Summary

Implements the InPlace update strategy as designed in #79 (.tide/designs/inplace-update-strategy.md).

What's new

InPlace added to UpdateStrategyType enum — alongside BlueGreen and HardFork
updateStrategy is now required — removes the implicit nil/fire-and-forget path. Zero-value migration treats empty type as InPlace with a logged warning.
Unified RolloutStatus replaces DeploymentStatus — single type used by all strategies with optional incumbent/entrant fields for BlueGreen/HardFork
RolloutInProgress condition — conditions-driven coordination between reconciler (detects diffs) and planner (actions them). Guards against concurrent mutations.
InPlace deployment plan: UpdateNodeSpecs → AwaitSpecUpdate → MarkReady
- UpdateNodeSpecs patches child SeiNode images via kube client
- AwaitSpecUpdate polls node.status.currentImage == node.spec.image (reads SeiNode status only, never StatefulSet directly)
- MarkReady builds sidecar clients, submits mark-ready, confirms Ready status
status.currentImage on SeiNode — set by the node controller when StatefulSet rollout completes (currentRevision == updateRevision). Parent controllers compare against spec.image for convergence.
Stalled rollout escalation — RolloutInProgress reason transitions to Stalled after 10 minutes, providing a durable alerting signal

Architecture decisions (validated with Tide experts)

Deployment controller owns the full InPlace lifecycle including sidecar mark-ready (existing precedent: awaitNodesCaughtUpExecution, genesis assembly)
SeiNode controller is unaware of deployment strategy — it just converges StatefulSet/Service and surfaces currentImage
Three single-purpose plan steps — mutate, verify rollout, signal readiness. Clear failure diagnostics at each stage.
No auto-rollback — rolling back a blockchain binary after a chain upgrade leaves the node unable to process new blocks

Test plan

🤖 Generated with Claude Code

Replace unconditional ensureMarkReady-on-every-reconcile with a plan-based approach that preserves controller responsibility boundaries: - SeiNodeDeployment controller (orchestrator) creates an InPlace plan: UpdateNodeSpecs → AwaitRunning - UpdateNodeSpecs patches child SeiNode images and sets a ReadinessApproved condition on each SeiNode - SeiNode controller (executor) reacts to ReadinessApproved by submitting mark-ready to its own sidecar, then clears the condition - Standalone SeiNodes (no parent deployment) self-approve via owner reference check This preserves the single-writer invariant: only the SeiNode controller talks to the sidecar. The Kubernetes resource is the communication channel between controllers, matching Cluster API and Crossplane patterns. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…Place enum - Add UpdateStrategyInPlace to the UpdateStrategyType enum - Make updateStrategy a required field (remove pointer/optional) - Replace DeploymentStatus with RolloutStatus containing: Strategy, TargetHash, StartedAt, per-node convergence tracking, plus the existing incumbent/entrant fields for BlueGreen/HardFork - Add ConditionRolloutInProgress on SeiNodeDeployment - Add ConditionReadinessApproved on SeiNode - Migrate all references from Deployment→Rollout across controllers, planner, tasks, and tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…→ MarkReady) - Add status.currentImage to SeiNode, populated by the node controller when StatefulSet rollout completes (currentRevision == updateRevision) - Add UpdateNodeSpecs task: patches child SeiNode image via kube client - Add AwaitSpecUpdate task: polls node.status.currentImage == spec.image to confirm the pod is running the new image (reads SeiNode status only, never StatefulSet directly) - Add MarkNodesReady task: builds sidecar clients via sidecarClientForNode, submits mark-ready once reachable, completes when all sidecars report Ready - Wire InPlace planner to generate the 3-step plan - Register new task types in the task registry Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…cking - detectDeploymentNeeded sets RolloutInProgress condition and populates per-node RolloutNodeStatus on the RolloutStatus - Add zero-value migration: empty updateStrategy.type treated as InPlace - Add reconcileRolloutStatus for InPlace convergence tracking: polls child SeiNode phases, clears rollout when all nodes Running - Add stalled rollout escalation: RolloutInProgress reason transitions to Stalled after 10 minutes with non-ready nodes - computeGroupPhase returns Upgrading when RolloutInProgress is True Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- TestDetectDeploymentNeeded_InPlace_SetsRolloutInProgress - TestDetectDeploymentNeeded_InPlace_AlreadyActive - TestDetectDeploymentNeeded_EmptyType_TreatedAsInPlace - TestReconcileRolloutStatus_InPlace_AllReady - TestReconcileRolloutStatus_InPlace_Partial - TestReconcileRolloutStatus_InPlace_Stalled - TestComputeGroupPhase_RolloutInProgress - TestObserveCurrentImage_UpdatesWhenConverged - TestObserveCurrentImage_SkipsWhenRolling - TestInPlacePlan_ThreeTasks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

UpdateNodeSpecs → AwaitSpecUpdate → MarkReady plan, status.currentImage for convergence observation, and deployment-level sidecar interaction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

api/v1alpha1/seinode_types.go

internal/controller/node/controller.go

internal/controller/nodedeployment/nodes.go

internal/controller/nodedeployment/status.go

Critical fixes: - Clear RolloutInProgress condition in completePlan/failPlan so future deployments are not permanently blocked - Implement InPlace plan supersession: if the spec changes during an active rollout (targetHash != currentHash), the stale plan is replaced with a new one targeting the latest spec - Harden observeCurrentImage: require ReadyReplicas >= 1 and non-empty CurrentRevision to prevent premature currentImage reporting Cleanup: - Remove dead ConditionReadinessApproved (replaced by plan-based approach) - Move stallThreshold const to top of status.go - Fix gofmt formatting in seinode_types.go - Update tests: supersession test, converged-image test with ReadyReplicas Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add rollout lifecycle events: RolloutStarted, RolloutSuperseded, RolloutComplete emitted at the appropriate state transitions - Add TODO for AwaitSpecUpdate pod failure detection (kubelet waiting reasons not exported as constants; mitigated by plan supersession) - Clarify templateHash godoc: lists tracked fields and explains which changes trigger deployment plans vs in-place propagation - Clean up unused imports from earlier rejected edit Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Supersession handles the primary recovery case (bad image push). The Upgrading phase itself is a durable signal for infra-level issues. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

observeCurrentImage edge cases: - SkipsWhenReadyReplicasZero (race condition guard) - SkipsWhenEmptyRevision (fresh StatefulSet) - NoopWhenAlreadyCurrent (idempotency) - StatefulSetNotFound (graceful nil return) Plan lifecycle: - CompletePlan_ClearsRolloutInProgress - FailPlan_ClearsRolloutInProgress - DoesNotClearWhilePlanActive (PlanInProgress guard) Task execution: - UpdateNodeSpecs patches image, skips already-current - AwaitSpecUpdate completes when converged, stays Running otherwise - MarkNodesReady deserialization and initial state Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

bdchatham and others added 6 commits April 13, 2026 11:34

bdchatham commented Apr 13, 2026

View reviewed changes

bdchatham and others added 4 commits April 13, 2026 13:52

refactor: remove stall threshold detection

cc393fa

Supersession handles the primary recovery case (bad image push). The Upgrading phase itself is a durable signal for infra-level issues. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

bdchatham merged commit f6429c3 into main Apr 13, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: InPlace update strategy for SeiNodeDeployment#80

feat: InPlace update strategy for SeiNodeDeployment#80
bdchatham merged 10 commits intomainfrom
feat/inplace-update-strategy

bdchatham commented Apr 13, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bdchatham commented Apr 13, 2026

Summary

What's new

Architecture decisions (validated with Tide experts)

Test plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant