feat: InPlace update strategy for SeiNodeDeployment#80
Merged
Conversation
Replace unconditional ensureMarkReady-on-every-reconcile with a plan-based approach that preserves controller responsibility boundaries: - SeiNodeDeployment controller (orchestrator) creates an InPlace plan: UpdateNodeSpecs → AwaitRunning - UpdateNodeSpecs patches child SeiNode images and sets a ReadinessApproved condition on each SeiNode - SeiNode controller (executor) reacts to ReadinessApproved by submitting mark-ready to its own sidecar, then clears the condition - Standalone SeiNodes (no parent deployment) self-approve via owner reference check This preserves the single-writer invariant: only the SeiNode controller talks to the sidecar. The Kubernetes resource is the communication channel between controllers, matching Cluster API and Crossplane patterns. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Place enum - Add UpdateStrategyInPlace to the UpdateStrategyType enum - Make updateStrategy a required field (remove pointer/optional) - Replace DeploymentStatus with RolloutStatus containing: Strategy, TargetHash, StartedAt, per-node convergence tracking, plus the existing incumbent/entrant fields for BlueGreen/HardFork - Add ConditionRolloutInProgress on SeiNodeDeployment - Add ConditionReadinessApproved on SeiNode - Migrate all references from Deployment→Rollout across controllers, planner, tasks, and tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…→ MarkReady) - Add status.currentImage to SeiNode, populated by the node controller when StatefulSet rollout completes (currentRevision == updateRevision) - Add UpdateNodeSpecs task: patches child SeiNode image via kube client - Add AwaitSpecUpdate task: polls node.status.currentImage == spec.image to confirm the pod is running the new image (reads SeiNode status only, never StatefulSet directly) - Add MarkNodesReady task: builds sidecar clients via sidecarClientForNode, submits mark-ready once reachable, completes when all sidecars report Ready - Wire InPlace planner to generate the 3-step plan - Register new task types in the task registry Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…cking - detectDeploymentNeeded sets RolloutInProgress condition and populates per-node RolloutNodeStatus on the RolloutStatus - Add zero-value migration: empty updateStrategy.type treated as InPlace - Add reconcileRolloutStatus for InPlace convergence tracking: polls child SeiNode phases, clears rollout when all nodes Running - Add stalled rollout escalation: RolloutInProgress reason transitions to Stalled after 10 minutes with non-ready nodes - computeGroupPhase returns Upgrading when RolloutInProgress is True Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- TestDetectDeploymentNeeded_InPlace_SetsRolloutInProgress - TestDetectDeploymentNeeded_InPlace_AlreadyActive - TestDetectDeploymentNeeded_EmptyType_TreatedAsInPlace - TestReconcileRolloutStatus_InPlace_AllReady - TestReconcileRolloutStatus_InPlace_Partial - TestReconcileRolloutStatus_InPlace_Stalled - TestComputeGroupPhase_RolloutInProgress - TestObserveCurrentImage_UpdatesWhenConverged - TestObserveCurrentImage_SkipsWhenRolling - TestInPlacePlan_ThreeTasks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
UpdateNodeSpecs → AwaitSpecUpdate → MarkReady plan, status.currentImage for convergence observation, and deployment-level sidecar interaction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
bdchatham
commented
Apr 13, 2026
Critical fixes: - Clear RolloutInProgress condition in completePlan/failPlan so future deployments are not permanently blocked - Implement InPlace plan supersession: if the spec changes during an active rollout (targetHash != currentHash), the stale plan is replaced with a new one targeting the latest spec - Harden observeCurrentImage: require ReadyReplicas >= 1 and non-empty CurrentRevision to prevent premature currentImage reporting Cleanup: - Remove dead ConditionReadinessApproved (replaced by plan-based approach) - Move stallThreshold const to top of status.go - Fix gofmt formatting in seinode_types.go - Update tests: supersession test, converged-image test with ReadyReplicas Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add rollout lifecycle events: RolloutStarted, RolloutSuperseded, RolloutComplete emitted at the appropriate state transitions - Add TODO for AwaitSpecUpdate pod failure detection (kubelet waiting reasons not exported as constants; mitigated by plan supersession) - Clarify templateHash godoc: lists tracked fields and explains which changes trigger deployment plans vs in-place propagation - Clean up unused imports from earlier rejected edit Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Supersession handles the primary recovery case (bad image push). The Upgrading phase itself is a durable signal for infra-level issues. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
observeCurrentImage edge cases: - SkipsWhenReadyReplicasZero (race condition guard) - SkipsWhenEmptyRevision (fresh StatefulSet) - NoopWhenAlreadyCurrent (idempotency) - StatefulSetNotFound (graceful nil return) Plan lifecycle: - CompletePlan_ClearsRolloutInProgress - FailPlan_ClearsRolloutInProgress - DoesNotClearWhilePlanActive (PlanInProgress guard) Task execution: - UpdateNodeSpecs patches image, skips already-current - AwaitSpecUpdate completes when converged, stays Running otherwise - MarkNodesReady deserialization and initial state Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements the InPlace update strategy as designed in #79 (
.tide/designs/inplace-update-strategy.md).What's new
updateStrategyis now required — removes the implicit nil/fire-and-forget path. Zero-value migration treats empty type as InPlace with a logged warning.RolloutStatusreplacesDeploymentStatus— single type used by all strategies with optional incumbent/entrant fields for BlueGreen/HardForkRolloutInProgresscondition — conditions-driven coordination between reconciler (detects diffs) and planner (actions them). Guards against concurrent mutations.UpdateNodeSpecs → AwaitSpecUpdate → MarkReadyUpdateNodeSpecspatches child SeiNode images via kube clientAwaitSpecUpdatepollsnode.status.currentImage == node.spec.image(reads SeiNode status only, never StatefulSet directly)MarkReadybuilds sidecar clients, submits mark-ready, confirms Ready statusstatus.currentImageon SeiNode — set by the node controller when StatefulSet rollout completes (currentRevision == updateRevision). Parent controllers compare againstspec.imagefor convergence.RolloutInProgressreason transitions toStalledafter 10 minutes, providing a durable alerting signalArchitecture decisions (validated with Tide experts)
awaitNodesCaughtUpExecution, genesis assembly)currentImageTest plan
TestDetectDeploymentNeeded_InPlace_SetsRolloutInProgressTestDetectDeploymentNeeded_InPlace_AlreadyActiveTestDetectDeploymentNeeded_EmptyType_TreatedAsInPlaceTestReconcileRolloutStatus_InPlace_AllReadyTestReconcileRolloutStatus_InPlace_PartialTestReconcileRolloutStatus_InPlace_StalledTestComputeGroupPhase_RolloutInProgressTestObserveCurrentImage_UpdatesWhenConvergedTestObserveCurrentImage_SkipsWhenRollingTestInPlacePlan_ThreeTasksmake test)make lint)🤖 Generated with Claude Code