design: HardFork deployment simplification via bootstrap by bdchatham · Pull Request #67 · sei-protocol/sei-k8s-controller

bdchatham · 2026-04-09T15:03:01Z

Summary

LLD for simplifying the HardFork deployment plan from 6 tasks to 4 by leveraging the existing bootstrap Job mechanism.

Key Change

Instead of the controller orchestrating halt-height coordination between incumbent and entrant nodes via sidecar tasks (SubmitHaltSignal + AwaitNodesAtHeight), the entrant node's bootstrap spec handles it:

Bootstrap image = incumbent binary (e.g., sei:v6.3.0)
Target height = haltHeight - 1
Production image = new binary (e.g., sei:v6.4.0)

The bootstrap Job syncs to one block before the upgrade, exits cleanly, then the StatefulSet starts with the new binary which processes the upgrade height.

Plan Reduction

Before (6 tasks): Create → AwaitRunning → SubmitHaltSignal → AwaitHeight → Switch → Teardown
After  (4 tasks): Create → AwaitRunning → Switch → Teardown

Design Covers

CRD changes (IncumbentImage on DeploymentStatus)
Planner simplification (remove 2 tasks)
CreateEntrantNodes bootstrap injection
Entrant lifecycle trace (bootstrap → upgrade handler → Running)
Failure modes and recovery
BlueGreen convergence analysis
Traffic switch timing
File-by-file changes and implementation order

📄 Design doc: .tide/designs/hardfork-bootstrap-simplification.md

🤖 Generated with Claude Code

LLD for reducing the HardFork deployment plan from 6 tasks to 4 by leveraging the existing bootstrap Job system. The entrant node's bootstrap image (incumbent binary) syncs to haltHeight-1, then the production StatefulSet (new binary) handles the upgrade height. Eliminates SubmitHaltSignal and AwaitNodesAtHeight tasks — the bootstrap mechanism handles halt-height coordination internally. Incumbents keep running until traffic switch, removing the risk of premature SIGTERM. Covers: CRD changes, planner simplification, CreateEntrantNodes bootstrap injection, failure modes, BlueGreen convergence analysis, traffic switch timing, and implementation order. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

LLD for formalizing the InPlace update strategy as an explicit UpdateStrategyType alongside BlueGreen and HardFork. Covers CRD changes, rollout status tracking, upgrade height gating, the sidecar mark-ready restart fix, and failure modes. Key decisions: - No plan/task machinery — uses existing ensureSeiNode path - ensureMarkReady in reconcileRunning unblocks all pod restarts - Optional upgradeHeight gating prevents premature rollouts - No auto-rollback (unsafe for chain upgrades) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* design: InPlace update strategy for SeiNodeDeployment (#67 sibling) LLD for formalizing the InPlace update strategy as an explicit UpdateStrategyType alongside BlueGreen and HardFork. Covers CRD changes, rollout status tracking, upgrade height gating, the sidecar mark-ready restart fix, and failure modes. Key decisions: - No plan/task machinery — uses existing ensureSeiNode path - ensureMarkReady in reconcileRunning unblocks all pod restarts - Optional upgradeHeight gating prevents premature rollouts - No auto-rollback (unsafe for chain upgrades) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * design: revise InPlace strategy per PR review feedback - Remove nil/no-strategy path; updateStrategy is now required - Unify DeploymentStatus and RolloutStatus into single RolloutStatus - Drop upgradeHeight gating (no reliable block height source) - Introduce conditions-driven reconciliation pattern with RolloutInProgress as the coordination mechanism between reconciler (detects diffs) and planner (actions them) - Document block height sourcing as open problem - Migrate BlueGreen/HardFork to use unified RolloutStatus Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * design: incorporate Tide team review findings - Add zero-value migration handling for existing CRDs without updateStrategy (treat empty Type as InPlace, log warning) - Restore IncumbentRevision on RolloutStatus (needed by BlueGreen) - Add stalled rollout escalation: RolloutInProgress reason transitions to Stalled after 10min, providing durable alerting signal - Add design principles section: small condition vocabulary, RolloutStatus as real state machine, simultaneous rollout rationale Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

bdchatham mentioned this pull request Apr 13, 2026

design: InPlace update strategy for SeiNodeDeployment #79

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

design: HardFork deployment simplification via bootstrap#67

design: HardFork deployment simplification via bootstrap#67
bdchatham wants to merge 1 commit intomainfrom
design/hardfork-bootstrap-simplification

bdchatham commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bdchatham commented Apr 9, 2026

Summary

Key Change

Plan Reduction

Design Covers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant