Skip to content

design: InPlace update strategy for SeiNodeDeployment#79

Merged
bdchatham merged 3 commits intomainfrom
design/inplace-update-strategy
Apr 13, 2026
Merged

design: InPlace update strategy for SeiNodeDeployment#79
bdchatham merged 3 commits intomainfrom
design/inplace-update-strategy

Conversation

@bdchatham
Copy link
Copy Markdown
Collaborator

Summary

  • LLD for formalizing InPlace as an explicit UpdateStrategyType alongside BlueGreen and HardFork
  • Sibling design to design: HardFork deployment simplification via bootstrap #67 (HardFork bootstrap simplification), following the same structure and depth
  • Designed with input from kubernetes-specialist and platform-engineer agents

Key Design Decisions

  • No plan/task machinery — InPlace bypasses the deployment planner entirely, using the existing ensureSeiNode in-place propagation path with rollout status tracking layered on top
  • ensureMarkReady in reconcileRunning — solves the sidecar restart problem for ALL strategies (pod restarts cause sidecar to block seid indefinitely). This is the highest-priority standalone fix
  • Optional upgradeHeight gating — when set, the controller validates the chain has reached the upgrade height before patching StatefulSets, preventing premature rollouts
  • Simultaneous rollout — all nodes update at once (chain upgrades are coordinated halts)
  • No auto-rollback — rolling back a blockchain binary after a chain upgrade leaves the node unable to process new blocks; controller reports Degraded and lets the engineer decide
  • RolloutStatus tracking — per-node convergence status distinct from DeploymentStatus (which tracks entrant/incumbent for blue-green)

Design Covers

  • CRD changes (enum, InPlaceStrategy, RolloutStatus types)
  • Controller changes (detectDeploymentNeeded branching, rollout status reconciliation)
  • Sidecar mark-ready resolution (primary: controller re-submit; future: sidecar self-detect)
  • Status reporting (conditions, events, phase transitions)
  • Failure modes and recovery
  • File-by-file changes and implementation order
  • Test plan

📄 Design doc: .tide/designs/inplace-update-strategy.md

🤖 Generated with Claude Code

LLD for formalizing the InPlace update strategy as an explicit
UpdateStrategyType alongside BlueGreen and HardFork. Covers CRD
changes, rollout status tracking, upgrade height gating, the sidecar
mark-ready restart fix, and failure modes.

Key decisions:
- No plan/task machinery — uses existing ensureSeiNode path
- ensureMarkReady in reconcileRunning unblocks all pod restarts
- Optional upgradeHeight gating prevents premature rollouts
- No auto-rollback (unsafe for chain upgrades)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
bdchatham and others added 2 commits April 13, 2026 10:37
- Remove nil/no-strategy path; updateStrategy is now required
- Unify DeploymentStatus and RolloutStatus into single RolloutStatus
- Drop upgradeHeight gating (no reliable block height source)
- Introduce conditions-driven reconciliation pattern with
  RolloutInProgress as the coordination mechanism between
  reconciler (detects diffs) and planner (actions them)
- Document block height sourcing as open problem
- Migrate BlueGreen/HardFork to use unified RolloutStatus

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add zero-value migration handling for existing CRDs without
  updateStrategy (treat empty Type as InPlace, log warning)
- Restore IncumbentRevision on RolloutStatus (needed by BlueGreen)
- Add stalled rollout escalation: RolloutInProgress reason transitions
  to Stalled after 10min, providing durable alerting signal
- Add design principles section: small condition vocabulary,
  RolloutStatus as real state machine, simultaneous rollout rationale

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@bdchatham bdchatham merged commit 58f9437 into main Apr 13, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant