Conversation
- Remove Nodes field and RolloutNodeStatus type from RolloutStatus. The field was required by CRD validation but never populated, causing status writes to fail with "nodes: Required value". Rollout lifecycle is fully managed by completePlan/failPlan. - Remove reconcileRolloutStatus (was operating on the removed Nodes field; rollout clearing is handled by completePlan). - Prevent templateHash from advancing when RolloutInProgress is true. Without this, updateStatus races with detectDeploymentNeeded: the hash converges before the plan can be created. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…uard - failPlan now returns ResultRequeueImmediate (matching completePlan) so the controller short-circuits before the final updateStatus call - Add defensive nil check on Rollout before accessing TargetHash in the supersession event message Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two critical fixes discovered during prod deployment of the InPlace strategy to pacific-1:
1. RolloutNodeStatus causes CRD validation failure
The
Nodesfield onRolloutStatuswas required by CRD validation butdetectDeploymentNeededstopped populating it after the simplification pass. Every status write failed withstatus.rollout.nodes: Required value, preventing any InPlace plan from being persisted.Fix: Remove
NodesandRolloutNodeStatusentirely. Rollout lifecycle is fully managed bycompletePlan/failPlan— no separate convergence tracking needed. Also removesreconcileRolloutStatuswhich operated on the deleted field.2. templateHash race between detectDeploymentNeeded and updateStatus
updateStatusadvancestemplateHashto match the current spec whenPlanInProgressis false. But it didn't checkRolloutInProgress. This caused a race:detectDeploymentNeededsetsRolloutInProgress, thenupdateStatusoverwritestemplateHashbeforereconcilePlancan build the plan. On the next reconcile, no divergence is detected.Fix: Guard
templateHashadvancement on bothPlanInProgressandRolloutInProgress.Test plan
reconcileRolloutStatusplan_test.goto removeNodesreferencesmake test)make lint)🤖 Generated with Claude Code