Skip to content

fix: retry policy counts consecutively instead of cumulatively#819

Merged
adityachoudhari26 merged 2 commits intomainfrom
retry-version-select-bug
Feb 26, 2026
Merged

fix: retry policy counts consecutively instead of cumulatively#819
adityachoudhari26 merged 2 commits intomainfrom
retry-version-select-bug

Conversation

@adityachoudhari26
Copy link
Member

@adityachoudhari26 adityachoudhari26 commented Feb 26, 2026

Summary by CodeRabbit

  • Bug Fixes

    • Improved retry evaluation to correctly count consecutive retryable jobs when deployment versions change, ensuring redeployments after version flips work as intended.
  • Tests

    • Added comprehensive tests for version selector functionality and version-flip retry scenarios.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 26, 2026

📝 Walkthrough

Walkthrough

The changes modify retry evaluation to sort jobs by creation time (latest first) and count only consecutive retryable attempts for the current release target, breaking the streak when different releases appear. Tests validate version-flip behavior where flipping between releases resets consecutive retry counts.

Changes

Cohort / File(s) Summary
Retry Evaluator Logic
apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/retry/retry.go
Added getJobsForReleaseTargetSortedLatestFirst() method to sort jobs by CreatedAt descending. Updated Evaluate flow to use sorted job list and added early-break logic when jobs don't belong to the exact release or aren't retryable. Clarified comments around consecutive retry attempt counting.
Retry Evaluator Tests
apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/retry/retry_test.go
Renamed test and added four new test functions covering version-flip scenarios: VersionFlip_AllowsRedeployAfterDifferentRelease, VersionFlip_CountsOnlyLatestConsecutiveJobs, VersionFlip_DeniesWhenConsecutiveExceedsLimit, and VersionFlip_MultipleFlips. Tests validate that consecutive counts reset when releases change.
Smart Defaults Tests
apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/retry/smart_defaults_test.go
Updated test expectations to reflect "consecutive retryable jobs" counting semantics. Reclassified test job statuses to distinguish retryable vs. non-retryable jobs and adjusted assertion messages accordingly.
Integration & E2E Support
apps/workspace-engine/test/integration/opts.go, apps/workspace-engine/test/e2e/engine_version_selector_test.go
Added WithRuleVersionSelector() helper function for policy rule configuration with CEL expressions. Added end-to-end test TestEngine_VersionSelector_FlipppingBetweenVersions validating version selector flipping between deployment versions with job routing.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • chore: grad rollouts rule go #706: Modifies release-manager policy evaluators to use target-scoped Evaluate signatures and logic that fetches/sorts jobs for a release target, directly complementing this change.

Poem

🐰 A sorted hop through job arrays,
Consecutive counts in fleeting ways,
When versions flip, the slate runs clean,
Latest first ensures we've seen,
The truest tale of retries' tale!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 45.45% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: fixing retry policy to count consecutively instead of cumulatively, which is the core objective reflected in the test renames, sorting logic, and evaluation flow updates.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch retry-version-select-bug

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
apps/workspace-engine/test/e2e/engine_version_selector_test.go (1)

16-16: Rename test to fix typo (FlipppingFlipping).

Minor naming cleanup for readability and discoverability in test output.

✏️ Suggested rename
-func TestEngine_VersionSelector_FlipppingBetweenVersions(t *testing.T) {
+func TestEngine_VersionSelector_FlippingBetweenVersions(t *testing.T) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/workspace-engine/test/e2e/engine_version_selector_test.go` at line 16,
Rename the test function TestEngine_VersionSelector_FlipppingBetweenVersions to
correct the typo (e.g., TestEngine_VersionSelector_FlippingBetweenVersions) so
the function name and any references to it use the new spelling; update the
function declaration and any references/imports or test runners that refer to
TestEngine_VersionSelector_FlipppingBetweenVersions to the corrected name.
apps/workspace-engine/test/integration/opts.go (1)

886-895: Propagate CEL parsing failures in WithRuleVersionSelector.

FromCelSelector errors are currently ignored, which can silently build an invalid selector in tests.

♻️ Suggested refactor
 func WithRuleVersionSelector(cel string) PolicyRuleOption {
-	s := &oapi.Selector{}
-	_ = s.FromCelSelector(oapi.CelSelector{Cel: cel})
 	return func(_ *TestWorkspace, r *oapi.PolicyRule) error {
+		s := &oapi.Selector{}
+		if err := s.FromCelSelector(oapi.CelSelector{Cel: cel}); err != nil {
+			return err
+		}
 		r.VersionSelector = &oapi.VersionSelectorRule{
 			Selector: *s,
 		}
 		return nil
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/workspace-engine/test/integration/opts.go` around lines 886 - 895,
WithRuleVersionSelector currently ignores the error returned by
oapi.Selector.FromCelSelector; change it to capture that error and propagate it
via the PolicyRuleOption closure. Specifically, call
s.FromCelSelector(oapi.CelSelector{Cel: cel}) and store the returned error;
return a closure (the PolicyRuleOption) that, if that error is non-nil, returns
the error, otherwise sets r.VersionSelector =
&oapi.VersionSelectorRule{Selector: *s} and returns nil. This uses the existing
symbols WithRuleVersionSelector, oapi.Selector, FromCelSelector,
PolicyRuleOption, TestWorkspace, and oapi.PolicyRule to locate the fix.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/retry/retry.go`:
- Around line 79-87: getJobsForReleaseTargetSortedLatestFirst builds a slice
from a map and sorts only by CreatedAt, which leaves order undefined for equal
timestamps; modify the sort.Slice comparator in
getJobsForReleaseTargetSortedLatestFirst to add a deterministic tie-breaker when
jobs[i].CreatedAt.Equal(jobs[j].CreatedAt) — e.g., compare a unique field such
as jobs[i].Id (or jobs[i].Name if Id is unavailable) to return jobs[i].Id >
jobs[j].Id (or lexicographic compare) so equal-timestamp jobs get a stable,
deterministic ordering.

---

Nitpick comments:
In `@apps/workspace-engine/test/e2e/engine_version_selector_test.go`:
- Line 16: Rename the test function
TestEngine_VersionSelector_FlipppingBetweenVersions to correct the typo (e.g.,
TestEngine_VersionSelector_FlippingBetweenVersions) so the function name and any
references to it use the new spelling; update the function declaration and any
references/imports or test runners that refer to
TestEngine_VersionSelector_FlipppingBetweenVersions to the corrected name.

In `@apps/workspace-engine/test/integration/opts.go`:
- Around line 886-895: WithRuleVersionSelector currently ignores the error
returned by oapi.Selector.FromCelSelector; change it to capture that error and
propagate it via the PolicyRuleOption closure. Specifically, call
s.FromCelSelector(oapi.CelSelector{Cel: cel}) and store the returned error;
return a closure (the PolicyRuleOption) that, if that error is non-nil, returns
the error, otherwise sets r.VersionSelector =
&oapi.VersionSelectorRule{Selector: *s} and returns nil. This uses the existing
symbols WithRuleVersionSelector, oapi.Selector, FromCelSelector,
PolicyRuleOption, TestWorkspace, and oapi.PolicyRule to locate the fix.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a281caf and 326dbe1.

📒 Files selected for processing (5)
  • apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/retry/retry.go
  • apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/retry/retry_test.go
  • apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/retry/smart_defaults_test.go
  • apps/workspace-engine/test/e2e/engine_version_selector_test.go
  • apps/workspace-engine/test/integration/opts.go

Comment on lines +79 to +87
func (e *RetryEvaluator) getJobsForReleaseTargetSortedLatestFirst(releaseTarget *oapi.ReleaseTarget) []*oapi.Job {
jobsMap := e.getters.GetJobsForReleaseTarget(releaseTarget)
jobs := make([]*oapi.Job, 0, len(jobsMap))
for _, job := range jobsMap {
jobs = append(jobs, job)
}
sort.Slice(jobs, func(i, j int) bool {
return jobs[i].CreatedAt.After(jobs[j].CreatedAt)
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# First, locate and examine the file
file="apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/retry/retry.go"

if [ ! -f "$file" ]; then
  echo "File not found at: $file"
  exit 1
fi

echo "=== File content around lines 79-90 ==="
cat -n "$file" | sed -n '75,92p'

echo
echo "=== Check for imports at top of file ==="
head -30 "$file" | cat -n

echo
echo "=== Look for the full method and its context ==="
cat -n "$file" | sed -n '79,100p'

Repository: ctrlplanedev/ctrlplane

Length of output: 3033


🏁 Script executed:

#!/bin/bash

# Find the Job struct definition to verify it has an Id field
echo "=== Searching for Job struct definition ==="
find . -name "*.go" -type f -exec grep -l "type Job struct" {} \; | head -5

# Look for oapi package
echo
echo "=== Finding oapi package ==="
find . -path "*/oapi/*.go" -type f | head -10 | xargs ls -la 2>/dev/null | head -20

Repository: ctrlplanedev/ctrlplane

Length of output: 1213


🏁 Script executed:

#!/bin/bash

# Search for Job struct definition with Id field
echo "=== Searching for Job struct with fields ==="
rg -A 20 "type Job struct" --type go | head -50

echo
echo "=== Check CreatedAt field type ==="
rg "CreatedAt\s+(?:time\.)?Time" --type go | head -10

Repository: ctrlplanedev/ctrlplane

Length of output: 4976


🏁 Script executed:

#!/bin/bash

# Check how getJobsForReleaseTargetSortedLatestFirst is actually used
file="apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/retry/retry.go"

echo "=== Usage of getJobsForReleaseTargetSortedLatestFirst ==="
rg "getJobsForReleaseTargetSortedLatestFirst" "$file" -B 2 -A 5

echo
echo "=== Full context of Evaluate method to understand retry counting ==="
cat -n "$file" | sed -n '40,130p'

Repository: ctrlplanedev/ctrlplane

Length of output: 4081


Add tie-breaking to deterministic sort order in newest-first sorting.

The slice is built from a map and sorted only by CreatedAt. When jobs share the same timestamp, sort.Slice produces undefined relative order, making consecutive retry counting nondeterministic if equal-time jobs for the same release have different retryable statuses.

Suggested fix
	sort.Slice(jobs, func(i, j int) bool {
+		if jobs[i].CreatedAt.Equal(jobs[j].CreatedAt) {
+			return jobs[i].Id > jobs[j].Id
+		}
		return jobs[i].CreatedAt.After(jobs[j].CreatedAt)
	})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/retry/retry.go`
around lines 79 - 87, getJobsForReleaseTargetSortedLatestFirst builds a slice
from a map and sorts only by CreatedAt, which leaves order undefined for equal
timestamps; modify the sort.Slice comparator in
getJobsForReleaseTargetSortedLatestFirst to add a deterministic tie-breaker when
jobs[i].CreatedAt.Equal(jobs[j].CreatedAt) — e.g., compare a unique field such
as jobs[i].Id (or jobs[i].Name if Id is unavailable) to return jobs[i].Id >
jobs[j].Id (or lexicographic compare) so equal-timestamp jobs get a stable,
deterministic ordering.

@adityachoudhari26 adityachoudhari26 merged commit b8a4b48 into main Feb 26, 2026
8 of 9 checks passed
@adityachoudhari26 adityachoudhari26 deleted the retry-version-select-bug branch February 26, 2026 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant