Add full-jitter exponential backoff with max_delay cap to retry logic by Copilot · Pull Request #389 · abhimehro/ctrld-sync

Copilot · 2026-02-19T23:51:33Z

Retry storms occur when multiple clients fail simultaneously and retry at synchronized intervals. The previous ±50% jitter still allows clustering, and had no ceiling on delay growth (attempt #8 with delay=1 could wait 256s+).

Changes

MAX_RETRY_DELAY = 60.0 — new constant capping exponential growth; without it, 10 retries could accumulate thousands of seconds of wait time
retry_with_jitter(attempt, base_delay, max_delay) — standalone helper implementing full jitter (uniform draw from [0, min(base_delay * 2^attempt, max_delay))), which distributes retries evenly across the entire window rather than clustering them near the base value
_retry_request — now delegates to retry_with_jitter(); docstring updated to describe the new strategy

# Before: ±50% jitter, unbounded growth
base_wait = delay * (2 ** attempt)
jitter_factor = 0.5 + random.random()   # [0.5, 1.5]
wait_time = base_wait * jitter_factor

# After: full jitter, capped at MAX_RETRY_DELAY
wait_time = retry_with_jitter(attempt, base_delay=delay)
# → min(delay * 2^attempt, 60.0) * random.random()  →  [0, cap)

Tests

test_jitter_stays_within_bounds: bounds updated from [0.5x, 1.5x] → [0, min(base, MAX_RETRY_DELAY))
test_exponential_backoff_still_increases: expected delay with random.random()=0.5 updated from base * 1.0 → base * 0.5
test_429_rate_limit_retries_with_jitter: upper bound tightened to < 1.0
test_429_without_retry_after_uses_exponential_backoff (rate_limit suite): patch changed to random.random=1.0 so the real-sleep elapsed >= 3.0s assertion remains valid under the new formula

Original prompt

This section details on the original issue you should resolve

<issue_title>[Code Quality] Add exponential backoff with jitter for API retry logic</issue_title>
<issue_description>### Description

The current retry logic lacks jitter in exponential backoff, which can cause retry storms when multiple requests fail simultaneously. Adding jitter improves API rate limit handling and reduces thundering herd problems.

Problem

Current Implementation:

Basic exponential backoff exists for retries

No randomization (jitter) in retry delays

Risk of synchronized retries causing additional API load

Impact:

Suboptimal API utilization during failures

Higher risk of hitting rate limits during retry bursts

Predictable retry patterns may be less effective

Suggested Changes

Implement exponential backoff with full jitter:
import random
import time

def retry_with_jitter(attempt: int, base_delay: float = 1.0, max_delay: float = 60.0) -> float:
    """Calculate retry delay with exponential backoff and full jitter.
    
    Args:
        attempt: Retry attempt number (0-indexed)
        base_delay: Base delay in seconds (default: 1.0)
        max_delay: Maximum delay in seconds (default: 60.0)
    
    Returns:
        Delay in seconds with jitter applied
    """
    exponential_delay = min(base_delay * (2 ** attempt), max_delay)
    jittered_delay = random.uniform(0, exponential_delay)
    return jittered_delay
Usage in retry logic:
for attempt in range(max_retries):
    try:
        response = make_api_request()
        return response
    except RateLimitError:
        if attempt < max_retries - 1:
            delay = retry_with_jitter(attempt, base_delay=2.0, max_delay=60.0)
            time.sleep(delay)
Files Affected

main.py - Retry logic in _retry_request() function (around line 950-985)

Consider adding to utility module if reused

Testing

Add tests in tests/test_retry_jitter.py:

Verify jitter produces values in expected range

Confirm exponential growth up to max_delay

Test distribution is uniform (not clustered)

Success Criteria

Retry delays include randomization

Delays respect max_delay ceiling

Tests verify jitter distribution

Existing retry behavior preserved (no regressions)

Source

Extracted from Daily Perf Improver discussion abhimehro/ctrld-sync#219 which identified this as a medium-priority reliability improvement.

Priority

Medium - Improves reliability but not blocking current functionality

References

AWS Architecture Blog: [Exponential Backoff and Jitter]((aws.amazon.com/redacted)

Google Cloud: [Retry Strategy Best Practices]((cloud.google.com/redacted)

🔍 Task mining by Discussion Task Miner - Code Quality Improvement Agent

To install this workflow, run gh aw add github/gh-aw/.github/workflows/discussion-task-miner.md@94662b1dee8ce96c876ba9f33b3ab8be32de82a4. View source at https://github.com/github/gh-aw/tree/94662b1dee8ce96c876ba9f33b3ab8be32de82a4/.github/workflows/discussion-task-miner.md.

expires on Feb 20, 2026, 1:49 PM UTC

Comments on the Issue (you are @copilot in this section)

Fixes [Code Quality] Add exponential backoff with jitter for API retry logic #379

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

trunk-io · 2026-02-19T23:51:39Z

Merging to main in this repository is managed by Trunk.

To merge this pull request, check the box to the left or comment /trunk merge below.

Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>

github-actions · 2026-02-20T00:12:24Z

👋 Development Partner is reviewing this PR. Will provide feedback shortly.

main.py

+        Delay in seconds with full jitter applied
+    """
+    exponential_delay = min(base_delay * (2 ** attempt), max_delay)
+    return exponential_delay * random.random()


tests/test_retry_jitter.py

-            # First retry: base=1, range=[0.5, 1.5]
-            assert 0.5 <= wait_times[0] <= 1.5
+            # First retry: full jitter, base=1, range=[0, 1.0) since random.random() < 1.0
+            assert 0.0 <= wait_times[0] < 1.0


main.py

-    Retry request with exponential backoff.
+    Retry request with exponential backoff and full jitter.
+
+    RETRY STRATEGY:


tests/test_retry_jitter.py

-            # First retry: base=1, range=[0.5, 1.5]
-            assert 0.5 <= wait_times[0] <= 1.5
+            # First retry: full jitter, base=1, range=[0, 1.0) since random.random() < 1.0
+            assert 0.0 <= wait_times[0] < 1.0


main.py

+
+    Args:
+        attempt: Retry attempt number (0-indexed)
+        base_delay: Base delay in seconds (default: 1.0)


main.py

@@ -1219,13 +1242,9 @@ def _retry_request(request_func, max_retries=MAX_RETRIES, delay=RETRY_DELAY):
                    log.debug(f"Response content: {sanitize_for_log(e.response.text)}")
                raise



main.py

tests/test_retry_jitter.py

-        # Use a fixed random.random() so jitter multiplier is stable across attempts.
-        # Assuming jitter is implemented as: base_delay * (0.5 + random.random()),
-        # a fixed return_value of 0.5 yields a jitter_factor of 1.0.
+        # Full jitter is implemented as: min(base_delay * 2**attempt, MAX_RETRY_DELAY) * random.random()


main.py

+
+    Args:
+        attempt: Retry attempt number (0-indexed)
+        base_delay: Base delay in seconds (default: 1.0)


main.py

@@ -1219,13 +1242,9 @@ def _retry_request(request_func, max_retries=MAX_RETRIES, delay=RETRY_DELAY):
                    log.debug(f"Response content: {sanitize_for_log(e.response.text)}")
                raise



main.py

tests/test_retry_jitter.py

-        # Use a fixed random.random() so jitter multiplier is stable across attempts.
-        # Assuming jitter is implemented as: base_delay * (0.5 + random.random()),
-        # a fixed return_value of 0.5 yields a jitter_factor of 1.0.
+        # Full jitter is implemented as: min(base_delay * 2**attempt, MAX_RETRY_DELAY) * random.random()


Copilot

Pull request overview

This PR implements full-jitter exponential backoff with a maximum delay cap to improve retry reliability and prevent retry storms. The change replaces the previous ±50% jitter approach (which allowed clustering around base values and had unbounded growth) with AWS-recommended full jitter that uniformly distributes retry delays across the entire backoff window.

Changes:

Introduced MAX_RETRY_DELAY = 60.0 constant to cap exponential growth and prevent extremely long delays
Added retry_with_jitter() helper function implementing full jitter: draws delay uniformly from [0, min(base_delay * 2^attempt, max_delay))
Updated _retry_request() to use the new jitter strategy, replacing inline backoff calculation

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.

File	Description
main.py	Added MAX_RETRY_DELAY constant, retry_with_jitter() function, and updated _retry_request() to use full jitter
tests/test_retry_jitter.py	Updated test expectations from ±50% jitter bounds to full jitter bounds [0, cap); fixed random.random() mock values
tests/test_rate_limit.py	Changed random.random() patch from 0.5 to 1.0 to maintain correct elapsed time assertion under new formula
uv.lock	Removed empty metadata.requires-dev section (automatic cleanup)

main.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

github-actions · 2026-02-20T00:19:31Z

👋 Development Partner is reviewing this PR. Will provide feedback shortly.

Initial plan

cb230f1

Copilot AI assigned Copilot and abhimehro Feb 19, 2026

Copilot started work on behalf of abhimehro February 19, 2026 23:51 View session

Copilot AI and others added 2 commits February 19, 2026 23:58

Initial plan

0271ff1

Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>

Add retry_with_jitter helper with full jitter and max_delay cap

8fa87b3

Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add exponential backoff with jitter for API retry logic~~ Add full-jitter exponential backoff with max_delay cap to retry logic Feb 20, 2026

Copilot AI requested a review from abhimehro February 20, 2026 00:03

Copilot finished work on behalf of abhimehro February 20, 2026 00:03

abhimehro marked this pull request as ready for review February 20, 2026 00:12

Copilot AI review requested due to automatic review settings February 20, 2026 00:12

github-actions bot added the python label Feb 20, 2026

github-advanced-security bot found potential problems Feb 20, 2026

View reviewed changes

Copilot started reviewing on behalf of abhimehro February 20, 2026 00:12 View session

github-advanced-security bot found potential problems Feb 20, 2026

View reviewed changes

Copilot AI reviewed Feb 20, 2026

View reviewed changes

main.py Outdated Show resolved Hide resolved

Update main.py

98236e1

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

abhimehro approved these changes Feb 20, 2026

View reviewed changes

abhimehro merged commit 582a857 into main Feb 20, 2026
13 checks passed

abhimehro deleted the copilot/add-exponential-backoff-jitter branch February 20, 2026 00:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add full-jitter exponential backoff with max_delay cap to retry logic#389

Add full-jitter exponential backoff with max_delay cap to retry logic#389
abhimehro merged 4 commits intomainfrom
copilot/add-exponential-backoff-jitter

Copilot AI commented Feb 19, 2026 •

edited

Loading

Uh oh!

trunk-io bot commented Feb 19, 2026

Uh oh!

github-actions bot commented Feb 20, 2026

Uh oh!

Check notice

Check notice

Check notice

Check notice

Check warning

Check warning

Check warning

Check warning

Check warning

Check warning

Check warning

Check warning

Copilot AI left a comment

Uh oh!

Uh oh!

github-actions bot commented Feb 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -1219,13 +1242,9 @@ def _retry_request(request_func, max_retries=MAX_RETRIES, delay=RETRY_DELAY):
		log.debug(f"Response content: {sanitize_for_log(e.response.text)}")
		raise

Conversation

Copilot AI commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Tests

Problem

Suggested Changes

Files Affected

Testing

Success Criteria

Source

Priority

References

Comments on the Issue (you are @copilot in this section)

Uh oh!

trunk-io bot commented Feb 19, 2026

Uh oh!

github-actions bot commented Feb 20, 2026

Uh oh!

Check notice

Check notice

Check notice

Check notice

Check warning

Check warning

Check warning

Check warning

Check warning

Check warning

Check warning

Check warning

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

github-actions bot commented Feb 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Feb 19, 2026 •

edited

Loading