Skip to content

Add full-jitter exponential backoff with max_delay cap to retry logic#389

Merged
abhimehro merged 4 commits intomainfrom
copilot/add-exponential-backoff-jitter
Feb 20, 2026
Merged

Add full-jitter exponential backoff with max_delay cap to retry logic#389
abhimehro merged 4 commits intomainfrom
copilot/add-exponential-backoff-jitter

Conversation

Copy link

Copilot AI commented Feb 19, 2026

Retry storms occur when multiple clients fail simultaneously and retry at synchronized intervals. The previous ±50% jitter still allows clustering, and had no ceiling on delay growth (attempt #8 with delay=1 could wait 256s+).

Changes

  • MAX_RETRY_DELAY = 60.0 — new constant capping exponential growth; without it, 10 retries could accumulate thousands of seconds of wait time

  • retry_with_jitter(attempt, base_delay, max_delay) — standalone helper implementing full jitter (uniform draw from [0, min(base_delay * 2^attempt, max_delay))), which distributes retries evenly across the entire window rather than clustering them near the base value

  • _retry_request — now delegates to retry_with_jitter(); docstring updated to describe the new strategy

# Before: ±50% jitter, unbounded growth
base_wait = delay * (2 ** attempt)
jitter_factor = 0.5 + random.random()   # [0.5, 1.5]
wait_time = base_wait * jitter_factor

# After: full jitter, capped at MAX_RETRY_DELAY
wait_time = retry_with_jitter(attempt, base_delay=delay)
# → min(delay * 2^attempt, 60.0) * random.random()  →  [0, cap)

Tests

  • test_jitter_stays_within_bounds: bounds updated from [0.5x, 1.5x][0, min(base, MAX_RETRY_DELAY))
  • test_exponential_backoff_still_increases: expected delay with random.random()=0.5 updated from base * 1.0base * 0.5
  • test_429_rate_limit_retries_with_jitter: upper bound tightened to < 1.0
  • test_429_without_retry_after_uses_exponential_backoff (rate_limit suite): patch changed to random.random=1.0 so the real-sleep elapsed >= 3.0s assertion remains valid under the new formula
Original prompt

This section details on the original issue you should resolve

<issue_title>[Code Quality] Add exponential backoff with jitter for API retry logic</issue_title>
<issue_description>### Description

The current retry logic lacks jitter in exponential backoff, which can cause retry storms when multiple requests fail simultaneously. Adding jitter improves API rate limit handling and reduces thundering herd problems.

Problem

Current Implementation:

  • Basic exponential backoff exists for retries
  • No randomization (jitter) in retry delays
  • Risk of synchronized retries causing additional API load

Impact:

  • Suboptimal API utilization during failures
  • Higher risk of hitting rate limits during retry bursts
  • Predictable retry patterns may be less effective

Suggested Changes

Implement exponential backoff with full jitter:

import random
import time

def retry_with_jitter(attempt: int, base_delay: float = 1.0, max_delay: float = 60.0) -> float:
    """Calculate retry delay with exponential backoff and full jitter.
    
    Args:
        attempt: Retry attempt number (0-indexed)
        base_delay: Base delay in seconds (default: 1.0)
        max_delay: Maximum delay in seconds (default: 60.0)
    
    Returns:
        Delay in seconds with jitter applied
    """
    exponential_delay = min(base_delay * (2 ** attempt), max_delay)
    jittered_delay = random.uniform(0, exponential_delay)
    return jittered_delay

Usage in retry logic:

for attempt in range(max_retries):
    try:
        response = make_api_request()
        return response
    except RateLimitError:
        if attempt < max_retries - 1:
            delay = retry_with_jitter(attempt, base_delay=2.0, max_delay=60.0)
            time.sleep(delay)

Files Affected

  • main.py - Retry logic in _retry_request() function (around line 950-985)
  • Consider adding to utility module if reused

Testing

Add tests in tests/test_retry_jitter.py:

  • Verify jitter produces values in expected range
  • Confirm exponential growth up to max_delay
  • Test distribution is uniform (not clustered)

Success Criteria

  • Retry delays include randomization
  • Delays respect max_delay ceiling
  • Tests verify jitter distribution
  • Existing retry behavior preserved (no regressions)

Source

Extracted from Daily Perf Improver discussion abhimehro/ctrld-sync#219 which identified this as a medium-priority reliability improvement.

Priority

Medium - Improves reliability but not blocking current functionality

References

  • AWS Architecture Blog: [Exponential Backoff and Jitter]((aws.amazon.com/redacted)
  • Google Cloud: [Retry Strategy Best Practices]((cloud.google.com/redacted)

🔍 Task mining by Discussion Task Miner - Code Quality Improvement Agent

To install this workflow, run gh aw add github/gh-aw/.github/workflows/discussion-task-miner.md@94662b1dee8ce96c876ba9f33b3ab8be32de82a4. View source at https://github.com/github/gh-aw/tree/94662b1dee8ce96c876ba9f33b3ab8be32de82a4/.github/workflows/discussion-task-miner.md.

  • expires on Feb 20, 2026, 1:49 PM UTC

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@trunk-io
Copy link

trunk-io bot commented Feb 19, 2026

Merging to main in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

Copilot AI and others added 2 commits February 19, 2026 23:58
Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
Copilot AI changed the title [WIP] Add exponential backoff with jitter for API retry logic Add full-jitter exponential backoff with max_delay cap to retry logic Feb 20, 2026
Copilot AI requested a review from abhimehro February 20, 2026 00:03
@abhimehro abhimehro marked this pull request as ready for review February 20, 2026 00:12
Copilot AI review requested due to automatic review settings February 20, 2026 00:12
@github-actions
Copy link

👋 Development Partner is reviewing this PR. Will provide feedback shortly.

Delay in seconds with full jitter applied
"""
exponential_delay = min(base_delay * (2 ** attempt), max_delay)
return exponential_delay * random.random()

Check notice

Code scanning / Bandit

Standard pseudo-random generators are not suitable for security/cryptographic purposes. Note

Standard pseudo-random generators are not suitable for security/cryptographic purposes.
# First retry: base=1, range=[0.5, 1.5]
assert 0.5 <= wait_times[0] <= 1.5
# First retry: full jitter, base=1, range=[0, 1.0) since random.random() < 1.0
assert 0.0 <= wait_times[0] < 1.0

Check notice

Code scanning / Bandit

Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Note test

Use of assert detected. The enclosed code will be removed when compiling to optimised byte code.
Retry request with exponential backoff.
Retry request with exponential backoff and full jitter.

RETRY STRATEGY:

Check notice

Code scanning / Bandit (reported by Codacy)

Standard pseudo-random generators are not suitable for security/cryptographic purposes. Note

Standard pseudo-random generators are not suitable for security/cryptographic purposes.
# First retry: base=1, range=[0.5, 1.5]
assert 0.5 <= wait_times[0] <= 1.5
# First retry: full jitter, base=1, range=[0, 1.0) since random.random() < 1.0
assert 0.0 <= wait_times[0] < 1.0

Check notice

Code scanning / Bandit (reported by Codacy)

Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Note test

Use of assert detected. The enclosed code will be removed when compiling to optimised byte code.

Args:
attempt: Retry attempt number (0-indexed)
base_delay: Base delay in seconds (default: 1.0)

Check warning

Code scanning / Pylintpython3 (reported by Codacy)

Line too long (106/100) Warning

Line too long (106/100)
@@ -1219,13 +1242,9 @@ def _retry_request(request_func, max_retries=MAX_RETRIES, delay=RETRY_DELAY):
log.debug(f"Response content: {sanitize_for_log(e.response.text)}")
raise

Check warning

Code scanning / Pylintpython3 (reported by Codacy)

Line too long (107/100) Warning

Line too long (107/100)
main.py Outdated

Check warning

Code scanning / Pylintpython3 (reported by Codacy)

Line too long (108/100) Warning

Line too long (108/100)
# Use a fixed random.random() so jitter multiplier is stable across attempts.
# Assuming jitter is implemented as: base_delay * (0.5 + random.random()),
# a fixed return_value of 0.5 yields a jitter_factor of 1.0.
# Full jitter is implemented as: min(base_delay * 2**attempt, MAX_RETRY_DELAY) * random.random()

Check warning

Code scanning / Pylintpython3 (reported by Codacy)

Line too long (104/100) Warning test

Line too long (104/100)

Args:
attempt: Retry attempt number (0-indexed)
base_delay: Base delay in seconds (default: 1.0)

Check warning

Code scanning / Pylint (reported by Codacy)

Line too long (106/100) Warning

Line too long (106/100)
@@ -1219,13 +1242,9 @@ def _retry_request(request_func, max_retries=MAX_RETRIES, delay=RETRY_DELAY):
log.debug(f"Response content: {sanitize_for_log(e.response.text)}")
raise

Check warning

Code scanning / Pylint (reported by Codacy)

Line too long (107/100) Warning

Line too long (107/100)
main.py Outdated

Check warning

Code scanning / Pylint (reported by Codacy)

Line too long (108/100) Warning

Line too long (108/100)
# Use a fixed random.random() so jitter multiplier is stable across attempts.
# Assuming jitter is implemented as: base_delay * (0.5 + random.random()),
# a fixed return_value of 0.5 yields a jitter_factor of 1.0.
# Full jitter is implemented as: min(base_delay * 2**attempt, MAX_RETRY_DELAY) * random.random()

Check warning

Code scanning / Pylint (reported by Codacy)

Line too long (104/100) Warning test

Line too long (104/100)
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements full-jitter exponential backoff with a maximum delay cap to improve retry reliability and prevent retry storms. The change replaces the previous ±50% jitter approach (which allowed clustering around base values and had unbounded growth) with AWS-recommended full jitter that uniformly distributes retry delays across the entire backoff window.

Changes:

  • Introduced MAX_RETRY_DELAY = 60.0 constant to cap exponential growth and prevent extremely long delays
  • Added retry_with_jitter() helper function implementing full jitter: draws delay uniformly from [0, min(base_delay * 2^attempt, max_delay))
  • Updated _retry_request() to use the new jitter strategy, replacing inline backoff calculation

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.

File Description
main.py Added MAX_RETRY_DELAY constant, retry_with_jitter() function, and updated _retry_request() to use full jitter
tests/test_retry_jitter.py Updated test expectations from ±50% jitter bounds to full jitter bounds [0, cap); fixed random.random() mock values
tests/test_rate_limit.py Changed random.random() patch from 0.5 to 1.0 to maintain correct elapsed time assertion under new formula
uv.lock Removed empty metadata.requires-dev section (automatic cleanup)

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@github-actions
Copy link

👋 Development Partner is reviewing this PR. Will provide feedback shortly.

@abhimehro abhimehro merged commit 582a857 into main Feb 20, 2026
13 checks passed
@abhimehro abhimehro deleted the copilot/add-exponential-backoff-jitter branch February 20, 2026 00:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Code Quality] Add exponential backoff with jitter for API retry logic

3 participants