Add full-jitter exponential backoff with max_delay cap to retry logic#389
Add full-jitter exponential backoff with max_delay cap to retry logic#389
Conversation
|
Merging to
|
Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
|
👋 Development Partner is reviewing this PR. Will provide feedback shortly. |
| Delay in seconds with full jitter applied | ||
| """ | ||
| exponential_delay = min(base_delay * (2 ** attempt), max_delay) | ||
| return exponential_delay * random.random() |
Check notice
Code scanning / Bandit
Standard pseudo-random generators are not suitable for security/cryptographic purposes. Note
| # First retry: base=1, range=[0.5, 1.5] | ||
| assert 0.5 <= wait_times[0] <= 1.5 | ||
| # First retry: full jitter, base=1, range=[0, 1.0) since random.random() < 1.0 | ||
| assert 0.0 <= wait_times[0] < 1.0 |
Check notice
Code scanning / Bandit
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Note test
| Retry request with exponential backoff. | ||
| Retry request with exponential backoff and full jitter. | ||
|
|
||
| RETRY STRATEGY: |
Check notice
Code scanning / Bandit (reported by Codacy)
Standard pseudo-random generators are not suitable for security/cryptographic purposes. Note
| # First retry: base=1, range=[0.5, 1.5] | ||
| assert 0.5 <= wait_times[0] <= 1.5 | ||
| # First retry: full jitter, base=1, range=[0, 1.0) since random.random() < 1.0 | ||
| assert 0.0 <= wait_times[0] < 1.0 |
Check notice
Code scanning / Bandit (reported by Codacy)
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Note test
|
|
||
| Args: | ||
| attempt: Retry attempt number (0-indexed) | ||
| base_delay: Base delay in seconds (default: 1.0) |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Line too long (106/100) Warning
| @@ -1219,13 +1242,9 @@ def _retry_request(request_func, max_retries=MAX_RETRIES, delay=RETRY_DELAY): | |||
| log.debug(f"Response content: {sanitize_for_log(e.response.text)}") | |||
| raise | |||
|
|
|||
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Line too long (107/100) Warning
main.py
Outdated
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Line too long (108/100) Warning
| # Use a fixed random.random() so jitter multiplier is stable across attempts. | ||
| # Assuming jitter is implemented as: base_delay * (0.5 + random.random()), | ||
| # a fixed return_value of 0.5 yields a jitter_factor of 1.0. | ||
| # Full jitter is implemented as: min(base_delay * 2**attempt, MAX_RETRY_DELAY) * random.random() |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Line too long (104/100) Warning test
|
|
||
| Args: | ||
| attempt: Retry attempt number (0-indexed) | ||
| base_delay: Base delay in seconds (default: 1.0) |
Check warning
Code scanning / Pylint (reported by Codacy)
Line too long (106/100) Warning
| @@ -1219,13 +1242,9 @@ def _retry_request(request_func, max_retries=MAX_RETRIES, delay=RETRY_DELAY): | |||
| log.debug(f"Response content: {sanitize_for_log(e.response.text)}") | |||
| raise | |||
|
|
|||
Check warning
Code scanning / Pylint (reported by Codacy)
Line too long (107/100) Warning
main.py
Outdated
Check warning
Code scanning / Pylint (reported by Codacy)
Line too long (108/100) Warning
| # Use a fixed random.random() so jitter multiplier is stable across attempts. | ||
| # Assuming jitter is implemented as: base_delay * (0.5 + random.random()), | ||
| # a fixed return_value of 0.5 yields a jitter_factor of 1.0. | ||
| # Full jitter is implemented as: min(base_delay * 2**attempt, MAX_RETRY_DELAY) * random.random() |
Check warning
Code scanning / Pylint (reported by Codacy)
Line too long (104/100) Warning test
There was a problem hiding this comment.
Pull request overview
This PR implements full-jitter exponential backoff with a maximum delay cap to improve retry reliability and prevent retry storms. The change replaces the previous ±50% jitter approach (which allowed clustering around base values and had unbounded growth) with AWS-recommended full jitter that uniformly distributes retry delays across the entire backoff window.
Changes:
- Introduced
MAX_RETRY_DELAY = 60.0constant to cap exponential growth and prevent extremely long delays - Added
retry_with_jitter()helper function implementing full jitter: draws delay uniformly from[0, min(base_delay * 2^attempt, max_delay)) - Updated
_retry_request()to use the new jitter strategy, replacing inline backoff calculation
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| main.py | Added MAX_RETRY_DELAY constant, retry_with_jitter() function, and updated _retry_request() to use full jitter |
| tests/test_retry_jitter.py | Updated test expectations from ±50% jitter bounds to full jitter bounds [0, cap); fixed random.random() mock values |
| tests/test_rate_limit.py | Changed random.random() patch from 0.5 to 1.0 to maintain correct elapsed time assertion under new formula |
| uv.lock | Removed empty metadata.requires-dev section (automatic cleanup) |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
👋 Development Partner is reviewing this PR. Will provide feedback shortly. |
Retry storms occur when multiple clients fail simultaneously and retry at synchronized intervals. The previous ±50% jitter still allows clustering, and had no ceiling on delay growth (attempt #8 with
delay=1could wait 256s+).Changes
MAX_RETRY_DELAY = 60.0— new constant capping exponential growth; without it, 10 retries could accumulate thousands of seconds of wait timeretry_with_jitter(attempt, base_delay, max_delay)— standalone helper implementing full jitter (uniform draw from[0, min(base_delay * 2^attempt, max_delay))), which distributes retries evenly across the entire window rather than clustering them near the base value_retry_request— now delegates toretry_with_jitter(); docstring updated to describe the new strategyTests
test_jitter_stays_within_bounds: bounds updated from[0.5x, 1.5x]→[0, min(base, MAX_RETRY_DELAY))test_exponential_backoff_still_increases: expected delay withrandom.random()=0.5updated frombase * 1.0→base * 0.5test_429_rate_limit_retries_with_jitter: upper bound tightened to< 1.0test_429_without_retry_after_uses_exponential_backoff(rate_limit suite): patch changed torandom.random=1.0so the real-sleepelapsed >= 3.0sassertion remains valid under the new formulaOriginal prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.