Skip to content

BUILD-10777 Add retry with exponential backoff for OIDC and Cognito auth#55

Merged
mikolaj-matuszny-ext-sonarsource merged 12 commits intomasterfrom
feat/mmatuszny/BUILD-10777-add-retry-logic-to-auth
Mar 30, 2026
Merged

BUILD-10777 Add retry with exponential backoff for OIDC and Cognito auth#55
mikolaj-matuszny-ext-sonarsource merged 12 commits intomasterfrom
feat/mmatuszny/BUILD-10777-add-retry-logic-to-auth

Conversation

@mikolaj-matuszny-ext-sonarsource
Copy link
Copy Markdown
Contributor

@mikolaj-matuszny-ext-sonarsource mikolaj-matuszny-ext-sonarsource commented Mar 30, 2026

Summary

  • Add generic retryWithBackoff() utility with exponential backoff and jitter
  • Wrap all three external calls in getCognitoCredentials with retry: GitHub OIDC token, Cognito GetId, Cognito GetCredentialsForIdentity
  • Each call retries independently — a Cognito failure doesn't re-fetch the OIDC token

Fixes transient Rate exceeded failures observed in CI (BUILD-10777).

Backoff behavior

Defaults: 3 attempts, 5000ms base delay, exponential with jitter (50-100% of base).

Jitter prevents thundering herd when many concurrent runners retry simultaneously after a transient Cognito rate limit.

Attempt Base delay With jitter (range)
1st retry 5000ms 2.5s – 5s
2nd retry 10000ms 5s – 10s

Example: a workflow with 20 matrix jobs hits Cognito simultaneously. Without jitter, all 20 retry at exactly 5s, hitting the rate limit again. With jitter, retries spread across 2.5-5s, reducing contention.

Example log output on transient Cognito rate limit:

Warning: Cognito GetId failed (attempt 1/3): Rate exceeded. Retrying in 3842ms...
Info: Exchanging OIDC token for Cognito identity...
Info: Obtaining AWS credentials from Cognito...
Info: AWS credentials configured successfully

Defaults are configurable via AuthConfig.retryOptions (maxAttempts, baseDelayMs).

Changes

File Description
src/retry.ts New generic retry utility with exponential backoff + jitter
src/auth.ts Wrap OIDC and Cognito calls with retry
__tests__/retry.test.ts 4 tests for retry utility
__tests__/auth.test.ts 4 tests for transient failure recovery
__tests__/credential-setup.test.ts Mock retry delays for fast tests

Test plan

  • 8 new tests covering retry success, retry exhaustion, and each call point
  • All 23 tests pass in ~560ms
  • Verify in CI that transient OIDC/Cognito failures are retried (check logs for retry warnings)

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wrap the three external calls (GitHub OIDC token, Cognito GetId,
Cognito GetCredentials) with retryWithBackoff for transient failure
resilience. Add retryOptions to AuthConfig so tests can override delays.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…dential-setup tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mikolaj-matuszny-ext-sonarsource mikolaj-matuszny-ext-sonarsource force-pushed the feat/mmatuszny/BUILD-10777-add-retry-logic-to-auth branch from 7393e09 to eb6d2d4 Compare March 30, 2026 08:17
@mikolaj-matuszny-ext-sonarsource mikolaj-matuszny-ext-sonarsource marked this pull request as draft March 30, 2026 08:17
@sonar-review-alpha
Copy link
Copy Markdown

sonar-review-alpha bot commented Mar 30, 2026

Summary

Adds exponential backoff retry logic to handle transient OIDC and Cognito rate-limit failures. Wraps three independent external calls (GitHub OIDC token fetch, Cognito GetId, Cognito GetCredentials) with a new retryWithBackoff() utility that uses crypto.randomInt() for jitter to prevent thundering herd. Each call retries independently with its own label—a Cognito failure doesn't re-attempt the OIDC token fetch. Includes 8 new tests; all 23 tests pass in ~560ms.

What reviewers should know

Start with src/retry.ts to understand the core backoff logic (exponential 2^n with 50-100% jitter applied on each attempt). Then review src/auth.ts to see how each of the three external calls is wrapped—note the retryOpts spread on lines 27, 39-40, and 53 to preserve caller-supplied retryOptions. Watch for the intentional independence: each wrapped call has its own attempt counter, so one failure doesn't restart another's attempts. Check test mocks in __tests__/credential-setup.test.ts (line 6-10): delays are mocked to 1ms globally to keep tests fast. The new auth tests (lines 83-172) validate retry success paths; the existing auth tests remain unchanged and still pass.


  • Generate Walkthrough
  • Generate Diagram

🗣️ Give feedback

sonar-review-alpha[bot]

This comment was marked as outdated.

…ring herd

Jitter range: 50-100% of base delay. Prevents concurrent runners
from retrying in lockstep after a transient outage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sonarqube-agent
Copy link
Copy Markdown

sonarqube-agent bot commented Mar 30, 2026

SonarQube Remediation Agent

SonarQube found 6 issues in this PR that the agent can fix for you. Est. time saved: ~30 min

6 issues found
  • 🟠 No magic number: 3.auth.test.ts:136
  • 🟠 No magic number: 3.auth.test.ts:162
  • 🟠 No magic number: 3.auth.test.ts:172
  • 🟠 No magic number: 1100.retry.test.ts:35
  • 🟠 No magic number: 1100.retry.test.ts:54
  • 🟠 No magic number: 4000.retry.test.ts:75
  • Run Remediation Agent
    Select the checkbox above to enable this action.

View Project in SonarCloud

💡 Got issues in your backlog? Let the agent fix them for you.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… rate limiting

Cognito Rate exceeded errors need longer backoff windows.
New delays: ~5s first retry, ~10s second retry (with jitter).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…pliance

Math.random() flagged as security hotspot (non-CSPRNG). Use
node:crypto randomInt instead — same jitter behavior, no SQ finding.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sonarqube-cloud-us
Copy link
Copy Markdown

@sonarqubecloud
Copy link
Copy Markdown

@mikolaj-matuszny-ext-sonarsource mikolaj-matuszny-ext-sonarsource marked this pull request as ready for review March 30, 2026 11:56
Copy link
Copy Markdown

@sonar-review-alpha sonar-review-alpha bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! ✅

Clean implementation with a well-designed retry abstraction, correct jitter math, and thorough test coverage. Good to merge.

🗣️ Give feedback

@mikolaj-matuszny-ext-sonarsource mikolaj-matuszny-ext-sonarsource merged commit 55df12d into master Mar 30, 2026
23 checks passed
@mikolaj-matuszny-ext-sonarsource mikolaj-matuszny-ext-sonarsource deleted the feat/mmatuszny/BUILD-10777-add-retry-logic-to-auth branch March 30, 2026 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants