Skip to content

Add exponential backoff to reactive consumer retries#680

Merged
cbartz merged 10 commits intomainfrom
copilot/add-exponential-backoff-retries
Dec 17, 2025
Merged

Add exponential backoff to reactive consumer retries#680
cbartz merged 10 commits intomainfrom
copilot/add-exponential-backoff-retries

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Dec 16, 2025

Applicable spec: N/A

Overview

Replaces static 60-second retry sleep in reactive consumer with exponential backoff (60s base, 1800s cap) to reduce load on dependencies during sustained failures.

Also: Update CODEOWNERS to reflect current team.

Rationale

Static retry delays cause a problem where sustained failures hammer dependencies with consistent 60s intervals. Exponential backoff progressively increases delays, reducing API load during prolonged outages while maintaining the same initial retry delay. The backoff is capped at 1800 seconds (30 minutes) to prevent excessively long delays.

Juju Events Changes

None.

Module Changes

github_runner_manager/reactive/consumer.py:

  • Added BACKOFF_BASE_SECONDS = 60 constant
  • Added BACKOFF_MAX_SECONDS = 1800 constant
  • Added _calculate_backoff_time(retry_count) helper implementing base × 2^(retry_count-1) capped at 1800s
  • Helper function placed below public function get_queue_size() following code organization conventions
  • Modified consume() to use exponential backoff for message retries (when msg_process_count > 1)
  • _spawn_runner() unchanged - continues using 60s polling interval
  • WAIT_TIME_IN_SEC constant retained for use in _spawn_runner()

Backoff schedule (capped at 1800s):

Retry 1 2 3 4 5 6+
Delay 60s 120s 240s 480s 960s 1800s

Note: With RETRY_LIMIT = 5, the maximum backoff experienced is 960s (16 minutes) before message rejection.

Tests:

  • Added 7 unit tests for _calculate_backoff_time() covering retry counts and 1800s cap
  • Updated 2 existing tests to verify exponential backoff behavior

Documentation:

  • Updated docs/changelog.md with entry for 2025-12-16 documenting the exponential backoff feature
  • Bumped version from 0.9.0 to 0.10.0 in github-runner-manager/pyproject.toml

Library Changes

None.

Checklist

  • The charm style guide was applied.
  • The contributing guide was applied.
  • The changes are compliant with ISD054 - Managing Charm Complexity
  • The documentation for charmhub is updated.
  • The PR is tagged with appropriate label (urgent, trivial, complex).
  • The changelog is updated with changes that affects the users of the charm.
  • The application version number is updated in github-runner-manager/pyproject.toml.
Original prompt

This section details on the original issue you should resolve

<issue_title>Add exponential backoff to reactive consumer</issue_title>
<issue_description>### Enhancement Proposal

The current retry/sleep logic in github-runner-manager/src/github_runner_manager/reactive/consumer.py uses a static sleep time (WAIT_TIME_IN_SEC = 60). This applies to message retries in the main consumer loop.

Proposal:

  • Implement an exponential backoff strategy for retries in the consume loop (when msg_process_count > 1).
  • Document and parameterize the backoff base and maximum/minimum values.
  • Ensure the _spawn_runner helper remains unchanged (it should continue using its existing polling strategy).

Rationale:
This will mitigate the risk of overwhelming dependencies when runner instantiation is temporarily failing, and create gentler failure/retry behavior under duress.</issue_description>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI changed the title [WIP] Add exponential backoff to reactive consumer Add exponential backoff to reactive consumer retries Dec 16, 2025
Copilot AI requested a review from cbartz December 16, 2025 07:21
Copy link
Copy Markdown
Collaborator

@cbartz cbartz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Please also update the changelog and bump the application in pyproject.toml

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Dec 16, 2025

@copilot Please also update the changelog and bump the application in pyproject.toml

Updated changelog with entry for 2025-12-16 and bumped version from 0.9.0 to 0.10.0 in commit 663fed4

@cbartz cbartz marked this pull request as ready for review December 16, 2025 14:23
Copilot AI requested a review from cbartz December 16, 2025 14:23
Copilot AI and others added 7 commits December 17, 2025 09:42
Co-authored-by: cbartz <4182921+cbartz@users.noreply.github.com>
Co-authored-by: cbartz <4182921+cbartz@users.noreply.github.com>
…o 60s

Co-authored-by: cbartz <4182921+cbartz@users.noreply.github.com>
Co-authored-by: cbartz <4182921+cbartz@users.noreply.github.com>
Co-authored-by: cbartz <4182921+cbartz@users.noreply.github.com>
Co-authored-by: cbartz <4182921+cbartz@users.noreply.github.com>
@cbartz cbartz force-pushed the copilot/add-exponential-backoff-retries branch from 663fed4 to 55a0e29 Compare December 17, 2025 08:42
@cbartz cbartz enabled auto-merge (squash) December 17, 2025 09:43
@github-actions
Copy link
Copy Markdown
Contributor

TICS Quality Gate

✔️ Passed

github-runner-operator

See the results in the TICS Viewer

The following files have been checked for this project
  • github-runner-manager/src/github_runner_manager/reactive/consumer.py
  • github-runner-manager/tests/unit/reactive/test_consumer.py
  • github-runner-manager/tests/unit/test_pre_job_script.py

.github/workflows/tics.yaml / TICS / TICS GitHub Action

@cbartz cbartz merged commit 3f7863c into main Dec 17, 2025
76 of 82 checks passed
@cbartz cbartz deleted the copilot/add-exponential-backoff-retries branch December 17, 2025 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add exponential backoff to reactive consumer

6 participants