🔄 Fix: Implement exponential backoff in edge node [#5]#8
🔄 Fix: Implement exponential backoff in edge node [#5]#8Jerrycyborg wants to merge 1 commit intomainfrom
Conversation
#5] - Add MAX_BACKOFF_MS constant (default 30s, configurable via env) - Track backoff state (backoffMs, consecutiveErrors) - Double backoff on each consecutive error (exponential) - Add 0-1000ms jitter to prevent thundering herd - Reset backoff to pollMs on successful task execution - Log backoff duration and attempt count for observability This prevents overwhelming the control plane when nodes encounter persistent errors, while allowing quick recovery when errors resolve. Closes #5
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fb90e6b62a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // Reset backoff on successful task execution | ||
| backoffMs = pollMs; | ||
| consecutiveErrors = 0; |
There was a problem hiding this comment.
Reset backoff after any successful poll
The new backoff state is only reset after a task is fully executed, so successful claimTask() polls that return no task do not clear backoffMs/consecutiveErrors. After a temporary outage, if the node spends time idling with empty queues, the next isolated error is still treated like a long error streak and can sleep up to MAX_BACKOFF_MS, which delays recovery and reduces polling responsiveness even though many successful polls occurred in between.
Useful? React with 👍 / 👎.
Summary
Implements exponential backoff with jitter for edge node error handling (Issue #5), preventing thundering herd problems when multiple nodes encounter errors simultaneously.
Changes
Added constants:
MAX_BACKOFF_MS(default: 30s, configurable viaEDGEMESH_MAX_BACKOFF_MS)Exponential backoff state tracking:
backoffMs: Current backoff duration (starts atpollMs)consecutiveErrors: Counter for error trackingError handling logic:
MAX_BACKOFF_MSto prevent infinite delayspollMson successful task executionImproved observability:
backing off for XmsBefore
After
Backoff Progression Example
Benefits
Testing
Closes