🔄 Fix: Implement exponential backoff in edge node [#5] by Jerrycyborg · Pull Request #8 · Jerrycyborg/Claw-Edgemesh

Jerrycyborg · 2026-03-02T11:03:38Z

Summary

Implements exponential backoff with jitter for edge node error handling (Issue #5), preventing thundering herd problems when multiple nodes encounter errors simultaneously.

Changes

Added constants:
- MAX_BACKOFF_MS (default: 30s, configurable via EDGEMESH_MAX_BACKOFF_MS)
Exponential backoff state tracking:
- backoffMs: Current backoff duration (starts at pollMs)
- consecutiveErrors: Counter for error tracking
Error handling logic:
- Doubles backoff on each consecutive error (exponential growth)
- Adds 0-1000ms random jitter to prevent synchronized retries
- Caps max backoff at MAX_BACKOFF_MS to prevent infinite delays
- Resets backoff to pollMs on successful task execution
Improved observability:
- Logs backoff duration: backing off for Xms
- Tracks consecutive error count in log messages

Before

catch (err) {
  console.error(`[edge-node:${nodeId}] loop error`, err);
  await sleep(pollMs); // Fixed 1.5s delay
}

After

catch (err) {
  consecutiveErrors++;
  const jitter = Math.random() * 1000;
  const sleepTime = Math.min(backoffMs + jitter, MAX_BACKOFF_MS);
  
  console.log(`backing off for ${Math.round(sleepTime)}ms`);
  await sleep(sleepTime);
  
  backoffMs = Math.min(backoffMs * 2, MAX_BACKOFF_MS);
}

Backoff Progression Example

Error #	Backoff (without jitter)	With jitter range
0	1.5s	1.5-2.5s
1	3s	3-4s
2	6s	6-7s
3	12s	12-13s
4	24s	24-25s
5+	30s (max)	30-31s

Benefits

✅ Prevents control plane overload during outages
✅ Distributed jitter reduces synchronized thundering herd
✅ Quick recovery when errors resolve (resets to 1.5s)
✅ Bounded worst-case delay (30s cap)
✅ Configurable via environment variable

Testing

Existing tests should pass
Manual test: Stop control plane and observe edge node logs
Should see increasing backoff times with jitter
Restart control plane and verify quick recovery

Closes

Fixes 🔴 CRITICAL: Implement exponential backoff in edge node error handling #5

#5] - Add MAX_BACKOFF_MS constant (default 30s, configurable via env) - Track backoff state (backoffMs, consecutiveErrors) - Double backoff on each consecutive error (exponential) - Add 0-1000ms jitter to prevent thundering herd - Reset backoff to pollMs on successful task execution - Log backoff duration and attempt count for observability This prevents overwhelming the control plane when nodes encounter persistent errors, while allowing quick recovery when errors resolve. Closes #5

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fb90e6b62a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-02T11:05:29Z

src/edge-node.ts

+      // Reset backoff on successful task execution
+      backoffMs = pollMs;
+      consecutiveErrors = 0;


Reset backoff after any successful poll

The new backoff state is only reset after a task is fully executed, so successful claimTask() polls that return no task do not clear backoffMs/consecutiveErrors. After a temporary outage, if the node spends time idling with empty queues, the next isolated error is still treated like a long error streak and can sleep up to MAX_BACKOFF_MS, which delays recovery and reduces polling responsiveness even though many successful polls occurred in between.

Useful? React with 👍 / 👎.

chatgpt-codex-connector bot reviewed Mar 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🔄 Fix: Implement exponential backoff in edge node [#5]#8

🔄 Fix: Implement exponential backoff in edge node [#5]#8
Jerrycyborg wants to merge 1 commit intomainfrom
fix/issue-5-exponential-backoff

Jerrycyborg commented Mar 2, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Jerrycyborg commented Mar 2, 2026

Summary

Changes

Before

After

Backoff Progression Example

Benefits

Testing

Closes

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant