Skip to content

fix(tasks): prevent duplicate dispatch on all-runners-busy requeue#3988

Draft
cursor[bot] wants to merge 1 commit into
developfrom
cursor/critical-bug-investigation-f6ce
Draft

fix(tasks): prevent duplicate dispatch on all-runners-busy requeue#3988
cursor[bot] wants to merge 1 commit into
developfrom
cursor/critical-bug-investigation-f6ce

Conversation

@cursor

@cursor cursor Bot commented Jun 23, 2026

Copy link
Copy Markdown

Bug and impact

When all remote runners are busy, a task is requeued via ErrAllRunnersBusy. The old path enqueued the task while it still remained in the pool's running set, and only sent EventTypeRequeued from a defer after run() returned.

A periodic queue tick (EventTypeEmpty, every 5s) or any other queue event could run in that window, ClaimAndDequeue the task, and start a second dispatch on the same TaskRunner. That can cause duplicate runner assignments, corrupted pool state, and duplicate task execution.

Root cause

TaskRunner.run() used Enqueue() immediately but deferred EventTypeRequeued (which calls onTaskStop()). The reconciler's requeueTaskRunnerOffline() already documented the correct pattern but only sent the event synchronously — it still enqueued before releasing running state.

Fix

  1. Call onTaskStop() before Enqueue() so the task is never simultaneously queued and running.
  2. Send EventTypeRequeued synchronously (not from defer) so the current queue pass skips an immediate retry.
  3. Apply the same ordering to both reconciler requeue helpers for consistency.

Validation

  • Added TestTaskRunner_ErrAllRunnersBusy_ReleasesRunningBeforeEnqueue
  • Existing TestTaskPool_RequeuedEventCleansRunningStateAndSkipsImmediateRetry and reconciler requeue tests pass
Open in Web View Automation 

When every runner is at capacity, ErrAllRunnersBusy requeued tasks by
enqueueing them while they still sat in the running set, then sending
EventTypeRequeued from a defer. A periodic queue tick could
ClaimAndDequeue the task in that window and start a second dispatch on
the same TaskRunner.

Release running/active bookkeeping before enqueueing and notify the pool
synchronously, matching the reconciler requeue paths.

Co-authored-by: Denis Gukov <fiftin@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant