Skip to content

fix: queue client cancellation/timeouts#55

Merged
markt-sublime merged 6 commits into
masterfrom
mt.queue-client-cancellation
Jun 9, 2026
Merged

fix: queue client cancellation/timeouts#55
markt-sublime merged 6 commits into
masterfrom
mt.queue-client-cancellation

Conversation

@markt-sublime

Copy link
Copy Markdown
Contributor

Updates ASQ/SQS clients to use 30-second timeouts for deletion/visibility extension/DLQ insertion (Azure only), and updates them to cancel the Context used for dequeueing messages on shutdown. Prevents a hanging call from occupying a work slot for the container's lifetime, and prevents the brokers from pulling new work while shutting down (wastes an attempt + pushes out the message's visibility, then interrupts processing after the shutdown grace period)

AI usage: written by Claude; reviewed by me

markt-sublime and others added 5 commits June 8, 2026 16:02
Cancel the receive request in StopConsuming so an in-flight ReceiveMessage/
DequeueMessage unblocks instead of holding the loop (SQS: the full 20s
WaitTimeSeconds long-poll) and so shutdown stops pulling work it can't finish
in the ~30s grace before SIGKILL.

Bound the post-receive lifecycle calls (delete, retry, Azure DLQ enqueue) with
a 30s timeout so a wedged request can't pin a worker's concurrency slot for the
container's lifetime and starve the pool. extend is left unbounded: it runs on
the heartbeat watcher goroutine (not a token), self-heals on the next beat, and
30s would collide with the 30s-visibility-timeout queues.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Production receiveMessage is reached only from StartConsuming (which sets a
cancellable consumeCtx first), so the prior nil fallback only ever covered
direct test calls. Default consumeCtx to context.Background() in New() — keeping
a pre-StartConsuming call safe on both SDKs (SQS panics on a nil context, azure
errors) — set it in the test broker constructors, and drop the receiveMessage
guard. Also note the cancel/error invariant where the receive loop branches on it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add the same one-line note to the Azure V2 loop and both SQS loops so all four
receive branches document why a cancelled consumeCtx is handled by the error
arm.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reword the delete/DLQ/retry comments so the leading clause is unambiguously the
service call ("If this call times out"), distinct from the message's visibility
timeout. For deleteOne, frame the safety as the concrete reliance — callers must
already tolerate reprocessing, since the queue isn't exactly-once — rather than
naming the delivery property.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mirror TestDeleteOne_UsesBoundedContext for the azure broker: an undecodable
message at the decode-failure threshold drives deleteOne, and the mock client
asserts the DeleteMessage call carries a deadline.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@cameron-dunn-sublime cameron-dunn-sublime left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All comments apply for the SQS side of the house as well

Comment thread v1/brokers/azure/storage_queue.go Outdated
Comment thread v1/brokers/azure/storage_queue.go Outdated
Comment thread v1/brokers/azure/storage_queue.go Outdated
- Exit each broker's receive-loop select on consumeCtx.Done() alone, so a
  cancelled context stops the loop without also relying on stopReceivingChan.
- Make stopReceiving a non-blocking select (send or consumeCtx.Done()) so it
  can't deadlock when the loop already exited via the new case.
- defer cancelConsume() in StartConsuming so the context is released on every
  return path, including the retryable-error path that previously leaked it.
- Bound the receive call (Azure dequeue, SQS receive) with receiveCallTimeout
  (30s, above the 20s SQS long-poll) so a wedged receive can't pin a slot.
- Create the cancellable context in New() and the test constructors so
  cancelConsume is always non-nil; drop the nil guard in StopConsuming.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@markt-sublime markt-sublime merged commit e29593f into master Jun 9, 2026
3 checks passed
@markt-sublime markt-sublime deleted the mt.queue-client-cancellation branch June 9, 2026 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants