fix: queue client cancellation/timeouts#55
Merged
Conversation
Cancel the receive request in StopConsuming so an in-flight ReceiveMessage/ DequeueMessage unblocks instead of holding the loop (SQS: the full 20s WaitTimeSeconds long-poll) and so shutdown stops pulling work it can't finish in the ~30s grace before SIGKILL. Bound the post-receive lifecycle calls (delete, retry, Azure DLQ enqueue) with a 30s timeout so a wedged request can't pin a worker's concurrency slot for the container's lifetime and starve the pool. extend is left unbounded: it runs on the heartbeat watcher goroutine (not a token), self-heals on the next beat, and 30s would collide with the 30s-visibility-timeout queues. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Production receiveMessage is reached only from StartConsuming (which sets a cancellable consumeCtx first), so the prior nil fallback only ever covered direct test calls. Default consumeCtx to context.Background() in New() — keeping a pre-StartConsuming call safe on both SDKs (SQS panics on a nil context, azure errors) — set it in the test broker constructors, and drop the receiveMessage guard. Also note the cancel/error invariant where the receive loop branches on it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add the same one-line note to the Azure V2 loop and both SQS loops so all four receive branches document why a cancelled consumeCtx is handled by the error arm. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reword the delete/DLQ/retry comments so the leading clause is unambiguously the
service call ("If this call times out"), distinct from the message's visibility
timeout. For deleteOne, frame the safety as the concrete reliance — callers must
already tolerate reprocessing, since the queue isn't exactly-once — rather than
naming the delivery property.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mirror TestDeleteOne_UsesBoundedContext for the azure broker: an undecodable message at the decode-failure threshold drives deleteOne, and the mock client asserts the DeleteMessage call carries a deadline. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
cameron-dunn-sublime
approved these changes
Jun 9, 2026
cameron-dunn-sublime
left a comment
Member
There was a problem hiding this comment.
All comments apply for the SQS side of the house as well
- Exit each broker's receive-loop select on consumeCtx.Done() alone, so a cancelled context stops the loop without also relying on stopReceivingChan. - Make stopReceiving a non-blocking select (send or consumeCtx.Done()) so it can't deadlock when the loop already exited via the new case. - defer cancelConsume() in StartConsuming so the context is released on every return path, including the retryable-error path that previously leaked it. - Bound the receive call (Azure dequeue, SQS receive) with receiveCallTimeout (30s, above the 20s SQS long-poll) so a wedged receive can't pin a slot. - Create the cancellable context in New() and the test constructors so cancelConsume is always non-nil; drop the nil guard in StopConsuming. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Updates ASQ/SQS clients to use 30-second timeouts for deletion/visibility extension/DLQ insertion (Azure only), and updates them to cancel the Context used for dequeueing messages on shutdown. Prevents a hanging call from occupying a work slot for the container's lifetime, and prevents the brokers from pulling new work while shutting down (wastes an attempt + pushes out the message's visibility, then interrupts processing after the shutdown grace period)
AI usage: written by Claude; reviewed by me