Fix leaking pending entires with IgnoreErrors#327
Conversation
…ingCount RouteResponse was exported but served only internal routing. Its return value was also misleading: it returned true for unmatched server-initiated IDs regardless of whether a pending entry existed, conflating delivery success with sequence-number classification. Replace it with unexported deliverPending, which returns true iff a matching pending entry was found and delivered. RouteMessage and RouteInboundMessage each duplicated the lock/lookup/delete/deliver block; both now delegate to deliverPending. Add PendingCount as a side-effect-free observer for router state, replacing the side-effectful RouteResponse probes used in tests. Rename test functions and update the sub-test that verified the now- removed server-initiated absorption behavior of RouteResponse.
WaitSendDone mixed intent (one-way vs two-way) with mechanism (block on completion). Oneway better expresses what the field controls. Add wantServerResponse() and wantSendConfirmation() predicates to Request so callers in sender() and DispatchLocalRequest read as natural language rather than boolean arithmetic. sender() now delivers send confirmation directly on ResponseChan instead of going through a now-removed RouteResponse round-trip. Send errors for one-way calls that are not in the router are also delivered directly. Update Enqueue panic message and all test references accordingly.
PendingCount delegates to MessageRouter.PendingCount and is nil-safe. It exposes router state for tests and diagnostics without requiring caller access to the internal router. Update BenchmarkNodeEnqueueSend to use the renamed Oneway field.
mustWaitSendDone embedded call-type knowledge (Unicast, Multicast) that belongs at the call site, not in the predicate. isIgnoreErrors simply exposes the flag that was set, leaving interpretation to callers. Rename TestCallOptionsMustWaitSendDone to TestCallOptionsIsIgnoreErrors and drop the two-way call cases that are no longer meaningful. Add TestCallOptionsIgnoreErrorsResourceLeak verifying that fire-and- forget multicast does not grow node pending counts.
Switch the fire-and-forget guard to isIgnoreErrors() and the blocking send request to Oneway: true, matching the renamed stream.Request field.
Fire-and-forget multicast (IgnoreErrors) was unconditionally allocating a replyChan and setting ResponseChan on every stream.Request, causing every sent message to register in the router's pending map with no server response ever arriving to drain it. Node pending counts grew without bound. Fix by introducing oneway and waitForSend flags on clientCtxOptions. createChannel() on clientCtxOptions encapsulates channel allocation: returns nil for fire-and-forget (oneway=true, waitForSend=false), which propagates to ResponseChan=nil on stream.Request, preventing router registration entirely. Add reportNodeError() on ClientCtx to consolidate the nil-guard pattern scattered across sendShared, sendWithPerNodeTransformation, and transformAndMarshal. Remove chanSize as a standalone function; its logic now lives in createChannel.
|
|
Overall Grade |
Security Reliability Complexity Hygiene |
Code Review Summary
| Analyzer | Status | Updated (UTC) | Details |
|---|---|---|---|
| Go | Apr 14, 2026 8:22p.m. | Review ↗ | |
| Shell | Apr 14, 2026 8:22p.m. | Review ↗ |
Important
AI Review is run only on demand for your team. We're only showing results of static analysis review right now. To trigger AI Review, comment @deepsourcebot review on this thread.
There was a problem hiding this comment.
Pull request overview
Fixes an unbounded growth (“pending entries” leak) in fire-and-forget Multicast/Unicast calls when IgnoreErrors() is used, by preventing router registration for calls that will never receive a server response.
Changes:
- Prevent router pending-map registration for fire-and-forget one-way calls by propagating
ResponseChan=nil(no reply channel allocation). - Refactor stream request semantics by replacing
WaitSendDonewithOnewayand adding clearer routing predicates (wantServerResponse,wantSendConfirmation), plusPendingCount()observers. - Update and extend tests to validate the leak is fixed and to use the new pending-count observer.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
unicast.go |
Switches IgnoreErrors handling to fire-and-forget; updates request fields to new Oneway semantics. |
multicast.go |
Uses new clientCtxOptions flags to avoid allocating/using reply channels for fire-and-forget multicast. |
client_interceptor.go |
Adds reply channel allocation logic (createChannel) and consolidates node error reporting (reportNodeError). |
internal/stream/channel.go |
Introduces Oneway on Request, updates sender routing behavior to bypass router for one-way confirmations. |
internal/stream/router.go |
Adds PendingCount(), replaces exported routing helper with deliverPending, simplifies route paths. |
node.go |
Exposes PendingCount() on Node (nil-safe), delegating to the router. |
internal/stream/router_test.go |
Updates tests for new router APIs/semantics. |
internal/stream/channel_test.go |
Updates tests to use Oneway instead of WaitSendDone. |
callopts.go |
Replaces mustWaitSendDone with a simpler isIgnoreErrors predicate. |
callopts_test.go |
Renames callopts test and adds regression test for IgnoreErrors pending leak. |
node_test.go |
Updates benchmark request field name. |
.vscode/gorums.txt |
Adds “Twoway” to local spellchecking dictionary. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…l type
Split newClientCtx (streaming, oneway, waitForSend bool) into two focused
constructors:
- newQuorumCallClientCtx[Req, Resp]: always two-way; streaming selects the
response iterator type and channel buffer size; oneway is implicitly false.
- newMulticastClientCtx[Req]: always one-way; Resp is fixed as *emptypb.Empty
in the type signature; replyChan only created when waitForSend=true.
This eliminates the confusing triple-bool call site and makes misuse
(e.g. streaming=true on multicast) a compile-time error.
clientCtxOptions was a thin wrapper whose only job was to carry four values
into newClientCtx, which is now replaced by the two more focused constructors.
These constructors also moves the channel-allocation logic from
clientCtxOptions.createChannel inline in these constructors, which makes it
easier to understand the relationship between streaming and channel buffer size.
Add ClientCtx.enqueue to eliminate the duplicate stream.Request{} literals
in sendShared and sendWithPerNodeTransformation.
This removes the callType argument from getCallOptions since it is not used/necessary in the current logic. If we need it in the future, we can add it back when we see a concrete need for it. This also removes the isIgnoreErrors() method; call sites should instead use the ignoreErrors field directly, just like we already do for the interceptors field.
This adds TestWaitUntil test helper to avoid time.Sleep-based in TestCallOptionsIgnoreErrorsResourceLeak.
There was a problem hiding this comment.
Pull request overview
This PR fixes a router pending-entry leak caused by fire-and-forget multicast calls using IgnoreErrors(), where a response channel was still allocated and requests were registered as pending even though no server response would ever arrive.
Changes:
- Prevent pending-map growth for fire-and-forget one-way calls by propagating
ResponseChan=nilso the router never registers those requests. - Refactor stream request semantics by renaming
WaitSendDone→Oneway, introducingwantServerResponse()/wantSendConfirmation(), and routing send confirmations directly (no router round-trip). - Add pending-count observability (
MessageRouter.PendingCount(),Node.PendingCount()), update tests accordingly, and add a regression test for the leak.
Reviewed changes
Copilot reviewed 14 out of 15 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| unicast.go | Uses ignoreErrors to decide fire-and-forget vs wait-for-send; enqueues one-way requests with correct Oneway/ResponseChan semantics. |
| multicast.go | Constructs multicast client context with optional reply channel (nil for fire-and-forget) to avoid router registration. |
| quorumcall.go | Switches to new quorum-call client context constructor and updated getCallOptions signature. |
| client_interceptor.go | Splits client context creation into quorum-call vs multicast paths; centralizes enqueue and error reporting with nil-safe reply channel handling. |
| callopts.go | Simplifies call option parsing by removing call-type coupling; getCallOptions now only reflects provided flags/interceptors. |
| callopts_test.go | Renames option tests and adds regression test verifying no pending growth with fire-and-forget multicast. |
| testing_shared.go | Adds TestWaitUntil helper for polling-based test assertions. |
| node.go | Adds PendingCount() observer (nil-safe) delegating to router. |
| node_test.go | Updates benchmark request field from WaitSendDone to Oneway. |
| internal/stream/channel.go | Implements Oneway semantics, predicates for routing/confirmation, and ensures only server-response calls are registered in router. |
| internal/stream/channel_test.go | Updates tests to use Oneway field and new semantics. |
| internal/stream/router.go | Replaces RouteResponse with deliverPending, adds PendingCount(), and updates routing logic accordingly. |
| internal/stream/router_test.go | Updates tests to match deliverPending and PendingCount() behavior. |
| .vscode/gorums.txt | Adds “Twoway” to workspace dictionary. |
| .gitignore | Ignores new review docs and .claude/agent-memory/. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This reuses the ctx for all 1000 Multicast calls. This also fixes the TestWaitUntil by defer cancel() if the predicate became true, rather than waiting for t.Cleanup.
Problem
Fire-and-forget Multicast calls (using the IgnoreErrors option) unconditionally allocated a replyChan and set ResponseChan on every stream.Request. This caused each sent message to register in the router's pending map, but no server response ever arrives to drain it. As a result, node pending counts grew without bound — a resource leak.
Changes
Fix (core bug):
Introduce oneway and waitForSend flags on clientCtxOptions. A new createChannel() method encapsulates reply channel allocation and returns nil for fire-and-forget calls (oneway=true, waitForSend=false). A nil channel propagates to ResponseChan=nil on stream.Request, preventing router registration entirely. A new reportNodeError() method on ClientCtx consolidates the nil-guard pattern that was previously scattered across sendShared, sendWithPerNodeTransformation, and transformAndMarshal.
Refactor (stream package):
Rename WaitSendDone to Oneway on stream.Request — the old name mixed intent with mechanism. Add wantServerResponse() and wantSendConfirmation() predicate methods so routing logic in sender() reads as natural language. The sender() now delivers send confirmation directly on ResponseChan rather than going through a RouteResponse round-trip.
Replace the exported RouteResponse with an unexported deliverPending, which returns true only when a matching pending entry was found and delivered. The old return value was misleading — it returned true for unmatched server-initiated IDs regardless of whether delivery occurred. Add PendingCount() as a side-effect-free observer for router state, replacing the side-effectful RouteResponse probes previously used in tests.
Refactor (callopts):
Replace mustWaitSendDone with isIgnoreErrors(). The old predicate embedded call-type knowledge (Unicast, Multicast) that belongs at the call site; the new one simply exposes the flag and leaves interpretation to callers.
Observable (node):
Add PendingCount() on Node, delegating to MessageRouter.PendingCount(), nil-safe. Used in the new test that verifies fire-and-forget multicast does not grow pending counts.
Testing
Fixes #326