Skip to content

Fix leaking pending entires with IgnoreErrors#327

Merged
meling merged 12 commits intomasterfrom
meling/326/fix-leaking-pending-entries-with-ignore-errors
Apr 14, 2026
Merged

Fix leaking pending entires with IgnoreErrors#327
meling merged 12 commits intomasterfrom
meling/326/fix-leaking-pending-entries-with-ignore-errors

Conversation

@meling
Copy link
Copy Markdown
Member

@meling meling commented Apr 14, 2026

Problem

Fire-and-forget Multicast calls (using the IgnoreErrors option) unconditionally allocated a replyChan and set ResponseChan on every stream.Request. This caused each sent message to register in the router's pending map, but no server response ever arrives to drain it. As a result, node pending counts grew without bound — a resource leak.

Changes

Fix (core bug):

Introduce oneway and waitForSend flags on clientCtxOptions. A new createChannel() method encapsulates reply channel allocation and returns nil for fire-and-forget calls (oneway=true, waitForSend=false). A nil channel propagates to ResponseChan=nil on stream.Request, preventing router registration entirely. A new reportNodeError() method on ClientCtx consolidates the nil-guard pattern that was previously scattered across sendShared, sendWithPerNodeTransformation, and transformAndMarshal.

Refactor (stream package):

Rename WaitSendDone to Oneway on stream.Request — the old name mixed intent with mechanism. Add wantServerResponse() and wantSendConfirmation() predicate methods so routing logic in sender() reads as natural language. The sender() now delivers send confirmation directly on ResponseChan rather than going through a RouteResponse round-trip.

Replace the exported RouteResponse with an unexported deliverPending, which returns true only when a matching pending entry was found and delivered. The old return value was misleading — it returned true for unmatched server-initiated IDs regardless of whether delivery occurred. Add PendingCount() as a side-effect-free observer for router state, replacing the side-effectful RouteResponse probes previously used in tests.

Refactor (callopts):

Replace mustWaitSendDone with isIgnoreErrors(). The old predicate embedded call-type knowledge (Unicast, Multicast) that belongs at the call site; the new one simply exposes the flag and leaves interpretation to callers.

Observable (node):

Add PendingCount() on Node, delegating to MessageRouter.PendingCount(), nil-safe. Used in the new test that verifies fire-and-forget multicast does not grow pending counts.

Testing

  • Renamed TestCallOptionsMustWaitSendDone → TestCallOptionsIsIgnoreErrors
  • Added TestCallOptionsIgnoreErrorsResourceLeak — verifies that fire-and-forget multicast does not grow node pending counts
  • Updated router tests to use PendingCount instead of RouteResponse probes

Fixes #326

meling added 6 commits April 14, 2026 16:21
…ingCount

RouteResponse was exported but served only internal routing. Its return
value was also misleading: it returned true for unmatched server-initiated
IDs regardless of whether a pending entry existed, conflating delivery
success with sequence-number classification.

Replace it with unexported deliverPending, which returns true iff a
matching pending entry was found and delivered. RouteMessage and
RouteInboundMessage each duplicated the lock/lookup/delete/deliver block;
both now delegate to deliverPending.

Add PendingCount as a side-effect-free observer for router state,
replacing the side-effectful RouteResponse probes used in tests.
Rename test functions and update the sub-test that verified the now-
removed server-initiated absorption behavior of RouteResponse.
WaitSendDone mixed intent (one-way vs two-way) with mechanism (block
on completion). Oneway better expresses what the field controls.

Add wantServerResponse() and wantSendConfirmation() predicates to
Request so callers in sender() and DispatchLocalRequest read as
natural language rather than boolean arithmetic. sender() now
delivers send confirmation directly on ResponseChan instead of going
through a now-removed RouteResponse round-trip. Send errors for
one-way calls that are not in the router are also delivered directly.

Update Enqueue panic message and all test references accordingly.
PendingCount delegates to MessageRouter.PendingCount and is nil-safe.
It exposes router state for tests and diagnostics without requiring
caller access to the internal router.

Update BenchmarkNodeEnqueueSend to use the renamed Oneway field.
mustWaitSendDone embedded call-type knowledge (Unicast, Multicast)
that belongs at the call site, not in the predicate. isIgnoreErrors
simply exposes the flag that was set, leaving interpretation to callers.

Rename TestCallOptionsMustWaitSendDone to TestCallOptionsIsIgnoreErrors
and drop the two-way call cases that are no longer meaningful.
Add TestCallOptionsIgnoreErrorsResourceLeak verifying that fire-and-
forget multicast does not grow node pending counts.
Switch the fire-and-forget guard to isIgnoreErrors() and the blocking
send request to Oneway: true, matching the renamed stream.Request field.
Fire-and-forget multicast (IgnoreErrors) was unconditionally allocating
a replyChan and setting ResponseChan on every stream.Request, causing
every sent message to register in the router's pending map with no
server response ever arriving to drain it. Node pending counts grew
without bound.

Fix by introducing oneway and waitForSend flags on clientCtxOptions.
createChannel() on clientCtxOptions encapsulates channel allocation:
returns nil for fire-and-forget (oneway=true, waitForSend=false),
which propagates to ResponseChan=nil on stream.Request, preventing
router registration entirely.

Add reportNodeError() on ClientCtx to consolidate the nil-guard
pattern scattered across sendShared, sendWithPerNodeTransformation,
and transformAndMarshal. Remove chanSize as a standalone function;
its logic now lives in createChannel.
Copilot AI review requested due to automatic review settings April 14, 2026 15:04
@deepsource-io
Copy link
Copy Markdown
Contributor

deepsource-io Bot commented Apr 14, 2026

DeepSource Code Review

We reviewed changes in 06356cc...fe425a5 on this pull request. Below is the summary for the review, and you can see the individual issues we found as inline review comments.

See full review on DeepSource ↗

PR Report Card

Overall Grade   Security  

Reliability  

Complexity  

Hygiene  

Code Review Summary

Analyzer Status Updated (UTC) Details
Go Apr 14, 2026 8:22p.m. Review ↗
Shell Apr 14, 2026 8:22p.m. Review ↗

Important

AI Review is run only on demand for your team. We're only showing results of static analysis review right now. To trigger AI Review, comment @deepsourcebot review on this thread.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes an unbounded growth (“pending entries” leak) in fire-and-forget Multicast/Unicast calls when IgnoreErrors() is used, by preventing router registration for calls that will never receive a server response.

Changes:

  • Prevent router pending-map registration for fire-and-forget one-way calls by propagating ResponseChan=nil (no reply channel allocation).
  • Refactor stream request semantics by replacing WaitSendDone with Oneway and adding clearer routing predicates (wantServerResponse, wantSendConfirmation), plus PendingCount() observers.
  • Update and extend tests to validate the leak is fixed and to use the new pending-count observer.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
unicast.go Switches IgnoreErrors handling to fire-and-forget; updates request fields to new Oneway semantics.
multicast.go Uses new clientCtxOptions flags to avoid allocating/using reply channels for fire-and-forget multicast.
client_interceptor.go Adds reply channel allocation logic (createChannel) and consolidates node error reporting (reportNodeError).
internal/stream/channel.go Introduces Oneway on Request, updates sender routing behavior to bypass router for one-way confirmations.
internal/stream/router.go Adds PendingCount(), replaces exported routing helper with deliverPending, simplifies route paths.
node.go Exposes PendingCount() on Node (nil-safe), delegating to the router.
internal/stream/router_test.go Updates tests for new router APIs/semantics.
internal/stream/channel_test.go Updates tests to use Oneway instead of WaitSendDone.
callopts.go Replaces mustWaitSendDone with a simpler isIgnoreErrors predicate.
callopts_test.go Renames callopts test and adds regression test for IgnoreErrors pending leak.
node_test.go Updates benchmark request field name.
.vscode/gorums.txt Adds “Twoway” to local spellchecking dictionary.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread unicast.go Outdated
Comment thread callopts_test.go Outdated
Comment thread internal/stream/router.go Outdated
meling added 5 commits April 14, 2026 20:19
…l type

Split newClientCtx (streaming, oneway, waitForSend bool) into two focused
constructors:

- newQuorumCallClientCtx[Req, Resp]: always two-way; streaming selects the
  response iterator type and channel buffer size; oneway is implicitly false.
- newMulticastClientCtx[Req]: always one-way; Resp is fixed as *emptypb.Empty
  in the type signature; replyChan only created when waitForSend=true.

This eliminates the confusing triple-bool call site and makes misuse
(e.g. streaming=true on multicast) a compile-time error.

clientCtxOptions was a thin wrapper whose only job was to carry four values
into newClientCtx, which is now replaced by the two more focused constructors.

These constructors also moves the channel-allocation logic from
clientCtxOptions.createChannel inline in these constructors, which makes it
easier to understand the relationship between streaming and channel buffer size.

Add ClientCtx.enqueue to eliminate the duplicate stream.Request{} literals
in sendShared and sendWithPerNodeTransformation.
This removes the callType argument from getCallOptions since it is not
used/necessary in the current logic. If we need it in the future, we can
add it back when we see a concrete need for it. This also removes the
isIgnoreErrors() method; call sites should instead use the ignoreErrors
field directly, just like we already do for the interceptors field.
This adds TestWaitUntil test helper to avoid time.Sleep-based in
TestCallOptionsIgnoreErrorsResourceLeak.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a router pending-entry leak caused by fire-and-forget multicast calls using IgnoreErrors(), where a response channel was still allocated and requests were registered as pending even though no server response would ever arrive.

Changes:

  • Prevent pending-map growth for fire-and-forget one-way calls by propagating ResponseChan=nil so the router never registers those requests.
  • Refactor stream request semantics by renaming WaitSendDoneOneway, introducing wantServerResponse() / wantSendConfirmation(), and routing send confirmations directly (no router round-trip).
  • Add pending-count observability (MessageRouter.PendingCount(), Node.PendingCount()), update tests accordingly, and add a regression test for the leak.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
unicast.go Uses ignoreErrors to decide fire-and-forget vs wait-for-send; enqueues one-way requests with correct Oneway/ResponseChan semantics.
multicast.go Constructs multicast client context with optional reply channel (nil for fire-and-forget) to avoid router registration.
quorumcall.go Switches to new quorum-call client context constructor and updated getCallOptions signature.
client_interceptor.go Splits client context creation into quorum-call vs multicast paths; centralizes enqueue and error reporting with nil-safe reply channel handling.
callopts.go Simplifies call option parsing by removing call-type coupling; getCallOptions now only reflects provided flags/interceptors.
callopts_test.go Renames option tests and adds regression test verifying no pending growth with fire-and-forget multicast.
testing_shared.go Adds TestWaitUntil helper for polling-based test assertions.
node.go Adds PendingCount() observer (nil-safe) delegating to router.
node_test.go Updates benchmark request field from WaitSendDone to Oneway.
internal/stream/channel.go Implements Oneway semantics, predicates for routing/confirmation, and ensures only server-response calls are registered in router.
internal/stream/channel_test.go Updates tests to use Oneway field and new semantics.
internal/stream/router.go Replaces RouteResponse with deliverPending, adds PendingCount(), and updates routing logic accordingly.
internal/stream/router_test.go Updates tests to match deliverPending and PendingCount() behavior.
.vscode/gorums.txt Adds “Twoway” to workspace dictionary.
.gitignore Ignores new review docs and .claude/agent-memory/.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread testing_shared.go Outdated
Comment thread callopts_test.go Outdated
Comment thread callopts_test.go
This reuses the ctx for all 1000 Multicast calls. This also fixes the
TestWaitUntil by defer cancel() if the predicate became true,
rather than waiting for t.Cleanup.
@meling meling merged commit 3241d25 into master Apr 14, 2026
5 checks passed
@meling meling deleted the meling/326/fix-leaking-pending-entries-with-ignore-errors branch April 14, 2026 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: Multicast with IgnoreErrors() leaks router pending entries

2 participants