Skip to content

fix(probe): cancel streaming response to release concurrency slot#442

Merged
mcowger merged 2 commits into
mcowger:mainfrom
sirn:fix-probe-leak
May 19, 2026
Merged

fix(probe): cancel streaming response to release concurrency slot#442
mcowger merged 2 commits into
mcowger:mainfrom
sirn:fix-probe-leak

Conversation

@sirn
Copy link
Copy Markdown
Contributor

@sirn sirn commented May 19, 2026

Fix an issue where runProbe never releases its concurrency slot.

Steps to reproduce:

  1. Click the probe (>) button either on the provider dialog or the model alias page.
  2. Observe concurrency count goes up by 1 and never comes down.

Root cause:

Chat probe uses stream: true to measure TTFT/TPS, but ProbeService never consumed or cancelled the returned ReadableStream. Since doRelease() only fires when a stream is consumed, cancelled, or errored, the probe will permanently leak 1 concurrency slot per each run.

Fix:

Make sure to call response.stream.cancel() after running probes.

sirn and others added 2 commits May 19, 2026 19:54
Chat probes use stream: true to measure realistic TTFT/TPS, but the
ProbeService never consumed or cancelled the returned ReadableStream.
The dispatcher wraps streams so doRelease() only fires when the stream
is consumed, cancelled, or errored. Since the probe only reads
response.plexus metadata, the stream was silently abandoned, leaking
one concurrency slot per chat probe. Over time this exhausted
maxConcurrency and deadlocked the provider/model.

Cancel the stream after dispatch with error handling (logger.warn
on failure, since the slot is already released by the wrapper).
@mcowger mcowger merged commit bf7c3a4 into mcowger:main May 19, 2026
1 check passed
@sirn sirn deleted the fix-probe-leak branch May 19, 2026 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants