Skip to content

ui: fix stop and reasoning skip in single-model mode#25084

Open
ServeurpersoCom wants to merge 3 commits into
ggml-org:masterfrom
ServeurpersoCom:ui/fix-stop-continue-skip
Open

ui: fix stop and reasoning skip in single-model mode#25084
ServeurpersoCom wants to merge 3 commits into
ggml-org:masterfrom
ServeurpersoCom:ui/fix-stop-continue-skip

Conversation

@ServeurpersoCom

@ServeurpersoCom ServeurpersoCom commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Overview

Two fixes here :

Single-model: pressing Stop while the model is thinking now actually stops generation. Before, it kept running in the background until the end and only stopped late, and a page refresh got stuck on "Connecting to Server" until it finished.

Single-model and router: the Skip reasoning button now keeps working after you Stop then Continue. Before, that sequence left it dead and clicking it did nothing.

Additional information

Fixes #25055

Requirements

stop cancel (single-model only): stopGenerationForChat read the live model
dropdown instead of a frozen identity to build the DELETE key. in
single-model the POST opens the session under conv_id alone while the stop
appended ::model, so the key missed and generation ran to eos; router
matched by accident and already worked. now effectiveModel is frozen on
every write and the dropdown is consulted only in router mode, so the key
matches the POST in both modes.

reasoning skip (single-model and router): the continue flow never wired
onCompletionId, so after a stop and continue the message kept the dead
completion id and the skip control hit a slot that no longer existed.
record the fresh id on every continue so the skip targets the live slot.
@ServeurpersoCom ServeurpersoCom requested a review from a team as a code owner June 27, 2026 19:59
@ServeurpersoCom ServeurpersoCom requested a review from allozaur June 27, 2026 20:00
@ggerganov

ggerganov commented Jun 28, 2026

Copy link
Copy Markdown
Member

On my end, the server still keeps generating if I click on the stop button during reasoning.

Here is the log after click the stop button multiple times:

0.03.036.423 I slot   operator(): id  3 | task 0 | new prompt, n_ctx_slot = 262144, n_keep = 0, task.n_tokens = 14
0.03.036.431 I slot   operator(): id  3 | task 0 | cached n_tokens = 0, memory_seq_rm [0, end)
0.03.036.481 I srv  stream_sessi: stream_session_attach_pipe: conv_id=d01cb306-f2cc-47f9-ae6a-f6e7177db8bc::ggml-org/Qwen3.5-0.8B-GGUF:Q4_0 (empty=0)
0.03.039.204 I slot   operator(): id  3 | task 0 | cached n_tokens = 10, memory_seq_rm [10, end)
0.03.039.363 I slot init_sampler: id  3 | task 0 | init sampler, took 0.00 ms, tokens: text = 14, total = 14
0.03.060.041 I slot create_check: id  3 | task 0 | created context checkpoint 1 of 32 (pos_min = 9, pos_max = 9, n_tokens = 10, size = 19.266 MiB)
0.03.987.288 I srv         close: stream_pipe close: draining conv=d01cb306-f2cc-47f9-ae6a-f6e7177db8bc::ggml-org/Qwen3.5-0.8B-GGUF:Q4_0
0.05.114.513 I reasoning-budget: budget exhausted, forcing end sequence
0.05.118.580 I reasoning-budget: forced sequence complete, done
0.06.067.810 I slot print_timing: id  3 | task 0 | n_decoded =    760, tg = 253.26 t/s, tg_3s = 253.26 t/s
0.09.068.221 I slot print_timing: id  3 | task 0 | n_decoded =   1536, tg = 255.95 t/s, tg_3s = 258.63 t/s
0.12.069.759 I slot print_timing: id  3 | task 0 | n_decoded =   2306, tg = 256.14 t/s, tg_3s = 256.54 t/s
0.15.073.291 I slot print_timing: id  3 | task 0 | n_decoded =   3073, tg = 255.95 t/s, tg_3s = 255.37 t/s
0.16.299.739 I srv    operator(): DELETE /v1/stream/d01cb306-f2cc-47f9-ae6a-f6e7177db8bc -> evict_and_cancel
0.18.074.116 I slot print_timing: id  3 | task 0 | n_decoded =   3838, tg = 255.74 t/s, tg_3s = 254.93 t/s
0.18.619.593 I srv    operator(): DELETE /v1/stream/d01cb306-f2cc-47f9-ae6a-f6e7177db8bc -> evict_and_cancel
0.18.846.172 I srv    operator(): DELETE /v1/stream/d01cb306-f2cc-47f9-ae6a-f6e7177db8bc -> evict_and_cancel
0.19.036.088 I srv    operator(): DELETE /v1/stream/d01cb306-f2cc-47f9-ae6a-f6e7177db8bc -> evict_and_cancel
0.19.204.126 I srv    operator(): DELETE /v1/stream/d01cb306-f2cc-47f9-ae6a-f6e7177db8bc -> evict_and_cancel
0.19.379.945 I srv    operator(): DELETE /v1/stream/d01cb306-f2cc-47f9-ae6a-f6e7177db8bc -> evict_and_cancel
0.21.075.663 I slot print_timing: id  3 | task 0 | n_decoded =   4600, tg = 255.43 t/s, tg_3s = 253.87 t/s
0.24.075.915 I slot print_timing: id  3 | task 0 | n_decoded =   5359, tg = 255.08 t/s, tg_3s = 252.98 t/s
0.27.077.798 I slot print_timing: id  3 | task 0 | n_decoded =   6119, tg = 254.84 t/s, tg_3s = 253.17 t/s

@ServeurpersoCom

Copy link
Copy Markdown
Contributor Author

Thanks for testing. I'm checking the conversation IDs. Something is amiss. This PR needs to resolve all those cases.

stop and reasoning control silently no-op when the id does not match a
live session or completion, which makes a client side id mismatch
impossible to diagnose from the logs.

evict_and_cancel now warns with the requested conv_id and the list of
live session keys when nothing matches, and logs an info line when a stop
is accepted. the control task warns with the requested completion id and
the live processing cmpl_ids when no slot owns it, warns when reasoning
control was never armed, and logs an info line when reasoning_end is
accepted. the router stop proxy warns when no child owns the conv_id and
when the child reports no live session.

no behavior change, observability only.
@ServeurpersoCom ServeurpersoCom requested a review from a team as a code owner June 28, 2026 12:40
@ServeurpersoCom

ServeurpersoCom commented Jun 28, 2026

Copy link
Copy Markdown
Contributor Author

Backend instrumentation to debug the frontend. (Can be reverted or improved, this will be very useful with the right level of verbosity in case of UI regression.)

It's an edge case:
According to your log, the ::model wasn't passed to the backend, and it silently failed. The attach is conv_id=id::ggml-org/... but the DELETE is the bare id..., no ::model, so the stop key misses the live session. The POST sends the full identity, the stop drops the suffix.
I'm still nailing down the exact sequence that triggers it. If you remember any specific action before the stop (switching conversation, model change, a continue, fresh chat vs existing), it would point me straight to it.

the stop path derived the model lazily and fell back to the live dropdown
when the frozen streaming state held none. when the dropdown was empty
while the POST had used a resolved model, the DELETE key dropped the
::model suffix, missed the live session, and generation drained to eos.

seed the streaming state with effectiveModel at t0, before the request,
so the frozen identity is present from the first millisecond. the stop
now reads that identity only, with no dropdown fallback that can diverge
from the POST. the DELETE key equals the X-Conversation-Id sent at POST
in both single-model and router mode.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Misc. bug: pressing stop while model is reasoning in webui breaks send button and doesn't stop generation

2 participants