Problem
When a client disconnects mid-generation (timeout, user cancel, agent restart), the
server detects the broken stream but continues the full prefill to completion before
acting on it. On long-context sessions this means several minutes of wasted GPU work
before the server is available again.
What happens
The progress callback (server_progress_cb) sets stream_failed = true when a
keepalive write fails, but this flag is only checked after ds4_session_sync()
returns — there's no path to signal the prefill loop to stop early. The loop in
metal_graph_prefill_chunked_range only bails on a Metal GPU error, not on client
state.
And it gets worse when the disconnected client reconnects and retries with a slightly different
prompt (e.g. one extra token from a tool result), the token mismatch triggers another
full prefill from zero immediately after the first one finishes.
Example log of issue
Client disconnected partway through generation. Server finished the full 67580-token
prefill (~257 seconds), then started over on the retry:
0602 19:15:59 ds4-server: chat ctx=0..67580:67580 TOOLS prompt done 256.984s
0602 19:15:59 ds4-server: chat ctx=0..67580:67580 TOOLS stream closed during prefill
0602 19:15:59 ds4-server: live kv cache miss live=67580 prompt=67609 common=67527 reason=token-mismatch
0602 19:15:59 ds4-server: chat ctx=0..67609:67609 TOOLS prompt start
0602 19:15:59 ds4-server: chat ctx=0..67609:67609 TOOLS prefill chunk 0/67609 (0.0%) ...
Cause
ds4_session_progress_fn is typedef void (*)(void*, const char*, int, int) — the
callback returns void, so there's no way to signal abort back to the prefill loop.
stream_failed lives in the server layer and is invisible to ds4.c.
Possible fix
Two approaches:
Option A — cancel flag pointer (smaller diff)
Add a volatile bool *cancel_flag to the session (via ds4_session_set_cancel_flag
or similar). The server sets it when stream_failed is detected in the progress
callback. The chunked prefill loop checks it between 4096-token chunks and returns
early if set.
Option B — progress callback returns bool
Change ds4_session_progress_fn to return bool (false = cancel). The server
callback returns false when stream_failed. Every call site in the prefill loop
checks the return value. Cleaner long-term but touches more call sites including
ds4_distributed.c.
Option A has a smaller diff and doesn't change the public callback signature.
--
On a Mac Studio M4 Max 128GB, q2-imatrix, cli flags: --ctx 192000 --kv-disk-dir /tmp/ds4-kv --kv-disk-space-mb 16384
Problem
When a client disconnects mid-generation (timeout, user cancel, agent restart), the
server detects the broken stream but continues the full prefill to completion before
acting on it. On long-context sessions this means several minutes of wasted GPU work
before the server is available again.
What happens
The progress callback (
server_progress_cb) setsstream_failed = truewhen akeepalive write fails, but this flag is only checked after
ds4_session_sync()returns — there's no path to signal the prefill loop to stop early. The loop in
metal_graph_prefill_chunked_rangeonly bails on a Metal GPU error, not on clientstate.
And it gets worse when the disconnected client reconnects and retries with a slightly different
prompt (e.g. one extra token from a tool result), the token mismatch triggers another
full prefill from zero immediately after the first one finishes.
Example log of issue
Client disconnected partway through generation. Server finished the full 67580-token
prefill (~257 seconds), then started over on the retry:
Cause
ds4_session_progress_fnistypedef void (*)(void*, const char*, int, int)— thecallback returns void, so there's no way to signal abort back to the prefill loop.
stream_failedlives in the server layer and is invisible tods4.c.Possible fix
Two approaches:
Option A — cancel flag pointer (smaller diff)
Add a
volatile bool *cancel_flagto the session (viads4_session_set_cancel_flagor similar). The server sets it when
stream_failedis detected in the progresscallback. The chunked prefill loop checks it between 4096-token chunks and returns
early if set.
Option B — progress callback returns bool
Change
ds4_session_progress_fnto returnbool(false = cancel). The servercallback returns false when
stream_failed. Every call site in the prefill loopchecks the return value. Cleaner long-term but touches more call sites including
ds4_distributed.c.Option A has a smaller diff and doesn't change the public callback signature.
--
On a Mac Studio M4 Max 128GB, q2-imatrix, cli flags: --ctx 192000 --kv-disk-dir /tmp/ds4-kv --kv-disk-space-mb 16384