Skip to content

Commit ffe668f

Browse files
unamedkrclaude
andcommitted
fix: load_context resets chat state + KV cache saves 57 tokens (#83)
Two fixes: 1. quant_load_context now resets cached_text/cached_tokens to prevent stale text-prefix matching in quant_chat after context restore. 2. KV cache pre-build uses quant_chat() for prefill (saves 57 tokens vs 1 token with quant_generate). Status: KV save/load works correctly (57 tokens round-trip verified). Speed: 4.5s cached lookup vs 15s regular (3.3x faster). Remaining: loaded context + new question produces inaccurate answers. Root cause: quant_chat's slow path re-prefills the entire new prompt, overwriting the loaded KV cache. Needs a new API (quant_continue_from_cache) that appends tokens starting at position n_ctx_tokens instead of 0. Refs #83 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 296886b commit ffe668f

File tree

1 file changed

+11
-0
lines changed

1 file changed

+11
-0
lines changed

quant.h

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17362,6 +17362,17 @@ int quant_load_context(quant_ctx* ctx, const char* path) {
1736217362

1736317363
/* Restore position */
1736417364
ctx->n_ctx_tokens = (int)nt;
17365+
17366+
/* Reset chat state so quant_chat() treats this as a fresh session
17367+
* with pre-filled KV cache. Without this, quant_chat's text-prefix
17368+
* matching sees stale cached_text and produces misaligned output.
17369+
* The next quant_chat() call will re-tokenize its prompt and prefill
17370+
* starting from position nt (where the loaded KV ends). */
17371+
if (ctx->cached_text) { free(ctx->cached_text); ctx->cached_text = NULL; }
17372+
if (ctx->cached_tokens) { free(ctx->cached_tokens); ctx->cached_tokens = NULL; }
17373+
ctx->n_cached = 0;
17374+
ctx->cached_capacity = 0;
17375+
1736517376
fclose(fp);
1736617377
fprintf(stderr, "quant_load_context: restored %u tokens (%u layers) from %s\n",
1736717378
nt, nl, path);

0 commit comments

Comments
 (0)