Skip to content

feat(backend): chat context compaction#261

Open
MesoX wants to merge 23 commits into
willdady:mainfrom
MesoX:feature/context-compaction-clean
Open

feat(backend): chat context compaction#261
MesoX wants to merge 23 commits into
willdady:mainfrom
MesoX:feature/context-compaction-clean

Conversation

@MesoX

@MesoX MesoX commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Chat context compaction: keep chats alive past the model's context window

Design doc: docs/adr/0012-context-compaction.md (ADR-0012)

Problem

Chats hard-fail the moment their message history exceeds a model's context window. The AI SDK (ai@6) reports real token usage after each call, but it exposes no context-window metadata on the model interface and no pre-call tokenizer. Providers also disagree on whether the window is discoverable at all: Google, OpenRouter, and vLLM expose it via their APIs, while OpenAI, Anthropic, and Bedrock do not. On top of that, the existing error handling only covered auth, rate-limit, and 5xx failures, so a context-overflow rejection killed the turn outright.

What this PR does

This change adds a two-tier, view-not-delete compaction model. It is fed by a single token estimator, all of its durable state is mutated through a single versioned compare-and-swap (CAS) writer, an always-on recovery net catches the overflow errors that the proactive path misses, and a deterministic context-editing pass prunes stale, bulky tool results without ever calling a model. Because top-level chats and sub-agents both run through the shared agent-runner/ToolLoopAgent, one implementation covers both.

The full rationale — including every alternative we rejected and why — lives in ADR-0012. The key decisions, summarized from it, are below.

Load-bearing principles

  • View, not delete. The watermark and summary change what is sent to the model, never what is stored. Raw messages persist in the database untouched, so compaction is non-destructive in the data sense: a user can still read the full history, and a future "expand summary" UI comes for free. This reduces the "irreversible data loss" objection to a matter of UX courtesy rather than correctness. A summarized message is never hard-deleted.
  • One estimator. Token counting lives in exactly one function operating over one neutral structure (CountUnit[]). Both Tier 1 (UIMessages) and Tier 2 (ModelMessages) normalize into it, and both count only model-bound parts (text, tool-call, tool-result, file, image). Divergence between the two tiers is therefore impossible by construction, because a tier cannot fire on a number the other tier never sees.
  • One durable writer. Every mutation of compaction state (summaryWatermark, contextSummary, compactionDirty) goes through a single compare-and-swap keyed on a version column. Concurrency — for example a trigger run racing a user run — and the interaction between compaction and history-edit invalidation are both resolved by version rather than by comparing watermark values. That matters because a watermark can move backward on a history edit, and a value-based comparison would misread that as "not yet advanced" and write a stale summary over mutated history. On a CAS conflict the loser re-reads the row: if the winner already covered its prefix it skips, otherwise it retries once and then skips with a contended warning. There is no recompute loop and no livelock.
  • Recovery is the net. When a 400/413 overflow error is caught, the messages are aggressively trimmed in memory (through the same Tier 2 adapter, so there is no bespoke trimmer) and the call is retried exactly once. Recovery flags compactionDirty and leaves the durable compaction to the next turn. Crucially, recovery stays on even when proactive compaction is globally disabled — it is the last line of defense, not a risk surface.

Mechanisms

  • Window resolution. resolveContextWindow(provider, modelId) resolves a window per model, in order: a manual override (provider.modelMeta), then provider API auto-detection (Google / OpenRouter / vLLM), then the community-maintained litellm registry JSON (which covers OpenAI / Anthropic / Bedrock), and finally a conservative 8192 default. We deliberately do not maintain our own context-window table. Key normalization is boundary-safe, results are cached per provider+model with a TTL, the cache is evicted immediately on a modelMeta edit, default results use a short TTL, and API fetches use a 5 s timeout behind a single-flight guard.
  • Token estimation. The estimator uses char/4 over text parts only (it never runs char/4 over a base64 image) and a modality table to size non-text parts. It is used only on the first turn, before any provider usage exists; every later turn uses the real, provider-reported usage.inputTokens. We accept the first-turn imprecision — guarded by a 1.15 cold-start margin and the recovery net — rather than ship a per-provider tokenizer.
  • Tier 1 — cross-turn (durable). This runs in prepareChatTurn over the durable history. Its budget math works off the input budget (window − maxOutputReserve − safetyReserve), with a trigger ratio of 0.8 and a target of 0.5. The two ratios are deliberately distinct to give hysteresis, so compaction does not re-fire on the very next turn. The work is staged cheap-first: Stage 0 context-edits (pruning bulky old tool results with no model call), Stage 1 prunes the older prefix, and Stage 2 summarizes the prefix into a single synthetic message only if the conversation is still over target. Tool-call/result pairs stay atomic across the keep boundary, summarization is incremental (only the messages after the watermark), and a visible context-compacted event keeps the behavior fail-loud.
  • Tier 2 — intra-turn (in-memory). This handles a single heavy response whose own tool loop bloats the window mid-loop. It runs in the SDK prepareStep hook on both streamText and generateText, summarizing old completed tool results while keeping recent steps verbatim and preserving call/result pairing. It fires only when genuinely near the limit, and it is not persisted — the next turn's Tier 1 folds the result into the durable summary.
  • Sub-agents. Sub-agents start fresh on each invocation and carry no cross-turn history, so they use Tier 2 only. Each one resolves its own model's window and output limits, and recovery covers them as well through the shared runner.
  • Config and kill switch. Compaction behavior is global (DEFAULT_COMPACTION_CONFIG); only the window/output size is per model. Setting COMPACTION_ENABLED=false disables all proactive compaction in production without a deploy, and recovery ignores that flag. Per-agent tuning was shipped and then removed, because no surveyed tool exposes it and the ratios already self-normalize to the model window.

Frontend

  • Context-usage ring. A small ring next to the model selector fills as usedTokens / contextWindow, ramping from green to amber (≥ 0.7) to red (≥ 0.9), and rendering a neutral grey when the window is unknown or defaulted rather than guessing a ramp. The window comes from the currently selected model, and the numerator is the last response's peak contextTokens.
  • Per-message stats. An (i) popover under each assistant response shows input/output tokens, time-to-first-token, and total generation time, reusing the existing tool-call timing mechanism.
  • Force-compact on demand. The ring is clickable: POST /chats/:id/compact runs Tier 1 once regardless of the threshold and returns the post-compact usage so the ring refreshes immediately. A click is deferred while a response is streaming, and a confirmation dialog appears only when the drop would be significant.
  • Compaction trace. Compaction is surfaced as a synthetic compact_context tool-call/result pair that reuses the tool-call UI. The trace part is stripped before convertToModelMessages, so it never replays to the provider as a phantom tool call.

Notable rejected alternatives (full list in ADR §Considered Options)

  • Single-tier compaction (cross-turn only). Rejected because it cannot rescue a single response whose own tool loop overflows the window.
  • Hard-deleting or truncating old messages. Rejected because it is irreversible and reproduces the "silently drops the middle" failure mode seen in gateway truncation. View-not-delete keeps the data and makes the action auditable.
  • A homegrown context-window lookup table. Rejected because it is unmaintainable across providers; the litellm registry is the industry's "don't maintain your own table" answer.
  • A real pre-call tokenizer. Rejected for v1 because it is a heavy per-provider dependency for a number the provider returns accurately after the very first call.
  • Optimistic concurrency by comparing watermark values. Rejected because it breaks when invalidation moves the watermark backward; versioned CAS removes that monotonicity assumption.
  • Compacting only to the trigger threshold. Rejected because it re-fires every turn (the thrash failure mode); distinct trigger and target ratios are what provide hysteresis.
  • A token floor on the trigger (max(window × pct, 64000)). Rejected because it overflows sub-64k models, where the trigger would never fire. A floor belongs only on the window fallback, never on the trigger.

Schema changes (additive, nullable/defaulted)

  • provider.modelMeta (JSONB) — per-model window/output overrides.
  • chat/run gain contextSummary, summaryWatermark, compactionDirty, and version.
  • Migration: apps/backend/drizzle/0046_context_compaction.sql.

Rollout is lazy and there is no backfill: existing chats compact only on their next turn. An eager backfill would create a thundering herd of summarize calls.

Cross-tenant safety

The submit route verifies that the body id belongs to the caller's workspace before a run starts. The compaction store is keyed by chat id alone, so without that check one workspace could mutate another's summary and watermark.

Observability

The feature emits structured metric:-tagged log lines: compaction.fired, summarize.latency_ms, recovery.*, context_window.fell_to_default, litellm.key_miss, cas.conflict, and context_edited.

Config

New apps/backend/.env.example keys, all optional with built-in defaults: COMPACTION_ENABLED, COMPACTION_TRIGGER_RATIO, COMPACTION_TARGET_RATIO, COMPACTION_RESERVE_RATIO, COMPACTION_KEEP_RECENT, COMPACTION_MIN_PRUNABLE_CHARS, and COMPACTION_MIN_RECENT_PRUNABLE_CHARS.

Tests

New suites cover compaction.test.ts, context-window.test.ts, token-estimate.test.ts, and recovery.test.ts; existing suites for agent-runner.test.ts, chat.test.ts, chat-execution.test.ts, sub-agent.test.ts, and packages/schemas/index.test.ts were extended. Recovery's per-provider overflow regex is fixture-tested across OpenAI/vLLM, Anthropic, Google, and Bedrock.

Known limitations (ADR §Open/deferred)

  • A single oversized newest result. A tool result too large to fit even as the newest message still hard-errors; the fix is an ingestion cap at storage time, which is out of scope here.
  • A few items are consciously deferred and are not needed for the "no more hard fails" goal: a live Pending → Running compaction trace (it currently renders post-hoc only), the estimate-vs-real divergence metric (log-only for now), and content-type tool-result media accounting.

🤖 Generated with Claude Code

frantisek.spacek@morosystems.cz and others added 22 commits June 11, 2026 20:26
…paction

Adds proactive + reactive context-window management so chats no longer hard-fail
when history exceeds a model's window (docs/adr/0009).

- Context-window resolution (context-window.ts + litellm-registry.ts): manual
  override → provider API → vendored litellm registry → conservative default,
  cached with TTL and evicted on modelMeta change.
- Single token estimator over a neutral CountUnit structure (token-estimate.ts),
  shared by Tier 1 (UIMessages) and Tier 2 (ModelMessages).
- Tier 1 cross-turn compaction (compaction.ts): prune-then-summarize behind a
  single versioned CAS watermark writer; hysteresis trigger/target ratios.
- Tier 2 in-turn compaction via prepareStep; sub-agent wiring threads the window.
- Recovery middleware (recovery.ts): detect provider context-overflow, trim and
  retry once; always on, independent of the proactive kill switch.
- agent-runner wraps the model with recovery, threads RV1 prior-messages for
  C4 edit-detection, and stamps §H/§I run stats onto the assistant message.
- Schema: additive nullable provider.modelMeta + chat compaction state + per-agent
  compaction config; migration 0046. RV2 workspace-scope check on chat submit;
  POST /:chatId/compact for on-demand compaction.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ations

Surfaces the compaction work in the chat UI.

- Context-usage ring beside the model selector (§H): fill = last input tokens /
  resolved window for the selected model; neutral grey when the window is
  unknown/default. Clickable to compact on demand (§J).
- Per-message stats popover next to Regenerate (§I): input/output tokens, TTFT
  and total generation time from the server-stamped metadata.stats.
- Tool-call run durations in the tool header, reusing server start/complete
  timestamps with a client-observed fallback (useToolDuration). The hook is
  written to satisfy the react-hooks purity / set-state-in-effect / refs rules:
  render reads only state, and all writes are deferred into timer callbacks.
- Per-agent compaction config fields in the agent form.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts:
#	apps/backend/src/services/chat-execution.ts
…nds in /v1

vLLM and most OpenAI-compatible providers set baseUrl to "{root}/v1" (the
OpenAI SDK needs it that way for chat calls). detectOpenAiCompatible appended
"/v1/models" to that, producing "{root}/v1/v1/models" → 404 → the window
silently fell to the 8192 default and the §H usage ring rendered "unknown".
Strip a trailing "/v1" before building the models URL so the probe hits
"{root}/v1/models" and reads max_model_len. Added a regression test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The §I stats popover (i) icon was gated on metadata.stats, which was only
stamped post-stream and persisted to the DB — so it appeared a refetch
round-trip after the answer finished, lagging the copy/delete/regenerate icons.

Emit the stats via messageMetadata on the `finish` event so they ride the final
stream chunk and the (i) appears the instant the answer completes. firstTokenAt
is captured in streamText.onChunk (fires before finish) instead of the async
snapshot drain. A single buildMessageStats() helper feeds both the streamed copy
and the post-stream persist stamp (sharing one finishedAt) so live and reloaded
stats match.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… trace

11a (U5): ring no longer jumps back to pre-compaction value on user Send.
Expiry now tracks `assistantMessageCount` instead of `messages.length` so
optimistic user-message pushes don't trigger early fallback.

11b: wrap the agent-info (i) button in a Tooltip so hover reveals its
purpose without opening the dialog. Shows agent description or "Agent info".

11c (§K): Tier 1 compaction is now visible in the chat timeline.
- Backend emits synthetic `compact_context` tool-call + tool-result chunks
  into the UIMessage stream (via `prependCompactionChunks`) immediately after
  the `start` event; the existing tool-call expander renders them for free.
- `CompactionTrace` threads through Tier1Output → ChatTurn.compactionTrace →
  agent-runner.stream() — only when a model summary was produced (prune-only
  turns produce no trace to avoid empty/confusing entries).
- `COMPACT_CONTEXT_TOOL_NAME` constant shared across stream producer, strip
  filter, §J message builder, and frontend display-name mapping.
- `stripCompactionTraceParts` removes trace parts before ModelMessage
  conversion so the provider never sees the phantom tool call on replay.
- `buildCompactionTraceMessage` builds a standalone assistant message for §J
  (forced compact endpoint — no live stream to inject into).
- `title: "Context compaction"` on the chunk + tool.tsx display-name mapping
  shows a friendly label instead of the raw underscore name.
- `context-usage-ring.tsx`: improve neutral-state tooltip copy.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wraps the existing Popover trigger in a Tooltip using the
TooltipTrigger asChild > PopoverTrigger asChild composition pattern.
Hover shows compact stats (In/Out/TTFT/Total); click still opens
the full popover.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Chunk 12: drop the per-agent context-compaction override surface.
Compaction now runs from DEFAULT_COMPACTION_CONFIG (gated by the
COMPACTION_ENABLED kill switch); per-model config lands separately.

- schema.ts: drop 6 agent columns (compaction_enabled, trigger_ratio,
  target_ratio, reserve_ratio, keep_recent_messages, min_prunable_chars)
- schemas: drop matching base fields + the targetRatio<triggerRatio
  hysteresis refine; slim agentCreate/agentUpdate picks
- compaction.ts: delete CompactionConfigOverrides + resolveCompactionConfig
- chat-execution.ts: build config from DEFAULT_COMPACTION_CONFIG
- agent-form.tsx: remove the Context compaction form block + state
- tests: drop the per-agent config cases

Migration: rewrite 0046 to never add the agent columns (it has not
reached main) + trim its snapshot, rather than stacking a drop
migration. Already-migrated DBs (dev, test server) had the columns
dropped manually.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per-agent compaction config was removed in 625ff96; the `agent` parameter
threaded into buildCompactionRuntime is now unused. Remove it from the args
type, the destructure, and all three call sites.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
After Chunk 12 compaction config is global (DEFAULT_COMPACTION_CONFIG +
COMPACTION_ENABLED kill switch). Add optional env overrides for the ceiling
ratios so a test deployment can lower the trigger and exercise auto-compaction
without filling a large context window.

Unset/blank/invalid env values fall back to the built-in defaults, so
production behavior is unchanged. Knobs: COMPACTION_TRIGGER_RATIO,
COMPACTION_TARGET_RATIO, COMPACTION_RESERVE_RATIO, COMPACTION_KEEP_RECENT,
COMPACTION_MIN_PRUNABLE_CHARS. Documented in .env.example.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…to compaction plan

Survey of Hermes Agent, Codex CLI, Claude Code, Cline confirms the shipped
design already implements real-token feeding (C1), input-tokens-only window
(F1), reserve-carve trigger (C3 = Codex 90% cap), and two-layer proactive+
recovery (P4). Documents the Hermes token-floor (#14690) as a do-not-copy
anti-pattern and defers three optional adds (message-count valve, maxOutput
reserve floor, model-aware aggressiveness).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Capture the 2026-06-12 live test-server findings: per-step-timeout bug that
kills pre-stream compaction (150s summarize > 120s watchdog), the 8,631-token
runaway, and the lost-turn root cause. Add the four fixes (heartbeat,
maxOutputTokens cap + concise prompt, stream-open-before-compaction with live
Running status, abortSignal), the prior-art summarization-prompt survey
(Claude Code / Codex / OpenCode / Hermes), the proposed replacement prompt,
and the decided-against provider-model-selector.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…pt overhaul

Fix 1 (CRITICAL): thread `onActivity` into `buildCompactionRuntime` and tick it
every 10 s via `setInterval` while `generateText` runs inside the `summarize`
closure. The 120 s per-step stall watchdog now sees regular activity during a
slow summarize call instead of silence, and no longer kills the run before the
summary is committed.

Fix 2: add `maxOutputTokens: 2000` ceiling to the summarize `generateText` call
as a hard backstop against the runaway-expansion failure mode observed in the
live test (8,631-token output for a 6,178-token input). Log `finishReason` and
emit a `warn` if the ceiling is hit so the event is visible in prod logs.
Replace the unstructured one-liner system prompt with a structured four-section
handoff prompt (Intent / Decisions / Files / Next step) that survives repeated
re-compactions and includes an explicit "aim under ~1500 tokens" length target.

Fix 3 (stream-before-compaction) deferred — heartbeat addresses the immediate
timeout bug; the live-indicator refactor is a larger architectural change.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Address review findings on the chunk 13 summarize changes:

- Reorder the handoff prompt sections most-critical-first (Intent →
  Current state & next step → Decisions → Files) so a truncation at the
  maxOutputTokens ceiling drops file/tool detail rather than the
  resume-critical intent and next step. Add a front-load instruction.
- Thread the run's AbortSignal through prepareChatTurn and
  buildCompactionRuntime into the summarizer generateText call. The Fix 1
  heartbeat suppresses the per-step stall watchdog during summarize, so
  without this a hung call would leak until the 10 min per-run timeout;
  now a cancel/timeout aborts it.
- Drop the redundant onActivity wrapper at the call site (the optional
  event param already satisfies the () => void heartbeat signature).
- Reconcile the conflicting token-count comments (~150-200 vs ~1500)
  around the output ceiling.
- Add tests for the summarize closure: abort signal + ceiling + ordered
  prompt threading, the finishReason === length warn path, and the
  heartbeat tick/clear lifecycle. Export buildCompactionRuntime for test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Stage 2 compaction returned recent messages verbatim, so large tool
results (e.g. MCP dumps) in the kept window dominated tokensAfter and
prevented reaching targetTokens. Prune recent tool outputs with a
higher threshold (minRecentPrunableChars, default minPrunableChars*5)
and apply it across every over-target return path — including the
empty-prefix / null-watermark bail where the whole history fits within
keepRecentMessages. Warn when the post-compaction estimate still
exceeds 2x target. Raise summarizer output ceiling 2000 -> 4000 to
catch models that ignore the 1500-token prompt limit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Three tasks: (1) Tier 1 recent-trim safety (64232d8 shipped; option D
overflow-gate + exempt-newest remaining), (2) context-editing-style
prune of kept tool results (needs review + bigger plan; session
iteration captured), (3) ingestion cap for oversized MCP/sub-agent
results (upstream-issue candidate, marked not filed).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…newest

Tier 1 recent (kept) tool results are now trimmed only when the kept view
would breach the hard window wall (inputBudget), not on a soft targetTokens
(hysteresis) miss. A soft miss is left at full fidelity and re-compacts next
turn. The single newest message is always exempt.

- UICompactOptions.inputBudget threaded from applyTier1Compaction as
  max(0, budget.inputBudget - overheadTokens) (mirrors effectiveTarget).
- keepRecentWithinWall applied on both over-target paths (Stage 2 + empty
  prefix); omitted budget falls back to always-trim (pre-option-D guard).

Review fixes:
- Re-gate the over-target warning to afterEstimate > inputBudget; the old
  target*2 heuristic fired on every healthy compaction under a low target
  ratio. Falls back to target*2 when no wall is supplied.
- keepRecentWithinWall returns the recent estimate to avoid a double pass.

Tests: 57 pass (option-D verbatim/trim/empty-prefix/no-warn cases).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… elision)

editToolResults + elidedToolPlaceholder: pure view transform eliding OLD
bulky tool-result bodies to a self-describing placeholder before the Tier 1
trigger, so a leaned view can skip summarization. Recency by tool-result
count, newest message exempt, size-gated; no durable state (P1).

Post-review hardening: grow-guard (never elide when placeholder >= output;
no prompt inflation / negative reclaim) + idempotency guard, both decided up
front into a Map so the rewrite runs only on real work and never copies on a
no-op. 3 config fields + 3 env overrides. +10 tests; runs/ suite 193 pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ction-clean

# Conflicts:
#	apps/backend/src/runs/agent-runner.test.ts
#	apps/backend/src/runs/agent-runner.ts
#	apps/backend/src/services/chat-execution.ts
#	apps/backend/src/tools/sub-agent.ts
Defensive fixes to context compaction surfaced by chunk 14 review:

- A1: projectTier1Tokens treats lastInputTokens<=0 as no-baseline
  (usage-less gateways persist contextTokens=0); findLastInputTokens
  skips trace messages and zero-count turns symmetrically.
- A2: no-dimension image token fallback uses pessimistic per-provider
  ceilings (Anthropic 1600 / OpenAI-high 2000) instead of flat 1200.
- A6: cap output reserve at half the window so an input-scoped
  contextWindow minus a large max_output_tokens cannot collapse
  inputBudget and thrash.
- B-F1/C2: range-validate compaction env overrides (warn + default on
  out-of-range); restore target<trigger hysteresis clamp lost when
  chunk 12 removed resolveCompactionConfig.
- B-F3: wrap sub-agent models with overflow-recovery middleware (built
  always; recovery is the P4 net even under the kill switch), guarded
  on a model instance so a string id degrades instead of dropping the
  sub-agent.
- B-F7: report same-basis tokensBefore/After in compaction.fired.
- B-F8/A4: forceCompactChat appends the trace via atomic jsonb
  concatenation instead of overwriting the whole column.
- T10 deviation: version-mismatch skip no longer clears dirty (a
  concurrent invalidate also bumps version but leaves dirty on purpose).

Typecheck clean; compaction + token-estimate suites 92 pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Map-reduce summarizer no longer re-overflows: summarizePrefix checks the
folded size (prior summary + framing) and recurses the reduce step; chunks
split on message boundaries via packSegments instead of arbitrary char slices.

Force-compact confirm now gates on significance (ADR-0012 §Force-compact):
/compact returns tokensBefore/messagesDropped/keepRecentMessages and the
context-window endpoint exposes keepRecentMessages so the client confirms only
when the drop is significant, else runs immediately.

Token estimator counts errorText on output-error UI parts, restoring the
UI/Model adapter count equality (§One estimator). Sub-agent recovery/Tier 2
subtract per-sub-agent overhead. Extracted resolveCompactionConfig so the
runtime and the context-window route share one config source.

Also: drop fabricated Qwen3-72B registry key, rename ADR 0009→0012 (resolves
collision with the invitation ADR-0009), refresh stale plan/0009 references,
ring amber via CSS var with hex fallback, add cross-tenant 404 submit test,
remove shipped internal plan doc.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@MesoX MesoX changed the title feat(backend): chat context compaction (ADR-0012) feat(backend): chat context compaction Jun 15, 2026
CI `pnpm lint` failed on the compaction branch under type-aware rules:
- require-await: de-async test mocks + two prod helpers (loadBuiltinRegistry,
  the default loadRegistry slot) that never await; return Promise.resolve/reject
  to preserve behavior.
- no-unnecessary-type-assertion: drop redundant `as` casts (autofix).
- no-unsafe-* / no-explicit-any: type the captured mock-call args in
  chat-execution/sub-agent tests instead of leaking `any`.
- no-useless-assignment: drop the dead `agent` local in the force-compact path.

No behavior change. Lint, typecheck, and 203 affected tests all pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant