feat(backend): chat context compaction by MesoX · Pull Request #261 · willdady/platypus

MesoX · 2026-06-15T12:19:24Z

Chat context compaction: keep chats alive past the model's context window

Design doc: docs/adr/0012-context-compaction.md (ADR-0012)

Problem

Chats hard-fail the moment their message history exceeds a model's context window. The AI SDK (ai@6) reports real token usage after each call, but it exposes no context-window metadata on the model interface and no pre-call tokenizer. Providers also disagree on whether the window is discoverable at all: Google, OpenRouter, and vLLM expose it via their APIs, while OpenAI, Anthropic, and Bedrock do not. On top of that, the existing error handling only covered auth, rate-limit, and 5xx failures, so a context-overflow rejection killed the turn outright.

What this PR does

This change adds a two-tier, view-not-delete compaction model. It is fed by a single token estimator, all of its durable state is mutated through a single versioned compare-and-swap (CAS) writer, an always-on recovery net catches the overflow errors that the proactive path misses, and a deterministic context-editing pass prunes stale, bulky tool results without ever calling a model. Because top-level chats and sub-agents both run through the shared agent-runner/ToolLoopAgent, one implementation covers both.

The full rationale — including every alternative we rejected and why — lives in ADR-0012. The key decisions, summarized from it, are below.

Load-bearing principles

View, not delete. The watermark and summary change what is sent to the model, never what is stored. Raw messages persist in the database untouched, so compaction is non-destructive in the data sense: a user can still read the full history, and a future "expand summary" UI comes for free. This reduces the "irreversible data loss" objection to a matter of UX courtesy rather than correctness. A summarized message is never hard-deleted.
One estimator. Token counting lives in exactly one function operating over one neutral structure (CountUnit[]). Both Tier 1 (UIMessages) and Tier 2 (ModelMessages) normalize into it, and both count only model-bound parts (text, tool-call, tool-result, file, image). Divergence between the two tiers is therefore impossible by construction, because a tier cannot fire on a number the other tier never sees.
One durable writer. Every mutation of compaction state (summaryWatermark, contextSummary, compactionDirty) goes through a single compare-and-swap keyed on a version column. Concurrency — for example a trigger run racing a user run — and the interaction between compaction and history-edit invalidation are both resolved by version rather than by comparing watermark values. That matters because a watermark can move backward on a history edit, and a value-based comparison would misread that as "not yet advanced" and write a stale summary over mutated history. On a CAS conflict the loser re-reads the row: if the winner already covered its prefix it skips, otherwise it retries once and then skips with a contended warning. There is no recompute loop and no livelock.
Recovery is the net. When a 400/413 overflow error is caught, the messages are aggressively trimmed in memory (through the same Tier 2 adapter, so there is no bespoke trimmer) and the call is retried exactly once. Recovery flags compactionDirty and leaves the durable compaction to the next turn. Crucially, recovery stays on even when proactive compaction is globally disabled — it is the last line of defense, not a risk surface.

Mechanisms

Window resolution. resolveContextWindow(provider, modelId) resolves a window per model, in order: a manual override (provider.modelMeta), then provider API auto-detection (Google / OpenRouter / vLLM), then the community-maintained litellm registry JSON (which covers OpenAI / Anthropic / Bedrock), and finally a conservative 8192 default. We deliberately do not maintain our own context-window table. Key normalization is boundary-safe, results are cached per provider+model with a TTL, the cache is evicted immediately on a modelMeta edit, default results use a short TTL, and API fetches use a 5 s timeout behind a single-flight guard.
Token estimation. The estimator uses char/4 over text parts only (it never runs char/4 over a base64 image) and a modality table to size non-text parts. It is used only on the first turn, before any provider usage exists; every later turn uses the real, provider-reported usage.inputTokens. We accept the first-turn imprecision — guarded by a 1.15 cold-start margin and the recovery net — rather than ship a per-provider tokenizer.
Tier 1 — cross-turn (durable). This runs in prepareChatTurn over the durable history. Its budget math works off the input budget (window − maxOutputReserve − safetyReserve), with a trigger ratio of 0.8 and a target of 0.5. The two ratios are deliberately distinct to give hysteresis, so compaction does not re-fire on the very next turn. The work is staged cheap-first: Stage 0 context-edits (pruning bulky old tool results with no model call), Stage 1 prunes the older prefix, and Stage 2 summarizes the prefix into a single synthetic message only if the conversation is still over target. Tool-call/result pairs stay atomic across the keep boundary, summarization is incremental (only the messages after the watermark), and a visible context-compacted event keeps the behavior fail-loud.
Tier 2 — intra-turn (in-memory). This handles a single heavy response whose own tool loop bloats the window mid-loop. It runs in the SDK prepareStep hook on both streamText and generateText, summarizing old completed tool results while keeping recent steps verbatim and preserving call/result pairing. It fires only when genuinely near the limit, and it is not persisted — the next turn's Tier 1 folds the result into the durable summary.
Sub-agents. Sub-agents start fresh on each invocation and carry no cross-turn history, so they use Tier 2 only. Each one resolves its own model's window and output limits, and recovery covers them as well through the shared runner.
Config and kill switch. Compaction behavior is global (DEFAULT_COMPACTION_CONFIG); only the window/output size is per model. Setting COMPACTION_ENABLED=false disables all proactive compaction in production without a deploy, and recovery ignores that flag. Per-agent tuning was shipped and then removed, because no surveyed tool exposes it and the ratios already self-normalize to the model window.

Frontend

Context-usage ring. A small ring next to the model selector fills as usedTokens / contextWindow, ramping from green to amber (≥ 0.7) to red (≥ 0.9), and rendering a neutral grey when the window is unknown or defaulted rather than guessing a ramp. The window comes from the currently selected model, and the numerator is the last response's peak contextTokens.
Per-message stats. An (i) popover under each assistant response shows input/output tokens, time-to-first-token, and total generation time, reusing the existing tool-call timing mechanism.
Force-compact on demand. The ring is clickable: POST /chats/:id/compact runs Tier 1 once regardless of the threshold and returns the post-compact usage so the ring refreshes immediately. A click is deferred while a response is streaming, and a confirmation dialog appears only when the drop would be significant.
Compaction trace. Compaction is surfaced as a synthetic compact_context tool-call/result pair that reuses the tool-call UI. The trace part is stripped before convertToModelMessages, so it never replays to the provider as a phantom tool call.

Notable rejected alternatives (full list in ADR §Considered Options)

Single-tier compaction (cross-turn only). Rejected because it cannot rescue a single response whose own tool loop overflows the window.
Hard-deleting or truncating old messages. Rejected because it is irreversible and reproduces the "silently drops the middle" failure mode seen in gateway truncation. View-not-delete keeps the data and makes the action auditable.
A homegrown context-window lookup table. Rejected because it is unmaintainable across providers; the litellm registry is the industry's "don't maintain your own table" answer.
A real pre-call tokenizer. Rejected for v1 because it is a heavy per-provider dependency for a number the provider returns accurately after the very first call.
Optimistic concurrency by comparing watermark values. Rejected because it breaks when invalidation moves the watermark backward; versioned CAS removes that monotonicity assumption.
Compacting only to the trigger threshold. Rejected because it re-fires every turn (the thrash failure mode); distinct trigger and target ratios are what provide hysteresis.
A token floor on the trigger (max(window × pct, 64000)). Rejected because it overflows sub-64k models, where the trigger would never fire. A floor belongs only on the window fallback, never on the trigger.

Schema changes (additive, nullable/defaulted)

provider.modelMeta (JSONB) — per-model window/output overrides.
chat/run gain contextSummary, summaryWatermark, compactionDirty, and version.
Migration: apps/backend/drizzle/0046_context_compaction.sql.

Rollout is lazy and there is no backfill: existing chats compact only on their next turn. An eager backfill would create a thundering herd of summarize calls.

Cross-tenant safety

The submit route verifies that the body id belongs to the caller's workspace before a run starts. The compaction store is keyed by chat id alone, so without that check one workspace could mutate another's summary and watermark.

Observability

The feature emits structured metric:-tagged log lines: compaction.fired, summarize.latency_ms, recovery.*, context_window.fell_to_default, litellm.key_miss, cas.conflict, and context_edited.

Config

New apps/backend/.env.example keys, all optional with built-in defaults: COMPACTION_ENABLED, COMPACTION_TRIGGER_RATIO, COMPACTION_TARGET_RATIO, COMPACTION_RESERVE_RATIO, COMPACTION_KEEP_RECENT, COMPACTION_MIN_PRUNABLE_CHARS, and COMPACTION_MIN_RECENT_PRUNABLE_CHARS.

Tests

New suites cover compaction.test.ts, context-window.test.ts, token-estimate.test.ts, and recovery.test.ts; existing suites for agent-runner.test.ts, chat.test.ts, chat-execution.test.ts, sub-agent.test.ts, and packages/schemas/index.test.ts were extended. Recovery's per-provider overflow regex is fixture-tested across OpenAI/vLLM, Anthropic, Google, and Bedrock.

Known limitations (ADR §Open/deferred)

A single oversized newest result. A tool result too large to fit even as the newest message still hard-errors; the fix is an ingestion cap at storage time, which is out of scope here.
A few items are consciously deferred and are not needed for the "no more hard fails" goal: a live Pending → Running compaction trace (it currently renders post-hoc only), the estimate-vs-real divergence metric (log-only for now), and content-type tool-result media accounting.

🤖 Generated with Claude Code

…paction Adds proactive + reactive context-window management so chats no longer hard-fail when history exceeds a model's window (docs/adr/0009). - Context-window resolution (context-window.ts + litellm-registry.ts): manual override → provider API → vendored litellm registry → conservative default, cached with TTL and evicted on modelMeta change. - Single token estimator over a neutral CountUnit structure (token-estimate.ts), shared by Tier 1 (UIMessages) and Tier 2 (ModelMessages). - Tier 1 cross-turn compaction (compaction.ts): prune-then-summarize behind a single versioned CAS watermark writer; hysteresis trigger/target ratios. - Tier 2 in-turn compaction via prepareStep; sub-agent wiring threads the window. - Recovery middleware (recovery.ts): detect provider context-overflow, trim and retry once; always on, independent of the proactive kill switch. - agent-runner wraps the model with recovery, threads RV1 prior-messages for C4 edit-detection, and stamps §H/§I run stats onto the assistant message. - Schema: additive nullable provider.modelMeta + chat compaction state + per-agent compaction config; migration 0046. RV2 workspace-scope check on chat submit; POST /:chatId/compact for on-demand compaction. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ations Surfaces the compaction work in the chat UI. - Context-usage ring beside the model selector (§H): fill = last input tokens / resolved window for the selected model; neutral grey when the window is unknown/default. Clickable to compact on demand (§J). - Per-message stats popover next to Regenerate (§I): input/output tokens, TTFT and total generation time from the server-stamped metadata.stats. - Tool-call run durations in the tool header, reusing server start/complete timestamps with a client-observed fallback (useToolDuration). The hook is written to satisfy the react-hooks purity / set-state-in-effect / refs rules: render reads only state, and all writes are deferred into timer callbacks. - Per-agent compaction config fields in the agent form. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

# Conflicts: # apps/backend/src/services/chat-execution.ts

…nds in /v1 vLLM and most OpenAI-compatible providers set baseUrl to "{root}/v1" (the OpenAI SDK needs it that way for chat calls). detectOpenAiCompatible appended "/v1/models" to that, producing "{root}/v1/v1/models" → 404 → the window silently fell to the 8192 default and the §H usage ring rendered "unknown". Strip a trailing "/v1" before building the models URL so the probe hits "{root}/v1/models" and reads max_model_len. Added a regression test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The §I stats popover (i) icon was gated on metadata.stats, which was only stamped post-stream and persisted to the DB — so it appeared a refetch round-trip after the answer finished, lagging the copy/delete/regenerate icons. Emit the stats via messageMetadata on the `finish` event so they ride the final stream chunk and the (i) appears the instant the answer completes. firstTokenAt is captured in streamText.onChunk (fires before finish) instead of the async snapshot drain. A single buildMessageStats() helper feeds both the streamed copy and the post-stream persist stamp (sharing one finishedAt) so live and reloaded stats match. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… trace 11a (U5): ring no longer jumps back to pre-compaction value on user Send. Expiry now tracks `assistantMessageCount` instead of `messages.length` so optimistic user-message pushes don't trigger early fallback. 11b: wrap the agent-info (i) button in a Tooltip so hover reveals its purpose without opening the dialog. Shows agent description or "Agent info". 11c (§K): Tier 1 compaction is now visible in the chat timeline. - Backend emits synthetic `compact_context` tool-call + tool-result chunks into the UIMessage stream (via `prependCompactionChunks`) immediately after the `start` event; the existing tool-call expander renders them for free. - `CompactionTrace` threads through Tier1Output → ChatTurn.compactionTrace → agent-runner.stream() — only when a model summary was produced (prune-only turns produce no trace to avoid empty/confusing entries). - `COMPACT_CONTEXT_TOOL_NAME` constant shared across stream producer, strip filter, §J message builder, and frontend display-name mapping. - `stripCompactionTraceParts` removes trace parts before ModelMessage conversion so the provider never sees the phantom tool call on replay. - `buildCompactionTraceMessage` builds a standalone assistant message for §J (forced compact endpoint — no live stream to inject into). - `title: "Context compaction"` on the chunk + tool.tsx display-name mapping shows a friendly label instead of the raw underscore name. - `context-usage-ring.tsx`: improve neutral-state tooltip copy. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Wraps the existing Popover trigger in a Tooltip using the TooltipTrigger asChild > PopoverTrigger asChild composition pattern. Hover shows compact stats (In/Out/TTFT/Total); click still opens the full popover. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Chunk 12: drop the per-agent context-compaction override surface. Compaction now runs from DEFAULT_COMPACTION_CONFIG (gated by the COMPACTION_ENABLED kill switch); per-model config lands separately. - schema.ts: drop 6 agent columns (compaction_enabled, trigger_ratio, target_ratio, reserve_ratio, keep_recent_messages, min_prunable_chars) - schemas: drop matching base fields + the targetRatio<triggerRatio hysteresis refine; slim agentCreate/agentUpdate picks - compaction.ts: delete CompactionConfigOverrides + resolveCompactionConfig - chat-execution.ts: build config from DEFAULT_COMPACTION_CONFIG - agent-form.tsx: remove the Context compaction form block + state - tests: drop the per-agent config cases Migration: rewrite 0046 to never add the agent columns (it has not reached main) + trim its snapshot, rather than stacking a drop migration. Already-migrated DBs (dev, test server) had the columns dropped manually. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Per-agent compaction config was removed in 625ff96; the `agent` parameter threaded into buildCompactionRuntime is now unused. Remove it from the args type, the destructure, and all three call sites. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

After Chunk 12 compaction config is global (DEFAULT_COMPACTION_CONFIG + COMPACTION_ENABLED kill switch). Add optional env overrides for the ceiling ratios so a test deployment can lower the trigger and exercise auto-compaction without filling a large context window. Unset/blank/invalid env values fall back to the built-in defaults, so production behavior is unchanged. Knobs: COMPACTION_TRIGGER_RATIO, COMPACTION_TARGET_RATIO, COMPACTION_RESERVE_RATIO, COMPACTION_KEEP_RECENT, COMPACTION_MIN_PRUNABLE_CHARS. Documented in .env.example. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…to compaction plan Survey of Hermes Agent, Codex CLI, Claude Code, Cline confirms the shipped design already implements real-token feeding (C1), input-tokens-only window (F1), reserve-carve trigger (C3 = Codex 90% cap), and two-layer proactive+ recovery (P4). Documents the Hermes token-floor (#14690) as a do-not-copy anti-pattern and defers three optional adds (message-count valve, maxOutput reserve floor, model-aware aggressiveness). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Capture the 2026-06-12 live test-server findings: per-step-timeout bug that kills pre-stream compaction (150s summarize > 120s watchdog), the 8,631-token runaway, and the lost-turn root cause. Add the four fixes (heartbeat, maxOutputTokens cap + concise prompt, stream-open-before-compaction with live Running status, abortSignal), the prior-art summarization-prompt survey (Claude Code / Codex / OpenCode / Hermes), the proposed replacement prompt, and the decided-against provider-model-selector. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…pt overhaul Fix 1 (CRITICAL): thread `onActivity` into `buildCompactionRuntime` and tick it every 10 s via `setInterval` while `generateText` runs inside the `summarize` closure. The 120 s per-step stall watchdog now sees regular activity during a slow summarize call instead of silence, and no longer kills the run before the summary is committed. Fix 2: add `maxOutputTokens: 2000` ceiling to the summarize `generateText` call as a hard backstop against the runaway-expansion failure mode observed in the live test (8,631-token output for a 6,178-token input). Log `finishReason` and emit a `warn` if the ceiling is hit so the event is visible in prod logs. Replace the unstructured one-liner system prompt with a structured four-section handoff prompt (Intent / Decisions / Files / Next step) that survives repeated re-compactions and includes an explicit "aim under ~1500 tokens" length target. Fix 3 (stream-before-compaction) deferred — heartbeat addresses the immediate timeout bug; the live-indicator refactor is a larger architectural change. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Address review findings on the chunk 13 summarize changes: - Reorder the handoff prompt sections most-critical-first (Intent → Current state & next step → Decisions → Files) so a truncation at the maxOutputTokens ceiling drops file/tool detail rather than the resume-critical intent and next step. Add a front-load instruction. - Thread the run's AbortSignal through prepareChatTurn and buildCompactionRuntime into the summarizer generateText call. The Fix 1 heartbeat suppresses the per-step stall watchdog during summarize, so without this a hung call would leak until the 10 min per-run timeout; now a cancel/timeout aborts it. - Drop the redundant onActivity wrapper at the call site (the optional event param already satisfies the () => void heartbeat signature). - Reconcile the conflicting token-count comments (~150-200 vs ~1500) around the output ceiling. - Add tests for the summarize closure: abort signal + ceiling + ordered prompt threading, the finishReason === length warn path, and the heartbeat tick/clear lifecycle. Export buildCompactionRuntime for test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Stage 2 compaction returned recent messages verbatim, so large tool results (e.g. MCP dumps) in the kept window dominated tokensAfter and prevented reaching targetTokens. Prune recent tool outputs with a higher threshold (minRecentPrunableChars, default minPrunableChars*5) and apply it across every over-target return path — including the empty-prefix / null-watermark bail where the whole history fits within keepRecentMessages. Warn when the post-compaction estimate still exceeds 2x target. Raise summarizer output ceiling 2000 -> 4000 to catch models that ignore the 1500-token prompt limit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Three tasks: (1) Tier 1 recent-trim safety (64232d8 shipped; option D overflow-gate + exempt-newest remaining), (2) context-editing-style prune of kept tool results (needs review + bigger plan; session iteration captured), (3) ingestion cap for oversized MCP/sub-agent results (upstream-issue candidate, marked not filed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…newest Tier 1 recent (kept) tool results are now trimmed only when the kept view would breach the hard window wall (inputBudget), not on a soft targetTokens (hysteresis) miss. A soft miss is left at full fidelity and re-compacts next turn. The single newest message is always exempt. - UICompactOptions.inputBudget threaded from applyTier1Compaction as max(0, budget.inputBudget - overheadTokens) (mirrors effectiveTarget). - keepRecentWithinWall applied on both over-target paths (Stage 2 + empty prefix); omitted budget falls back to always-trim (pre-option-D guard). Review fixes: - Re-gate the over-target warning to afterEstimate > inputBudget; the old target*2 heuristic fired on every healthy compaction under a low target ratio. Falls back to target*2 when no wall is supplied. - keepRecentWithinWall returns the recent estimate to avoid a double pass. Tests: 57 pass (option-D verbatim/trim/empty-prefix/no-warn cases). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… elision) editToolResults + elidedToolPlaceholder: pure view transform eliding OLD bulky tool-result bodies to a self-describing placeholder before the Tier 1 trigger, so a leaned view can skip summarization. Recency by tool-result count, newest message exempt, size-gated; no durable state (P1). Post-review hardening: grow-guard (never elide when placeholder >= output; no prompt inflation / negative reclaim) + idempotency guard, both decided up front into a Map so the rewrite runs only on real work and never copies on a no-op. 3 config fields + 3 env overrides. +10 tests; runs/ suite 193 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ction-clean # Conflicts: # apps/backend/src/runs/agent-runner.test.ts # apps/backend/src/runs/agent-runner.ts # apps/backend/src/services/chat-execution.ts # apps/backend/src/tools/sub-agent.ts

Defensive fixes to context compaction surfaced by chunk 14 review: - A1: projectTier1Tokens treats lastInputTokens<=0 as no-baseline (usage-less gateways persist contextTokens=0); findLastInputTokens skips trace messages and zero-count turns symmetrically. - A2: no-dimension image token fallback uses pessimistic per-provider ceilings (Anthropic 1600 / OpenAI-high 2000) instead of flat 1200. - A6: cap output reserve at half the window so an input-scoped contextWindow minus a large max_output_tokens cannot collapse inputBudget and thrash. - B-F1/C2: range-validate compaction env overrides (warn + default on out-of-range); restore target<trigger hysteresis clamp lost when chunk 12 removed resolveCompactionConfig. - B-F3: wrap sub-agent models with overflow-recovery middleware (built always; recovery is the P4 net even under the kill switch), guarded on a model instance so a string id degrades instead of dropping the sub-agent. - B-F7: report same-basis tokensBefore/After in compaction.fired. - B-F8/A4: forceCompactChat appends the trace via atomic jsonb concatenation instead of overwriting the whole column. - T10 deviation: version-mismatch skip no longer clears dirty (a concurrent invalidate also bumps version but leaves dirty on purpose). Typecheck clean; compaction + token-estimate suites 92 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Map-reduce summarizer no longer re-overflows: summarizePrefix checks the folded size (prior summary + framing) and recurses the reduce step; chunks split on message boundaries via packSegments instead of arbitrary char slices. Force-compact confirm now gates on significance (ADR-0012 §Force-compact): /compact returns tokensBefore/messagesDropped/keepRecentMessages and the context-window endpoint exposes keepRecentMessages so the client confirms only when the drop is significant, else runs immediately. Token estimator counts errorText on output-error UI parts, restoring the UI/Model adapter count equality (§One estimator). Sub-agent recovery/Tier 2 subtract per-sub-agent overhead. Extracted resolveCompactionConfig so the runtime and the context-window route share one config source. Also: drop fabricated Qwen3-72B registry key, rename ADR 0009→0012 (resolves collision with the invitation ADR-0009), refresh stale plan/0009 references, ring amber via CSS var with hex fallback, add cross-tenant 404 submit test, remove shipped internal plan doc. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

CI `pnpm lint` failed on the compaction branch under type-aware rules: - require-await: de-async test mocks + two prod helpers (loadBuiltinRegistry, the default loadRegistry slot) that never await; return Promise.resolve/reject to preserve behavior. - no-unnecessary-type-assertion: drop redundant `as` casts (autofix). - no-unsafe-* / no-explicit-any: type the captured mock-call args in chat-execution/sub-agent tests instead of leaking `any`. - no-useless-assignment: drop the dead `agent` local in the force-compact path. No behavior change. Lint, typecheck, and 203 affected tests all pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

frantisek.spacek@morosystems.cz and others added 22 commits June 11, 2026 20:26

Merge branch 'main' into feature/context-compaction-clean

fa78e7a

# Conflicts: # apps/backend/src/services/chat-execution.ts

chore(backend): add compaction.check debug log at trigger decision point

55fc2f0

Merge remote-tracking branch 'origin/main' into feature/context-compa…

df533db

…ction-clean # Conflicts: # apps/backend/src/runs/agent-runner.test.ts # apps/backend/src/runs/agent-runner.ts # apps/backend/src/services/chat-execution.ts # apps/backend/src/tools/sub-agent.ts

MesoX changed the title ~~feat(backend): chat context compaction (ADR-0012)~~ feat(backend): chat context compaction Jun 15, 2026

This was referenced Jun 15, 2026

MCP tool-set bloat inflates agent context on every step — introduce optional MCP Profiles (tool subsets) #179

Open

feat(org): configurable agent-run timeouts with env-clamped ceiling #168

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(backend): chat context compaction#261

feat(backend): chat context compaction#261
MesoX wants to merge 23 commits into
willdady:mainfrom
MesoX:feature/context-compaction-clean

MesoX commented Jun 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MesoX commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Chat context compaction: keep chats alive past the model's context window

Problem

What this PR does

Load-bearing principles

Mechanisms

Frontend

Notable rejected alternatives (full list in ADR §Considered Options)

Schema changes (additive, nullable/defaulted)

Cross-tenant safety

Observability

Config

Tests

Known limitations (ADR §Open/deferred)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MesoX commented Jun 15, 2026 •

edited

Loading