Problem
Sessions have no per-session token cap. A pathological session (long-running CC instance, runaway tool loop, agent stuck in a 50-turn correction cycle) can rack up arbitrary spend on agentmemory's background compress / summarize / consolidate calls. The existing AGENTMEMORY_LLM_TIMEOUT_MS only caps per-call duration. No safety net at the session level.
Cost-aware model selection (#613) covers per-token cost. This issue covers per-session-total cost.
Proposed shape
Per-session running budget with hard cap + soft warning threshold.
iii composition:
- New KV scope
mem:session-budget keyed by sessionId: { tokenCap, tokensUsed, costEstimate, warnEmittedAt?, exhaustedAt? }
- iii function
mem::session::budget::init({ sessionId, tokenCap? }) writes initial state on session-start
- iii function
mem::session::budget::record({ sessionId, inputTokens, outputTokens, model }) increments after each LLM call
- Provider wrapper (in
ResilientProvider) reads active sessionId via AsyncLocalStorage, increments budget after each call, blocks future calls if cap exceeded
- Cron trigger reaps budgets for sessions where
endedAt + retentionDays passed
Sessionid resolution via AsyncLocalStorage
Provider doesn't directly know sessionId today. New sessionContext AsyncLocalStorage scopes every iii function call. mem::observe, mem::compress, mem::summarize, mem::consolidate-pipeline enter the ALS scope with their sessionId at the top. Provider wrapper reads from ALS — falls back to "unknown" sessionId for system-triggered calls (cron sweeps).
Defaults
tokenCap default: 100k tokens per session. Configurable globally via AGENTMEMORY_SESSION_TOKEN_CAP. Per-session override via mem::session::start payload.
- Soft warning at 80% — emits
event::mem::budget::soft-warned for downstream subscribers (viewer alert).
- Hard cap blocks further LLM calls — emits
event::mem::budget::exhausted. Subsequent compress/summarize calls return synthetic-only output (no LLM).
Edge cases
- Concurrent calls for same session — atomic increment via iii state
update op. Already supported.
- Failed calls — increment with
0 input/output tokens in finally block. Don't double-count partial calls.
- Counter never incremented — provider returns synthetically without LLM call → no increment. Correct behavior.
- Per-model cost normalization — record raw tokens, normalize to USD at display time using configurable rate table (defaults from
cost-aware model selection table in README).
- Forked session inherits used count? — fresh per fork. Each fork gets its own budget.
- Budget across server restart — KV-persisted, recovers on next iii-state boot.
- System-triggered calls (cron-fired consolidation) — no active session. Tracked under a global system-budget sentinel scope with separate cap.
- Budget exhausted mid-summarize — abort BEFORE next chunk. Current in-flight call completes. Save partial state with
truncated: true flag on the session summary.
Acceptance
Why it matters
Cost safety net. Pathological sessions can't burn down a user's monthly LLM budget. Soft warning gives early visibility. Audit log + OTEL metric make it queryable.
Problem
Sessions have no per-session token cap. A pathological session (long-running CC instance, runaway tool loop, agent stuck in a 50-turn correction cycle) can rack up arbitrary spend on agentmemory's background compress / summarize / consolidate calls. The existing
AGENTMEMORY_LLM_TIMEOUT_MSonly caps per-call duration. No safety net at the session level.Cost-aware model selection (#613) covers per-token cost. This issue covers per-session-total cost.
Proposed shape
Per-session running budget with hard cap + soft warning threshold.
iii composition:
mem:session-budgetkeyed bysessionId:{ tokenCap, tokensUsed, costEstimate, warnEmittedAt?, exhaustedAt? }mem::session::budget::init({ sessionId, tokenCap? })writes initial state on session-startmem::session::budget::record({ sessionId, inputTokens, outputTokens, model })increments after each LLM callResilientProvider) reads active sessionId via AsyncLocalStorage, increments budget after each call, blocks future calls if cap exceededendedAt + retentionDayspassedSessionid resolution via AsyncLocalStorage
Provider doesn't directly know sessionId today. New
sessionContextAsyncLocalStorage scopes every iii function call.mem::observe,mem::compress,mem::summarize,mem::consolidate-pipelineenter the ALS scope with their sessionId at the top. Provider wrapper reads from ALS — falls back to "unknown" sessionId for system-triggered calls (cron sweeps).Defaults
tokenCapdefault: 100k tokens per session. Configurable globally viaAGENTMEMORY_SESSION_TOKEN_CAP. Per-session override viamem::session::startpayload.event::mem::budget::soft-warnedfor downstream subscribers (viewer alert).event::mem::budget::exhausted. Subsequent compress/summarize calls return synthetic-only output (no LLM).Edge cases
updateop. Already supported.0input/output tokens in finally block. Don't double-count partial calls.cost-aware model selectiontable in README).truncated: trueflag on the session summary.Acceptance
ResilientProviderincrements budget post-call (finally block)agentmemory statusshowssessions: N active, M near-cap, K exhaustedagentmemory.session.tokens_usedhistogramAGENTMEMORY_SESSION_TOKEN_CAP=Nglobal override + per-session override on startWhy it matters
Cost safety net. Pathological sessions can't burn down a user's monthly LLM budget. Soft warning gives early visibility. Audit log + OTEL metric make it queryable.