feat(budget): per-session token cap with soft warn + hard block

## Problem

Sessions have no per-session token cap. A pathological session (long-running CC instance, runaway tool loop, agent stuck in a 50-turn correction cycle) can rack up arbitrary spend on agentmemory's background compress / summarize / consolidate calls. The existing `AGENTMEMORY_LLM_TIMEOUT_MS` only caps per-call duration. No safety net at the session level.

Cost-aware model selection (#613) covers per-token cost. This issue covers per-session-total cost.

## Proposed shape

Per-session running budget with hard cap + soft warning threshold.

iii composition:
- New KV scope `mem:session-budget` keyed by `sessionId`: `{ tokenCap, tokensUsed, costEstimate, warnEmittedAt?, exhaustedAt? }`
- iii function `mem::session::budget::init({ sessionId, tokenCap? })` writes initial state on session-start
- iii function `mem::session::budget::record({ sessionId, inputTokens, outputTokens, model })` increments after each LLM call
- Provider wrapper (in `ResilientProvider`) reads active sessionId via AsyncLocalStorage, increments budget after each call, blocks future calls if cap exceeded
- Cron trigger reaps budgets for sessions where `endedAt + retentionDays` passed

## Sessionid resolution via AsyncLocalStorage

Provider doesn't directly know sessionId today. New `sessionContext` AsyncLocalStorage scopes every iii function call. `mem::observe`, `mem::compress`, `mem::summarize`, `mem::consolidate-pipeline` enter the ALS scope with their sessionId at the top. Provider wrapper reads from ALS — falls back to "unknown" sessionId for system-triggered calls (cron sweeps).

## Defaults

- `tokenCap` default: 100k tokens per session. Configurable globally via `AGENTMEMORY_SESSION_TOKEN_CAP`. Per-session override via `mem::session::start` payload.
- Soft warning at 80% — emits `event::mem::budget::soft-warned` for downstream subscribers (viewer alert).
- Hard cap blocks further LLM calls — emits `event::mem::budget::exhausted`. Subsequent compress/summarize calls return synthetic-only output (no LLM).

## Edge cases

- **Concurrent calls for same session** — atomic increment via iii state `update` op. Already supported.
- **Failed calls** — increment with `0` input/output tokens in finally block. Don't double-count partial calls.
- **Counter never incremented** — provider returns synthetically without LLM call → no increment. Correct behavior.
- **Per-model cost normalization** — record raw tokens, normalize to USD at display time using configurable rate table (defaults from `cost-aware model selection` table in README).
- **Forked session inherits used count?** — fresh per fork. Each fork gets its own budget.
- **Budget across server restart** — KV-persisted, recovers on next iii-state boot.
- **System-triggered calls (cron-fired consolidation)** — no active session. Tracked under a global system-budget sentinel scope with separate cap.
- **Budget exhausted mid-summarize** — abort BEFORE next chunk. Current in-flight call completes. Save partial state with `truncated: true` flag on the session summary.

## Acceptance

- [ ] New KV scope + 2 functions + 1 cron trigger
- [ ] AsyncLocalStorage threads sessionId through iii function calls
- [ ] `ResilientProvider` increments budget post-call (finally block)
- [ ] Hard-cap blocks future LLM calls + emits exhausted event
- [ ] Soft-warn at 80% + emits warn event
- [ ] `agentmemory status` shows `sessions: N active, M near-cap, K exhausted`
- [ ] OTEL metric `agentmemory.session.tokens_used` histogram
- [ ] Tests: cap enforcement, soft-warn threshold, fork-fresh-budget, concurrent increment, system-sentinel scope
- [ ] `AGENTMEMORY_SESSION_TOKEN_CAP=N` global override + per-session override on start

## Why it matters

Cost safety net. Pathological sessions can't burn down a user's monthly LLM budget. Soft warning gives early visibility. Audit log + OTEL metric make it queryable.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(budget): per-session token cap with soft warn + hard block #767

Problem

Proposed shape

Sessionid resolution via AsyncLocalStorage

Defaults

Edge cases

Acceptance

Why it matters

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat(budget): per-session token cap with soft warn + hard block #767

Description

Problem

Proposed shape

Sessionid resolution via AsyncLocalStorage

Defaults

Edge cases

Acceptance

Why it matters

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions