Persist per-call LLM token usage to DB (audit table for tokens/cost/TPM)

## Problem

We don't persist individual LLM call data. The metrics middleware (`pkg/agent/middleware/metrics/internal.go:12`) accumulates token counts in-memory keyed by story, and only the aggregated totals get written to `stories.tokens_used` / `stories.cost_usd` on story completion (`pkg/architect/request.go:645-670`).

Two gaps fall out of that:

1. **Retries destroy history.** A story that's retried overwrites its aggregate, so we can't see how much work a failed attempt cost or how often retries happen.
2. **No time series.** We can't compute TPM/cost-over-time, peak burst rates for rate-limit sizing, or per-agent / per-model cost breakdowns from the DB. Today the only way to get this is by grep'ing `logs/maestro.log` for `LLM call to model 'X': ... request tokens: N, response tokens: M ...` lines, which works for one session but isn't queryable, doesn't survive log rotation, and can't be joined to stories/sessions.

PM and architect calls also aren't always associated with a story (bootstrap, spec review, escalations), so per-story aggregation can't represent them at all.

## Proposal

Add an `llm_calls` audit table written from the metrics middleware on every call. One row per LLM invocation.

Suggested columns (everything in this list is already in-memory at log time per `pkg/agent/middleware/metrics/middleware.go:84-100`):

- `id` PK
- `session_id` (added by persistence layer, like other tables)
- `created_at`
- `model` (e.g. `claude-opus-4-5`, `claude-sonnet-4-5`, `gpt-5.2`)
- `agent_id`
- `story_id` (NULLABLE — PM/architect calls often have none)
- `state` (agent state at call time)
- `request_tokens`
- `response_tokens`
- `total_tokens`
- `cost_usd`
- `latency_ms`
- `success` (0/1)
- `temperature`

Indexes on `(session_id, created_at)`, `(model, created_at)`, `(agent_id)`, `(story_id)`.

The existing per-story aggregate columns (`stories.tokens_used`, `stories.cost_usd`) can stay as a denormalized rollup, or be derived from `llm_calls` — separate decision.

## Why now

Came up while estimating tokens-per-minute for rate-limiter sizing. Had to parse `maestro.log` because the DB couldn't answer the question. Numbers from one ~2.5h rc3 session (`maestro-work/rc3/.maestro/logs/maestro.log`, 1,532 LLM calls):

| Model | Calls | Total tokens | Active min | Avg TPM (active) | Peak TPM |
|---|---:|---:|---:|---:|---:|
| claude-opus-4-5 | 41 | 254K | 20 | 12,705 | 46,703 |
| claude-sonnet-4-5 | 593 | 3.8M | 44 | 86,622 | 251,048 |
| gpt-5.2 | 125 | 3.1M | 41 | 74,631 | 727,096 |

That gpt-5.2 727k-tokens-in-one-minute peak is exactly the kind of thing we'd want to drill into (single huge call? sustained burst?) and we can't, because the data isn't queryable.

## Out of scope

- Backfill from logs.
- Dashboard / WebUI changes (separate follow-up once the table exists).
- Changes to the cost calculation itself.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Persist per-call LLM token usage to DB (audit table for tokens/cost/TPM) #218

Problem

Proposal

Why now

Out of scope

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Model	Calls	Total tokens	Active min	Avg TPM (active)	Peak TPM
claude-opus-4-5	41	254K	20	12,705	46,703
claude-sonnet-4-5	593	3.8M	44	86,622	251,048
gpt-5.2	125	3.1M	41	74,631	727,096

Uh oh!

Persist per-call LLM token usage to DB (audit table for tokens/cost/TPM) #218

Description

Problem

Proposal

Why now

Out of scope

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions