feat(web): per-turn token breakdown with cost attribution (surpass ccusage)

## Summary

ccx web currently shows **session-level aggregate** token counts (Input / Output / Cache Read / Cache Write) in the info panel. That's not enough anymore. Users burning through Claude Code quota want to know *which turn* cost them, not "the whole conversation used 1.2M tokens."

Ship per-turn usage, sidechain-aware cache accounting, and cost attribution inside the web UI — and do it better than [ccusage](https://github.com/ryoppippi/ccusage) does from the CLI.

## Why now

- Quota is easier to burn in Claude Code than it used to be. Users are asking "where did my $40 go" and have no way to see it inside a conversation.
- ccx already parses the full tree per-message. The data is there — we just don't surface it.
- ccusage has proven the demand but is CLI-only, has no per-turn/per-tool granularity, and has accuracy bugs around sidechains and streaming partials ([#913](https://github.com/ryoppippi/ccusage/issues/913), [#938](https://github.com/ryoppippi/ccusage/issues/938)). We can leapfrog.

## Current state

- `internal/parser/types.go:63-78` — `SessionStats` holds only *aggregate* `InputTokens`, `OutputTokens`, `CacheReadTokens`, `CacheCreateTokens`. No per-message usage stored.
- `internal/parser/session.go:42-52` — full parser accumulates token stats but discards the per-message value after aggregation.
- `internal/parser/session.go:424-476` — quick parser does the same aggregate-only walk.
- `internal/web/templates.go:470-488` — info panel renders session-level totals only.
- `internal/web/templates.go:236-240` — session list shows totals per session.
- Cost calculation was explicitly removed in a prior pass (`CLAUDE.md` line 196: "We removed cost estimation and lines-changed tracking"). Time to put it back — better this time.

The raw `usage` field is in every assistant message: `input_tokens`, `output_tokens`, `cache_creation_input_tokens`, `cache_read_input_tokens`. We already unmarshal it. We just throw it away.

## What ccusage does (and where it falls short)

| Axis | ccusage |
|---|---|
| Daily / Weekly / Monthly | Yes |
| Model breakdown | Yes (`--breakdown`) |
| Session | Yes |
| Project | Yes (`--instances`) |
| 5-hour billing block | Yes |
| **Per user turn** | **No** — [#935](https://github.com/ryoppippi/ccusage/issues/935) open |
| **Per tool invocation** | **No** |
| Visual / web | **No** — tables only, [#933](https://github.com/ryoppippi/ccusage/issues/933) requests charts |

Cost model: `LiteLLMPricingFetcher` (runtime fetch or `--offline` snapshot). Cached reads billed at the discounted rate. Not hardcoded.

**Accuracy bugs ccx can avoid by construction:**
- [#913](https://github.com/ryoppippi/ccusage/issues/913) — sidechain `/btw` asides double-count `cache_read` because ccusage doesn't know what a sidechain is
- [#938](https://github.com/ryoppippi/ccusage/issues/938) — first-wins dedup keeps partial streaming `output_tokens`, undercounting ~2.7×
- [#934](https://github.com/ryoppippi/ccusage/issues/934) — `gpt-5.4-mini` mispriced as `gpt-5` via fuzzy match → 5× overcharge on Codex
- [#930](https://github.com/ryoppippi/ccusage/issues/930), [#916](https://github.com/ryoppippi/ccusage/issues/916), [#936](https://github.com/ryoppippi/ccusage/issues/936) — numbers don't match Anthropic console / Azure / CC's `/usage`

All four bug classes come from blind JSONL summation without tree semantics. ccx already has the tree model and per-message parser — we can be accurate on day one.

## What we ship

### Data layer

- [ ] Extend `MessageUsage` to persist per-message: `input_tokens`, `output_tokens`, `cache_creation_input_tokens`, `cache_read_input_tokens`, `model`, `timestamp`
- [ ] Store per-message usage on the `Message` tree node (not just session-level aggregate)
- [ ] Add `TurnStats` — aggregation per *user turn* (user message + all assistant responses/tool calls until the next user message)
- [ ] Sidechain-aware accounting: when a sidechain replays cache_read from parent context, do NOT double-count. Subtract overlap via tree structure.
- [ ] Embedded pricing table: Go map keyed by model name with `{input, output, cache_create, cache_read}` USD-per-million. Versioned, updatable, always-embedded — no runtime fetch. Start with Claude 3.5 Sonnet / 4 Sonnet / 4 Opus / Haiku 4.5.
- [ ] Flag Codex schema divergence up front — don't conflate until a second pass lands Codex pricing.

### Web UI

- [ ] Info panel: new "Per-turn breakdown" expandable section
  - Columns: turn index, first user-message snippet, model, input/output/cache tokens, USD cost
  - Sorted by cost desc so expensive turns bubble up
  - Clickable row → scrolls to that turn in the conversation (in-context drill-down, which ccusage structurally cannot do)
- [ ] Per-message cost badge next to assistant message header (collapsed by default; toggle via `u` key or toolbar button)
- [ ] Cache-vs-fresh ratio bar per turn (green cache-read, yellow input, orange output, red cache-create)
- [ ] Session summary: total spend, cache hit %, cost-per-message average
- [ ] Cross-provider: handle Claude Code + Codex in one unified ledger. If Codex schema differs, render the subset honestly — don't fake numbers.

### CLI

- [ ] `ccx usage` command mirroring ccusage's axes (daily/session/project/model) on top of ccx's tree-aware model
- [ ] `--format json|table`
- [ ] `--breakdown turn` — the thing ccusage doesn't have

### Docs

- [ ] README section: "Where did my tokens go?" with a screenshot of the per-turn breakdown
- [ ] CHANGELOG entry
- [ ] Accuracy notes: explicitly document what we count, how sidechains are handled, how we handle streaming partials
- [ ] License check on ccusage before borrowing any UX patterns directly

## Acceptance criteria

- [ ] Any session with ≥10 assistant turns shows N non-zero per-turn rows with cost
- [ ] Sidechain session: aggregate total matches `sum(turn costs)` within rounding; does NOT double-count cache_read
- [ ] Clicking a turn row jumps to the corresponding user message in the conversation view
- [ ] Session total matches Anthropic `/usage` for the same session within 5%; any gap documented under "Known deltas"
- [ ] Pricing embedded; works offline with `--no-network`
- [ ] Unit tests cover: per-message parse, per-turn aggregation, sidechain dedup, streaming partial handling, model-name → price lookup (exact match only, no fuzzy)
- [ ] 1000-message session renders the breakdown in <200ms after parse completes
- [ ] CHANGELOG entry present
- [ ] README screenshot updated

## Out of scope (explicit)

- Real-time live cost during an active session (needs streaming parse — separate issue)
- 5-hour billing block visualization (can live in `ccx usage blocks` later)
- MCP server mode

## Priority

**P0 — crown jewel of v0.next.** This is the one users will thank us for.

---

Prior art: https://github.com/ryoppippi/ccusage


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(web): per-turn token breakdown with cost attribution (surpass ccusage) #2

Summary

Why now

Current state

What ccusage does (and where it falls short)

What we ship

Data layer

Web UI

CLI

Docs

Acceptance criteria

Out of scope (explicit)

Priority

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Axis	ccusage
Daily / Weekly / Monthly	Yes
Model breakdown	Yes (`--breakdown`)
Session	Yes
Project	Yes (`--instances`)
5-hour billing block	Yes
Per user turn	No — #935 open
Per tool invocation	No
Visual / web	No — tables only, #933 requests charts

feat(web): per-turn token breakdown with cost attribution (surpass ccusage) #2

Description

Summary

Why now

Current state

What ccusage does (and where it falls short)

What we ship

Data layer

Web UI

CLI

Docs

Acceptance criteria

Out of scope (explicit)

Priority

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions