Trace generation requires two complementary data sources working together:
requests.db— from claude-code-proxy, captures every API request and response- JSONL files — from Claude Code's local storage (
~/.claude/projects/), provides conversation structure and message linking
The proxy captures the raw API traffic. The JSONL files provide the conversation thread structure that links requests into coherent sessions. Together, they enable accurate trace generation.
Claude Code (the CLI tool) saves conversation history to local JSONL files at ~/.claude/projects/<project-hash>/<conversation-id>.jsonl.
What they contain:
- Complete message history as the user experiences it
- Message IDs (
msg_01XYZ...) from assistant responses - Tool use/result pairs with content
- Sub-agent conversations in separate
agent-<id>.jsonlfiles - Metadata: timestamps,
isSidechainflag,toolUseResultwith agent linking info
What they capture:
- The canonical response for each turn (one entry per assistant message)
- Message IDs link directly to the corresponding request in
requests.db
Example JSONL entry:
{"type": "assistant", "message": {"id": "msg_01ABC...", "content": [...]}, "timestamp": "2025-01-15T14:30:22Z"}The claude-code-proxy sits between Claude Code and the Anthropic API, capturing every request and response.
What it contains:
- SQLite database with a
requeststable - Full request body (tools, system prompt, messages array)
- Full response body (content, usage metrics with cache breakdown)
- Timestamps for each request
What it captures:
- All API requests and responses
- Actual API cache metrics (
cache_read_input_tokens,cache_creation_input_tokens) - Token counts, model IDs, stop reasons
- Note: older proxy versions produced duplicate streaming/non-streaming pairs per turn due to a bug (see seifghazi/claude-code-proxy#33); the analysis tools handle both old and fixed proxy data
Schema:
CREATE TABLE requests (
id INTEGER PRIMARY KEY,
timestamp TEXT,
body TEXT, -- Full JSON request body
response TEXT -- Full JSON response body
);The key linking mechanism is the message ID (msg_01XYZ...):
JSONL file requests.db
───────── ───────────
{"type": "assistant", response.body.id = "msg_01ABC..."
"message": {"id": "msg_01ABC..."}}
│ │
└──────────── MATCH ─────────────────┘
-
build_message_index.pyscansrequests.dband extractsresponse.body.idfrom every response → createsmessage_id_index.json -
build_conversation_index.pyscans JSONL files, extracts message IDs, and matches them against the index → createsconversation_indextable in the database -
build_minimal_traces.pyuses both sources:- JSONL provides conversation structure and message ordering
- requests.db provides full request bodies for tokenization and hashing
- Message IDs link them together with 98-100% accuracy
Earlier approaches used timestamp matching (find the DB request closest in time to each JSONL entry). This only achieved 5-43% accuracy due to clock drift and ambiguous matches.
Message ID matching achieves 98-100% because each msg_01XYZ... ID is globally unique and appears in exactly one DB response.
The JSONL files record one response per turn. Each message ID maps to exactly one request in requests.db, which the trace builder uses directly. In older proxy data (with streaming duplicates), the JSONL message IDs point to the non-streaming requests, which have complete metadata (stop_reason, content types, usage).
Sometimes JSONL files are unavailable (deleted, different machine, lost). The recover_conversations.py script reconstructs conversation structure from requests.db alone.
- Classify each request as streaming or non-streaming by the
streamfield - Group requests into turns — anchoring on streaming requests, with optional non-streaming partners matched by content hash within 30 seconds
- Chain turns into conversations by hash prefix overlap — if request B's hash_ids share >97% prefix with request A, they're in the same conversation
- Order by timestamp to reconstruct the conversation timeline
When generating traces from both real JSONL and recovered data:
python3 recover_conversations.py requests.db \
--output-dir recovered_jsonl/ \
--exclude-indexed jsonl/ \
--workers 16The --exclude-indexed flag ensures recovered conversations only contain requests not already covered by real JSONL files. It works by:
- Scanning all JSONL files in the specified directory
- Extracting message IDs from assistant entries
- Skipping any DB request whose response contains a matching message ID
This prevents duplicate requests appearing in both real and recovered traces.
Claude Code (CLI)
│
┌───────────────┼───────────────────┐
│ │ │
▼ │ ▼
JSONL conversation │ claude-code-proxy
files saved locally │ (github.com/seifghazi/claude-code-proxy)
│ │ │
~/.claude/projects/ │ ▼
<hash>/<conv-id>.jsonl │ requests.db
(one entry per turn) │ (ALL requests + responses)
│
▼
Anthropic API
The setup: Claude Code is configured to route API calls through claude-code-proxy. The proxy saves every request/response to requests.db and forwards them to the Anthropic API. Separately, Claude Code saves its conversation state to local JSONL files as it normally does.
requests.db + JSONL files
│
├─── build_message_index.py ──→ message_id_index.json
├─── build_conversation_index.py ──→ conversation_index table
│
├─── complete_cache_visualizer.py ──→ HTML visualizations
├─── complete_cache_analyzer.py ──→ text statistics
│
└─── build_minimal_traces.py ──→ trace JSON files
│
└──→ kv-cache-tester (replay)
Currently, data collection requires running claude-code-proxy to capture requests, plus access to Claude Code's local JSONL files. A planned future enhancement will support ingesting data directly from the Claude Enterprise admin log console, which provides API request/response logs without needing a proxy.
The Enterprise log console provides similar data to requests.db (request bodies, response bodies, usage metrics) but through an admin API rather than a local proxy. The integration would:
- Fetch request logs from the Enterprise API
- Convert to the same internal format used by the analysis scripts
- Generate traces using the existing
build_minimal_traces.pypipeline
This would eliminate the proxy requirement for Enterprise customers and provide access to logs from any team member's Claude Code sessions.
| Aspect | Current (Proxy) | Future (Enterprise) |
|---|---|---|
| Data capture | Local proxy intercepts | Enterprise admin API |
| Scope | Single user's sessions | All team sessions |
| JSONL files | From local ~/.claude/ |
Not available (DB-only recovery) |
| Setup | Install + configure proxy | Enterprise admin access |
The trace format and analysis tools remain unchanged — only the data ingestion layer differs.