feat: usage spend analytics, repo graph overview + TUI tabs, read mod…#34
Conversation
…es, filter wave 16 Pilar A — `tokenix usage`: absolute token spend + ≈USD cost from agent transcripts (daily/weekly/monthly/session/model/project, 5-hour blocks with burn rate, month-end forecast, --cost-mode, --statusline, --json). New src/usage.rs + shared src/transcripts.rs (conversation_audit refactored to reuse it); gain.rs ModelPrice extended with output/cache rates + price_for / usage_cost helpers. Pilar B — `tokenix graph`: repo-wide hotspots (god nodes, bottlenecks, blast-radius leaders) + Graphviz DOT export (graph.rs repo_hotspots / format_repo_report / format_edges_dot). New Usage and Graph dashboard tabs. Pilar C — `tokenix read --mode full|outline|signatures|diff|density:X` (entropy-filtered reads). Filter wave 16: cargo tree, npm ls, kubectl explain, ip, ss, lsof, netstat, systemctl list-* (386 filters, 800 golden cases). Docs: README.md + AGENTS.md updated. Tests: 263 passed, fmt clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Vw2xCqT8ozZKw5VtWgWAAn
There was a problem hiding this comment.
Code Review
This pull request introduces comprehensive spend analytics via the new tokenix usage command and a repo-wide symbol-graph overview with tokenix graph, both integrated as new tabs in the interactive TUI. It also enhances the tokenix read command with new modes (such as entropy-based density filtering) and adds several bundled output filters. The review feedback highlights several key improvement opportunities: supporting standard JSON files alongside JSONL in transcript parsing, robustly extracting project names from Claude paths when cwd is missing, removing a redundant day == 0 check, properly handling execution errors for git diff, and optimizing the density filter by pre-calculating line entropy to avoid redundant computations during sorting.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| fn parse_file(path: &Path, out: &mut Vec<Record>, seen: &mut HashSet<String>) { | ||
| let Ok(raw) = std::fs::read_to_string(path) else { | ||
| return; | ||
| }; | ||
| let session_fallback = path | ||
| .file_stem() | ||
| .and_then(|s| s.to_str()) | ||
| .unwrap_or("?") | ||
| .to_string(); | ||
| for line in raw.lines() { | ||
| let Ok(v) = serde_json::from_str::<Value>(line) else { | ||
| continue; | ||
| }; | ||
| if let Some(rec) = record_from_value(&v, &session_fallback, seen) { | ||
| out.push(rec); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Currently, parse_file reads the transcript file line-by-line and attempts to parse each line as a JSON value. While this works perfectly for JSONL files, standard .json files (which are often pretty-printed or structured as a single JSON array/object) will fail to parse entirely. Trying to parse the entire file as a single JSON value first, and falling back to line-by-line parsing if that fails, ensures robust support for both formats.
fn parse_file(path: &Path, out: &mut Vec<Record>, seen: &mut HashSet<String>) {
let Ok(raw) = std::fs::read_to_string(path) else {
return;
};
let session_fallback = path
.file_stem()
.and_then(|s| s.to_str())
.unwrap_or("?")
.to_string();
// Try parsing the entire file as a single JSON value (e.g., pretty-printed JSON or a JSON array)
if let Ok(v) = serde_json::from_str::<Value>(&raw) {
if let Some(arr) = v.as_array() {
for item in arr {
if let Some(rec) = record_from_value(item, path, &session_fallback, seen) {
out.push(rec);
}
}
} else if let Some(rec) = record_from_value(&v, path, &session_fallback, seen) {
out.push(rec);
}
return;
}
// Fallback to line-by-line JSONL parsing
for line in raw.lines() {
let Ok(v) = serde_json::from_str::<Value>(line) else {
continue;
};
if let Some(rec) = record_from_value(&v, path, &session_fallback, seen) {
out.push(rec);
}
}
}| fn record_from_value( | ||
| v: &Value, | ||
| session_fallback: &str, | ||
| seen: &mut HashSet<String>, | ||
| ) -> Option<Record> { | ||
| let message = v.get("message"); | ||
| let usage = message | ||
| .and_then(|m| m.get("usage")) | ||
| .or_else(|| v.get("usage"))?; | ||
|
|
||
| let input = u64_at(usage, "input_tokens"); | ||
| let output = u64_at(usage, "output_tokens"); | ||
| let cache_read = u64_at(usage, "cache_read_input_tokens"); | ||
| let cache_write = u64_at(usage, "cache_creation_input_tokens"); | ||
| if input + output + cache_read + cache_write == 0 { | ||
| return None; | ||
| } | ||
|
|
||
| // Dedup replayed lines by (message id, requestId) when both are present. | ||
| let msg_id = message | ||
| .and_then(|m| m.get("id")) | ||
| .and_then(|x| x.as_str()) | ||
| .unwrap_or(""); | ||
| let req_id = v.get("requestId").and_then(|x| x.as_str()).unwrap_or(""); | ||
| if !msg_id.is_empty() && !req_id.is_empty() { | ||
| let key = format!("{msg_id}|{req_id}"); | ||
| if !seen.insert(key) { | ||
| return None; | ||
| } | ||
| } | ||
|
|
||
| let ts = v | ||
| .get("timestamp") | ||
| .and_then(|x| x.as_str()) | ||
| .and_then(parse_ts) | ||
| .unwrap_or_else(Local::now); | ||
|
|
||
| let model = message | ||
| .and_then(|m| m.get("model")) | ||
| .or_else(|| v.get("model")) | ||
| .and_then(|x| x.as_str()) | ||
| .unwrap_or("unknown") | ||
| .to_string(); | ||
|
|
||
| let project = v | ||
| .get("cwd") | ||
| .and_then(|x| x.as_str()) | ||
| .map(basename) | ||
| .unwrap_or_else(|| "?".to_string()); | ||
|
|
||
| let session = v | ||
| .get("sessionId") | ||
| .and_then(|x| x.as_str()) | ||
| .unwrap_or(session_fallback) | ||
| .to_string(); | ||
|
|
||
| let logged_cost = v | ||
| .get("costUSD") | ||
| .or_else(|| v.get("cost_usd")) | ||
| .and_then(|x| x.as_f64()); | ||
|
|
||
| Some(Record { | ||
| ts, | ||
| model, | ||
| project, | ||
| session, | ||
| input, | ||
| output, | ||
| cache_read, | ||
| cache_write, | ||
| logged_cost, | ||
| }) | ||
| } |
There was a problem hiding this comment.
If cwd is missing or formatted differently in the transcripts, project defaults to "?", which causes these records to be silently ignored when scoped to the current project. We can improve this by passing the transcript file's path to record_from_value and extracting the project name from the Claude projects slug (e.g., ~/.claude/projects/<slug>/...) when cwd is missing. Additionally, we can defensively parse costUSD / cost_usd from a string if it is logged as a string instead of a float.
fn record_from_value(
v: &Value,
path: &Path,
session_fallback: &str,
seen: &mut HashSet<String>,
) -> Option<Record> {
let message = v.get("message");
let usage = message
.and_then(|m| m.get("usage"))
.or_else(|| v.get("usage"))?;
let input = u64_at(usage, "input_tokens");
let output = u64_at(usage, "output_tokens");
let cache_read = u64_at(usage, "cache_read_input_tokens");
let cache_write = u64_at(usage, "cache_creation_input_tokens");
if input + output + cache_read + cache_write == 0 {
return None;
}
// Dedup replayed lines by (message id, requestId) when both are present.
let msg_id = message
.and_then(|m| m.get("id"))
.and_then(|x| x.as_str())
.unwrap_or("");
let req_id = v.get("requestId").and_then(|x| x.as_str()).unwrap_or("");
if !msg_id.is_empty() && !req_id.is_empty() {
let key = format!("{msg_id}|{req_id}");
if !seen.insert(key) {
return None;
}
}
let ts = v
.get("timestamp")
.and_then(|x| x.as_str())
.and_then(parse_ts)
.unwrap_or_else(Local::now);
let model = message
.and_then(|m| m.get("model"))
.or_else(|| v.get("model"))
.and_then(|x| x.as_str())
.unwrap_or("unknown")
.to_string();
let mut project = v
.get("cwd")
.and_then(|x| x.as_str())
.map(basename)
.unwrap_or_else(|| "?".to_string());
if project == "?" {
// Try to extract the project name from the transcript file path
// e.g., ~/.claude/projects/<slug>/...
let components: Vec<_> = path.components().collect();
if let Some(pos) = components.iter().position(|c| c.as_os_str() == "projects") {
if pos + 1 < components.len() {
let slug = components[pos + 1].as_os_str().to_string_lossy().to_string();
project = slug.split('-').last().unwrap_or(&slug).to_string();
}
}
}
let session = v
.get("sessionId")
.and_then(|x| x.as_str())
.unwrap_or(session_fallback)
.to_string();
let logged_cost = v
.get("costUSD")
.or_else(|| v.get("cost_usd"))
.and_then(|x| {
x.as_f64().or_else(|| {
x.as_str().and_then(|s| s.parse::<f64>().ok())
})
});
Some(Record {
ts,
model,
project,
session,
input,
output,
cache_read,
cache_write,
logged_cost,
})
}| /// Linear month-end projection from spend so far this calendar month. | ||
| fn month_forecast(records: &[Record], mode: CostMode) -> f64 { | ||
| let now = Local::now(); | ||
| let month_cost: f64 = records | ||
| .iter() | ||
| .filter(|r| r.ts.year() == now.year() && r.ts.month() == now.month()) | ||
| .map(|r| r.cost(mode)) | ||
| .sum(); | ||
| let days_in_month = days_in_month(now.year(), now.month()); | ||
| let day = now.day().max(1); | ||
| if day == 0 { | ||
| return month_cost; | ||
| } | ||
| month_cost / day as f64 * days_in_month as f64 | ||
| } |
There was a problem hiding this comment.
The check if day == 0 is redundant and represents dead code because day is initialized with now.day().max(1), which guarantees that day is always at least 1.
/// Linear month-end projection from spend so far this calendar month.
fn month_forecast(records: &[Record], mode: CostMode) -> f64 {
let now = Local::now();
let month_cost: f64 = records
.iter()
.filter(|r| r.ts.year() == now.year() && r.ts.month() == now.month())
.map(|r| r.cost(mode))
.sum();
let days_in_month = days_in_month(now.year(), now.month());
let day = now.day().max(1);
month_cost / day as f64 * days_in_month as f64
}| } else if m == "diff" { | ||
| println!("{}", chunker::generate_outline(&content, &rel)); | ||
| let out = std::process::Command::new("git") | ||
| .arg("-C") | ||
| .arg(&repo_root) | ||
| .args(["diff", "--", &rel]) | ||
| .output(); | ||
| match out { | ||
| Ok(o) if !o.stdout.is_empty() => { | ||
| println!("\n# changed hunks"); | ||
| println!("{}", String::from_utf8_lossy(&o.stdout)); | ||
| } | ||
| _ => println!("\n(no uncommitted changes)"), | ||
| } |
There was a problem hiding this comment.
If the git command fails (e.g., if git is not installed or not in PATH), std::process::Command::output() returns an Err. Matching this as _ => println!("\n(no uncommitted changes)") is misleading as it hides the actual error. We should explicitly handle the Err and stderr cases to aid in debugging.
} else if m == "diff" {
println!("{}", chunker::generate_outline(&content, &rel));
let out = std::process::Command::new("git")
.arg("-C")
.arg(&repo_root)
.args(["diff", "--", &rel])
.output();
match out {
Ok(o) if !o.stdout.is_empty() => {
println!("\n# changed hunks");
println!("{}", String::from_utf8_lossy(&o.stdout));
}
Ok(o) if !o.stderr.is_empty() => {
eprintln!("\n# git diff error");
eprintln!("{}", String::from_utf8_lossy(&o.stderr));
}
Err(e) => {
eprintln!("\n# failed to execute git: {e}");
}
_ => println!("\n(no uncommitted changes)"),
}| fn density_filter(lines: &[&str], frac: f64) -> String { | ||
| let total_tokens: usize = lines.iter().map(|l| chunker::count_tokens(l)).sum(); | ||
| let budget = ((total_tokens as f64) * frac).ceil() as usize; | ||
|
|
||
| let mut ranked: Vec<usize> = (0..lines.len()).collect(); | ||
| ranked.sort_by(|&a, &b| { | ||
| let ea = line_entropy(lines[a]); | ||
| let eb = line_entropy(lines[b]); | ||
| eb.partial_cmp(&ea) | ||
| .unwrap_or(std::cmp::Ordering::Equal) | ||
| .then(lines[b].len().cmp(&lines[a].len())) | ||
| .then(a.cmp(&b)) | ||
| }); | ||
|
|
||
| let mut keep = vec![false; lines.len()]; | ||
| let mut used = 0usize; | ||
| for &i in &ranked { | ||
| if used >= budget { | ||
| break; | ||
| } | ||
| keep[i] = true; | ||
| used += chunker::count_tokens(lines[i]).max(1); | ||
| } |
There was a problem hiding this comment.
In density_filter, line_entropy is called inside the sorting comparator. Since sorting does line_entropy is computed repeatedly for the same lines (e.g.,
fn density_filter(lines: &[&str], frac: f64) -> String {
let total_tokens: usize = lines.iter().map(|l| chunker::count_tokens(l)).sum();
let budget = ((total_tokens as f64) * frac).ceil() as usize;
let mut ranked: Vec<(usize, f64)> = lines
.iter()
.enumerate()
.map(|(i, line)| (i, line_entropy(line)))
.collect();
ranked.sort_by(|a, b| {
b.1.partial_cmp(&a.1)
.unwrap_or(std::cmp::Ordering::Equal)
.then(lines[b.0].len().cmp(&lines[a.0].len()))
.then(a.0.cmp(&b.0))
});
let mut keep = vec![false; lines.len()];
let mut used = 0usize;
for &(i, _) in &ranked {
if used >= budget {
break;
}
keep[i] = true;
used += chunker::count_tokens(lines[i]).max(1);
}|
⏳ Aguardando pipeline O pipeline de CI/CD está com estado |
Revisao Automatica - vibe-code/opencodeReview delegated to Vibe-Code task: http://vibe-code.vibe-code.svc.cluster.local:3000/tasks/76c6f162243a4d24 🤖 Origem Automatizada
|
|



…es, filter wave 16
Pilar A —
tokenix usage: absolute token spend + ≈USD cost from agent transcripts (daily/weekly/monthly/session/model/project, 5-hour blocks with burn rate, month-end forecast, --cost-mode, --statusline, --json). New src/usage.rs + shared src/transcripts.rs (conversation_audit refactored to reuse it); gain.rs ModelPrice extended with output/cache rates + price_for / usage_cost helpers.Pilar B —
tokenix graph: repo-wide hotspots (god nodes, bottlenecks, blast-radius leaders) + Graphviz DOT export (graph.rs repo_hotspots / format_repo_report / format_edges_dot). New Usage and Graph dashboard tabs.Pilar C —
tokenix read --mode full|outline|signatures|diff|density:X(entropy-filtered reads).Filter wave 16: cargo tree, npm ls, kubectl explain, ip, ss, lsof, netstat, systemctl list-* (386 filters, 800 golden cases).
Docs: README.md + AGENTS.md updated. Tests: 263 passed, fmt clean.
Claude-Session: https://claude.ai/code/session_01Vw2xCqT8ozZKw5VtWgWAAn