-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Problem
Claude/agent sessions repeatedly pay an “exploration tax” (many file reads/greps) to rebuild a mental model of a repo. CodeGraph addresses this by building a local, queryable code graph once and exposing agent-friendly queries (context building, callers/callees, impact).
Terraphim is well-positioned to provide a similar UX with additional advantages (thesaurus/rolegraph + graph embeddings), but we need a first-class per-repository code structure KG and agent-facing tools.
Proposal
Implement a CodeGraph-like repo index inside Terraphim:
1) Build a per-repo code KG
- Extract symbols + relations using ast-grep (initial focus: defs + imports; later calls).
- Normalize to a stable schema:
- Nodes:
CodeSymbol(function/class/method/interface/type),File,Module - Edges:
Imports,Defines,Extends,Implements,References,Calls(phase 2) - Track
file,span,signature, optionaldocstring/snippet, confidence for edges
- Nodes:
- Store alongside text for lexical search (BM25/FTS) and embeddings for semantic retrieval.
2) Add agent-facing query tools (MCP)
Provide structured endpoints that avoid file-by-file exploration:
repo_context(task, entrypoints?, depth?)→ top files/symbols + snippets + whyrepo_search(query, mode=name|semantic|hybrid)→ ranked symbols/filesrepo_callers(symbol_id)/repo_callees(symbol_id)(phase 2 when call graph is good)repo_impact(symbol_id|file, radius)→ blast radius + risk score- (optional)
repo_path(from_symbol,to_symbol)→ explanation chain
3) Leverage Terraphim graph embeddings (differentiator)
- Combine text embeddings with graph embeddings (connectivity-aware) for ranking.
- Use rolegraph/thesaurus to bias retrieval toward domain concepts (auth/payment/etc).
4) Incremental updates
init: index repo at HEADupdate: re-index changed files (git diff / mtimes)- Optional git hook integration
Phased delivery
Phase 1 (MVP)
- ast-grep extraction for defs + imports
- store nodes/edges + snippets
- MCP tools:
repo_search,repo_context,repo_impact(import graph)
Phase 2 (Parity)
- call graph extraction + improved resolution (start with TS/JS or Rust)
- enable
repo_callers/repo_calleeswith acceptable precision
Phase 3 (Enhancement)
- graph embeddings integrated into retrieval/ranking
Acceptance criteria
- On a non-trivial repo (e.g., terraphim-ai), agent can answer “where is X handled” and “what breaks if I change Y” with significantly fewer file reads/greps compared to baseline.
- Index is local, reproducible, and updates incrementally.
- Tool outputs are structured (IDs + locations) and suitable for automated context building.
Notes
- CodeGraph reference (conceptual): structured code graph + MCP tools to avoid repeated exploration.
- Implement storage Terraphim-native; keep UX/tooling CodeGraph-like.
Metadata
Metadata
Assignees
Labels
No labels