Structural indexing Β· Trigram search Β· Word index Β· Dependency graph Β· File watching Β· MCP + HTTP
Status Β· Install Β· Quick Start Β· MCP Tools Β· Benchmarks Β· Architecture Β· Data & Privacy Β· Building
Alpha software β API is stabilizing but may change
codedb works and is used daily in production AI workflows, but:
- Language support β Zig, Python, TypeScript/JavaScript (more planned)
- No auth β HTTP server binds to localhost only
- Snapshot format may change between versions
- MCP protocol is JSON-RPC 2.0 over stdio (stable)
| What works today | What's in progress |
|---|---|
| 12 MCP tools for full codebase intelligence | Additional language parsers |
| Trigram-accelerated full-text search | WASM target for Cloudflare Workers |
| O(1) inverted word index for identifier lookup | Incremental snapshot updates |
| Structural outlines (functions, structs, imports) | Multi-project support |
| Reverse dependency graph | Remote indexing over SSH |
| Atomic line-range edits with version tracking | |
| Auto-registration in Claude, Codex, Gemini, Cursor | |
| Polling file watcher with filtered directory walker | |
| Portable snapshot for instant MCP startup | |
| Multi-agent support with file locking + heartbeats | |
| Codesigned + notarized macOS binaries | |
| Cross-platform: macOS (ARM/x86), Linux (ARM/x86) |
curl -fsSL https://codedb.codegraff.com/install.sh | shDownloads the binary for your platform and auto-registers codedb as an MCP server in Claude Code, Codex, Gemini CLI, and Cursor.
| Platform | Binary | Signed |
|---|---|---|
| macOS ARM64 (Apple Silicon) | codedb-darwin-arm64 |
β codesigned + notarized |
| macOS x86_64 (Intel) | codedb-darwin-x86_64 |
β codesigned + notarized |
| Linux ARM64 | codedb-linux-arm64 |
β |
| Linux x86_64 | codedb-linux-x86_64 |
β |
Or install manually from GitHub Releases.
After installing, codedb is automatically registered. Just open a project and the 12 MCP tools are available to your AI agent.
# Manual MCP start (auto-configured by install script)
codedb mcp /path/to/your/projectcodedb serve /path/to/your/project
# listening on localhost:7719codedb tree /path/to/project # file tree with symbol counts
codedb outline src/main.zig # symbols in a file
codedb find AgentRegistry # find symbol definitions
codedb search "handleAuth" # full-text search (trigram-accelerated)
codedb word Store # exact word lookup (inverted index, O(1))
codedb hot # recently modified files12 tools over the Model Context Protocol (JSON-RPC 2.0 over stdio):
| Tool | Description |
|---|---|
codedb_tree |
Full file tree with language, line counts, symbol counts |
codedb_outline |
Symbols in a file: functions, structs, imports, with line numbers |
codedb_symbol |
Find where a symbol is defined across the codebase |
codedb_search |
Trigram-accelerated full-text search |
codedb_word |
O(1) inverted index word lookup |
codedb_hot |
Most recently modified files |
codedb_deps |
Reverse dependency graph (which files import this file) |
codedb_read |
Read file content |
codedb_edit |
Apply line-range edits (atomic writes) |
codedb_changes |
Changed files since a sequence number |
codedb_status |
Index status (file count, current sequence) |
codedb_snapshot |
Full pre-rendered JSON snapshot of the codebase |
# 1. Get the file tree
curl localhost:7719/tree
# β src/main.zig (zig, 55L, 4 symbols)
# src/store.zig (zig, 156L, 12 symbols)
# src/agent.zig (zig, 135L, 8 symbols)
# 2. Drill into a file
curl "localhost:7719/outline?path=src/store.zig"
# β L20: struct_def Store
# L30: function init
# L55: function recordSnapshot
# 3. Find a symbol across the codebase
curl "localhost:7719/symbol?name=AgentRegistry"
# β {"path":"src/agent.zig","line":30,"kind":"struct_def"}
# 4. Full-text search
curl "localhost:7719/search?q=handleAuth&max=10"
# 5. Check what changed
curl "localhost:7719/changes?since=42"Measured on Apple M4 Pro, 48GB RAM. MCP = pre-indexed warm queries (20 iterations avg). CLI/external tools include process startup (3 iterations avg). Ground truth verified against Python reference implementation.
codedb repo (20 files, 12.6k lines):
| Query | codedb MCP | codedb CLI | ast-grep | ripgrep | grep | MCP speedup |
|---|---|---|---|---|---|---|
| File tree | 0.04 ms | 52.9 ms | β | β | β | 1,253x vs CLI |
Symbol search (init) |
0.10 ms | 54.1 ms | 3.2 ms | 6.3 ms | 6.5 ms | 549x vs CLI |
Full-text search (allocator) |
0.05 ms | 60.7 ms | 3.2 ms | 5.3 ms | 6.6 ms | 1,340x vs CLI |
Word index (self) |
0.04 ms | 59.7 ms | n/a | 7.2 ms | 6.5 ms | 1,404x vs CLI |
| Structural outline | 0.05 ms | 53.5 ms | 3.1 ms | β | 2.4 ms | 1,143x vs CLI |
| Dependency graph | 0.05 ms | 2.2 ms | n/a | n/a | n/a | 45x vs CLI |
merjs repo (100 files, 17.3k lines):
| Query | codedb MCP | codedb CLI | ast-grep | ripgrep | grep | MCP speedup |
|---|---|---|---|---|---|---|
| File tree | 0.05 ms | 54.0 ms | β | β | β | 1,173x vs CLI |
Symbol search (init) |
0.07 ms | 54.4 ms | 3.4 ms | 6.3 ms | 3.6 ms | 758x vs CLI |
Full-text search (allocator) |
0.03 ms | 54.1 ms | 2.9 ms | 5.1 ms | 3.7 ms | 1,554x vs CLI |
Word index (self) |
0.04 ms | 54.7 ms | n/a | 6.3 ms | 4.2 ms | 1,518x vs CLI |
| Structural outline | 0.04 ms | 54.9 ms | 3.4 ms | β | 2.5 ms | 1,243x vs CLI |
| Dependency graph | 0.05 ms | 1.9 ms | n/a | n/a | n/a | 41x vs CLI |
codedb returns structured, relevant results β not raw line dumps. For AI agents, this means dramatically fewer tokens per query:
| Repo | codedb MCP | ripgrep / grep | Reduction |
|---|---|---|---|
codedb (search allocator) |
~20 tokens | ~32,564 tokens | 1,628x fewer |
merjs (search allocator) |
~20 tokens | ~4,007 tokens | 200x fewer |
codedb builds all indexes on startup (outlines, trigram, word, dependency graph) β not just a parse tree:
| Repo | Files | Lines | Cold start | Per file |
|---|---|---|---|---|
| codedb | 20 | 12.6k | 17 ms | 0.85 ms |
| merjs | 100 | 17.3k | 16 ms | 0.16 ms |
| openclaw/openclaw | 11,281 | 2.29M | 2.9 s | 6.66 ms |
| vitessio/vitess | 5,028 | 2.18M | ~2 s | 0.40 ms |
| Indexes are built once on startup. After that, the file watcher keeps them updated incrementally (single-file re-index: <2ms). Queries never re-scan the filesystem. |
- MCP server indexes once on startup β all queries hit in-memory data structures (O(1) hash lookups)
- CLI pays ~55ms process startup + full filesystem scan on every invocation
- ast-grep re-parses all files through tree-sitter on every call (~3ms)
- ripgrep/grep brute-force scan every file on every call (~5-7ms)
- The MCP advantage: index once, query thousands of times at sub-millisecond latency
| Feature | codedb MCP | codedb CLI | ast-grep | ripgrep | grep | ctags |
|---|---|---|---|---|---|---|
| Structural parsing | β | β | β | β | β | β |
| Trigram search index | β | β | β | β | β | β |
| Inverted word index | β | β | β | β | β | β |
| Dependency graph | β | β | β | β | β | β |
| Version tracking | β | β | β | β | β | β |
| Multi-agent locking | β | β | β | β | β | β |
| Pre-indexed (warm) | β | β | β | β | β | β |
| No process startup | β | β | β | β | β | β |
| MCP protocol | β | β | β | β | β | β |
| Full-text search | β | β | β | β | β | β |
| Atomic file edits | β | β | β | β | β | β |
| File watcher | β | β | β | β | β | β |
codedb = tree-sitter + search index + dependency graph + agent runtime. Zero external dependencies. Pure Zig. Single binary.
βββββββββββββββ βββββββββββββββ
β HTTP :7719 β β MCP stdio β
β server.zig β β mcp.zig β
ββββββββ¬βββββββ ββββββββ¬βββββββ
β β
βββββββββ¬ββββββββββββ
β
ββββββββββββΌβββββββββββ
β Explorer β
β explore.zig β
β βββββββββββββββββ β
β β WordIndex β β
β β TrigramIndex β β
β β Outlines β β
β β Contents β β
β β DepGraph β β
β βββββββββββββββββ β
ββββββββββββ¬βββββββββββ
β
ββββββββββββΌβββββββββββ
β Store βββββ data.log
β store.zig β
ββββββββββββ¬βββββββββββ
β
ββββββββββββΌβββββββββββ
β Watcher β β polls every 2s
β watcher.zig β
β (FilteredWalker) β
βββββββββββββββββββββββ
No SQLite. No dependencies. Purpose-built data model:
- Explorer β structural index engine. Parses Zig, Python, TypeScript/JavaScript. Maintains outlines, trigram index, inverted word index, content cache, and dependency graph behind a single mutex.
- Store β append-only version log. Every mutation (snapshot, edit, delete) gets a monotonically increasing sequence number. Version history capped at 100 per file.
- Watcher β polling file watcher (2s interval).
FilteredWalkerprunes.git,node_modules,zig-cache,__pycache__, etc. before descending. - Agents β first-class structs with cursors, heartbeats, and exclusive file locks. Stale agents reaped after 30s.
| Thread | Role |
|---|---|
| Main | HTTP accept loop or MCP read loop |
| Watcher | Polls filesystem every 2s via FilteredWalker |
| ISR | Rebuilds snapshot when stale flag is set |
| Reap | Cleans up stale agents every 5s |
| Per-connection | HTTP server spawns a thread per connection |
All threads share a shutdown: atomic.Value(bool) for graceful termination.
codedb collects anonymous usage telemetry to improve the tool. Telemetry is written to ~/.codedb/telemetry.ndjson and synced to the codedb analytics endpoint on session close. No source code, file contents, file paths, or search queries are collected β only aggregate tool call counts, latency, and startup stats.
| Location | Contents | Purpose |
|---|---|---|
~/.codedb/projects/<hash>/ |
Trigram index, frequency table, data log | Persistent index cache |
~/.codedb/telemetry.ndjson |
Aggregate tool calls and startup stats | Local telemetry log |
./codedb.snapshot |
File tree, outlines, content, frequency table | Portable snapshot for instant MCP startup |
Not stored: No source code is sent anywhere. No file contents, file paths, or search queries are collected in telemetry. Sensitive files auto-excluded (.env*, credentials.json, secrets.*, .pem, .key, SSH keys, AWS configs).
To disable the local telemetry log entirely, set CODEDB_NO_TELEMETRY=1.
To sync the local NDJSON file into Postgres for analysis or dashboards, use scripts/sync-telemetry.py with the schema in docs/telemetry/postgres-schema.sql. The data flow is documented in docs/telemetry.md.
rm -rf ~/.codedb/ # clear all cached indexes
rm -f codedb.snapshot # remove snapshot from projectRequirements: Zig 0.15+
git clone https://github.com/justrach/codedb.git
cd codedb
zig build # debug build
zig build -Doptimize=ReleaseFast # release build
zig build test # run tests
zig build bench # run benchmarksBinary: zig-out/bin/codedb
zig build -Doptimize=ReleaseFast -Dtarget=x86_64-linux
zig build -Doptimize=ReleaseFast -Dtarget=aarch64-linux
zig build -Doptimize=ReleaseFast -Dtarget=x86_64-macos./release.sh 0.2.0 # build, codesign, notarize, upload to GitHub Releases
./release.sh 0.2.0 --dry-run # preview without executingSee LICENSE for details.
