-
Notifications
You must be signed in to change notification settings - Fork 37
feat: fuzzy file search — codedb_find with typo-tolerant matching #163
Copy link
Copy link
Closed
Labels
priority:p2Medium priorityMedium priority
Description
Problem
Agents currently use codedb_tree (dumps entire file tree) or codedb_search (substring match) to find files. Neither supports fuzzy/typo-tolerant matching. An agent looking for auth_middleware.py has to know the exact name — typing authmiddleware or auth_midlware returns nothing.
Proposed Solution
Add a codedb_find MCP tool for fuzzy file search:
{"name": "codedb_find", "arguments": {"query": "authmidlware"}}Returns ranked file matches:
1. src/auth_middleware.py (score: 0.92)
2. src/middleware/auth.py (score: 0.85)
3. tests/test_auth_middleware.py (score: 0.78)
Matching algorithm
Use subsequence matching with scoring — the query characters must appear in order in the filename, but gaps are allowed. Score based on:
- Consecutive matches —
authmatchingauth_middlewareat position 0 scores higher than scattered matches - Word boundary bonus — matching at
_,/,.boundaries scores higher - Path depth penalty — shallower files rank higher
- Filename vs path — matches in the filename score higher than directory matches
Implementation
- Search against
explorer.outlineskeys (all indexed file paths) - No new index needed — just iterate paths and score
- Return top 10 results sorted by score
- Add to
tools_listinsrc/mcp.zigwith description: "Fuzzy file search — finds files by approximate name. Typo-tolerant. Use when you know roughly what file you are looking for but not the exact path."
Scoring function (Zig pseudocode)
fn fuzzyScore(query: []const u8, path: []const u8) ?f32 {
// Subsequence match — all query chars must appear in order
var qi: usize = 0;
var score: f32 = 0;
var prev_match: ?usize = null;
for (path, 0..) |ch, pi| {
if (qi < query.len and toLower(ch) == toLower(query[qi])) {
// Consecutive match bonus
if (prev_match != null and pi == prev_match.? + 1) score += 2.0;
// Word boundary bonus
if (pi == 0 or path[pi-1] == / or path[pi-1] == _ or path[pi-1] == .) score += 3.0;
// Base match
score += 1.0;
prev_match = pi;
qi += 1;
}
}
if (qi < query.len) return null; // not all chars matched
// Normalize by query length, penalize long paths
return score / @as(f32, @floatFromInt(path.len));
}Files to modify
src/mcp.zig— addcodedb_findto Tool enum, tools_list, dispatch, handleFindsrc/explore.zig— addfuzzyFindFilesmethodsrc/tests.zig— test cases for fuzzy matching
Test cases
test "fuzzy: exact match scores highest" { ... }
test "fuzzy: subsequence match works" { ... }
test "fuzzy: typo-tolerant (missing char)" { ... }
test "fuzzy: word boundary bonus" { ... }
test "fuzzy: filename ranks above directory" { ... }Why this matters
Token efficiency — agents currently dump the full tree (hundreds of tokens) or grep broadly just to find a file. A fuzzy find returns 10 ranked results in ~50 tokens.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
priority:p2Medium priorityMedium priority