Skip to content

feat: fuzzy file search — codedb_find with typo-tolerant matching #163

@justrach

Description

@justrach

Problem

Agents currently use codedb_tree (dumps entire file tree) or codedb_search (substring match) to find files. Neither supports fuzzy/typo-tolerant matching. An agent looking for auth_middleware.py has to know the exact name — typing authmiddleware or auth_midlware returns nothing.

Proposed Solution

Add a codedb_find MCP tool for fuzzy file search:

{"name": "codedb_find", "arguments": {"query": "authmidlware"}}

Returns ranked file matches:

1. src/auth_middleware.py (score: 0.92)
2. src/middleware/auth.py (score: 0.85)
3. tests/test_auth_middleware.py (score: 0.78)

Matching algorithm

Use subsequence matching with scoring — the query characters must appear in order in the filename, but gaps are allowed. Score based on:

  1. Consecutive matchesauth matching auth_middleware at position 0 scores higher than scattered matches
  2. Word boundary bonus — matching at _, /, . boundaries scores higher
  3. Path depth penalty — shallower files rank higher
  4. Filename vs path — matches in the filename score higher than directory matches

Implementation

  • Search against explorer.outlines keys (all indexed file paths)
  • No new index needed — just iterate paths and score
  • Return top 10 results sorted by score
  • Add to tools_list in src/mcp.zig with description: "Fuzzy file search — finds files by approximate name. Typo-tolerant. Use when you know roughly what file you are looking for but not the exact path."

Scoring function (Zig pseudocode)

fn fuzzyScore(query: []const u8, path: []const u8) ?f32 {
    // Subsequence match — all query chars must appear in order
    var qi: usize = 0;
    var score: f32 = 0;
    var prev_match: ?usize = null;
    for (path, 0..) |ch, pi| {
        if (qi < query.len and toLower(ch) == toLower(query[qi])) {
            // Consecutive match bonus
            if (prev_match != null and pi == prev_match.? + 1) score += 2.0;
            // Word boundary bonus
            if (pi == 0 or path[pi-1] == / or path[pi-1] == _ or path[pi-1] == .) score += 3.0;
            // Base match
            score += 1.0;
            prev_match = pi;
            qi += 1;
        }
    }
    if (qi < query.len) return null; // not all chars matched
    // Normalize by query length, penalize long paths
    return score / @as(f32, @floatFromInt(path.len));
}

Files to modify

  • src/mcp.zig — add codedb_find to Tool enum, tools_list, dispatch, handleFind
  • src/explore.zig — add fuzzyFindFiles method
  • src/tests.zig — test cases for fuzzy matching

Test cases

test "fuzzy: exact match scores highest" { ... }
test "fuzzy: subsequence match works" { ... }
test "fuzzy: typo-tolerant (missing char)" { ... }
test "fuzzy: word boundary bonus" { ... }
test "fuzzy: filename ranks above directory" { ... }

Why this matters

Token efficiency — agents currently dump the full tree (hundreds of tokens) or grep broadly just to find a file. A fuzzy find returns 10 ranked results in ~50 tokens.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions