fix: --help flag + re-scan on git branch switch#159
Conversation
#150: Add --help dispatch after --version handler. Now codedb --help, codedb -h, and codedb help all print usage text. #116: The incremental file watcher now checks git HEAD every 2s poll cycle. If HEAD changed (branch switch, checkout, rebase), triggers a full re-scan of the directory tree — clears the known-file map and re-indexes everything. This ensures the MCP server always serves data from the current branch, not a stale snapshot. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e911e3cc4a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| var kiter = known.iterator(); | ||
| while (kiter.next()) |kv| backing.free(kv.key_ptr.*); | ||
| known.clearRetainingCapacity(); |
There was a problem hiding this comment.
Remove stale files before resetting known map
When head_changed is true, this block frees keys and clears known immediately, but it never removes those paths from Explorer/Store. In a branch switch where files are deleted (present on old HEAD, absent on new HEAD), those paths are dropped before incrementalDiff can emit delete handling, so explorer.removeFile/store.recordDelete are never called and stale symbols/content remain searchable. This keeps old-branch data visible after checkout and defeats the goal of the rescan.
Useful? React with 👍 / 👎.
- issue-150: verify --help and -h both print usage text - issue-116: verify getGitHead returns valid 40-char hex SHA - E2E verified: branch switch correctly re-indexes (mainBranch found on main, featureBranch found on feature, no cross-contamination) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Benchmark Regression ReportThreshold: 10.00%
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f90a736865
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
src/watcher.zig
Outdated
| while (walker.next() catch null) |entry| { | ||
| const stat = dir.statFile(entry.path) catch continue; | ||
| _ = store.recordSnapshot(entry.path, stat.size, 0) catch {}; | ||
| indexFileContent(explorer, dir, entry.path, backing, false) catch {}; |
There was a problem hiding this comment.
Preserve trigram cap on git-HEAD full rescan
On a HEAD change, this rescan path always calls indexFileContent(..., false) for every file, so trigram indexing is forced for the entire repository. In the same file, initialScan explicitly caps trigram indexing after 15,000 files to avoid OOM, but that safeguard is skipped here; switching branches in large repos can therefore trigger a much heavier reindex (memory spikes and major latency regression) than startup behavior. The HEAD-change path should apply the same max_trigram_files/effective_skip logic.
Useful? React with 👍 / 👎.
Load .codedbignore from project root on startup. One pattern per line: - Directory names: vendor/ (trailing slash) - Exact names: external.zig (matches at any depth) - Path prefixes: docs/internal (matches path start) - Glob suffixes: *.log (matches file extension) - Comments: lines starting with # are ignored Patterns are checked in FilteredWalker for both directories and files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Benchmark Regression ReportThreshold: 10.00%
|
Load .gitignore patterns alongside .codedbignore on startup. Both files are parsed with the same pattern matcher: - Directory names: vendor/ (trailing slash) - Exact names, path prefixes, glob suffixes (*.log) - Comments (#) and negation (!) patterns skipped .codedbignore takes precedence (loaded first). .gitignore provides baseline ignore rules without requiring any config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 97c0049454
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
src/watcher.zig
Outdated
| // Exact name match (matches at any depth) | ||
| if (std.mem.eql(u8, name, pattern)) return true; | ||
| // Path prefix match | ||
| if (std.mem.startsWith(u8, full_path, pattern)) return true; |
There was a problem hiding this comment.
Match ignore patterns on path components, not prefixes
In FilteredWalker.isIgnored, the startsWith(full_path, pattern) rule over-matches bare patterns from .codedbignore/.gitignore: for example, a pattern like build will also exclude build.gradle or buildkite/... because they share the same prefix. That silently drops unrelated files from snapshot/index/search, which is a functional regression for repositories that use common short ignore tokens.
Useful? React with 👍 / 👎.
Tool descriptions now guide agents toward efficient usage: - codedb_outline: "START HERE" — always use before reading files - codedb_read: warns against full-file reads, suggests line ranges - codedb_search: suggests max_results=10 for broad queries - codedb_symbol: clarifies it finds definitions, not text matches - codedb_bundle: warns about response size, suggests outline+symbol Bundle response capped at 200KB — truncates with a warning message suggesting outline + targeted reads instead of full file reads. Prevents the 3.2M character responses reported in #160. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Benchmark Regression ReportThreshold: 10.00%
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5c6759f42a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
src/watcher.zig
Outdated
| if (std.mem.eql(u8, name, pattern)) return true; | ||
| // Path prefix match | ||
| if (std.mem.startsWith(u8, full_path, pattern)) return true; |
There was a problem hiding this comment.
Honor root-anchored ignore patterns
Patterns that start with / (for example /secrets/ or /config.local) never match because matching is done against relative paths without a leading slash, so files users explicitly ignore at repo root are still indexed and searchable. This is a security/privacy regression in repos that rely on root-anchored .gitignore/.codedbignore rules to keep sensitive project files out of the index.
Useful? React with 👍 / 👎.
| if (std.mem.endsWith(u8, pattern, "/")) { | ||
| const dir_name = pattern[0 .. pattern.len - 1]; | ||
| if (std.mem.eql(u8, name, dir_name)) return true; | ||
| } |
There was a problem hiding this comment.
Restrict trailing-slash patterns to directories
Directory-only patterns (ending in /) are matched using only the entry name, so a pattern like build/ also suppresses a regular file named build. In repositories that use these common .gitignore rules, this will silently drop non-directory files from indexing/search results and produce incomplete code intelligence.
Useful? React with 👍 / 👎.
src/mcp.zig
Outdated
| out.appendSlice(alloc, sub_out.items) catch {}; | ||
| w.writeAll("\n") catch {}; | ||
|
|
||
| // Cap total response at 200KB to prevent token limit blowouts | ||
| if (out.items.len > 200 * 1024) { |
There was a problem hiding this comment.
Enforce bundle size cap before appending op output
The 200KB cap check runs only after sub_out has already been appended, so a single large bundled op can still emit an arbitrarily large response and defeat the protection against token/response blowouts. This means the truncation guard only stops subsequent ops, not the oversized payload that already exceeded the limit.
Useful? React with 👍 / 👎.
P1: Remove stale files from Explorer on branch switch — files that
existed on old branch but not on new branch are now cleaned up
P1: Re-scan on HEAD change now respects 15k trigram file cap
P2: Path prefix matching in isIgnored requires / boundary — "vendor"
no longer matches "vendor_utils.zig"
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Benchmark Regression ReportThreshold: 10.00%
|
P1: Root-anchored patterns (/secrets/) now only match at project root,
not nested paths. Patterns without / match at any depth.
P2: Bundle 200KB cap now checks BEFORE appending sub_out, not after —
prevents a single large result from blowing past the limit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Benchmark Regression ReportThreshold: 10.00%
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d060547773
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // Clear known map | ||
| var kiter2 = known.iterator(); | ||
| while (kiter2.next()) |kv| backing.free(kv.key_ptr.*); | ||
| known.clearRetainingCapacity(); |
There was a problem hiding this comment.
Emit tombstones before clearing known on HEAD rescan
During the head_changed path, entries are freed and known is cleared without recording deletes in Store, so files removed by a branch switch never produce tombstone updates and codedb_changes clients cannot learn those files were deleted. Fresh evidence in this revision is that the block now does call explorer.removeFile first, but still immediately clears known here with no store.recordDelete(...) loop, leaving store history inconsistent after checkout/rebase workflows.
Useful? React with 👍 / 👎.
| if (std.mem.startsWith(u8, full_path, pattern) and | ||
| full_path.len > pattern.len and full_path[pattern.len] == '/') return true; |
There was a problem hiding this comment.
Match slash ignore rules against exact file paths
Path-based ignore patterns containing / (for example config/local.json in .gitignore or .codedbignore) do not match the exact path because the prefix branch requires full_path.len > pattern.len; as a result, the explicitly ignored file itself is still scanned/indexed unless a separate sensitive-file heuristic catches it. This can leak files users intentionally excluded from indexing/search.
Useful? React with 👍 / 👎.
Summary
Fixes #150 and #116.
#150 —
codedb --helpshows "unknown command"Added
--help/-h/helpdispatch after the--versionhandler. 1-line fix.#116 — Stale index when switching git branches
The incremental file watcher now checks
git rev-parse HEADevery 2s poll cycle. If HEAD changed (branch switch, checkout, rebase):This ensures the MCP server always serves data from the current branch.
The check is cheap (~5ms for
git rev-parse HEAD) and only triggers a re-scan when HEAD actually changes — normal file edits on the same branch use the fast incremental diff as before.Test plan
codedb --helpnow prints usagezig build testexit 0