Skip to content

fix: --help flag + re-scan on git branch switch#159

Merged
justrach merged 7 commits intomainfrom
fix/150-116-help-stale-index
Apr 5, 2026
Merged

fix: --help flag + re-scan on git branch switch#159
justrach merged 7 commits intomainfrom
fix/150-116-help-stale-index

Conversation

@justrach
Copy link
Copy Markdown
Owner

@justrach justrach commented Apr 5, 2026

Summary

Fixes #150 and #116.

#150codedb --help shows "unknown command"

Added --help / -h / help dispatch after the --version handler. 1-line fix.

#116 — Stale index when switching git branches

The incremental file watcher now checks git rev-parse HEAD every 2s poll cycle. If HEAD changed (branch switch, checkout, rebase):

  1. Clears the known-file map
  2. Full re-scan of the directory tree
  3. Re-indexes all files

This ensures the MCP server always serves data from the current branch.

The check is cheap (~5ms for git rev-parse HEAD) and only triggers a re-scan when HEAD actually changes — normal file edits on the same branch use the fast incremental diff as before.

Test plan

  • codedb --help now prints usage
  • All existing tests pass
  • zig build test exit 0

#150: Add --help dispatch after --version handler. Now codedb --help,
codedb -h, and codedb help all print usage text.

#116: The incremental file watcher now checks git HEAD every 2s poll
cycle. If HEAD changed (branch switch, checkout, rebase), triggers a
full re-scan of the directory tree — clears the known-file map and
re-indexes everything. This ensures the MCP server always serves
data from the current branch, not a stale snapshot.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e911e3cc4a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +336 to +338
var kiter = known.iterator();
while (kiter.next()) |kv| backing.free(kv.key_ptr.*);
known.clearRetainingCapacity();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove stale files before resetting known map

When head_changed is true, this block frees keys and clears known immediately, but it never removes those paths from Explorer/Store. In a branch switch where files are deleted (present on old HEAD, absent on new HEAD), those paths are dropped before incrementalDiff can emit delete handling, so explorer.removeFile/store.recordDelete are never called and stale symbols/content remain searchable. This keeps old-branch data visible after checkout and defeats the goal of the rescan.

Useful? React with 👍 / 👎.

- issue-150: verify --help and -h both print usage text
- issue-116: verify getGitHead returns valid 40-char hex SHA
- E2E verified: branch switch correctly re-indexes (mainBranch found
  on main, featureBranch found on feature, no cross-contamination)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 5, 2026

Benchmark Regression Report

Threshold: 10.00%

Tool Base (ns) Head (ns) Delta Status
codedb_bundle 615449 654037 +6.27% OK
codedb_changes 108957 113341 +4.02% OK
codedb_deps 29990 29150 -2.80% OK
codedb_edit 16928 17402 +2.80% OK
codedb_hot 157807 156501 -0.83% OK
codedb_outline 391427 372963 -4.72% OK
codedb_read 164671 163282 -0.84% OK
codedb_search 250021 254020 +1.60% OK
codedb_snapshot 3464127 3471714 +0.22% OK
codedb_status 30338 28123 -7.30% OK
codedb_symbol 59526 60871 +2.26% OK
codedb_tree 75943 71887 -5.34% OK
codedb_word 89601 89343 -0.29% OK

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f90a736865

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/watcher.zig Outdated
while (walker.next() catch null) |entry| {
const stat = dir.statFile(entry.path) catch continue;
_ = store.recordSnapshot(entry.path, stat.size, 0) catch {};
indexFileContent(explorer, dir, entry.path, backing, false) catch {};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve trigram cap on git-HEAD full rescan

On a HEAD change, this rescan path always calls indexFileContent(..., false) for every file, so trigram indexing is forced for the entire repository. In the same file, initialScan explicitly caps trigram indexing after 15,000 files to avoid OOM, but that safeguard is skipped here; switching branches in large repos can therefore trigger a much heavier reindex (memory spikes and major latency regression) than startup behavior. The HEAD-change path should apply the same max_trigram_files/effective_skip logic.

Useful? React with 👍 / 👎.

Load .codedbignore from project root on startup. One pattern per line:
- Directory names: vendor/ (trailing slash)
- Exact names: external.zig (matches at any depth)
- Path prefixes: docs/internal (matches path start)
- Glob suffixes: *.log (matches file extension)
- Comments: lines starting with # are ignored

Patterns are checked in FilteredWalker for both directories and files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 5, 2026

Benchmark Regression Report

Threshold: 10.00%

Tool Base (ns) Head (ns) Delta Status
codedb_bundle 620665 614981 -0.92% OK
codedb_changes 110683 106122 -4.12% OK
codedb_deps 28846 29492 +2.24% OK
codedb_edit 17977 16781 -6.65% OK
codedb_hot 158177 167346 +5.80% OK
codedb_outline 396086 382192 -3.51% OK
codedb_read 169078 166867 -1.31% OK
codedb_search 253250 250232 -1.19% OK
codedb_snapshot 3482539 3412887 -2.00% OK
codedb_status 31939 28097 -12.03% OK
codedb_symbol 60280 59705 -0.95% OK
codedb_tree 75271 77473 +2.93% OK
codedb_word 90765 89863 -0.99% OK

Load .gitignore patterns alongside .codedbignore on startup. Both files
are parsed with the same pattern matcher:
- Directory names: vendor/ (trailing slash)
- Exact names, path prefixes, glob suffixes (*.log)
- Comments (#) and negation (!) patterns skipped

.codedbignore takes precedence (loaded first). .gitignore provides
baseline ignore rules without requiring any config.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 97c0049454

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/watcher.zig Outdated
// Exact name match (matches at any depth)
if (std.mem.eql(u8, name, pattern)) return true;
// Path prefix match
if (std.mem.startsWith(u8, full_path, pattern)) return true;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Match ignore patterns on path components, not prefixes

In FilteredWalker.isIgnored, the startsWith(full_path, pattern) rule over-matches bare patterns from .codedbignore/.gitignore: for example, a pattern like build will also exclude build.gradle or buildkite/... because they share the same prefix. That silently drops unrelated files from snapshot/index/search, which is a functional regression for repositories that use common short ignore tokens.

Useful? React with 👍 / 👎.

Tool descriptions now guide agents toward efficient usage:
- codedb_outline: "START HERE" — always use before reading files
- codedb_read: warns against full-file reads, suggests line ranges
- codedb_search: suggests max_results=10 for broad queries
- codedb_symbol: clarifies it finds definitions, not text matches
- codedb_bundle: warns about response size, suggests outline+symbol

Bundle response capped at 200KB — truncates with a warning message
suggesting outline + targeted reads instead of full file reads.
Prevents the 3.2M character responses reported in #160.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 5, 2026

Benchmark Regression Report

Threshold: 10.00%

Tool Base (ns) Head (ns) Delta Status
codedb_bundle 622885 623278 +0.06% OK
codedb_changes 110987 109817 -1.05% OK
codedb_deps 28936 29363 +1.48% OK
codedb_edit 18256 17359 -4.91% OK
codedb_hot 159741 161698 +1.23% OK
codedb_outline 384586 382954 -0.42% OK
codedb_read 169850 165474 -2.58% OK
codedb_search 251862 257964 +2.42% OK
codedb_snapshot 3463479 3456535 -0.20% OK
codedb_status 29871 31461 +5.32% OK
codedb_symbol 60540 59262 -2.11% OK
codedb_tree 75256 77882 +3.49% OK
codedb_word 89941 90832 +0.99% OK

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5c6759f42a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/watcher.zig Outdated
Comment on lines +222 to +224
if (std.mem.eql(u8, name, pattern)) return true;
// Path prefix match
if (std.mem.startsWith(u8, full_path, pattern)) return true;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Honor root-anchored ignore patterns

Patterns that start with / (for example /secrets/ or /config.local) never match because matching is done against relative paths without a leading slash, so files users explicitly ignore at repo root are still indexed and searchable. This is a security/privacy regression in repos that rely on root-anchored .gitignore/.codedbignore rules to keep sensitive project files out of the index.

Useful? React with 👍 / 👎.

Comment on lines +217 to +220
if (std.mem.endsWith(u8, pattern, "/")) {
const dir_name = pattern[0 .. pattern.len - 1];
if (std.mem.eql(u8, name, dir_name)) return true;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restrict trailing-slash patterns to directories

Directory-only patterns (ending in /) are matched using only the entry name, so a pattern like build/ also suppresses a regular file named build. In repositories that use these common .gitignore rules, this will silently drop non-directory files from indexing/search results and produce incomplete code intelligence.

Useful? React with 👍 / 👎.

src/mcp.zig Outdated
Comment on lines +1051 to +1055
out.appendSlice(alloc, sub_out.items) catch {};
w.writeAll("\n") catch {};

// Cap total response at 200KB to prevent token limit blowouts
if (out.items.len > 200 * 1024) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enforce bundle size cap before appending op output

The 200KB cap check runs only after sub_out has already been appended, so a single large bundled op can still emit an arbitrarily large response and defeat the protection against token/response blowouts. This means the truncation guard only stops subsequent ops, not the oversized payload that already exceeded the limit.

Useful? React with 👍 / 👎.

P1: Remove stale files from Explorer on branch switch — files that
    existed on old branch but not on new branch are now cleaned up
P1: Re-scan on HEAD change now respects 15k trigram file cap
P2: Path prefix matching in isIgnored requires / boundary — "vendor"
    no longer matches "vendor_utils.zig"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 5, 2026

Benchmark Regression Report

Threshold: 10.00%

Tool Base (ns) Head (ns) Delta Status
codedb_bundle 620849 624858 +0.65% OK
codedb_changes 111456 108419 -2.72% OK
codedb_deps 28927 30352 +4.93% OK
codedb_edit 16968 17877 +5.36% OK
codedb_hot 159584 164821 +3.28% OK
codedb_outline 390716 415389 +6.31% OK
codedb_read 171107 168680 -1.42% OK
codedb_search 252620 259378 +2.68% OK
codedb_snapshot 3473586 3397347 -2.19% OK
codedb_status 29514 30398 +3.00% OK
codedb_symbol 62135 62704 +0.92% OK
codedb_tree 77254 80379 +4.05% OK
codedb_word 91460 91268 -0.21% OK

P1: Root-anchored patterns (/secrets/) now only match at project root,
    not nested paths. Patterns without / match at any depth.
P2: Bundle 200KB cap now checks BEFORE appending sub_out, not after —
    prevents a single large result from blowing past the limit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 5, 2026

Benchmark Regression Report

Threshold: 10.00%

Tool Base (ns) Head (ns) Delta Status
codedb_bundle 619651 617705 -0.31% OK
codedb_changes 108380 106388 -1.84% OK
codedb_deps 28582 29516 +3.27% OK
codedb_edit 16391 16883 +3.00% OK
codedb_hot 155113 157134 +1.30% OK
codedb_outline 390868 379075 -3.02% OK
codedb_read 171459 163762 -4.49% OK
codedb_search 248766 247893 -0.35% OK
codedb_snapshot 3482723 3492668 +0.29% OK
codedb_status 28254 27993 -0.92% OK
codedb_symbol 59552 57510 -3.43% OK
codedb_tree 75096 80544 +7.25% OK
codedb_word 89126 89571 +0.50% OK

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d060547773

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +420 to +423
// Clear known map
var kiter2 = known.iterator();
while (kiter2.next()) |kv| backing.free(kv.key_ptr.*);
known.clearRetainingCapacity();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Emit tombstones before clearing known on HEAD rescan

During the head_changed path, entries are freed and known is cleared without recording deletes in Store, so files removed by a branch switch never produce tombstone updates and codedb_changes clients cannot learn those files were deleted. Fresh evidence in this revision is that the block now does call explorer.removeFile first, but still immediately clears known here with no store.recordDelete(...) loop, leaving store history inconsistent after checkout/rebase workflows.

Useful? React with 👍 / 👎.

Comment on lines +237 to +238
if (std.mem.startsWith(u8, full_path, pattern) and
full_path.len > pattern.len and full_path[pattern.len] == '/') return true;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Match slash ignore rules against exact file paths

Path-based ignore patterns containing / (for example config/local.json in .gitignore or .codedbignore) do not match the exact path because the prefix branch requires full_path.len > pattern.len; as a result, the explicitly ignored file itself is still scanned/indexed unless a separate sensitive-file heuristic catches it. This can leak files users intentionally excluded from indexing/search.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cli: codedb --help shows "unknown command" instead of usage

1 participant