fix(health,index_db): rebuild_index atomic swap prevents index wipeou…#160
fix(health,index_db): rebuild_index atomic swap prevents index wipeou…#160greatjourney589 wants to merge 4 commits into
Conversation
|
Warning Review limit reached
More reviews will be available in 47 minutes and 28 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. 📝 WalkthroughWalkthroughReplaces destructive reset-and-reindex with an atomic temp-file rebuild that indexes durable artifacts into a temp DB while collecting per-artifact failures. On success the temp DB is swapped into place; unreadable artifacts are warned. ChangesIndex rebuild resilience and diagnostics
Sequence DiagramsequenceDiagram
participant Caller
participant rebuild_index
participant temp_db as TempDB(.db.tmp)
participant disk as ArtifactFiles
participant final_db as state.db
Caller->>rebuild_index: rebuild_index(store)
rebuild_index->>temp_db: open temp DB via open_db_at(path)
rebuild_index->>disk: list claims/pages/entities
loop per artifact
disk-->>rebuild_index: read file (YAML/MD)
alt deserialize success
rebuild_index->>temp_db: index_claim/page/entity (+ embeddings)
temp_db-->>rebuild_index: row inserted
else deserialize fails
rebuild_index->>rebuild_index: add artifact to skipped list
end
end
rebuild_index->>final_db: atomic move .db.tmp → state.db
alt skipped not empty
rebuild_index->>Caller: warnings.warn(skipped artifacts)
end
rebuild_index-->>Caller: return stats
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/vouch/health.py`:
- Around line 199-207: The current per-artifact try/except around Claim
deserialization also catches database write errors from index_db.index_claim;
narrow the rescue so only the file read/parse/validation is caught: read and
validate the YAML into Claim (using Claim.model_validate and _yaml_load) inside
a try/except that appends to skipped on deserialization/read failures, but call
index_db.index_claim outside that except so any SQLite/indexing errors bubble
and abort the rebuild; apply the same pattern to the page/entity loops where
index_db.index_page/index_entity are invoked.
- Around line 226-238: rebuild_index currently swaps tmp_path into place with
shutil.move then calls _rebuild_embeddings(store), which can mutate the live DB
and break atomicity if it fails; fix by either (A) moving the embedding rebuild
to the temp DB before the swap (invoke whatever API _rebuild_embeddings or a
variant that targets tmp_path / the new DB file so all mutations happen on the
temp DB) and only shutil.move after that succeeds, or (B) make the post-swap
embedding refresh explicitly best-effort by wrapping the
_rebuild_embeddings(store) call in a try/except that logs the failure (including
exception details) but does not raise, ensuring the swapped DB remains the
canonical state; update rebuild_index to use one of these approaches and
reference shutil.move, tmp_path, db_path, _rebuild_embeddings and rebuild_index
in the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: c3a7afe1-a409-44a1-b95e-be519f633e15
📒 Files selected for processing (3)
src/vouch/health.pysrc/vouch/index_db.pytests/test_index.py
ReviewSummary: Implements the atomic-swap approach (Option A) from #159. The fix is structurally sound — build into a temp file, swap only on success, catch per-artifact parse errors with a skip-and-warn fallback — and the two new tests cover the critical regression paths. A few issues worth discussing below. What works
Suggestions
Verdictrequest changes — The orphaned |
|
Hi,@plind-junior |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/vouch/health.py`:
- Around line 268-272: The current broad except in the get_embedder
import/initialization swallows any error and leaves embedding_index empty after
rebuild_index(); change it to only catch the explicit "embedder unavailable"
exception exported by the embeddings module (e.g. EmbedderUnavailableError or
similarly named exception) by importing that symbol from .embeddings and using
"except EmbedderUnavailableError:" so real initialization errors in
get_embedder() propagate (or are re-raised) and unexpected failures are not
suppressed; also add a processLogger.error or raise after catching to record the
unavailability case while preserving other exceptions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 59df0a0b-214d-4364-a8b4-fc18d41a1c27
📒 Files selected for processing (2)
src/vouch/health.pytests/test_index.py
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 53f1d7dac5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| else: | ||
| # Detect a blown index: DB exists but FTS is empty while artifact files | ||
| # are present on disk. This is the aftermath of a crashed rebuild_index. | ||
| stats = index_db.stats(store.kb_dir) |
There was a problem hiding this comment.
Handle unreadable state.db in doctor
When state.db exists but is corrupt or not a SQLite database (for example after an interrupted write or accidental overwrite), this new index_db.stats() call raises a sqlite3.DatabaseError, so vouch doctor aborts instead of returning a health report that can tell the user to rebuild the derived index. Since the index is explicitly treated as disposable, this path should catch stats/open failures and report a warning/error rather than crashing.
Useful? React with 👍 / 👎.
| except Exception as exc: | ||
| skipped.append((claim_path.name, str(exc))) | ||
| continue |
There was a problem hiding this comment.
Abort rebuild on artifact read failures
If read_text() fails here for an otherwise valid artifact (for example a transient OSError/PermissionError on one claim file), this broad catch treats it like a corrupt YAML file, skips it, and still swaps the temp database into state.db. That drops the artifact from the live search index even though the old index was still usable; read/I/O failures should abort the temp rebuild so the atomic-swap path preserves the existing index.
Useful? React with 👍 / 👎.
| with tempfile.NamedTemporaryFile( | ||
| dir=store.kb_dir, suffix=".db.tmp", delete=False, | ||
| ) as tmp: |
There was a problem hiding this comment.
Use an ignored temp filename for rebuilds
If the process is killed after creating this temp DB but before cleanup, the leftover file is named like tmpabcd.db.tmp under .vouch/. KBStore.init() only ignores state.db and state.db-*, so the crash-safety path can leave an untracked derived SQLite file in the repo; use a state.db- prefix or extend the ignore pattern so crash leftovers stay out of source control.
Useful? React with 👍 / 👎.
| embedder = get_embedder() | ||
| except Exception: | ||
| except (KeyError, ImportError): | ||
| return |
There was a problem hiding this comment.
Keep FTS rebuild independent of embedder startup
With an embedder registered, any non-ImportError startup failure from get_embedder() (for example a sentence-transformers model cache/download or runtime error) now aborts vouch index before the temp DB is swapped, so even the FTS index stays stale or missing. Embeddings are an optional enhancement elsewhere in the codebase, and the previous path ignored these startup failures; this should still skip embeddings and let the durable FTS rebuild complete.
Useful? React with 👍 / 👎.
What changed
Closes #159
rebuild_index()now writes the new index into a temporary file in the same directory, then atomically renames it into place on success. Per-artifact parse errors are caught individually — bad files are skipped with a warning rather than aborting the whole build.vouch doctorgains anindex_blowncheck that fires whenstate.dbis empty but artifact files exist on disk. A newopen_db_at(path)helper inindex_db.pyaccepts an explicit path for the temp-file build.Why
rebuild_index()calledindex_db.reset()— which committed destructive DELETEs — before reading a single artifact file. Any parse error (bad YAML, truncated write,ValidationError) caused the context manager to exit with an empty, committedstate.db. The existing index was destroyed with no recovery path, silently blacking outvouch context,vouch search, and all MCP/JSONL retrieval. Fixes the critical index-wipeout bug; also addresses the gap wherevouch doctorreported no errors after the crash.What might break
Nothing.
rebuild_indexis an internal function — its signature is unchanged. No on-disk format changes. Users with an existing.vouch/directory will see no difference in normal operation; a corrupted artifact that previously crashedvouch indexwill now be skipped with a warning instead.VEP
Not required — no surface change (object model,
kb.*method signatures, on-disk layout, bundle format, or audit-log shape).Tests
make checkpasses locally (lint + mypy + pytest)CHANGELOG.mdupdated under## [Unreleased]Summary by CodeRabbit