Skip to content

feat: Auto-heal for silent semantic search failure#7

Closed
PatrickSys wants to merge 2 commits intomasterfrom
release/v1.3.1
Closed

feat: Auto-heal for silent semantic search failure#7
PatrickSys wants to merge 2 commits intomasterfrom
release/v1.3.1

Conversation

@PatrickSys
Copy link
Owner

@PatrickSys PatrickSys commented Jan 6, 2026

Description

This PR implements an auto-heal mechanism to resolve the 'Silent Semantic Search Failure' issue.

Changes

  • Adds automatic detection and recovery when the vector index becomes corrupted
  • Implements new \IndexCorruptedError\ custom error type
  • Triggers automatic re-indexing when critical index failures occur
  • Falls back to keyword search for non-critical errors
  • Updates version to v1.3.1

Testing

  • Verified in real-world scenarios
  • New test scripts validate the auto-heal behavior

@greptile-apps
Copy link

greptile-apps bot commented Jan 6, 2026

Greptile Summary

Implements automatic corruption detection and healing for LanceDB semantic search failures. When the vector database schema is corrupted (missing vector column), the system now throws IndexCorruptedError which triggers synchronous re-indexing and automatic retry instead of silently falling back to keyword-only search.

Key Changes

  • New Error Type: IndexCorruptedError signals schema corruption requiring re-indexing
  • Detection Points: Corruption detected during initialization, schema validation, and search operations in lancedb.ts
  • Propagation: Errors bubble through search.ts to index.ts handler without silent catches
  • Auto-Heal: MCP server catches IndexCorruptedError, triggers performIndexing(), and retries search with fresh CodebaseSearcher instance
  • Test Coverage: Three new test suites validate corruption detection, error propagation, and E2E auto-heal flow
  • Embedding Model: Changed from bge-base-en-v1.5 (768d) to bge-small-en-v1.5 (384d) - requires re-indexing for existing users

Potential Issues

  • Broad error catching in lancedb.ts:192-193 may trigger false-positive auto-heals for transient errors (see inline comment)
  • Embedding dimension change is breaking but not highlighted in changelog as requiring manual re-index

Confidence Score: 4/5

  • Safe to merge with one notable issue around overly broad error catching
  • Score reflects solid implementation of auto-heal mechanism with comprehensive test coverage, but the broad error catching pattern in lancedb.ts could cause unnecessary re-indexing for non-corruption errors. The embedding model change is also breaking for existing users but handled by auto-heal.
  • Pay close attention to src/storage/lancedb.ts - the catch-all error handling at line 192 may need refinement to avoid false positives

Important Files Changed

Filename Overview
src/errors/index.ts New custom error class for LanceDB corruption detection - clean implementation
src/storage/lancedb.ts Implements corruption detection with IndexCorruptedError throws - overly broad error catching may trigger false positives
src/core/search.ts Propagates IndexCorruptedError through initialization and semantic search - clean pass-through logic
src/index.ts Implements auto-heal mechanism with synchronous re-indexing and retry - comprehensive error handling
src/core/indexer.ts Changed embedding model from bge-base (768d) to bge-small (384d) and formatting fix - requires re-indexing

Sequence Diagram

sequenceDiagram
    participant User
    participant MCP as MCP Server
    participant Searcher as CodebaseSearcher
    participant Storage as LanceDBStorage
    participant Indexer as CodebaseIndexer

    User->>MCP: search_codebase(query)
    MCP->>Searcher: search(query, limit, filters)
    
    Searcher->>Searcher: initialize()
    Searcher->>Storage: initialize(storagePath)
    
    alt Table exists but schema invalid
        Storage->>Storage: Check schema for vector column
        Storage-->>Storage: Missing vector column detected
        Storage->>Storage: dropTable('code_chunks')
        Storage-->>Searcher: throw IndexCorruptedError
        Searcher-->>MCP: throw IndexCorruptedError
    else Table missing
        Storage->>Searcher: search() called with no table
        Storage-->>Searcher: throw IndexCorruptedError
        Searcher-->>MCP: throw IndexCorruptedError
    end
    
    MCP->>MCP: Catch IndexCorruptedError
    MCP->>MCP: Log "[Auto-Heal] Index corrupted..."
    MCP->>Indexer: performIndexing()
    Indexer->>Indexer: index() - full re-index
    Indexer->>Storage: store(chunks with embeddings)
    Storage->>Storage: createTable('code_chunks')
    Indexer-->>MCP: indexing complete
    
    alt Re-indexing successful
        MCP->>MCP: Check indexState.status === 'ready'
        MCP->>Searcher: new CodebaseSearcher(ROOT_PATH)
        MCP->>Searcher: search(query, limit, filters)
        Searcher->>Storage: search(queryVector, limit)
        Storage-->>Searcher: results
        Searcher-->>MCP: SearchResult[]
        MCP-->>User: Success response with results
    else Re-indexing failed
        MCP-->>User: Error response (auto-heal failed)
    end
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

11 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +192 to +193
if (error instanceof Error && (error.message.includes('LanceDB') || error.message.includes('Arrow'))) {
throw new IndexCorruptedError(`LanceDB runtime error: ${error.message}`);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Overly broad error catching - any error matching "LanceDB" or "Arrow" triggers auto-heal

This catch-all may cause unnecessary re-indexing for transient network issues, OOM errors, or other non-corruption problems. Consider narrowing to specific error types or adding additional validation before throwing IndexCorruptedError.

Suggested change
if (error instanceof Error && (error.message.includes('LanceDB') || error.message.includes('Arrow'))) {
throw new IndexCorruptedError(`LanceDB runtime error: ${error.message}`);
if (error instanceof Error && error.message.includes('No vector column')) {
throw new IndexCorruptedError('LanceDB index corrupted: missing vector column');
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/storage/lancedb.ts
Line: 192:193

Comment:
**logic:** Overly broad error catching - any error matching "LanceDB" or "Arrow" triggers auto-heal

This catch-all may cause unnecessary re-indexing for transient network issues, OOM errors, or other non-corruption problems. Consider narrowing to specific error types or adding additional validation before throwing `IndexCorruptedError`.

```suggestion
      if (error instanceof Error && error.message.includes('No vector column')) {
        throw new IndexCorruptedError('LanceDB index corrupted: missing vector column');
      }
```

How can I resolve this? If you propose a fix, please make it concise.

@PatrickSys PatrickSys changed the title Release v1.3.1: Auto-Heal for Silent Semantic Search Failure feat: Auto-heal for silent semantic search failure Jan 6, 2026
@PatrickSys
Copy link
Owner Author

Closing in favor of a new PR with proper branch naming. See the new PR for this feature.

@PatrickSys PatrickSys closed this Jan 6, 2026
@PatrickSys PatrickSys deleted the release/v1.3.1 branch January 6, 2026 10:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant