Skip to content

RLV Locator: add lightweight semantic search as third RRF signal #89

@unamedkr

Description

@unamedkr

Summary

Add a lightweight sentence embedding model (all-MiniLM-L6-v2, 22M params) as the third signal in the Reciprocal Rank Fusion (RRF) locator, catching semantic matches that BM25 and keywords miss.

Problem

Current locator uses BM25 + keyword overlap. On the 1.3MB large-doc test, Q15 fails because "temnein" (Greek word meaning "to cut") is in chunk 531, but BM25 picks chunk 553 (which discusses "Stegocephalia" = "roof-headed"). The semantic connection between "what does temnein mean" and the chunk containing "temnein (to cut)" is obvious to humans but invisible to keyword matching.

Proposed Solution

# Three-signal RRF (currently two)
rrf[cid] = (1/(60+rank_keyword) +
            1/(60+rank_bm25) +
            1/(60+rank_semantic))  # NEW

Embedding model selection

Model Params CPU latency Quality
all-MiniLM-L6-v2 22M ~30ms/query Good
BGE-small-en 33M ~50ms/query Better
nomic-embed-text 137M ~200ms/query Best

MiniLM is recommended: 30ms per query on CPU, no GPU needed.

Pre-computation

Chunk embeddings are computed once during quantcpp index and stored alongside KV caches. Per-query cost is only one embedding (30ms).

Expected Impact

  • Q15 (temnein): semantic similarity catches the correct chunk
  • 19/20 → 20/20 on large-doc test
  • General: better handling of paraphrased/synonym queries

Priority: P2

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions