feat(embed): role-aware prompts + cache key distinguishment (P0-E2) by ohdearquant · Pull Request #112 · ohdearquant/lattice

ohdearquant · 2026-05-25T20:20:27Z

Layer

L2 — embed service public API (PR8 of 11)

What

document_instruction() on EmbeddingModel: E5 multilingual variants return Some("passage: "). BGE/MiniLM return None. Qwen returns None on document side (raw passage per model card).
New EmbeddingRole enum (Query | Passage | Generic) with distinct cache_tag() strings injected into Blake3 hash in cache.rs::compute_key().
New EmbeddingService trait methods: embed_query(), embed_passage(). Apply query_instruction() / document_instruction() prefix then call embed().
CachedEmbeddingService overrides both methods to call internal embed_with_role() with the appropriate EmbeddingRole, so cache keys are role-distinct.
embed() uses EmbeddingRole::Generic for backwards compatibility.

Why

Two inseparable issues: (a) E5/Qwen need their canonical prompt prefixes applied for production-quality embeddings, (b) without role-aware cache keys, embed_query("hello") and embed_passage("hello") collide in cache.

Result

3 cache key tests (cache.rs): query vs passage vs generic produce distinct keys, deterministic, isolated
7 service tests (service/tests.rs): E5 document instruction, BGE/MiniLM no document instruction, Qwen no document instruction, apply_prefix, role cache tags distinct

Cache migration

Existing on-disk caches written with role:generic will be separate from new role:query / role:passage keys. Old caches remain readable under Generic. No migration needed; callers moving from embed() to embed_passage() will see misses until repopulation.

Stack

Base: #111 (PR7 tokenizer e2e test)
Umbrella: #104

🤖 Generated with Claude Code

Fix document_instruction() to return "passage: " for E5 multilingual variants (was unconditionally None). Add EmbeddingRole enum (Query, Passage, Generic) and embed_query/embed_passage trait methods that apply model-specific prompt prefixes before forwarding. Extend CacheKey hash inputs with role.cache_tag() so query and passage embeddings of the same raw text are stored as separate cache entries. CachedEmbeddingService overrides both role-aware methods with prompt-application + role-keyed cache logic. Existing embed() uses Generic role for backwards compat. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

ohdearquant · 2026-05-25T20:49:22Z

Subsumed by #104 merge (umbrella PR brought all 11 PRs' content to main in one merge commit after stacked-PR base branches collapsed). Codex round-1 findings tracked in #116. Closing as superseded.

This was referenced May 25, 2026

feat(embed): route BGE through CLS pooling, keep E5/MiniLM on mean (P1-E3) #113

Closed

embed-perf-quality show (umbrella draft — slice into ordered PRs below) #104

Merged

embed-perf-quality codex review follow-ups (PR #105-#115) #116

Open

ohdearquant closed this May 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(embed): role-aware prompts + cache key distinguishment (P0-E2)#112

feat(embed): role-aware prompts + cache key distinguishment (P0-E2)#112
ohdearquant wants to merge 1 commit into
pr-embedperf-07-tokenizer-e2e-testfrom
pr-embedperf-08-role-aware-prompts

ohdearquant commented May 25, 2026

Uh oh!

ohdearquant commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ohdearquant commented May 25, 2026

Layer

What

Why

Result

Cache migration

Stack

Uh oh!

ohdearquant commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant