Skip to content

feat(embed): role-aware prompts + cache key distinguishment (P0-E2)#112

Closed
ohdearquant wants to merge 1 commit into
pr-embedperf-07-tokenizer-e2e-testfrom
pr-embedperf-08-role-aware-prompts
Closed

feat(embed): role-aware prompts + cache key distinguishment (P0-E2)#112
ohdearquant wants to merge 1 commit into
pr-embedperf-07-tokenizer-e2e-testfrom
pr-embedperf-08-role-aware-prompts

Conversation

@ohdearquant
Copy link
Copy Markdown
Owner

Layer

L2 — embed service public API (PR8 of 11)

What

  • document_instruction() on EmbeddingModel: E5 multilingual variants return Some("passage: "). BGE/MiniLM return None. Qwen returns None on document side (raw passage per model card).
  • New EmbeddingRole enum (Query | Passage | Generic) with distinct cache_tag() strings injected into Blake3 hash in cache.rs::compute_key().
  • New EmbeddingService trait methods: embed_query(), embed_passage(). Apply query_instruction() / document_instruction() prefix then call embed().
  • CachedEmbeddingService overrides both methods to call internal embed_with_role() with the appropriate EmbeddingRole, so cache keys are role-distinct.
  • embed() uses EmbeddingRole::Generic for backwards compatibility.

Why

Two inseparable issues: (a) E5/Qwen need their canonical prompt prefixes applied for production-quality embeddings, (b) without role-aware cache keys, embed_query("hello") and embed_passage("hello") collide in cache.

Result

  • 3 cache key tests (cache.rs): query vs passage vs generic produce distinct keys, deterministic, isolated
  • 7 service tests (service/tests.rs): E5 document instruction, BGE/MiniLM no document instruction, Qwen no document instruction, apply_prefix, role cache tags distinct

Cache migration

Existing on-disk caches written with role:generic will be separate from new role:query / role:passage keys. Old caches remain readable under Generic. No migration needed; callers moving from embed() to embed_passage() will see misses until repopulation.

Stack

Base: #111 (PR7 tokenizer e2e test)
Umbrella: #104

🤖 Generated with Claude Code

Fix document_instruction() to return "passage: " for E5 multilingual
variants (was unconditionally None). Add EmbeddingRole enum (Query,
Passage, Generic) and embed_query/embed_passage trait methods that apply
model-specific prompt prefixes before forwarding. Extend CacheKey hash
inputs with role.cache_tag() so query and passage embeddings of the
same raw text are stored as separate cache entries. CachedEmbeddingService
overrides both role-aware methods with prompt-application + role-keyed
cache logic. Existing embed() uses Generic role for backwards compat.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ohdearquant
Copy link
Copy Markdown
Owner Author

Subsumed by #104 merge (umbrella PR brought all 11 PRs' content to main in one merge commit after stacked-PR base branches collapsed). Codex round-1 findings tracked in #116. Closing as superseded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant