test(embed): end-to-end tokenizer parity through embedding service (P0-E1) by ohdearquant · Pull Request #111 · ohdearquant/lattice

ohdearquant · 2026-05-25T20:20:08Z

Layer

L2 — test addition (PR7 of 11)

What

Adds crates/embed/tests/tokenizer_parity_e2e.rs with 3 tests (BGE/WordPiece, E5/SentencePiece, Qwen/BPE). Each test calls load_tokenizer() directly — the same code path NativeEmbeddingService uses — with the same model configs, asserting token IDs match HF reference values.

Why

Confirms the tokenizer fixes from PR2-PR6 flow through the embed service without model weights being required. Tests skip with explicit message when tokenizer JSON is absent; no stubs.

Result

All 3 e2e tests pass on this stack
No model weights required to exercise (CI-friendly)

Stack

Base: #110 (PR6 SP trailing-ws)
Umbrella: #104

🤖 Generated with Claude Code

…0-E1) Confirms the tokenizer fixes from impl-tokenizer-fixes (SentencePiece BOS/EOS, Qwen BPE EOS, AddedToken longest-match) flow through the load_tokenizer path that NativeEmbeddingService uses. Three tests cover BGE/WordPiece, E5/SentencePiece, and Qwen/BPE at the token-ID level so no model weights are needed; tests skip when the HF snapshot is absent. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

ohdearquant · 2026-05-25T20:49:21Z

Subsumed by #104 merge (umbrella PR brought all 11 PRs' content to main in one merge commit after stacked-PR base branches collapsed). Codex round-1 findings tracked in #116. Closing as superseded.

This was referenced May 25, 2026

feat(embed): role-aware prompts + cache key distinguishment (P0-E2) #112

Closed

embed-perf-quality show (umbrella draft — slice into ordered PRs below) #104

Merged

embed-perf-quality codex review follow-ups (PR #105-#115) #116

Open

ohdearquant closed this May 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(embed): end-to-end tokenizer parity through embedding service (P0-E1)#111

test(embed): end-to-end tokenizer parity through embedding service (P0-E1)#111
ohdearquant wants to merge 1 commit into
pr-embedperf-06-sp-trailing-wsfrom
pr-embedperf-07-tokenizer-e2e-test

ohdearquant commented May 25, 2026

Uh oh!

ohdearquant commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ohdearquant commented May 25, 2026

Layer

What

Why

Result

Stack

Uh oh!

ohdearquant commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant