Skip to content

audit(embed-perf-quality): baseline + tokenizer parity tests#105

Merged
ohdearquant merged 1 commit into
mainfrom
pr-embedperf-01-audit-baseline
May 25, 2026
Merged

audit(embed-perf-quality): baseline + tokenizer parity tests#105
ohdearquant merged 1 commit into
mainfrom
pr-embedperf-01-audit-baseline

Conversation

@ohdearquant
Copy link
Copy Markdown
Owner

Layer

L0 — baseline + testing infrastructure (PR1 of 11 in embed-perf-quality show)

What

  • Updates crates/embed/BASELINE.toml with measured tokenizer parity gap (8/28 pre-fixes), cold-start latency notes, SIMD parity status
  • Adds crates/inference/tests/audit_tokenizer_parity.rs with 28 audit cases across BGE (WordPiece), Qwen (BPE), E5 (SentencePiece)

Why

The audit baseline + failing test cases gate every subsequent fix. PR2-PR6 close the audit gap one tokenizer at a time.

State

  • Tests intentionally fail 0/9 (E5 SP) + 0/10 (Qwen BPE) + 8/9 (BGE WP) at this PR — they're the regression gate. Subsequent fix PRs close each.

Stack

This PR is the foundation of the show. Umbrella: #104.

🤖 Generated with Claude Code

Fill BASELINE.toml pending_measurement fields with honest values:
- cold_start_latency / cache_hit_latency: n/a (model files not in CI worktree)
- simd_parity_avx2_neon_scalar: pass_neon_19_of_19 (AVX2/AVX-512 not on host)
- mrl_wiring_status: wired_qwen3_only (was stale "not_wired")

Add crates/inference/tests/audit_tokenizer_parity.rs covering 28 cases across BGE
WordPiece (9), Qwen BPE (10), and E5 SentencePiece (9). 3/3 test functions FAIL
intentionally — these failures are the Phase C priority list:

  E5  0/9  — all missing <s>/</s> framing
  Qwen 0/10 — all missing </s>, +1 URL regex divergence
  BGE  8/9  — single AddedToken split bug on [CLS]/[SEP]

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant