Skip to content

feat(embed-test): extend HF parity to MiniLM-L6-v2 + paraphrase-multilingual-MiniLM-L12-v2 (closes khive ship gate)#115

Closed
ohdearquant wants to merge 2 commits into
pr-embedperf-10-hf-parity-gatefrom
pr-embedperf-11-minilm-paraphrase-parity
Closed

feat(embed-test): extend HF parity to MiniLM-L6-v2 + paraphrase-multilingual-MiniLM-L12-v2 (closes khive ship gate)#115
ohdearquant wants to merge 2 commits into
pr-embedperf-10-hf-parity-gatefrom
pr-embedperf-11-minilm-paraphrase-parity

Conversation

@ohdearquant
Copy link
Copy Markdown
Owner

Layer

L3 — ship gate (PR11 of 11)

What

Extends the HF parity regression gate (PR10) to the two embedding models khive uses in production:

  • sentence-transformers/all-MiniLM-L6-v2 — 384-dim, mean pool + L2, no prompt prefix, WordPiece (BERT-base-uncased style)
  • sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 — 384-dim, mean pool + L2, no prompt prefix, SentencePiece (XLM-R style)

Why

These are the actual ship-criterion models for khive launch. Without parity vs HF on these specifically, khive's recall pipeline can't validate retrieval quality against any reference framework.

Parity result (with full stack applied)

Model Min cosine Max abs diff Verdict
all-MiniLM-L6-v2 0.999899 3.50e-3 PASS
paraphrase-multilingual-MiniLM-L12-v2 0.999875 2.45e-3 PASS

Both clear the COS_SIM_MIN_F32 = 0.9990 ship gate.

Note on tokenizer config resolution

~/.lattice/models/all-minilm-l6-v2/ contains weights but no tokenizer.json or config.json. The Python generator (gen_embed_parity_goldens.py) was extended with a find_hf_cache_snapshot() helper to fall back to ~/.cache/huggingface/hub/models--<owner>--<name>/snapshots/<hash>/ for full tokenizer config. Runtime lattice service still loads weights from ~/.lattice/models/ unchanged — this only affects golden generation.

After this PR merges, lattice can ship v0.2.5

Stack

Base: #114 (PR10 HF parity gate)
Umbrella: #104

🤖 Generated with Claude Code

ohdearquant and others added 2 commits May 25, 2026 16:21
…se-multilingual-MiniLM-L12-v2

Add two new golden fixture generators and committed fixtures for
sentence-transformers/all-MiniLM-L6-v2 (WordPiece, mean pool, no prefix)
and sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 (SentencePiece,
mean pool, no prefix).  Generator uses HF cache snapshot for full tokenizer
config; weight path resolution matches the existing E5/BGE pattern.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…t (closes khive ship gate)

Add all_minilm_l6_v2_parity_vs_hf and paraphrase_multilingual_minilm_l12_v2_parity_vs_hf
test functions following the BGE/E5 pattern.  Both use plain embed() (no prompt prefix),
masked mean pooling, and COS_SIM_MIN_F32 (0.9990) tolerance.

Results on this machine:
  all-MiniLM-L6-v2:              5/5 PASS, min cosine 0.999899
  paraphrase-multilingual-L12:   5/5 PASS, min cosine 0.999875

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ohdearquant
Copy link
Copy Markdown
Owner Author

Subsumed by #104 merge (umbrella PR brought all 11 PRs' content to main in one merge commit after stacked-PR base branches collapsed). Codex round-1 findings tracked in #116. Closing as superseded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant