Skip to content

docs(claude-md): differential-test-first + bump-and-yank recipe#99

Merged
ohdearquant merged 1 commit into
mainfrom
docs/claude-md-diff-test-first
May 25, 2026
Merged

docs(claude-md): differential-test-first + bump-and-yank recipe#99
ohdearquant merged 1 commit into
mainfrom
docs/claude-md-diff-test-first

Conversation

@ohdearquant
Copy link
Copy Markdown
Owner

Summary

Promotes two development directions from the v0.2.3 release session into project guidance.

1. Differential Test First

When lattice diverges from MLX / HF transformers / llama.cpp, write a 20-line Python script comparing the same primitive in both frameworks before reading lattice code or spawning investigation agents. The 0.77 PPL gap on Qwen3.5-0.8B (WikiText-2) was misdiagnosed as "FP precision drift" for days; the actual cause was a RoPE pairing convention bug (interleaved vs stride-half), identified in 5 seconds by a script comparing MLX nn.RoPE(traditional=False) against both candidates.

Includes:

  • Template Python script skeleton
  • Quantitative literature bounds for cheaply rejecting precision-drift hypotheses (f16-vs-f32 ~0.00x, bf16-vs-f32 <0.05, Q4 0.1-0.3)
  • "Be skeptical of comments that paraphrase config fields" — the lattice RoPE comment was structurally misleading for months

2. Bump-and-yank recipe

crates.io versions are immutable. When a shipped release has a correctness bug, the right pattern is bump + ship fix + yank broken. Done in v0.2.3 (yanked 0.2.2 across all 5 crates). Adds the explicit recipe to the Publishing section.

Also corrected stale version = "0.1.0" pin in the Publishing section to match current convention (path deps bump in lockstep with workspace version).

Test plan

  • make ci doc lint passed (pre-commit hook)
  • No code changes — pure docs

🤖 Generated with Claude Code

Two development directions promoted from the v0.2.3 session:

1. **Differential Test First** — when lattice diverges from MLX/HF/llama.cpp,
   write a 20-line Python script comparing the same primitive across both
   frameworks BEFORE reading lattice code or spawning agents. This closed a
   0.77 PPL gap (Qwen3.5-0.8B, WikiText-2) in 5 seconds that had been
   misdiagnosed as "FP precision drift" for days. The actual bug was RoPE
   pairing convention (interleaved vs stride-half). Also: quantitative
   literature bounds cheaply reject hypotheses — f16-vs-f32 PPL <0.01,
   bf16-vs-f32 <0.05; gaps above those bounds are structural, not numerical.
   Also: be skeptical of comments that paraphrase config fields without
   explaining what the field actually controls in the reference impl.

2. **Bump-and-yank recovery** — crates.io is immutable. When a published
   release has a correctness bug, bump to next patch + ship the fix + yank
   the broken version. Done in v0.2.3 (yanked 0.2.2 which shipped with the
   RoPE bug). Plus: corrected stale "version = 0.1.0" pin in the publish
   section to match current workspace-version convention.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ohdearquant ohdearquant merged commit bd9dfa8 into main May 25, 2026
3 checks passed
@ohdearquant ohdearquant deleted the docs/claude-md-diff-test-first branch May 25, 2026 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant