diff --git a/CLAUDE.md b/CLAUDE.md index 0a51d05e..2c0edc65 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -30,6 +30,30 @@ cargo bench -p lattice-inference --bench elementwise_cpu_bench # inference Quick mode (`--quick`) is sufficient for direction + magnitude. Full mode only when you need tight CIs for a PR description or ADR evidence. +### Differential Test First (Cross-Framework Bugs) + +When lattice produces different output than a reference framework (MLX, HF transformers, llama.cpp), write a self-contained Python script that runs the same primitive in both frameworks and compares max-diff **before** reading lattice code or spawning investigation agents. A 20-line script gives a definitive answer in seconds; code-reading and agent analysis take hours and can converge on wrong conclusions. + +```python +# Template: /tmp/test__conv.py +import numpy as np, mlx.core as mx, mlx.nn as nn +# 1. Construct minimal input +# 2. Run via MLX (reference) +# 3. Run via each candidate lattice convention (as numpy) +# 4. Compare: which candidate has max-diff < 1e-4? +``` + +This process closed a 0.77 PPL gap on Qwen3.5-0.8B that had been misdiagnosed as "f32-vs-bf16 precision drift" for days. The actual bug was a RoPE pairing convention mismatch — interleaved `(2i, 2i+1)` vs stride-half `(i, half+i)`. Verified in 5 seconds: stride-half max-diff `8e-6`, interleaved `67.5`. PPL dropped from 16.62 → 15.89 (MLX gold 15.86). + +**Quantitative bounds reject hypotheses cheaply.** Before chasing "FP precision drift" or other plausible-sounding causes, check the literature for typical magnitude: +- f16 vs f32 PPL delta: `~0.00x` (llama.cpp community) +- bf16 vs f32 PPL delta: `<0.05` (arxiv:2510.26788) +- Q4 quantization PPL delta: `0.1-0.3` (llama.cpp #406) + +If the gap you're investigating exceeds these bounds, the cause is structural (algorithm, layout, convention), not numerical. Reject the precision hypothesis on quantitative grounds and look for a real bug. + +**Be skeptical of comments that paraphrase config fields.** A comment that says "X uses field=true" without explaining what the field actually controls in the reference implementation is a footgun. The lattice RoPE comment said "Qwen3.5 uses mrope_interleaved=true" — technically matched config, but `mrope_interleaved` controls multimodal M-RoPE section interleaving (video/image tokens), not 1-D text RoPE pairing. The bug existed for months because nobody verified the comment against HF's `rotate_half` or MLX's `nn.RoPE`. + ### Regression Gate (ADR-058) PRs touching CPU kernel paths trigger `bench-regression.yml` in CI. It runs on both `x86_64-linux` (AVX2) and `aarch64-linux` (NEON) against baselines stored on the orphan `perf-baselines` branch. @@ -80,4 +104,14 @@ Changes to `inference` affect `embed` and `tune`. Changes to `fann` affect `tune ## Publishing -Leaf crates publish first: inference → fann → transport → (wait 30s) → embed → (wait 30s) → tune. Use `make publish`. Internal path deps must have `version = "0.1.0"`. +Leaf crates publish first: inference → fann → transport → (wait 30s) → embed → (wait 30s) → tune. Use `make publish`. Internal path deps' `version = ` field must match the current workspace version (bump them in lockstep when bumping `[workspace.package].version`). + +**Shipped-bug recovery (bump-and-yank).** crates.io versions are immutable. When a published release has a correctness bug: + +1. Bump workspace + path-dep versions to the next patch +2. Update release notes file (rename if needed); add a "Note on v" section explaining the yank +3. Tag + GH release + `make publish` +4. `for c in lattice-inference lattice-fann lattice-transport lattice-embed lattice-tune; do cargo yank --version "$c"; done` +5. Verify: `curl -s https://crates.io/api/v1/crates/` should show `latest_unyanked=`, `yanked=[]` + +Done in v0.2.3 (yanked broken 0.2.2 which shipped with the RoPE bug). New `cargo add` users get the fix; existing pinned users get a yank warning on next `cargo update`.