Skip to content

release: v0.2.2 — MLX-parity quality + structured output + LoRA lifecycle#97

Merged
ohdearquant merged 1 commit into
mainfrom
release/v0.2.2
May 25, 2026
Merged

release: v0.2.2 — MLX-parity quality + structured output + LoRA lifecycle#97
ohdearquant merged 1 commit into
mainfrom
release/v0.2.2

Conversation

@ohdearquant
Copy link
Copy Markdown
Owner

Summary

Release notes for v0.2.2 — 58 commits, 29,524 insertions, 1,012 deletions since v0.2.1.

Workspace version is already at 0.2.2 (bumped during the release cycle), so this PR adds only the release-notes file.

Major themes

  1. MLX-parity quality on Qwen3.5-0.8B: PPL gap 0.77 → 0.029 (RoPE stride-half fix fix(inference): RoPE stride-half pairing — WikiText-2 PPL gap 0.77 → 0.029 vs MLX #96, FP16 lm_head, asymmetric Q4)
  2. Structured output: XGrammar engine (ADR-046)
  3. Serving: continuous batching with chunked prefill (ADR-048), prefix-shared paged KV cache (ADR-047)
  4. Multimodal: vision encoder (ADR-049)
  5. MoE: Metal dispatch with expert coalescing (ADR-053)
  6. Speculative decoding: GDN self-spec draft, MTP on QuaRot Q4, probabilistic rejection sampling (ADR-050)
  7. Fine-tuning: LoRA full-lifecycle (ADR-057 D1-D5, feat: implement ADR-057 LoRA full-lifecycle consumer API (D1-D5) #65), Adam/AdamW + LoRA gradients (feat(tune): Adam/AdamW optimizer + LoRA gradient computation #90)
  8. Performance: CPU/embed SIMD (perf(embed): SIMD optimizations + correctness hardening #79, perf(inference): CPU SIMD optimizations + parity regression tests #80), Metal zero-copy decode + MLP fusion (fix(inference): Metal GPU build repair + zero-copy decode + MLP fusion #81), CI perf-regression tracking (ADR-058, feat(ci): ADR-058 CPU perf regression tracking + bench tooling #83)

Post-merge steps

git checkout main && git pull
git tag -a v0.2.2 -m "v0.2.2"
git push origin v0.2.2
gh release create v0.2.2 --title "v0.2.2 — MLX-parity quality" --notes-file docs/releases/v0.2.2.md
# Optional: make publish  (crates.io)

Test plan

  • cargo check --workspace --features "f16 metal-gpu" clean
  • Release notes match commit history (verified via git log v0.2.1..HEAD)

🤖 Generated with Claude Code

58 commits, 29K LOC since v0.2.1. Highlights:
- Quality parity with MLX (PPL gap 0.77 → 0.029)
- XGrammar structured output, continuous batching, paged KV cache
- LoRA full-lifecycle (ADR-057), Adam/AdamW optimizer
- CI perf regression tracking (ADR-058)
- MoE Metal dispatch, vision encoder, MTP self-spec decoding

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ohdearquant ohdearquant merged commit e77de0a into main May 25, 2026
3 checks passed
@ohdearquant ohdearquant deleted the release/v0.2.2 branch May 25, 2026 03:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant