Skip to content

v0.12.0

Choose a tag to compare

@github-actions github-actions released this 11 Apr 06:13
· 295 commits to main since this release
97a422c

What's Changed

  • docs: clarify model.gguf placeholder in all README examples by @unamedkr in #19
  • feat(wasm): Qwen3/Llama model selector + real-time streaming by @unamedkr in #20
  • Fix GGUF BPE merge parsing — Qwen3/Llama3 garbage output by @unamedkr in #21
  • perf: sort vocab before merge parsing + rebuild WASM with ASYNCIFY by @unamedkr in #22
  • Fix Qwen3 garbage output: RMSNorm +1 for all Qwen-family models by @unamedkr in #23
  • Fix Qwen RMSNorm + switch WASM demo to Qwen3.5 0.8B by @unamedkr in #24
  • perf(wasm): SIMD128 + O3 + LTO for 2-4x faster browser inference by @unamedkr in #25
  • ux(wasm): Thinking... indicator during prompt prefill by @unamedkr in #26
  • perf(wasm): pthreads multi-threading + Service Worker COOP/COEP by @unamedkr in #27
  • perf(wasm): Web Worker + no ASYNCIFY — maximum inference speed by @unamedkr in #28
  • fix(wasm): OOM on low-memory devices by @unamedkr in #29
  • fix(wasm): eliminate UI hang during prompt prefill by @unamedkr in #30
  • ux(wasm): polished demo — progress bar, mobile, two-phase UI by @unamedkr in #31
  • fix(wasm): drop pthreads — fixes UI hang by @unamedkr in #32
  • fix(wasm): remove prefill sleep — restores token streaming by @unamedkr in #33
  • fix(wasm): ccall({async:true}) — fixes ASYNCIFY streaming by @unamedkr in #34
  • ux(wasm): clarify prefill wait + confirm streaming works by @unamedkr in #35
  • feat(wasm): Llama 3.2 1B Instruct + skip Q4 reconversion by @unamedkr in #36
  • feat(wasm): SmolLM2-135M fast default + Llama 1B quality option by @unamedkr in #37
  • docs: address 'why not just use llama.cpp?' feedback by @unamedkr in #38
  • docs(guide): 'When to use which?' table + C code in CTA by @unamedkr in #39
  • i18n: complete EN/KO coverage for guide page by @unamedkr in #40
  • fix(guide): Korean typo 겄건이 → 경계 by @unamedkr in #41
  • feat(cli): ollama-parity — tq pull/list/run/serve by @unamedkr in #42
  • feat(pypi): quantcpp CLI ollama-parity (pull/list/run/serve) by @unamedkr in #43
  • chore: sync version fallback to 0.12.0 by @unamedkr in #44

Full Changelog: v0.8.0...v0.12.0