Skip to content
#

llama-cpp

Here are 255 public repositories matching this topic...

Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit keys & 4-bit values, reducing memory by 59% with <1% quality loss. Includes benchmarking, visualization, and one-command setup. Optimized for M1/M2/M3 Macs with Metal support.

  • Updated May 21, 2025
  • Python
ToolNeuron

Complete offline AI ecosystem for Android: Chat (GGUF/LLMs), Images (Stable Diffusion 1.5), Voice (TTS/STT), and Knowledge (RAG Data-Packs). Or access 100+ cloud models via OpenRouter. Extensible plugins, zero subscriptions, no data harvesting. Open-source privacy-first AI on your terms.

  • Updated Jan 17, 2026
  • Kotlin

Improve this page

Add a description, image, and links to the llama-cpp topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llama-cpp topic, visit your repo's landing page and select "manage topics."

Learn more