Issues · ggml-org/llama.cpp · GitHub

changelog : libllama API
#9289 · ggerganov opened on Sep 3, 2024
12
changelog : llama-server REST API
#9291 · ggerganov opened on Sep 3, 2024
19
tutorials : list for llama.cpp
#13523 · ggerganov opened on May 14, 2025
22

Labels Milestones

Feature Request: DSpark confidence-scheduled verification & semi-autoregressive drafting

#25096

· gangula-karthik opened

on Jun 28, 2026

Eval bug: ggml_cuda_compute_forward: SOFT_MAX failed 0.11.348.725 E CUDA error: invalid argument

bug-unconfirmed

#25095

· freemanliu opened

on Jun 28, 2026

Misc. bug: webui window for editing system message is limited to only 2 lines

bug-unconfirmed

#25094

· kke12 opened

on Jun 28, 2026

Eval bug: Qwen3-VL image embedding doesn't work

bug-unconfirmed

#25088

· lilydjwg opened

on Jun 28, 2026

Feature Request: Add a docker target for the llama app

#25083

· kannon92 opened

on Jun 27, 2026

HIP/ROCm: system RAM grows unbounded with parallel slots due to CUDA graph cache never being evicted

#25082

· lukascechovic opened

on Jun 27, 2026

Eval bug: Gemma 4 tool calling fails with "The model produced output that does not match the expected peg-gemma4 format"

bug-unconfirmed

#25072

· DanielBMann9000 opened

on Jun 27, 2026

Eval bug: Premature "reasoning-budget: deactivated (natural end)", even BEFORE prompt processing

bug-unconfirmed

#25067

· ross-rosario opened

on Jun 26, 2026

Feature Request: Router mode - Do not reload model files if they are already loaded.

#25066

· mcr-ksh opened

on Jun 26, 2026

50% loss in Prefill Speed on AMD MI50, Build 9820

#25062

· DEV-DUFORD opened

on Jun 26, 2026

Eval bug: CUDA error: unsupported value or parameter in cublasSgemm_v2 during large context processing

bug-unconfirmed

#25061

· rbenrax opened

on Jun 26, 2026

Misc. bug: Blackwell GGML-CUDA SOFT_MAX Crash

bug-unconfirmed

#25060

· giveen opened

on Jun 26, 2026