server: add strict prompt cache RAM limit by tarruda · Pull Request #25070 · ggml-org/llama.cpp

tarruda · 2026-06-27T02:13:39Z

Overview

Implement --cache-ram-strict option, which makes --cache-ram a hard cache limit.

When this option is enabled, prompt states that wouldn't fit within the specified limit are skipped. If it fits, older entries are evicted before allocating a new entry.

Additional information

The --cache-ram option works as a soft limit to the cache ram. When enabled (--cache-ram > 0), it will always keep one entry even if exceeds the value passed to --cache-ram. Another problem is that it creates the new entry before evicting old ones, which can temporarily cause a big increase in the used memory.

When the user has a significant amount of free memory, the current behavior is fine. For users like me that run models that use most of the RAM capacity of the device, it would be useful to have more control over the maximum amount of RAM used and prevent unnecessary swapping which can wear SSDs.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: Minor codex assistance to get familiar with the relevants part of the code.

Implement `--cache-ram-strict` option, which makes `--cache-ram` a hard cache limit. When this option is enabled, prompt states that wouldn't fit within the specified limit are skipped. If it fits, older entries are evicted before allocating a new entry.

tarruda requested review from a team as code owners June 27, 2026 02:13

github-actions Bot added the server label Jun 27, 2026

tarruda force-pushed the cache-ram-strict branch from 7702d13 to 4782510 Compare June 28, 2026 10:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server: add strict prompt cache RAM limit#25070

server: add strict prompt cache RAM limit#25070
tarruda wants to merge 1 commit into
ggml-org:masterfrom
tarruda:cache-ram-strict

tarruda commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

tarruda commented Jun 27, 2026

Overview

Additional information

Requirements

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant