fix: avoid cleanup errors for partially initialized LlamaModel (abetlen#2173)

usernames122 · abetlen · web-flow · commit fdf38b3e4c41 · 2026-05-31T04:39:18.000-07:00
* Add attribute check for sampler in close method

This solves a bug I uncovered, that causes an AttributeError if constantly re-initializing a model in a loop and Python garbage collects it, such as testing the highest GPU layer count you can go before CUDA OOMs.

* fix: avoid cleanup errors for partial model init

---------

Co-authored-by: abetlen &lt;abetlen@gmail.com&gt;
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+- fix: avoid cleanup errors for partially initialized `LlamaModel` objects by @usernames122 in #2173
 - fix: suppress stdout and stderr in Jupyter notebooks by @Anai-Guo in #2181
 - feat: enable arm64 musl builds by @acon96 in #2221
 - feat: Update llama.cpp to ggml-org/llama.cpp@d749821db
diff --git a/llama_cpp/_internals.py b/llama_cpp/_internals.py
@@ -44,6 +44,9 @@ def __init__(
         self.params = params
         self.verbose = verbose
         self._exit_stack = ExitStack()
+        # LlamaModel does not use samplers, but close() can run after partial init.
+        self.sampler = None
+        self.custom_samplers = []
 
         model = None
 
@@ -65,7 +68,6 @@ def __init__(
 
         self.model = model
         self.vocab = vocab
-        self.sampler = None  # LlamaModel doesn't use samplers, but some cleanup code expects this attribute
 
         def free_model():
             if self.model is None: