Skip to content

fix: handle Ollama llama-runner crashes with structured error categor…#11

Merged
Limbicnation merged 1 commit into
mainfrom
fix/ollama-runner-crash-handling
May 8, 2026
Merged

fix: handle Ollama llama-runner crashes with structured error categor…#11
Limbicnation merged 1 commit into
mainfrom
fix/ollama-runner-crash-handling

Conversation

@Limbicnation
Copy link
Copy Markdown
Owner

@Limbicnation Limbicnation commented May 8, 2026

…ization

Replace the catch-all RuntimeError wrapper in OllamaClient.generate_streaming with a StreamResult dataclass that categorises failures (ok, timeout, transient, model_crash, server_error, unavailable). Llama-runner crashes (HTTP 500 with "runner terminated" / "exit status" / "load failed") are detected and surfaced as user-actionable messages in the ComfyUI prompt output instead of bubbling up as Python stacktraces.

PromptGenerator, NegativePrompt, and PromptRefiner nodes now branch on result.kind: model_crash / server_error / unavailable surface the message directly (subprocess fallback would also fail), while timeout / transient fall through to the existing subprocess fallback path.

Adds 6 unit tests covering each error class plus the success path. Updates the existing PromptRefiner seed tests to return StreamResult from mocks.

Summary by CodeRabbit

  • Bug Fixes

    • Improved error detection and classification for streaming generation failures, distinguishing between model crashes, server errors, timeouts, and transient issues.
    • Added intelligent fallback mechanisms when streaming fails to ensure generation attempts continue via alternative paths.
    • Enhanced error messaging to provide clearer, actionable information for different failure scenarios.
  • Refactor

    • Restructured streaming result handling to return structured responses with error categorization instead of generic null values.

…ization

Replace the catch-all RuntimeError wrapper in OllamaClient.generate_streaming
with a StreamResult dataclass that categorises failures (ok, timeout, transient,
model_crash, server_error, unavailable). Llama-runner crashes (HTTP 500 with
"runner terminated" / "exit status" / "load failed") are detected and surfaced
as user-actionable messages in the ComfyUI prompt output instead of bubbling
up as Python stacktraces.

PromptGenerator, NegativePrompt, and PromptRefiner nodes now branch on
result.kind: model_crash / server_error / unavailable surface the message
directly (subprocess fallback would also fail), while timeout / transient
fall through to the existing subprocess fallback path.

Adds 6 unit tests covering each error class plus the success path. Updates
the existing PromptRefiner seed tests to return StreamResult from mocks.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR refactors Ollama streaming to return a structured StreamResult dataclass instead of str | None, enabling precise error categorization. The adapter detects model crashes (500 with "runner terminated"), distinguishes server errors from transients, and consumers branch on result kind to choose appropriate fallback paths.

Changes

Ollama Streaming Error Categorization

Layer / File(s) Summary
Data Contract
nodes/adapters/ollama_client.py
StreamResult dataclass introduced with text, kind, and message fields to represent streaming outcomes; optional imports added for exception type detection.
Core Streaming Implementation
nodes/adapters/ollama_client.py
generate_streaming() refactored to return StreamResult for all paths: unavailable API, total/per-chunk timeouts, exceptions (classified by status code and message), empty yields, and success. Exception classification logic added to distinguish model crashes from generic server errors.
Exception Classification
nodes/adapters/ollama_client.py
_classify_streaming_exception() method detects llama-runner crashes via status code 500 and "runner terminated" message, maps them to model_crash with actionable user message.
Consumer Integration
nodes/negative_prompt_node.py, nodes/prompt_generator_node.py, nodes/prompt_refiner_node.py
Each node updated to capture streaming result and branch on result.kind: use result.text on success, immediately return error message for fatal kinds (model_crash/server_error/unavailable), fall back to subprocess for transient/timeout.
Test Coverage
tests/unit/test_ollama_client.py, tests/unit/test_prompt_refiner.py
New TestOllamaClientStreamingErrors class validates error categorization (model_crash, server_error, transient, unavailable). Existing streaming tests and prompt refiner mocks updated to return StreamResult instances.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

A rabbit hops through error paths so bright,
Streaming results now carry the light,
Crashes crash cleanly, timeouts timeout well,
Each fault type whispers its own tale to tell! 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 64.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly addresses the main change: introducing structured error categorization to handle Ollama llama-runner crashes. It aligns with the PR's primary objective of replacing catch-all errors with categorized StreamResult outcomes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/ollama-runner-crash-handling

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kilo-code-bot
Copy link
Copy Markdown

kilo-code-bot Bot commented May 8, 2026

Code Review Roast 🔥

Verdict: No Issues Found | Recommendation: Merge

Oh wait, this PR is actually clean. I had my flamethrower warmed up and everything — ready to roast some catch-all RuntimeError handlers, and then... nothing. The StreamResult dataclass is elegantly designed with comprehensive error categorization (model_crash detection with actionable recovery steps, server_error, transient, timeout, unavailable — a full spectrum). The node updates properly distinguish which errors warrant subprocess fallback and which deserve immediate surfacing. Test coverage spans all 6 error paths plus success. I'm genuinely impressed.

📊 Overall: Like finding a unicorn in production — I didn't think clean PRs existed anymore, but here we are.

Files Reviewed (6 files)
  • nodes/adapters/ollama_client.py - Core implementation with StreamResult and error classification
  • nodes/negative_prompt_node.py - Updated to handle StreamResult
  • nodes/prompt_generator_node.py - Updated to handle StreamResult
  • nodes/prompt_refiner_node.py - Updated to handle StreamResult
  • tests/unit/test_ollama_client.py - New tests for error paths
  • tests/unit/test_prompt_refiner.py - Updated mocks for StreamResult

Reviewed by minimax-m2.5-20260211 · 521,093 tokens

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a structured StreamResult dataclass to improve error handling and reporting in the OllamaClient streaming generation. It adds an exception classification system to identify specific failure modes—such as model crashes, server errors, and timeouts—providing more actionable feedback. The NegativePrompt, PromptGenerator, and PromptRefiner nodes were updated to utilize these structured results for better fallback logic. Feedback suggests generalizing the troubleshooting instructions in error messages to be platform-agnostic, as they currently include Linux-specific commands.

Comment on lines +319 to +326
friendly = (
f"Ollama llama runner crashed loading '{model}'. "
"This usually means the model file is corrupt, incompatible, "
"or out of VRAM. Try: (1) restart Ollama "
"(`sudo systemctl restart ollama`), "
"(2) switch to a known-good model like 'qwen3:8b', "
"(3) re-pull or rebuild the model."
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The troubleshooting message includes a Linux-specific command (sudo systemctl restart ollama). Since Ollama and ComfyUI are cross-platform (Windows, macOS, Linux), it's better to provide a more generic instruction or mention that the command is for Linux users to avoid confusion on other platforms.

            friendly = (
                f"Ollama llama runner crashed loading '{model}'. "
                "This usually means the model file is corrupt, incompatible, "
                "or out of VRAM. Try: (1) restart the Ollama service or application, "
                "(2) switch to a known-good model like 'qwen3:8b', "
                "(3) re-pull or rebuild the model."
            )

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
nodes/adapters/ollama_client.py (1)

40-55: ⚡ Quick win

kind: str should use Literal to lock down the valid discriminant values.

The plain str annotation allows callers to construct StreamResult(kind="typo") or compare against an invalid string without any static-analysis warning. Every consumer already branches exhaustively on the six documented values, so narrowing the type costs nothing and surfaces mistakes at type-check time.

♻️ Proposed change
-from typing import Any, ClassVar
+from typing import Any, ClassVar, Literal

 `@dataclass`
 class StreamResult:
     text: str | None = None
-    kind: str = "ok"
-    message: str = field(default="")
+    kind: Literal["ok", "timeout", "transient", "model_crash", "server_error", "unavailable"] = "ok"
+    message: str = ""

(message: str = "" is also the idiomatic form — field(default=...) is only needed for mutable defaults.)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nodes/adapters/ollama_client.py` around lines 40 - 55, The StreamResult
dataclass uses a plain str for the discriminant `kind` and uses
field(default=...) for `message`; change `kind` to a Literal union of the six
allowed values (e.g.
Literal["ok","timeout","transient","model_crash","server_error","unavailable"])
so static checkers catch invalid values, import Literal from typing, and
simplify `message` to the idiomatic default `message: str = ""`; update any type
hints/usages of StreamResult as needed to match the narrowed Literal type.
tests/unit/test_ollama_client.py (1)

232-408: ⚡ Quick win

Missing test for the "empty-content" transient branch.

When ollama.generate() streams successfully but yields zero text content, generate_streaming returns StreamResult(kind="transient", message="Generation returned no content.") (lines 289–293 in ollama_client.py). None of the six new tests exercise this path. A minimal addition:

✅ Suggested test
def test_no_content_returns_transient(self):
    """Streaming that yields no text should be classified as transient."""

    def _empty_iter():
        yield {"response": ""}   # chunk present but empty text

    mock_ollama = MagicMock()
    mock_ollama.generate.return_value = _empty_iter()

    restore = self._patch_module(mock_ollama, response_error_cls=_FakeResponseError)
    try:
        client = OllamaClient()
        result = client.generate_streaming(
            model="qwen3:8b",
            prompt="x",
            temperature=0.7,
            top_p=0.9,
            timeout=30,
        )
    finally:
        restore()

    assert result.kind == "transient"
    assert result.text is None
    assert "no content" in result.message
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/test_ollama_client.py` around lines 232 - 408, Add a new unit test
method (e.g. test_no_content_returns_transient) to
TestOllamaClientStreamingErrors that uses _patch_module to inject a mock_ollama
whose generate() returns an iterator yielding a single chunk {"response": ""}
(empty string), then call OllamaClient().generate_streaming(...) and assert the
returned StreamResult has kind == "transient", text is None, and the message
contains "no content"; place the test alongside the other tests and reuse the
existing restore pattern and _FakeResponseError fixture used by the other cases.
nodes/prompt_generator_node.py (1)

504-524: 💤 Low value

Consider unifying the two if blocks into if/elif/else for consistency with the other nodes.

The current structure (two separate if statements, line 504 and line 520) is functionally correct because the first block always returns, but it differs from the if/elif/else pattern used in both negative_prompt_node.py and prompt_refiner_node.py. Aligning the structure makes the intent immediately obvious and prevents confusion if code is inserted between the two blocks in a future edit.

♻️ Proposed refactor
-            if result.kind == "ok" and result.text is not None:
-                output = result.text.strip()
+            if result.kind == "ok" and result.text is not None:
+                output = result.text.strip()
                 if not include_reasoning:
                     output = extract_final_prompt(output)
 
                 if pbar is not None:
                     pbar.update_absolute(100)
 
                 if output:
                     print(f"[PromptGenerator] Generated {len(output)} characters")
                     return (output,)
                 else:
                     return ("[PromptGenerator] Generation returned empty result.",)
 
-            # Subprocess fallback would also fail for these classes; surface the
-            # message immediately so the user gets actionable guidance.
-            if result.kind in ("model_crash", "server_error", "unavailable"):
-                print(f"[PromptGenerator] {result.kind}: {result.message}")
-                return (f"[PromptGenerator] {result.message}",)
-
-            print(f"[PromptGenerator] Streaming failed ({result.kind}), falling back to subprocess")
+            elif result.kind in ("model_crash", "server_error", "unavailable"):
+                print(f"[PromptGenerator] {result.kind}: {result.message}")
+                return (f"[PromptGenerator] {result.message}",)
+            else:
+                print(f"[PromptGenerator] Streaming failed ({result.kind}), falling back to subprocess")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nodes/prompt_generator_node.py` around lines 504 - 524, Unify the two
separate condition checks on result into an if/elif/else structure so control
flow matches other nodes: make the first branch check result.kind == "ok"
(handling trimming, include_reasoning via extract_final_prompt,
pbar.update_absolute, printing length and returning output or empty-message),
the second branch be an elif for result.kind in
("model_crash","server_error","unavailable") that prints and returns the
result.message, and the final else should log the streaming-failed fallback;
keep existing behavior and return values for result, and reference the local
symbols result, include_reasoning, extract_final_prompt, pbar, and the printed
messages when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nodes/adapters/ollama_client.py`:
- Around line 318-328: The friendly crash message in the Ollama client hardcodes
"qwen3:8b", which is misleading if that is the model that crashed; update the
message construction in the method that returns StreamResult(kind="model_crash",
...) so it conditionally includes a suggested replacement only when model !=
"qwen3:8b" (e.g., "switch to a known-good model like 'qwen3:8b'") and otherwise
omits the specific model name or uses a generic suggestion like "switch to a
different known-good model"; adjust the test
test_runner_crash_returns_model_crash_kind to remove or relax the assert that
checks for "qwen3:8b" in result.message so it validates the model_crash kind and
presence of recovery guidance instead.

---

Nitpick comments:
In `@nodes/adapters/ollama_client.py`:
- Around line 40-55: The StreamResult dataclass uses a plain str for the
discriminant `kind` and uses field(default=...) for `message`; change `kind` to
a Literal union of the six allowed values (e.g.
Literal["ok","timeout","transient","model_crash","server_error","unavailable"])
so static checkers catch invalid values, import Literal from typing, and
simplify `message` to the idiomatic default `message: str = ""`; update any type
hints/usages of StreamResult as needed to match the narrowed Literal type.

In `@nodes/prompt_generator_node.py`:
- Around line 504-524: Unify the two separate condition checks on result into an
if/elif/else structure so control flow matches other nodes: make the first
branch check result.kind == "ok" (handling trimming, include_reasoning via
extract_final_prompt, pbar.update_absolute, printing length and returning output
or empty-message), the second branch be an elif for result.kind in
("model_crash","server_error","unavailable") that prints and returns the
result.message, and the final else should log the streaming-failed fallback;
keep existing behavior and return values for result, and reference the local
symbols result, include_reasoning, extract_final_prompt, pbar, and the printed
messages when making the change.

In `@tests/unit/test_ollama_client.py`:
- Around line 232-408: Add a new unit test method (e.g.
test_no_content_returns_transient) to TestOllamaClientStreamingErrors that uses
_patch_module to inject a mock_ollama whose generate() returns an iterator
yielding a single chunk {"response": ""} (empty string), then call
OllamaClient().generate_streaming(...) and assert the returned StreamResult has
kind == "transient", text is None, and the message contains "no content"; place
the test alongside the other tests and reuse the existing restore pattern and
_FakeResponseError fixture used by the other cases.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f1909a13-b6c8-45fa-9c49-0b9c5ede0522

📥 Commits

Reviewing files that changed from the base of the PR and between b5e1731 and bf54b54.

📒 Files selected for processing (6)
  • nodes/adapters/ollama_client.py
  • nodes/negative_prompt_node.py
  • nodes/prompt_generator_node.py
  • nodes/prompt_refiner_node.py
  • tests/unit/test_ollama_client.py
  • tests/unit/test_prompt_refiner.py

Comment on lines +318 to +328
if (is_response_error or status is not None) and status == 500 and any(s in msg for s in crash_signals):
friendly = (
f"Ollama llama runner crashed loading '{model}'. "
"This usually means the model file is corrupt, incompatible, "
"or out of VRAM. Try: (1) restart Ollama "
"(`sudo systemctl restart ollama`), "
"(2) switch to a known-good model like 'qwen3:8b', "
"(3) re-pull or rebuild the model."
)
self._log(f"Model crash detected: {exc}")
return StreamResult(kind="model_crash", message=friendly)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Hardcoded qwen3:8b recommendation in crash message is self-defeating when that model is the one crashing.

If model == "qwen3:8b", step (2) of the recovery guidance reads "switch to a known-good model like 'qwen3:8b'" — the exact model that just crashed. Suggest either omitting the specific model name or making it conditional:

🐛 Proposed fix
-            friendly = (
-                f"Ollama llama runner crashed loading '{model}'. "
-                "This usually means the model file is corrupt, incompatible, "
-                "or out of VRAM. Try: (1) restart Ollama "
-                "(`sudo systemctl restart ollama`), "
-                "(2) switch to a known-good model like 'qwen3:8b', "
-                "(3) re-pull or rebuild the model."
-            )
+            alt = next((m for m in self.DEFAULT_MODELS if m != model), "qwen3:8b")
+            friendly = (
+                f"Ollama llama runner crashed loading '{model}'. "
+                "This usually means the model file is corrupt, incompatible, "
+                "or out of VRAM. Try: (1) restart Ollama "
+                "(`sudo systemctl restart ollama`), "
+                f"(2) switch to a different model (e.g. '{alt}'), "
+                "(3) re-pull or rebuild the model."
+            )

Note: the existing assertion assert "qwen3:8b" in result.message in test_runner_crash_returns_model_crash_kind (line 287) checks for that specific string and would need updating alongside this change.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nodes/adapters/ollama_client.py` around lines 318 - 328, The friendly crash
message in the Ollama client hardcodes "qwen3:8b", which is misleading if that
is the model that crashed; update the message construction in the method that
returns StreamResult(kind="model_crash", ...) so it conditionally includes a
suggested replacement only when model != "qwen3:8b" (e.g., "switch to a
known-good model like 'qwen3:8b'") and otherwise omits the specific model name
or uses a generic suggestion like "switch to a different known-good model";
adjust the test test_runner_crash_returns_model_crash_kind to remove or relax
the assert that checks for "qwen3:8b" in result.message so it validates the
model_crash kind and presence of recovery guidance instead.

@Limbicnation Limbicnation merged commit 70ba1bf into main May 8, 2026
4 of 5 checks passed
@Limbicnation Limbicnation deleted the fix/ollama-runner-crash-handling branch May 8, 2026 02:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant