fix: handle Ollama llama-runner crashes with structured error categor… by Limbicnation · Pull Request #11 · Limbicnation/ComfyUI-PromptGenerator

Limbicnation · 2026-05-08T01:49:25Z

…ization

Replace the catch-all RuntimeError wrapper in OllamaClient.generate_streaming with a StreamResult dataclass that categorises failures (ok, timeout, transient, model_crash, server_error, unavailable). Llama-runner crashes (HTTP 500 with "runner terminated" / "exit status" / "load failed") are detected and surfaced as user-actionable messages in the ComfyUI prompt output instead of bubbling up as Python stacktraces.

PromptGenerator, NegativePrompt, and PromptRefiner nodes now branch on result.kind: model_crash / server_error / unavailable surface the message directly (subprocess fallback would also fail), while timeout / transient fall through to the existing subprocess fallback path.

Adds 6 unit tests covering each error class plus the success path. Updates the existing PromptRefiner seed tests to return StreamResult from mocks.

Summary by CodeRabbit

Bug Fixes
- Improved error detection and classification for streaming generation failures, distinguishing between model crashes, server errors, timeouts, and transient issues.
- Added intelligent fallback mechanisms when streaming fails to ensure generation attempts continue via alternative paths.
- Enhanced error messaging to provide clearer, actionable information for different failure scenarios.
Refactor
- Restructured streaming result handling to return structured responses with error categorization instead of generic null values.

…ization Replace the catch-all RuntimeError wrapper in OllamaClient.generate_streaming with a StreamResult dataclass that categorises failures (ok, timeout, transient, model_crash, server_error, unavailable). Llama-runner crashes (HTTP 500 with "runner terminated" / "exit status" / "load failed") are detected and surfaced as user-actionable messages in the ComfyUI prompt output instead of bubbling up as Python stacktraces. PromptGenerator, NegativePrompt, and PromptRefiner nodes now branch on result.kind: model_crash / server_error / unavailable surface the message directly (subprocess fallback would also fail), while timeout / transient fall through to the existing subprocess fallback path. Adds 6 unit tests covering each error class plus the success path. Updates the existing PromptRefiner seed tests to return StreamResult from mocks.

coderabbitai · 2026-05-08T01:49:35Z

📝 Walkthrough

Walkthrough

This PR refactors Ollama streaming to return a structured StreamResult dataclass instead of str | None, enabling precise error categorization. The adapter detects model crashes (500 with "runner terminated"), distinguishes server errors from transients, and consumers branch on result kind to choose appropriate fallback paths.

Changes

Ollama Streaming Error Categorization

Layer / File(s)	Summary
Data Contract `nodes/adapters/ollama_client.py`	`StreamResult` dataclass introduced with `text`, `kind`, and `message` fields to represent streaming outcomes; optional imports added for exception type detection.
Core Streaming Implementation `nodes/adapters/ollama_client.py`	`generate_streaming()` refactored to return `StreamResult` for all paths: unavailable API, total/per-chunk timeouts, exceptions (classified by status code and message), empty yields, and success. Exception classification logic added to distinguish model crashes from generic server errors.
Exception Classification `nodes/adapters/ollama_client.py`	`_classify_streaming_exception()` method detects llama-runner crashes via status code 500 and "runner terminated" message, maps them to `model_crash` with actionable user message.
Consumer Integration `nodes/negative_prompt_node.py`, `nodes/prompt_generator_node.py`, `nodes/prompt_refiner_node.py`	Each node updated to capture streaming result and branch on `result.kind`: use `result.text` on success, immediately return error message for fatal kinds (model_crash/server_error/unavailable), fall back to subprocess for transient/timeout.
Test Coverage `tests/unit/test_ollama_client.py`, `tests/unit/test_prompt_refiner.py`	New `TestOllamaClientStreamingErrors` class validates error categorization (model_crash, server_error, transient, unavailable). Existing streaming tests and prompt refiner mocks updated to return `StreamResult` instances.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

A rabbit hops through error paths so bright,
Streaming results now carry the light,
Crashes crash cleanly, timeouts timeout well,
Each fault type whispers its own tale to tell! 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 64.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly addresses the main change: introducing structured error categorization to handle Ollama llama-runner crashes. It aligns with the PR's primary objective of replacing catch-all errors with categorized StreamResult outcomes.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/ollama-runner-crash-handling

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

kilo-code-bot · 2026-05-08T01:50:56Z

Code Review Roast 🔥

Verdict: No Issues Found | Recommendation: Merge

Oh wait, this PR is actually clean. I had my flamethrower warmed up and everything — ready to roast some catch-all RuntimeError handlers, and then... nothing. The StreamResult dataclass is elegantly designed with comprehensive error categorization (model_crash detection with actionable recovery steps, server_error, transient, timeout, unavailable — a full spectrum). The node updates properly distinguish which errors warrant subprocess fallback and which deserve immediate surfacing. Test coverage spans all 6 error paths plus success. I'm genuinely impressed.

📊 Overall: Like finding a unicorn in production — I didn't think clean PRs existed anymore, but here we are.

Files Reviewed (6 files)

nodes/adapters/ollama_client.py - Core implementation with StreamResult and error classification
nodes/negative_prompt_node.py - Updated to handle StreamResult
nodes/prompt_generator_node.py - Updated to handle StreamResult
nodes/prompt_refiner_node.py - Updated to handle StreamResult
tests/unit/test_ollama_client.py - New tests for error paths
tests/unit/test_prompt_refiner.py - Updated mocks for StreamResult

_{Reviewed by minimax-m2.5-20260211 · 521,093 tokens}

gemini-code-assist

Code Review

This pull request introduces a structured StreamResult dataclass to improve error handling and reporting in the OllamaClient streaming generation. It adds an exception classification system to identify specific failure modes—such as model crashes, server errors, and timeouts—providing more actionable feedback. The NegativePrompt, PromptGenerator, and PromptRefiner nodes were updated to utilize these structured results for better fallback logic. Feedback suggests generalizing the troubleshooting instructions in error messages to be platform-agnostic, as they currently include Linux-specific commands.

gemini-code-assist · 2026-05-08T01:50:56Z

+            friendly = (
+                f"Ollama llama runner crashed loading '{model}'. "
+                "This usually means the model file is corrupt, incompatible, "
+                "or out of VRAM. Try: (1) restart Ollama "
+                "(`sudo systemctl restart ollama`), "
+                "(2) switch to a known-good model like 'qwen3:8b', "
+                "(3) re-pull or rebuild the model."
+            )


The troubleshooting message includes a Linux-specific command (sudo systemctl restart ollama). Since Ollama and ComfyUI are cross-platform (Windows, macOS, Linux), it's better to provide a more generic instruction or mention that the command is for Linux users to avoid confusion on other platforms.

friendly = ( f"Ollama llama runner crashed loading '{model}'. " "This usually means the model file is corrupt, incompatible, " "or out of VRAM. Try: (1) restart the Ollama service or application, " "(2) switch to a known-good model like 'qwen3:8b', " "(3) re-pull or rebuild the model." )

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

nodes/adapters/ollama_client.py (1)

40-55: ⚡ Quick win

kind: str should use Literal to lock down the valid discriminant values.

The plain str annotation allows callers to construct StreamResult(kind="typo") or compare against an invalid string without any static-analysis warning. Every consumer already branches exhaustively on the six documented values, so narrowing the type costs nothing and surfaces mistakes at type-check time.
♻️ Proposed change
-from typing import Any, ClassVar
+from typing import Any, ClassVar, Literal

 `@dataclass`
 class StreamResult:
     text: str | None = None
-    kind: str = "ok"
-    message: str = field(default="")
+    kind: Literal["ok", "timeout", "transient", "model_crash", "server_error", "unavailable"] = "ok"
+    message: str = ""
(message: str = "" is also the idiomatic form — field(default=...) is only needed for mutable defaults.)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nodes/adapters/ollama_client.py` around lines 40 - 55, The StreamResult
dataclass uses a plain str for the discriminant `kind` and uses
field(default=...) for `message`; change `kind` to a Literal union of the six
allowed values (e.g.
Literal["ok","timeout","transient","model_crash","server_error","unavailable"])
so static checkers catch invalid values, import Literal from typing, and
simplify `message` to the idiomatic default `message: str = ""`; update any type
hints/usages of StreamResult as needed to match the narrowed Literal type.

tests/unit/test_ollama_client.py (1)

232-408: ⚡ Quick win

Missing test for the "empty-content" transient branch.

When ollama.generate() streams successfully but yields zero text content, generate_streaming returns StreamResult(kind="transient", message="Generation returned no content.") (lines 289–293 in ollama_client.py). None of the six new tests exercise this path. A minimal addition:

✅ Suggested test

def test_no_content_returns_transient(self):
    """Streaming that yields no text should be classified as transient."""

    def _empty_iter():
        yield {"response": ""}   # chunk present but empty text

    mock_ollama = MagicMock()
    mock_ollama.generate.return_value = _empty_iter()

    restore = self._patch_module(mock_ollama, response_error_cls=_FakeResponseError)
    try:
        client = OllamaClient()
        result = client.generate_streaming(
            model="qwen3:8b",
            prompt="x",
            temperature=0.7,
            top_p=0.9,
            timeout=30,
        )
    finally:
        restore()

    assert result.kind == "transient"
    assert result.text is None
    assert "no content" in result.message

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/test_ollama_client.py` around lines 232 - 408, Add a new unit test
method (e.g. test_no_content_returns_transient) to
TestOllamaClientStreamingErrors that uses _patch_module to inject a mock_ollama
whose generate() returns an iterator yielding a single chunk {"response": ""}
(empty string), then call OllamaClient().generate_streaming(...) and assert the
returned StreamResult has kind == "transient", text is None, and the message
contains "no content"; place the test alongside the other tests and reuse the
existing restore pattern and _FakeResponseError fixture used by the other cases.

nodes/prompt_generator_node.py (1)

504-524: 💤 Low value

Consider unifying the two if blocks into if/elif/else for consistency with the other nodes.

The current structure (two separate if statements, line 504 and line 520) is functionally correct because the first block always returns, but it differs from the if/elif/else pattern used in both negative_prompt_node.py and prompt_refiner_node.py. Aligning the structure makes the intent immediately obvious and prevents confusion if code is inserted between the two blocks in a future edit.

♻️ Proposed refactor

-            if result.kind == "ok" and result.text is not None:
-                output = result.text.strip()
+            if result.kind == "ok" and result.text is not None:
+                output = result.text.strip()
                 if not include_reasoning:
                     output = extract_final_prompt(output)
 
                 if pbar is not None:
                     pbar.update_absolute(100)
 
                 if output:
                     print(f"[PromptGenerator] Generated {len(output)} characters")
                     return (output,)
                 else:
                     return ("[PromptGenerator] Generation returned empty result.",)
 
-            # Subprocess fallback would also fail for these classes; surface the
-            # message immediately so the user gets actionable guidance.
-            if result.kind in ("model_crash", "server_error", "unavailable"):
-                print(f"[PromptGenerator] {result.kind}: {result.message}")
-                return (f"[PromptGenerator] {result.message}",)
-
-            print(f"[PromptGenerator] Streaming failed ({result.kind}), falling back to subprocess")
+            elif result.kind in ("model_crash", "server_error", "unavailable"):
+                print(f"[PromptGenerator] {result.kind}: {result.message}")
+                return (f"[PromptGenerator] {result.message}",)
+            else:
+                print(f"[PromptGenerator] Streaming failed ({result.kind}), falling back to subprocess")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nodes/prompt_generator_node.py` around lines 504 - 524, Unify the two
separate condition checks on result into an if/elif/else structure so control
flow matches other nodes: make the first branch check result.kind == "ok"
(handling trimming, include_reasoning via extract_final_prompt,
pbar.update_absolute, printing length and returning output or empty-message),
the second branch be an elif for result.kind in
("model_crash","server_error","unavailable") that prints and returns the
result.message, and the final else should log the streaming-failed fallback;
keep existing behavior and return values for result, and reference the local
symbols result, include_reasoning, extract_final_prompt, pbar, and the printed
messages when making the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nodes/adapters/ollama_client.py`:
- Around line 318-328: The friendly crash message in the Ollama client hardcodes
"qwen3:8b", which is misleading if that is the model that crashed; update the
message construction in the method that returns StreamResult(kind="model_crash",
...) so it conditionally includes a suggested replacement only when model !=
"qwen3:8b" (e.g., "switch to a known-good model like 'qwen3:8b'") and otherwise
omits the specific model name or uses a generic suggestion like "switch to a
different known-good model"; adjust the test
test_runner_crash_returns_model_crash_kind to remove or relax the assert that
checks for "qwen3:8b" in result.message so it validates the model_crash kind and
presence of recovery guidance instead.

---

Nitpick comments:
In `@nodes/adapters/ollama_client.py`:
- Around line 40-55: The StreamResult dataclass uses a plain str for the
discriminant `kind` and uses field(default=...) for `message`; change `kind` to
a Literal union of the six allowed values (e.g.
Literal["ok","timeout","transient","model_crash","server_error","unavailable"])
so static checkers catch invalid values, import Literal from typing, and
simplify `message` to the idiomatic default `message: str = ""`; update any type
hints/usages of StreamResult as needed to match the narrowed Literal type.

In `@nodes/prompt_generator_node.py`:
- Around line 504-524: Unify the two separate condition checks on result into an
if/elif/else structure so control flow matches other nodes: make the first
branch check result.kind == "ok" (handling trimming, include_reasoning via
extract_final_prompt, pbar.update_absolute, printing length and returning output
or empty-message), the second branch be an elif for result.kind in
("model_crash","server_error","unavailable") that prints and returns the
result.message, and the final else should log the streaming-failed fallback;
keep existing behavior and return values for result, and reference the local
symbols result, include_reasoning, extract_final_prompt, pbar, and the printed
messages when making the change.

In `@tests/unit/test_ollama_client.py`:
- Around line 232-408: Add a new unit test method (e.g.
test_no_content_returns_transient) to TestOllamaClientStreamingErrors that uses
_patch_module to inject a mock_ollama whose generate() returns an iterator
yielding a single chunk {"response": ""} (empty string), then call
OllamaClient().generate_streaming(...) and assert the returned StreamResult has
kind == "transient", text is None, and the message contains "no content"; place
the test alongside the other tests and reuse the existing restore pattern and
_FakeResponseError fixture used by the other cases.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f1909a13-b6c8-45fa-9c49-0b9c5ede0522

📥 Commits

Reviewing files that changed from the base of the PR and between b5e1731 and bf54b54.

📒 Files selected for processing (6)

nodes/adapters/ollama_client.py
nodes/negative_prompt_node.py
nodes/prompt_generator_node.py
nodes/prompt_refiner_node.py
tests/unit/test_ollama_client.py
tests/unit/test_prompt_refiner.py

coderabbitai · 2026-05-08T01:57:18Z

+        if (is_response_error or status is not None) and status == 500 and any(s in msg for s in crash_signals):
+            friendly = (
+                f"Ollama llama runner crashed loading '{model}'. "
+                "This usually means the model file is corrupt, incompatible, "
+                "or out of VRAM. Try: (1) restart Ollama "
+                "(`sudo systemctl restart ollama`), "
+                "(2) switch to a known-good model like 'qwen3:8b', "
+                "(3) re-pull or rebuild the model."
+            )
+            self._log(f"Model crash detected: {exc}")
+            return StreamResult(kind="model_crash", message=friendly)


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Hardcoded qwen3:8b recommendation in crash message is self-defeating when that model is the one crashing.

If model == "qwen3:8b", step (2) of the recovery guidance reads "switch to a known-good model like 'qwen3:8b'" — the exact model that just crashed. Suggest either omitting the specific model name or making it conditional:

🐛 Proposed fix

- friendly = ( - f"Ollama llama runner crashed loading '{model}'. " - "This usually means the model file is corrupt, incompatible, " - "or out of VRAM. Try: (1) restart Ollama " - "(`sudo systemctl restart ollama`), " - "(2) switch to a known-good model like 'qwen3:8b', " - "(3) re-pull or rebuild the model." - ) + alt = next((m for m in self.DEFAULT_MODELS if m != model), "qwen3:8b") + friendly = ( + f"Ollama llama runner crashed loading '{model}'. " + "This usually means the model file is corrupt, incompatible, " + "or out of VRAM. Try: (1) restart Ollama " + "(`sudo systemctl restart ollama`), " + f"(2) switch to a different model (e.g. '{alt}'), " + "(3) re-pull or rebuild the model." + )

Note: the existing assertion assert "qwen3:8b" in result.message in test_runner_crash_returns_model_crash_kind (line 287) checks for that specific string and would need updating alongside this change.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nodes/adapters/ollama_client.py` around lines 318 - 328, The friendly crash message in the Ollama client hardcodes "qwen3:8b", which is misleading if that is the model that crashed; update the message construction in the method that returns StreamResult(kind="model_crash", ...) so it conditionally includes a suggested replacement only when model != "qwen3:8b" (e.g., "switch to a known-good model like 'qwen3:8b'") and otherwise omits the specific model name or uses a generic suggestion like "switch to a different known-good model"; adjust the test test_runner_crash_returns_model_crash_kind to remove or relax the assert that checks for "qwen3:8b" in result.message so it validates the model_crash kind and presence of recovery guidance instead.

gemini-code-assist Bot reviewed May 8, 2026

View reviewed changes

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

Limbicnation merged commit 70ba1bf into main May 8, 2026
4 of 5 checks passed

Limbicnation deleted the fix/ollama-runner-crash-handling branch May 8, 2026 02:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle Ollama llama-runner crashes with structured error categor…#11

fix: handle Ollama llama-runner crashes with structured error categor…#11
Limbicnation merged 1 commit into
mainfrom
fix/ollama-runner-crash-handling

Limbicnation commented May 8, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 8, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

kilo-code-bot Bot commented May 8, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Limbicnation commented May 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

kilo-code-bot Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Roast 🔥

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Limbicnation commented May 8, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 8, 2026 •

edited

Loading

kilo-code-bot Bot commented May 8, 2026 •

edited

Loading