fix: handle Ollama llama-runner crashes with structured error categor…#11
Conversation
…ization Replace the catch-all RuntimeError wrapper in OllamaClient.generate_streaming with a StreamResult dataclass that categorises failures (ok, timeout, transient, model_crash, server_error, unavailable). Llama-runner crashes (HTTP 500 with "runner terminated" / "exit status" / "load failed") are detected and surfaced as user-actionable messages in the ComfyUI prompt output instead of bubbling up as Python stacktraces. PromptGenerator, NegativePrompt, and PromptRefiner nodes now branch on result.kind: model_crash / server_error / unavailable surface the message directly (subprocess fallback would also fail), while timeout / transient fall through to the existing subprocess fallback path. Adds 6 unit tests covering each error class plus the success path. Updates the existing PromptRefiner seed tests to return StreamResult from mocks.
📝 WalkthroughWalkthroughThis PR refactors Ollama streaming to return a structured ChangesOllama Streaming Error Categorization
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Code Review Roast 🔥Verdict: No Issues Found | Recommendation: Merge Oh wait, this PR is actually clean. I had my flamethrower warmed up and everything — ready to roast some catch-all RuntimeError handlers, and then... nothing. The StreamResult dataclass is elegantly designed with comprehensive error categorization (model_crash detection with actionable recovery steps, server_error, transient, timeout, unavailable — a full spectrum). The node updates properly distinguish which errors warrant subprocess fallback and which deserve immediate surfacing. Test coverage spans all 6 error paths plus success. I'm genuinely impressed. 📊 Overall: Like finding a unicorn in production — I didn't think clean PRs existed anymore, but here we are. Files Reviewed (6 files)
Reviewed by minimax-m2.5-20260211 · 521,093 tokens |
There was a problem hiding this comment.
Code Review
This pull request introduces a structured StreamResult dataclass to improve error handling and reporting in the OllamaClient streaming generation. It adds an exception classification system to identify specific failure modes—such as model crashes, server errors, and timeouts—providing more actionable feedback. The NegativePrompt, PromptGenerator, and PromptRefiner nodes were updated to utilize these structured results for better fallback logic. Feedback suggests generalizing the troubleshooting instructions in error messages to be platform-agnostic, as they currently include Linux-specific commands.
| friendly = ( | ||
| f"Ollama llama runner crashed loading '{model}'. " | ||
| "This usually means the model file is corrupt, incompatible, " | ||
| "or out of VRAM. Try: (1) restart Ollama " | ||
| "(`sudo systemctl restart ollama`), " | ||
| "(2) switch to a known-good model like 'qwen3:8b', " | ||
| "(3) re-pull or rebuild the model." | ||
| ) |
There was a problem hiding this comment.
The troubleshooting message includes a Linux-specific command (sudo systemctl restart ollama). Since Ollama and ComfyUI are cross-platform (Windows, macOS, Linux), it's better to provide a more generic instruction or mention that the command is for Linux users to avoid confusion on other platforms.
friendly = (
f"Ollama llama runner crashed loading '{model}'. "
"This usually means the model file is corrupt, incompatible, "
"or out of VRAM. Try: (1) restart the Ollama service or application, "
"(2) switch to a known-good model like 'qwen3:8b', "
"(3) re-pull or rebuild the model."
)There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
nodes/adapters/ollama_client.py (1)
40-55: ⚡ Quick win
kind: strshould useLiteralto lock down the valid discriminant values.The plain
strannotation allows callers to constructStreamResult(kind="typo")or compare against an invalid string without any static-analysis warning. Every consumer already branches exhaustively on the six documented values, so narrowing the type costs nothing and surfaces mistakes at type-check time.♻️ Proposed change
-from typing import Any, ClassVar +from typing import Any, ClassVar, Literal `@dataclass` class StreamResult: text: str | None = None - kind: str = "ok" - message: str = field(default="") + kind: Literal["ok", "timeout", "transient", "model_crash", "server_error", "unavailable"] = "ok" + message: str = ""(
message: str = ""is also the idiomatic form —field(default=...)is only needed for mutable defaults.)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nodes/adapters/ollama_client.py` around lines 40 - 55, The StreamResult dataclass uses a plain str for the discriminant `kind` and uses field(default=...) for `message`; change `kind` to a Literal union of the six allowed values (e.g. Literal["ok","timeout","transient","model_crash","server_error","unavailable"]) so static checkers catch invalid values, import Literal from typing, and simplify `message` to the idiomatic default `message: str = ""`; update any type hints/usages of StreamResult as needed to match the narrowed Literal type.tests/unit/test_ollama_client.py (1)
232-408: ⚡ Quick winMissing test for the "empty-content"
transientbranch.When
ollama.generate()streams successfully but yields zero text content,generate_streamingreturnsStreamResult(kind="transient", message="Generation returned no content.")(lines 289–293 inollama_client.py). None of the six new tests exercise this path. A minimal addition:✅ Suggested test
def test_no_content_returns_transient(self): """Streaming that yields no text should be classified as transient.""" def _empty_iter(): yield {"response": ""} # chunk present but empty text mock_ollama = MagicMock() mock_ollama.generate.return_value = _empty_iter() restore = self._patch_module(mock_ollama, response_error_cls=_FakeResponseError) try: client = OllamaClient() result = client.generate_streaming( model="qwen3:8b", prompt="x", temperature=0.7, top_p=0.9, timeout=30, ) finally: restore() assert result.kind == "transient" assert result.text is None assert "no content" in result.message🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/test_ollama_client.py` around lines 232 - 408, Add a new unit test method (e.g. test_no_content_returns_transient) to TestOllamaClientStreamingErrors that uses _patch_module to inject a mock_ollama whose generate() returns an iterator yielding a single chunk {"response": ""} (empty string), then call OllamaClient().generate_streaming(...) and assert the returned StreamResult has kind == "transient", text is None, and the message contains "no content"; place the test alongside the other tests and reuse the existing restore pattern and _FakeResponseError fixture used by the other cases.nodes/prompt_generator_node.py (1)
504-524: 💤 Low valueConsider unifying the two
ifblocks intoif/elif/elsefor consistency with the other nodes.The current structure (two separate
ifstatements, line 504 and line 520) is functionally correct because the first block alwaysreturns, but it differs from theif/elif/elsepattern used in bothnegative_prompt_node.pyandprompt_refiner_node.py. Aligning the structure makes the intent immediately obvious and prevents confusion if code is inserted between the two blocks in a future edit.♻️ Proposed refactor
- if result.kind == "ok" and result.text is not None: - output = result.text.strip() + if result.kind == "ok" and result.text is not None: + output = result.text.strip() if not include_reasoning: output = extract_final_prompt(output) if pbar is not None: pbar.update_absolute(100) if output: print(f"[PromptGenerator] Generated {len(output)} characters") return (output,) else: return ("[PromptGenerator] Generation returned empty result.",) - # Subprocess fallback would also fail for these classes; surface the - # message immediately so the user gets actionable guidance. - if result.kind in ("model_crash", "server_error", "unavailable"): - print(f"[PromptGenerator] {result.kind}: {result.message}") - return (f"[PromptGenerator] {result.message}",) - - print(f"[PromptGenerator] Streaming failed ({result.kind}), falling back to subprocess") + elif result.kind in ("model_crash", "server_error", "unavailable"): + print(f"[PromptGenerator] {result.kind}: {result.message}") + return (f"[PromptGenerator] {result.message}",) + else: + print(f"[PromptGenerator] Streaming failed ({result.kind}), falling back to subprocess")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nodes/prompt_generator_node.py` around lines 504 - 524, Unify the two separate condition checks on result into an if/elif/else structure so control flow matches other nodes: make the first branch check result.kind == "ok" (handling trimming, include_reasoning via extract_final_prompt, pbar.update_absolute, printing length and returning output or empty-message), the second branch be an elif for result.kind in ("model_crash","server_error","unavailable") that prints and returns the result.message, and the final else should log the streaming-failed fallback; keep existing behavior and return values for result, and reference the local symbols result, include_reasoning, extract_final_prompt, pbar, and the printed messages when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@nodes/adapters/ollama_client.py`:
- Around line 318-328: The friendly crash message in the Ollama client hardcodes
"qwen3:8b", which is misleading if that is the model that crashed; update the
message construction in the method that returns StreamResult(kind="model_crash",
...) so it conditionally includes a suggested replacement only when model !=
"qwen3:8b" (e.g., "switch to a known-good model like 'qwen3:8b'") and otherwise
omits the specific model name or uses a generic suggestion like "switch to a
different known-good model"; adjust the test
test_runner_crash_returns_model_crash_kind to remove or relax the assert that
checks for "qwen3:8b" in result.message so it validates the model_crash kind and
presence of recovery guidance instead.
---
Nitpick comments:
In `@nodes/adapters/ollama_client.py`:
- Around line 40-55: The StreamResult dataclass uses a plain str for the
discriminant `kind` and uses field(default=...) for `message`; change `kind` to
a Literal union of the six allowed values (e.g.
Literal["ok","timeout","transient","model_crash","server_error","unavailable"])
so static checkers catch invalid values, import Literal from typing, and
simplify `message` to the idiomatic default `message: str = ""`; update any type
hints/usages of StreamResult as needed to match the narrowed Literal type.
In `@nodes/prompt_generator_node.py`:
- Around line 504-524: Unify the two separate condition checks on result into an
if/elif/else structure so control flow matches other nodes: make the first
branch check result.kind == "ok" (handling trimming, include_reasoning via
extract_final_prompt, pbar.update_absolute, printing length and returning output
or empty-message), the second branch be an elif for result.kind in
("model_crash","server_error","unavailable") that prints and returns the
result.message, and the final else should log the streaming-failed fallback;
keep existing behavior and return values for result, and reference the local
symbols result, include_reasoning, extract_final_prompt, pbar, and the printed
messages when making the change.
In `@tests/unit/test_ollama_client.py`:
- Around line 232-408: Add a new unit test method (e.g.
test_no_content_returns_transient) to TestOllamaClientStreamingErrors that uses
_patch_module to inject a mock_ollama whose generate() returns an iterator
yielding a single chunk {"response": ""} (empty string), then call
OllamaClient().generate_streaming(...) and assert the returned StreamResult has
kind == "transient", text is None, and the message contains "no content"; place
the test alongside the other tests and reuse the existing restore pattern and
_FakeResponseError fixture used by the other cases.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f1909a13-b6c8-45fa-9c49-0b9c5ede0522
📒 Files selected for processing (6)
nodes/adapters/ollama_client.pynodes/negative_prompt_node.pynodes/prompt_generator_node.pynodes/prompt_refiner_node.pytests/unit/test_ollama_client.pytests/unit/test_prompt_refiner.py
| if (is_response_error or status is not None) and status == 500 and any(s in msg for s in crash_signals): | ||
| friendly = ( | ||
| f"Ollama llama runner crashed loading '{model}'. " | ||
| "This usually means the model file is corrupt, incompatible, " | ||
| "or out of VRAM. Try: (1) restart Ollama " | ||
| "(`sudo systemctl restart ollama`), " | ||
| "(2) switch to a known-good model like 'qwen3:8b', " | ||
| "(3) re-pull or rebuild the model." | ||
| ) | ||
| self._log(f"Model crash detected: {exc}") | ||
| return StreamResult(kind="model_crash", message=friendly) |
There was a problem hiding this comment.
Hardcoded qwen3:8b recommendation in crash message is self-defeating when that model is the one crashing.
If model == "qwen3:8b", step (2) of the recovery guidance reads "switch to a known-good model like 'qwen3:8b'" — the exact model that just crashed. Suggest either omitting the specific model name or making it conditional:
🐛 Proposed fix
- friendly = (
- f"Ollama llama runner crashed loading '{model}'. "
- "This usually means the model file is corrupt, incompatible, "
- "or out of VRAM. Try: (1) restart Ollama "
- "(`sudo systemctl restart ollama`), "
- "(2) switch to a known-good model like 'qwen3:8b', "
- "(3) re-pull or rebuild the model."
- )
+ alt = next((m for m in self.DEFAULT_MODELS if m != model), "qwen3:8b")
+ friendly = (
+ f"Ollama llama runner crashed loading '{model}'. "
+ "This usually means the model file is corrupt, incompatible, "
+ "or out of VRAM. Try: (1) restart Ollama "
+ "(`sudo systemctl restart ollama`), "
+ f"(2) switch to a different model (e.g. '{alt}'), "
+ "(3) re-pull or rebuild the model."
+ )Note: the existing assertion assert "qwen3:8b" in result.message in test_runner_crash_returns_model_crash_kind (line 287) checks for that specific string and would need updating alongside this change.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@nodes/adapters/ollama_client.py` around lines 318 - 328, The friendly crash
message in the Ollama client hardcodes "qwen3:8b", which is misleading if that
is the model that crashed; update the message construction in the method that
returns StreamResult(kind="model_crash", ...) so it conditionally includes a
suggested replacement only when model != "qwen3:8b" (e.g., "switch to a
known-good model like 'qwen3:8b'") and otherwise omits the specific model name
or uses a generic suggestion like "switch to a different known-good model";
adjust the test test_runner_crash_returns_model_crash_kind to remove or relax
the assert that checks for "qwen3:8b" in result.message so it validates the
model_crash kind and presence of recovery guidance instead.
…ization
Replace the catch-all RuntimeError wrapper in OllamaClient.generate_streaming with a StreamResult dataclass that categorises failures (ok, timeout, transient, model_crash, server_error, unavailable). Llama-runner crashes (HTTP 500 with "runner terminated" / "exit status" / "load failed") are detected and surfaced as user-actionable messages in the ComfyUI prompt output instead of bubbling up as Python stacktraces.
PromptGenerator, NegativePrompt, and PromptRefiner nodes now branch on result.kind: model_crash / server_error / unavailable surface the message directly (subprocess fallback would also fail), while timeout / transient fall through to the existing subprocess fallback path.
Adds 6 unit tests covering each error class plus the success path. Updates the existing PromptRefiner seed tests to return StreamResult from mocks.
Summary by CodeRabbit
Bug Fixes
Refactor