fix: fall back to Ollama native /api/chat for thinking-mode models (fixes #26) by nandanadileep · Pull Request #38 · nikmcfly/MiroFish-Offline

nandanadileep · 2026-04-10T12:06:42Z

Problem

Thinking-mode models (e.g. gemma4:26b) generate internal <|think|> reasoning tokens that exhaust max_tokens before producing visible content. Ollama's OpenAI-compatible /v1/chat/completions endpoint strips those tokens and returns empty content, causing 500 errors when starting simulations.

Fix

In LLMClient.chat(), after calling the OpenAI-compat endpoint, check if content is empty. If it is and we're talking to an Ollama server, retry via the native /api/chat endpoint, which surfaces the visible response correctly.

Changes in backend/app/utils/llm_client.py:

Added _ollama_native_base() — strips /v1 suffix to get the Ollama host URL
Added _chat_via_ollama_native() — POSTs to /api/chat with stream=false, carries over temperature and num_ctx
In chat(): triggers the fallback only when content is falsy and _is_ollama() is true — fully backwards-compatible, zero impact on non-Ollama or non-thinking-mode models
Fixed a latent NoneType crash: re.sub(…, content or '') guards against None content even without the fallback

Test plan

Run a simulation with a standard model (e.g. qwen2.5:32b) — behaviour unchanged
Run a simulation with gemma4:26b — should now return visible response instead of 500
Verify _ollama_native_base() strips /v1 from http://localhost:11434/v1 correctly
Non-Ollama endpoints (OpenAI, etc.) are unaffected — fallback never fires

Fixes #26

Thinking-mode models (e.g. Gemma 4) generate internal reasoning tokens that can exhaust max_tokens before producing any visible content. Ollama's OpenAI-compatible /v1/chat/completions endpoint strips those tokens and returns empty content, causing 500 errors in simulations. When LLMClient.chat() receives empty content from an Ollama endpoint, it now retries via the native /api/chat endpoint which correctly returns the visible response. The fallback is backwards-compatible and only triggers on empty responses. Fixes nikmcfly#26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fall back to Ollama native /api/chat for thinking-mode models (fixes #26)#38

fix: fall back to Ollama native /api/chat for thinking-mode models (fixes #26)#38
nandanadileep wants to merge 1 commit intonikmcfly:mainfrom
nandanadileep:fix/ollama-thinking-mode-empty-response

nandanadileep commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nandanadileep commented Apr 10, 2026

Problem

Fix

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant