Problem
Models with thinking mode (e.g., Gemma 4) return empty content when called through Ollama's /v1/chat/completions endpoint. The model generates thinking tokens that consume the max_tokens budget, but the visible content is stripped by the OpenAI compatibility layer. This causes 500 errors when starting simulations.
Ollama's native /api/chat endpoint handles these models correctly and returns visible content.
Affected models
gemma4:26b (confirmed)
- Likely any future model using
<|think|> token reasoning
Proposed fix
Add a fallback in LLMClient.chat(): when the OpenAI-compatible endpoint returns empty content and we're talking to Ollama, retry via the native /api/chat endpoint. Backwards-compatible — only triggers on empty responses.
Problem
Models with thinking mode (e.g., Gemma 4) return empty
contentwhen called through Ollama's/v1/chat/completionsendpoint. The model generates thinking tokens that consume themax_tokensbudget, but the visible content is stripped by the OpenAI compatibility layer. This causes 500 errors when starting simulations.Ollama's native
/api/chatendpoint handles these models correctly and returns visible content.Affected models
gemma4:26b(confirmed)<|think|>token reasoningProposed fix
Add a fallback in
LLMClient.chat(): when the OpenAI-compatible endpoint returns empty content and we're talking to Ollama, retry via the native/api/chatendpoint. Backwards-compatible — only triggers on empty responses.