Skip to content

Thinking-mode models (Gemma 4) return empty responses via Ollama OpenAI-compatible endpoint #26

@wbryanta

Description

@wbryanta

Problem

Models with thinking mode (e.g., Gemma 4) return empty content when called through Ollama's /v1/chat/completions endpoint. The model generates thinking tokens that consume the max_tokens budget, but the visible content is stripped by the OpenAI compatibility layer. This causes 500 errors when starting simulations.

Ollama's native /api/chat endpoint handles these models correctly and returns visible content.

Affected models

  • gemma4:26b (confirmed)
  • Likely any future model using <|think|> token reasoning

Proposed fix

Add a fallback in LLMClient.chat(): when the OpenAI-compatible endpoint returns empty content and we're talking to Ollama, retry via the native /api/chat endpoint. Backwards-compatible — only triggers on empty responses.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions