Thinking-mode models (Gemma 4) return empty responses via Ollama OpenAI-compatible endpoint

## Problem

Models with thinking mode (e.g., Gemma 4) return empty `content` when called through Ollama's `/v1/chat/completions` endpoint. The model generates thinking tokens that consume the `max_tokens` budget, but the visible content is stripped by the OpenAI compatibility layer. This causes 500 errors when starting simulations.

Ollama's native `/api/chat` endpoint handles these models correctly and returns visible content.

## Affected models

- `gemma4:26b` (confirmed)
- Likely any future model using `<|think|>` token reasoning

## Proposed fix

Add a fallback in `LLMClient.chat()`: when the OpenAI-compatible endpoint returns empty content and we're talking to Ollama, retry via the native `/api/chat` endpoint. Backwards-compatible — only triggers on empty responses.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thinking-mode models (Gemma 4) return empty responses via Ollama OpenAI-compatible endpoint #26

Problem

Affected models

Proposed fix

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Thinking-mode models (Gemma 4) return empty responses via Ollama OpenAI-compatible endpoint #26

Description

Problem

Affected models

Proposed fix

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions