Skip to content

Switch Gemini text generation from generateContent to the Interactions API #289

Description

@Kamilbenkirane

Summary

Gemini's generateContent streaming (streamGenerateContent) surfaces only text/reasoning deltas and an end-of-turn groundingMetadata snapshot. It cannot expose intermediate steps — when the model launches a native Google Search, what it is searching, or per-step thought progress — as they happen. Google's newer Interactions API surfaces these as chronological steps (thoughts, function calls, progress updates), making native-tool activity observable mid-stream.

Motivation

Consumers of celeste's Google text streaming currently get no signal that a native Google Search is running — the search executes server-side during the pre-text latency and emits nothing until the answer begins. By contrast, the other reasoning providers expose live search activity during streaming that celeste can already surface per-chunk:

  • OpenAI / OpenResponses: response.web_search_call.in_progress / .searching / .completed
  • Anthropic: server_tool_use content_block_start + query delta + web_search_tool_result

Gemini is the gap. On generateContent there is no launch/progress signal for grounding; groundingMetadata only confirms a search happened and arrives accumulated with the answer.

The Interactions API closes the gap: with thinking_summaries="auto" it streams intermediate reasoning steps and progress alongside function calls and tool activity as a chronological steps array.

Proposal

Move celeste's Google text streaming from generateContent to the Interactions API so streamed reasoning and native-tool (search) progress are observable.

Affected areas:

  • src/celeste/providers/google/generate_content/ (streaming, grounding)
  • src/celeste/modalities/text/providers/google/

Open questions / scope

  • Add vs replace — add the Interactions API as an alternate streaming path, or migrate generateContent streaming to it wholesale?
  • Feature parity — parameters (temperature, thinking_budget/thinking_level), grounding/citation mapping, structured output, and the non-streaming path.
  • Normalized signal — how a native-search "started/done" step maps onto the streamed chunk/output (mirroring how reasoning is threaded onto TextChunk today).
  • Model coverage / availability — Gemini 3+ vs 2.5, and API availability.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions