Summary
Gemini's generateContent streaming (streamGenerateContent) surfaces only text/reasoning deltas and an end-of-turn groundingMetadata snapshot. It cannot expose intermediate steps — when the model launches a native Google Search, what it is searching, or per-step thought progress — as they happen. Google's newer Interactions API surfaces these as chronological steps (thoughts, function calls, progress updates), making native-tool activity observable mid-stream.
Motivation
Consumers of celeste's Google text streaming currently get no signal that a native Google Search is running — the search executes server-side during the pre-text latency and emits nothing until the answer begins. By contrast, the other reasoning providers expose live search activity during streaming that celeste can already surface per-chunk:
- OpenAI / OpenResponses:
response.web_search_call.in_progress / .searching / .completed
- Anthropic:
server_tool_use content_block_start + query delta + web_search_tool_result
Gemini is the gap. On generateContent there is no launch/progress signal for grounding; groundingMetadata only confirms a search happened and arrives accumulated with the answer.
The Interactions API closes the gap: with thinking_summaries="auto" it streams intermediate reasoning steps and progress alongside function calls and tool activity as a chronological steps array.
Proposal
Move celeste's Google text streaming from generateContent to the Interactions API so streamed reasoning and native-tool (search) progress are observable.
Affected areas:
src/celeste/providers/google/generate_content/ (streaming, grounding)
src/celeste/modalities/text/providers/google/
Open questions / scope
- Add vs replace — add the Interactions API as an alternate streaming path, or migrate
generateContent streaming to it wholesale?
- Feature parity — parameters (
temperature, thinking_budget/thinking_level), grounding/citation mapping, structured output, and the non-streaming path.
- Normalized signal — how a native-search "started/done" step maps onto the streamed chunk/output (mirroring how
reasoning is threaded onto TextChunk today).
- Model coverage / availability — Gemini 3+ vs 2.5, and API availability.
References
Summary
Gemini's
generateContentstreaming (streamGenerateContent) surfaces only text/reasoning deltas and an end-of-turngroundingMetadatasnapshot. It cannot expose intermediate steps — when the model launches a native Google Search, what it is searching, or per-step thought progress — as they happen. Google's newer Interactions API surfaces these as chronological steps (thoughts, function calls, progress updates), making native-tool activity observable mid-stream.Motivation
Consumers of celeste's Google text streaming currently get no signal that a native Google Search is running — the search executes server-side during the pre-text latency and emits nothing until the answer begins. By contrast, the other reasoning providers expose live search activity during streaming that celeste can already surface per-chunk:
response.web_search_call.in_progress/.searching/.completedserver_tool_usecontent_block_start+ query delta +web_search_tool_resultGemini is the gap. On
generateContentthere is no launch/progress signal for grounding;groundingMetadataonly confirms a search happened and arrives accumulated with the answer.The Interactions API closes the gap: with
thinking_summaries="auto"it streams intermediate reasoning steps and progress alongside function calls and tool activity as a chronologicalstepsarray.Proposal
Move celeste's Google text streaming from
generateContentto the Interactions API so streamed reasoning and native-tool (search) progress are observable.Affected areas:
src/celeste/providers/google/generate_content/(streaming, grounding)src/celeste/modalities/text/providers/google/Open questions / scope
generateContentstreaming to it wholesale?temperature,thinking_budget/thinking_level), grounding/citation mapping, structured output, and the non-streaming path.reasoningis threaded ontoTextChunktoday).References