This document details the prompt construction architecture for the Gemini API (specifically supporting Gemini 3.0 Thought Process) within the pipe framework.
To balance context caching efficiency with the reliable transmission of thought processes (thought_signature), we have adopted a 4-Layer Architecture.
The request payload is constructed as a list of Content objects (or strings compatible with the SDK) combined in the following strict order:
| Layer | Component | Format | Cacheable | Role | Description |
|---|---|---|---|---|---|
| 1 | Static | Text (Jinja2 Rendered) | ✅ Yes | System/User | Immutable system instructions, roles, rules, and procedures. Explicit target for context caching. Includes explicit cached_history. |
| 2 | Dynamic | Text (JSON) | ❌ No | User | Volatile context information such as file references, artifacts, todos, and current datetime. |
| 3 | Buffered | List[Content Obj] |
❌ No | User/Model | Recent conversation history not yet cached. Crucially contains the thought_signature from the previous model response. |
| 4 | Trigger | Content Obj |
❌ No | User | The immediate trigger for execution. Either the user's new instruction (current_task) or a tool's execution result (tool_response). |
- Content: Definitions for
roles,procedure,constraints, andcached_history(old turns exceeding the buffer threshold). - Purpose: To maximize context caching efficiency by grouping immutable or rarely changing content at the beginning of the prompt.
- Implementation: Rendered from
gemini_static_prompt.j2.
- Content:
file_references,artifacts,todos,current_datetime. - Purpose: To provide the model with the latest state of the project and environment.
- Implementation: Rendered from
gemini_dynamic_prompt.j2as a JSON string and appended as a text part. - Note: Previously,
current_taskwas included here, but it has been moved to the 4th Layer to ensure correct chronological ordering.
- Content: Recent conversation turns (
turns) that are not yet cached. - Key Mechanism: The most recent
ModelResponseTurnin this layer MUST contain thethought_signature(opaque binary tokens) from the previous API response to enable Gemini 3.0's continuous thought process. - Implementation: Constructed as a list of
google.genai.types.Contentobjects, NOT plain text. TheContentobject containing thethought_signatureis restored (deserialized and decoded) from the persistedraw_response.
- Content: The demand for action.
- Case A (Normal Turn): A new instruction from the user (
UserTaskTurn). - Case B (Post-Tool Execution): The result of a tool execution (
ToolResponseTurn). In this case,UserTaskTurnis typically empty or None.
- Case A (Normal Turn): A new instruction from the user (
- Purpose: Placed at the very end of the request to strongly prompt the model to "respond to THIS input".
- Ordering: If this layer were placed before the Buffered History (e.g., inside the Dynamic Layer), the model would see "Current Question -> Past Conversation -> End", leading to confusion or empty responses (
Model stream was empty).
graph TD
Start([Start Streaming Request]) --> Build[Build Prompt Model]
subgraph "Layer 1: Static"
Build --> Static[Render gemini_static_prompt.j2]
Static -- Cache Miss --> Payload[Append to Payload]
Static -- Cache Hit --> Cache[Use Cached Content ID]
end
subgraph "Layer 2: Dynamic"
Build --> Dynamic[Render gemini_dynamic_prompt.j2 (JSON)]
Dynamic --> Payload
end
subgraph "Layer 3: Buffered"
Build --> Buffer[Process Buffered Turns]
Buffer -- Restore --> Signature[Inject thought_signature from raw_response]
Signature --> Payload
end
subgraph "Layer 4: Trigger"
Build --> Check{Is Instruction Present?}
Check -- Yes --> Task[Create Current Task Content]
Check -- No --> Tool[Assume Tool Response is in Buffer]
Task --> Payload
end
Payload --> API[Call Gemini API]
- Strict Ordering: The order
Static -> Dynamic -> Buffered -> Triggeris mandatory. - Mixed Types: The
contentslist sent to the SDK contains a mix ofstr(for Static/Dynamic layers) andtypes.Contentobjects (for Buffered/Trigger layers). The SDK handles this hybrid structure. - Signature Restoration: Since
thought_signatureis binary data, it may be Base64 encoded when persisted to JSON (via Pydantic). The system manually appliesbase64.b64decodeif necessary during the restoration of the Buffered Layer to ensure valid bytes are sent to the API.