-
Notifications
You must be signed in to change notification settings - Fork 4
Thinking
Extended thinking (also known as reasoning or chain-of-thought) allows LLM models to show their reasoning process before providing a final answer. This is particularly useful for complex problem-solving, math, logic puzzles, and multi-step analysis tasks.
| Provider | Thinking Visible | Configuration |
|---|---|---|
| Claude (Anthropic/Bedrock) | Yes |
effort + budget
|
| Gemini 2.5+ | Yes |
budget (token count) |
| Gemini 3 | Yes |
effort levels |
| OpenAI (o1/o3) | No (hidden) |
effort only |
| Perplexity | Yes | Streams <think> blocks |
| Mistral Magistral | Yes | Always on |
| Ollama Qwen3 | Yes | Default on, :none to disable |
Configure thinking at the agent class level:
class ReasoningAgent < ApplicationAgent
model "claude-opus-4-5-20250514"
thinking effort: :high, budget: 10000
system "You are a reasoning assistant. Show your work step by step."
param :query, required: true
user "{query}"
endresult = ReasoningAgent.call(query: "What is 127 * 43?")
# Access the reasoning/thinking content
puts "Thinking:"
puts result.thinking_text
puts "\nAnswer:"
puts result.content
# Check if thinking was used
if result.has_thinking?
puts "Thinking tokens used: #{result.thinking_tokens}"
endThe effort option controls how much "thinking" the model does:
| Level | Description |
|---|---|
:none |
Disable thinking |
:low |
Light reasoning, faster responses |
:medium |
Balanced reasoning |
:high |
Deep reasoning, best for complex problems |
class QuickAgent < ApplicationAgent
thinking effort: :low # Fast, light reasoning
end
class DeepAgent < ApplicationAgent
thinking effort: :high # Thorough, detailed reasoning
endThe budget option caps the maximum tokens used for thinking:
class BudgetedAgent < ApplicationAgent
thinking effort: :high, budget: 5000 # Max 5000 tokens for thinking
endNote: Thinking tokens are billed by providers. Higher budgets increase costs.
Override thinking configuration at call time:
# Override effort and budget
result = MyAgent.call(
query: "Complex problem...",
thinking: { effort: :high, budget: 15000 }
)
# Disable thinking for this call
result = MyAgent.call(
query: "Simple question",
thinking: false
)
# Use lower effort for speed
result = MyAgent.call(
query: "Quick question",
thinking: { effort: :low }
)The Result object includes thinking-related accessors:
result = ThinkingAgent.call(query: "Solve this problem")
# Thinking content
result.thinking_text # String - the reasoning content
result.thinking_signature # String - for multi-turn continuity (Claude)
result.thinking_tokens # Integer - tokens used for thinking
result.has_thinking? # Boolean - whether thinking was usedWhen streaming, thinking chunks typically arrive before content chunks:
ThinkingAgent.stream(query: "Analyze this data...") do |chunk|
if chunk.thinking&.text
# Display thinking in a collapsible UI element
print "[Thinking] #{chunk.thinking.text}"
elsif chunk.content
# Stream the actual response
print chunk.content
end
endSet a default thinking configuration for all agents:
# config/initializers/ruby_llm_agents.rb
RubyLLM::Agents.configure do |config|
config.default_thinking = { effort: :medium }
# or
config.default_thinking = nil # disabled by default (recommended)
endRecommendation: Keep thinking disabled by default to avoid unexpected costs. Enable it per-agent as needed.
For Claude, the thinking signature enables continuity across conversation turns:
class ConversationAgent < ApplicationAgent
model "claude-opus-4-5-20250514"
thinking effort: :high
def messages
# Include thinking signature from previous turns
# for context continuity
conversation_history_with_signatures
end
endThinking tokens are billed by providers:
- Claude: Thinking tokens count toward output tokens
- Gemini: Budget affects cost
- OpenAI: Reasoning tokens are billed but hidden
Monitor costs via the dashboard or budget controls:
RubyLLM::Agents.configure do |config|
config.budgets = {
global_daily: 50.0,
per_agent_daily: { "ReasoningAgent" => 10.0 },
enforcement: :hard
}
end-
Use appropriate effort levels - Use
:highfor complex problems,:lowfor simple queries - Set token budgets - Prevent runaway costs with reasonable budget limits
-
Disable for simple tasks - Override with
thinking: falsefor trivial queries - Monitor usage - Track thinking token usage in the dashboard
- Cache when possible - Enable caching for deterministic thinking results
- Test with dry_run - Verify configuration without API calls
# Verify thinking configuration
result = MyAgent.call(query: "test", dry_run: true)
# Check the configuration in result.contentSee the complete example in your Rails app:
# app/agents/thinking_agent.rb
class ThinkingAgent < ApplicationAgent
description "Demonstrates extended thinking/reasoning support"
model "claude-opus-4-5-20250514"
temperature 0.0
thinking effort: :high, budget: 10000
system do
<<~S
You are a reasoning assistant that excels at step-by-step problem solving.
When given a problem:
1. Break it down into smaller steps
2. Work through each step carefully
3. Verify your work
4. Provide a clear final answer
S
end
param :query, required: true
user "{query}"
endIf you use the thinking DSL with a provider that doesn't support thinking, the configuration is silently ignored. This allows you to write agents that work across providers without conditional logic.
class FlexibleAgent < ApplicationAgent
model "gpt-4o" # Doesn't support visible thinking
thinking effort: :medium # Silently ignored
# Agent works normally without thinking
end