Thinking

Extended Thinking Support

Extended thinking (also known as reasoning or chain-of-thought) allows LLM models to show their reasoning process before providing a final answer. This is particularly useful for complex problem-solving, math, logic puzzles, and multi-step analysis tasks.

Supported Providers

Provider	Thinking Visible	Configuration
Claude (Anthropic/Bedrock)	Yes	`effort` + `budget`
Gemini 2.5+	Yes	`budget` (token count)
Gemini 3	Yes	`effort` levels
OpenAI (o1/o3)	No (hidden)	`effort` only
Perplexity	Yes	Streams `<think>` blocks
Mistral Magistral	Yes	Always on
Ollama Qwen3	Yes	Default on, `:none` to disable

Basic Usage

DSL Configuration

Configure thinking at the agent class level:

class ReasoningAgent < ApplicationAgent
  model "claude-opus-4-5-20250514"
  thinking effort: :high, budget: 10000

  system "You are a reasoning assistant. Show your work step by step."

  param :query, required: true
  user "{query}"
end

Calling the Agent

result = ReasoningAgent.call(query: "What is 127 * 43?")

# Access the reasoning/thinking content
puts "Thinking:"
puts result.thinking_text

puts "\nAnswer:"
puts result.content

# Check if thinking was used
if result.has_thinking?
  puts "Thinking tokens used: #{result.thinking_tokens}"
end

Configuration Options

Effort Levels

The effort option controls how much "thinking" the model does:

Level	Description
`:none`	Disable thinking
`:low`	Light reasoning, faster responses
`:medium`	Balanced reasoning
`:high`	Deep reasoning, best for complex problems

class QuickAgent < ApplicationAgent
  thinking effort: :low  # Fast, light reasoning
end

class DeepAgent < ApplicationAgent
  thinking effort: :high  # Thorough, detailed reasoning
end

Token Budget

The budget option caps the maximum tokens used for thinking:

class BudgetedAgent < ApplicationAgent
  thinking effort: :high, budget: 5000  # Max 5000 tokens for thinking
end

Note: Thinking tokens are billed by providers. Higher budgets increase costs.

Runtime Override

Override thinking configuration at call time:

# Override effort and budget
result = MyAgent.call(
  query: "Complex problem...",
  thinking: { effort: :high, budget: 15000 }
)

# Disable thinking for this call
result = MyAgent.call(
  query: "Simple question",
  thinking: false
)

# Use lower effort for speed
result = MyAgent.call(
  query: "Quick question",
  thinking: { effort: :low }
)

Result Object

The Result object includes thinking-related accessors:

result = ThinkingAgent.call(query: "Solve this problem")

# Thinking content
result.thinking_text       # String - the reasoning content
result.thinking_signature  # String - for multi-turn continuity (Claude)
result.thinking_tokens     # Integer - tokens used for thinking
result.has_thinking?       # Boolean - whether thinking was used

Streaming with Thinking

When streaming, thinking chunks typically arrive before content chunks:

ThinkingAgent.stream(query: "Analyze this data...") do |chunk|
  if chunk.thinking&.text
    # Display thinking in a collapsible UI element
    print "[Thinking] #{chunk.thinking.text}"
  elsif chunk.content
    # Stream the actual response
    print chunk.content
  end
end

Global Default

Set a default thinking configuration for all agents:

# config/initializers/ruby_llm_agents.rb
RubyLLM::Agents.configure do |config|
  config.default_thinking = { effort: :medium }
  # or
  config.default_thinking = nil  # disabled by default (recommended)
end

Recommendation: Keep thinking disabled by default to avoid unexpected costs. Enable it per-agent as needed.

Multi-turn Conversations

For Claude, the thinking signature enables continuity across conversation turns:

class ConversationAgent < ApplicationAgent
  model "claude-opus-4-5-20250514"
  thinking effort: :high

  def messages
    # Include thinking signature from previous turns
    # for context continuity
    conversation_history_with_signatures
  end
end

Cost Considerations

Thinking tokens are billed by providers:

Claude: Thinking tokens count toward output tokens
Gemini: Budget affects cost
OpenAI: Reasoning tokens are billed but hidden

Monitor costs via the dashboard or budget controls:

RubyLLM::Agents.configure do |config|
  config.budgets = {
    global_daily: 50.0,
    per_agent_daily: { "ReasoningAgent" => 10.0 },
    enforcement: :hard
  }
end

Best Practices

Use appropriate effort levels - Use :high for complex problems, :low for simple queries
Set token budgets - Prevent runaway costs with reasonable budget limits
Disable for simple tasks - Override with thinking: false for trivial queries
Monitor usage - Track thinking token usage in the dashboard
Cache when possible - Enable caching for deterministic thinking results
Test with dry_run - Verify configuration without API calls

# Verify thinking configuration
result = MyAgent.call(query: "test", dry_run: true)
# Check the configuration in result.content

Example Agent

See the complete example in your Rails app:

# app/agents/thinking_agent.rb
class ThinkingAgent < ApplicationAgent
  description "Demonstrates extended thinking/reasoning support"

  model "claude-opus-4-5-20250514"
  temperature 0.0
  thinking effort: :high, budget: 10000

  system do
    <<~S
      You are a reasoning assistant that excels at step-by-step problem solving.

      When given a problem:
      1. Break it down into smaller steps
      2. Work through each step carefully
      3. Verify your work
      4. Provide a clear final answer
    S
  end

  param :query, required: true
  user "{query}"
end

Providers Without Thinking Support

If you use the thinking DSL with a provider that doesn't support thinking, the configuration is silently ignored. This allows you to write agents that work across providers without conditional logic.

class FlexibleAgent < ApplicationAgent
  model "gpt-4o"  # Doesn't support visible thinking
  thinking effort: :medium  # Silently ignored

  # Agent works normally without thinking
end

Thinking

Extended Thinking Support

Supported Providers

Basic Usage

DSL Configuration

Calling the Agent

Configuration Options

Effort Levels

Token Budget

Runtime Override

Result Object

Streaming with Thinking

Global Default

Multi-turn Conversations

Cost Considerations

Best Practices

Example Agent

Providers Without Thinking Support

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally