Skip to content

Latest commit

 

History

History
722 lines (537 loc) · 26 KB

File metadata and controls

722 lines (537 loc) · 26 KB

Model Routing Guide

How the bot selects the right LLM model for each request through tier-based routing.

See also: Configuration Guide for runtime config fields, Quick Start for setup, Deployment Guide for production configuration.


Overview

The bot uses a 4-tier model selection strategy that picks the most appropriate model based on task complexity. The tier is determined from multiple sources with clear priority:

  1. User preference — set via /tier command or set_tier tool
  2. Skill overridemodel_tier field in skill YAML frontmatter
  3. Dynamic upgradeDynamicTierSystem promotes to coding when code activity is detected mid-conversation
  4. Fallback"balanced" when no tier is explicitly set
  5. Per-user model override — set via /model command

Operationally, model setup now follows this flow:

  1. Configure provider profiles in LLM Providers
  2. Maintain capability metadata in Model Catalog
  3. Assign routing/tier slots in Model Router
User Message
    |
    v
[ContextBuildingSystem]  --- Resolves tier from user prefs / active skill
    |                        Priority: force+user > skill > user pref > balanced
    v
[DynamicTierSystem]      --- May upgrade to "coding" if code activity detected
    |                        (only on iteration > 0, never downgrades)
    v
[ModelSelectionService]  --- Resolves actual model for the tier
    |                        (user override > router config fallback)
    v
[ToolLoopExecutionSystem] --- Selects model + reasoning level based on modelTier
    |                         (via DefaultToolLoopSystem internal loop)
    v
LLM API Call

Model Tiers

Four tiers map task complexity to model capabilities:

Tier Reasoning Typical Use Cases Default Model
balanced medium Greetings, general questions, summarization (default/fallback) openai/gpt-5.1
smart high Complex analysis, architecture decisions, multi-step planning openai/gpt-5.1
coding medium Code generation, debugging, refactoring, code review openai/gpt-5.2
deep xhigh PhD-level reasoning: proofs, scientific analysis, deep calculations openai/gpt-5.2

Each tier is independently configurable — you can assign any model from any supported provider to any tier. See Multi-Provider Setup below.

Configuration

Configure tier models in preferences/runtime-config.json under modelRouter:

{
  "modelRouter": {
    "routingModel": "openai/gpt-5.2-codex",
    "routingModelReasoning": "none",
    "balancedModel": "openai/gpt-5.1",
    "balancedModelReasoning": "none",
    "smartModel": "openai/gpt-5.1",
    "smartModelReasoning": "none",
    "codingModel": "openai/gpt-5.2",
    "codingModelReasoning": "none",
    "deepModel": "openai/gpt-5.2",
    "deepModelReasoning": "none",
    "dynamicTierEnabled": true,
    "temperature": 0.7
  }
}

Note: Reasoning models may ignore the temperature parameter. The presence of a reasoning object and the supportsTemperature flag in models/models.json (workspace) control this behavior. See models.json Reference.


How Tier Assignment Works

Tier Priority

The tier is resolved in ContextBuildingSystem (order=20) on iteration 0 with the following priority:

Priority Source Condition
1 (highest) User preference + force tierForce=true and modelTier set in user preferences
2 Skill model_tier Active skill has model_tier in YAML frontmatter
3 User preference modelTier set in user preferences (without force)
4 (lowest) Fallback "balanced" — when no tier is explicitly set

Force mode locks the tier, preventing both skill overrides and DynamicTierSystem upgrades. This is useful when you want a specific model regardless of context.

Setting Tier via /tier Command

/tier                    # Show current tier and force status
/tier coding             # Set tier to coding, clears force
/tier smart force        # Lock tier to smart (ignores skill overrides + dynamic upgrades)

Key behavior:

  • /tier <tier> always clears the force flag (even if it was previously on)
  • /tier <tier> force sets both the tier and locks it
  • The setting persists across conversations (stored in user preferences)

Setting Tier via set_tier Tool

The LLM can switch tiers mid-conversation using the set_tier tool:

{
  "tier": "coding"
}
  • If the user has tierForce=true, the tool returns an error: "Tier is locked by user"
  • Otherwise, the tier is applied immediately for the current conversation
  • The tool does NOT persist the change to user preferences (session-only)

Per-User Model Override (/model command)

Users can override the default model for any tier using the /model command:

/model                           # Show current model+reasoning for all 4 tiers
/model list                      # List available models (filtered by allowed providers)
/model <tier> <provider/model>   # Set model override for a tier
/model <tier> reasoning <level>  # Set reasoning level for the current model on a tier
/model <tier> reset              # Remove override, revert to application.properties default

Examples:

/model coding openai/gpt-5.2         # Set coding tier to GPT-5.2
/model coding reasoning high          # Set reasoning level to high
/model smart anthropic/claude-sonnet-4-20250514  # Use Claude for smart tier
/model coding reset                   # Revert coding tier to default

Key behavior:

  • Overrides are stored in UserPreferences.tierOverrides and persist across conversations
  • When a model is set, the default reasoning level from models.json is auto-applied
  • The model's provider must be in the allowed providers list (configured via BOT_MODEL_SELECTION_ALLOWED_PROVIDERS)
  • Tier selection (/tier) and model selection (/model) work independently — /tier selects which tier is active, /model customizes what model each tier uses
  • ModelSelectionService centralizes resolution: user override > router config fallback

Allowed providers:

BOT_MODEL_SELECTION_ALLOWED_PROVIDERS=openai,anthropic   # default

Only models from these providers will appear in /model list and be accepted in /model <tier> <model>.

Skill model_tier Override

Skills can declare a preferred model tier in their YAML frontmatter:

---
name: code-review
description: Review code changes
model_tier: coding
---

When a skill with model_tier is active:

  • If the user has tierForce=true, the skill's tier is ignored
  • Otherwise, the skill's tier takes precedence over the user's default preference
  • A system prompt instruction informs the LLM about the tier switch

Dynamic Tier Upgrade (Iteration > 0)

DynamicTierSystem (order=25) runs on subsequent iterations of the agent loop (after tool calls). It scans only messages from the current loop run (after the last user message) to detect coding activity.

Key constraint: It never scans old conversation history, only the current run's assistant messages and tool results. This prevents false positives from past coding discussions.

Upgrade signals:

Signal Type Detection Logic
File operations on code files filesystem / file_system tool calls with write_file or read_file on files ending in .py, .js, .ts, .java, .go, .rs, .rb, .sh, .c, .cpp, .cs, .kt, .scala, .swift, .lua, .r, .pl, .php, .sql, .yaml, .yml, .toml, .gradle, .cmake, .makefile, plus Makefile and Dockerfile
Code-related shell commands shell tool calls starting with python, node, npm, npx, pip, mvn, gradle, gcc, g++, cargo, go, rustc, pytest, make, cmake, javac, dotnet, ruby, tsc, webpack, esbuild, jest, mocha, yarn
Stack traces in tool results Tool result messages containing Traceback, SyntaxError, TypeError, NullPointerException, at com., at org., panic:, error[E, etc.

Rules:

  • Only upgrades to coding — never downgrades (prevents oscillation)
  • Skips if current tier is already coding or deep
  • Skips if user has tierForce=true
  • Only runs when bot.router.dynamic-tier-enabled=true (default)

Source: DynamicTierSystem.java


Model Selection in ToolLoopExecutionSystem

ToolLoopExecutionSystem (order=30) delegates to DefaultToolLoopSystem, which uses ModelSelectionService to translate the modelTier string into an actual model name and reasoning effort:

ModelSelectionService.resolveForTier(tier)
  1. Check UserPreferences.tierOverrides for the tier
  2. If override exists → use override model + reasoning
  3. Otherwise → use runtime config defaults (`modelRouter.*Model`)
  4. For reasoning models → auto-fill default reasoning from `models/models.json`

The selected model and reasoning effort are passed to the LLM adapter via LlmRequest.


Multi-Provider Setup

You can mix different LLM providers across tiers for cost optimization or capability access:

Configure provider API keys in preferences/runtime-config.json:

{
  "llm": {
    "providers": {
      "openai": { "apiKey": "sk-proj-...", "apiType": "openai" },
      "anthropic": { "apiKey": "sk-ant-...", "apiType": "anthropic" },
      "google": { "apiKey": "AIza...", "apiType": "gemini" }
    }
  },
  "modelRouter": {
    "balancedModel": "openai/gpt-5.1",
    "balancedModelReasoning": "medium",
    "smartModel": "anthropic/claude-opus-4-6",
    "smartModelReasoning": "high",
    "codingModel": "openai/gpt-5.2",
    "codingModelReasoning": "medium"
  }
}

The Langchain4jAdapter creates per-request model instances when the requested model differs from the default.

  • Provider config lookup is still based on the model prefix (provider/model).
  • Protocol dispatch is controlled by llm.providers.<provider>.apiType (openai, anthropic, gemini).

This decouples provider naming from wire protocol, so custom provider keys can still use the correct adapter.

See: Configuration Guide for runtime config details.


models.json Reference

Model capabilities are defined in the workspace at models/models.json.

On first run, the bot copies a bundled models.json into the workspace so edits can persist. The dashboard now manages this file through the Model Catalog section and can fetch live suggestions from provider APIs.

Each entry specifies:

Field Type Description
provider string Provider profile key used by runtime config and model discovery
displayName string Human-readable name for UI display (e.g., in /model list)
supportsTemperature boolean Whether to send the temperature parameter (reasoning models typically don't support it)
supportsVision boolean Whether the model supports image/vision inputs
maxInputTokens integer Maximum input tokens for non-reasoning models (used for truncation)
reasoning object Reasoning configuration (null/absent for non-reasoning models)
reasoning.default string Default reasoning level (e.g., "medium")
reasoning.levels object Map of level name to { "maxInputTokens": N }

Example entry:

{
  "models": {
    "openai/gpt-5.1": {
      "provider": "openai",
      "displayName": "GPT-5.1",
      "supportsTemperature": false,
      "supportsVision": true,
      "reasoning": {
        "default": "medium",
        "levels": {
          "low":    { "maxInputTokens": 1000000 },
          "medium": { "maxInputTokens": 1000000 },
          "high":   { "maxInputTokens": 500000 },
          "xhigh":  { "maxInputTokens": 250000 }
        }
      }
    },
    "openai/gpt-4o": {
      "provider": "openai",
      "displayName": "GPT-4o",
      "supportsTemperature": true,
      "supportsVision": true,
      "maxInputTokens": 128000
    },
    "anthropic/claude-sonnet-4-20250514": {
      "provider": "anthropic",
      "displayName": "Claude Sonnet 4",
      "supportsTemperature": true,
      "supportsVision": true,
      "maxInputTokens": 200000
    }
  },
  "defaults": {
    "supportsTemperature": true,
    "supportsVision": true,
    "maxInputTokens": 128000
  }
}

Note: The reasoningRequired field has been replaced by the presence of a reasoning object. Models with reasoning have per-level context limits inside reasoning.levels. Models without reasoning use the flat maxInputTokens field.

Model name resolution in ModelConfigService:

  1. Exact match (e.g., openai/gpt-5.1)
  2. Strip provider prefix (e.g., openai/gpt-5.1 becomes gpt-5.1)
  3. Prefix match (e.g., gpt-5.1-preview matches gpt-5.1)
  4. Fall back to defaults section

This means both plain ids and provider-scoped ids work, but provider-scoped ids are preferred when the same raw model id can appear under multiple providers.

Live Discovery

The dashboard can seed catalog entries by calling:

  • GET /api/models/discover/{provider}

ProviderModelDiscoveryService uses the configured provider profile and supports:

  • OpenAI-compatible /models
  • Anthropic /v1/models
  • Gemini /v1beta/models

Only provider profiles with configured API keys can be discovered successfully.

The maxInputTokens value is used by:

  • AutoCompactionSystem — triggers compaction at 80% of context window (uses per-level limit when available)
  • ToolLoopExecutionSystem — emergency truncation limits each message to 25% of context window (minimum 10K characters)
  • ModelSelectionService — resolves max tokens per tier for compaction threshold

See: Configuration Guide for workspace paths and model config notes.


Routing Configuration

Model routing is primarily configured by choosing models for each tier and enabling optional dynamic upgrades.

Edit preferences/runtime-config.json:

{
  "modelRouter": {
    "routingModel": "openai/gpt-5.2-codex",
    "balancedModel": "openai/gpt-5.1",
    "smartModel": "openai/gpt-5.1",
    "codingModel": "openai/gpt-5.2",
    "deepModel": "openai/gpt-5.2",
    "dynamicTierEnabled": true
  }
}

Dashboard mapping:

  • LLM Providers edits llm.providers
  • Model Catalog edits models/models.json
  • Model Router edits modelRouter

Tool Call ID Remapping

LLM providers have strict requirements for tool call identifiers and function names. When models switch mid-conversation (e.g., from a non-OpenAI provider to OpenAI due to tier change), stored tool call IDs and names from the previous provider may be incompatible with the new one. Langchain4jAdapter handles this transparently.

Source: Langchain4jAdapter.java

Function Name Sanitization

OpenAI requires function names to match ^[a-zA-Z0-9_-]+$. Non-OpenAI providers (e.g., DeepInfra) may return names containing dots or other characters that get stored in conversation history.

Original name:     "com.example.search.tool"
Sanitized name:    "com_example_search_tool"

sanitizeFunctionName() replaces any character outside [a-zA-Z0-9_-] with _. This is applied to both:

  • Assistant messages: tool call names in toolExecutionRequests
  • Tool result messages: the toolName field

If the name is null, it defaults to "unknown".

Tool Call ID Remapping

Provider-generated tool call IDs can violate two constraints:

  • Length: IDs exceeding 40 characters (the MAX_TOOL_CALL_ID_LENGTH constant)
  • Characters: IDs containing characters outside [a-zA-Z0-9_-] (e.g., dots from non-OpenAI providers)

When either condition is detected, the adapter builds a consistent ID remap table before converting any messages:

Original ID:       "chatcmpl-abc123.tool.call.very-long-identifier-from-provider"
Remapped ID:       "call_a1b2c3d4e5f6a1b2c3d4e5f6"  (UUID-based, 29 chars)

The remap is computed in a first pass over all messages, then applied consistently to both:

  • Assistant messages: toolCalls[].id field
  • Tool result messages: toolCallId field

This ensures the assistant's tool call IDs always match the corresponding tool result IDs, even after remapping.

Why this matters: Without remapping, switching from a model that generated long/non-standard IDs to OpenAI would cause 400 Bad Request errors because the tool result IDs wouldn't match what OpenAI expects.

Conversion Flow

LlmRequest.messages
    |
    v
[Pass 1: Build ID remap table]
    |  Scan all messages for tool calls with:
    |  - ID length > 40 chars, or
    |  - ID contains chars outside [a-zA-Z0-9_-]
    |  Generate: originalId -> "call_" + UUID(24 chars)
    v
[Pass 2: Convert messages]
    |  For each message:
    |  - assistant + toolCalls: remap IDs, sanitize names
    |  - tool results: remap toolCallId, sanitize toolName
    |  - user/system: pass through
    v
List<ChatMessage> (langchain4j format)

Large Input Truncation

The bot employs a 3-layer defense to prevent context window overflow. Each layer operates at a different stage of the pipeline and catches progressively more severe cases.

Layer 1: Auto-Compaction (Preventive)

AutoCompactionSystem (order=18) runs before the LLM call to proactively shrink the conversation history.

Source: AutoCompactionSystem.java

Token estimation:

estimatedTokens = sum(message.content.length) / charsPerToken + systemPromptOverheadTokens

Where charsPerToken defaults to 3.5 and systemPromptOverheadTokens defaults to 8000.

Threshold resolution:

  1. Look up the current model's maxInputTokens from models/models.json (via ModelConfigService)
  2. Apply 80% safety margin: modelMax * 0.8
  3. Cap by runtime config: min(modelThreshold, compaction.maxContextTokens)
  4. If model lookup fails, fall back to compaction.maxContextTokens

Compaction strategy:

  • Summarize old messages via LLM (balanced model, low reasoning) using CompactionService
  • Replace old messages with a [Conversation summary] system message + last N messages (default N=10)
  • If LLM unavailable, fall back to simple truncation (drop oldest, keep last N)

Configuration: Edit preferences/runtime-config.json:

{
  "compaction": {
    "enabled": true,
    "maxContextTokens": 50000,
    "keepLastMessages": 20
  }
}

See: Configuration Guide — Compaction

Layer 2: Tool Result Truncation (Per-Result)

DefaultToolLoopSystem truncates individual tool results that exceed maxToolResultChars before they are added to conversation history.

Source: DefaultToolLoopSystem.java

When a tool result's content exceeds the limit (default 100,000 characters):

[first maxChars characters of content...]

[OUTPUT TRUNCATED: 500000 chars total, showing first 100000 chars.
The full result is too large for the context window.
Try a more specific query, use filtering/pagination,
or process the data in smaller chunks.]

The suffix length is subtracted from the cut point so the final output stays within the limit. This hint enables the LLM to self-correct by retrying with a more specific query.

Configuration: Tool result truncation is controlled by the Spring property bot.auto-compact.max-tool-result-chars.

Default: 100000.

Layer 3: Emergency Truncation (Error Recovery)

ToolLoopExecutionSystem (order=30) catches context overflow errors from the LLM provider and applies emergency per-message truncation as a last resort.

Source: DefaultToolLoopSystem.java

Error detection — matches these patterns in the exception message chain:

  • exceeds maximum input length
  • context_length_exceeded
  • maximum context length
  • too many tokens
  • request too large

Per-message limit calculation:

maxInputTokens = ModelConfigService.getMaxInputTokens(selectedModel)
maxMessageChars = maxInputTokens * 3.5 * 0.25    // 25% of context window per message
maxMessageChars = max(maxMessageChars, 10000)      // floor: 10K chars

For example, with gpt-5.1 (1,000,000 input tokens):

maxMessageChars = 1,000,000 * 3.5 * 0.25 = 875,000 chars per message

With gpt-4o (128,000 input tokens):

maxMessageChars = 128,000 * 3.5 * 0.25 = 112,000 chars per message

Truncation format:

[first cutPoint characters of message...]

[EMERGENCY TRUNCATED: 500000 chars total. Try a more specific query to get smaller results.]

Recovery flow:

  1. Catch context_length_exceeded from LLM provider
  2. Scan all messages, truncate any exceeding per-message limit
  3. Also truncate in session history (so truncation persists across requests)
  4. Rebuild LlmRequest and retry once
  5. If retry also fails, set llm.error for ResponseRoutingSystem

Summary: Defense Layers

                     Conversation History
                            |
                            v
Layer 1: AutoCompactionSystem (order=18)
         Preventive. Estimates tokens, compacts if > 80% of model max.
         Strategy: LLM summary + keep last N messages.
                            |
                            v
              ToolLoopExecutionSystem (order=30)
              ┌─────────────────────────────────┐
              │  LLM call → tool execution loop │
              │                                 │
              │  Layer 2: Tool Result Truncation │
              │  Per-result. Truncates > 100K.  │
              │  Hint: "try a more specific     │
              │  query"                         │
              │                                 │
              │  Layer 3: Emergency Truncation   │
              │  On context_length_exceeded:    │
              │  per-message limit = 25% of     │
              │  context window. Truncates +    │
              │  retries once. Persists in      │
              │  session history.               │
              └─────────────────────────────────┘

Architecture: Key Classes

Class Package Order Purpose
ContextBuildingSystem domain.system 20 Resolves tier from user prefs / skill, builds system prompt
DynamicTierSystem domain.system 25 Mid-conversation upgrade to coding tier
ToolLoopExecutionSystem domain.system 30 LLM calls, tool execution loop, plan intercept, model selection, emergency truncation
AutoCompactionSystem domain.system 18 Preventive context compaction before LLM call
TierTool tools LLM tool for switching tier mid-conversation
CommandRouter adapter.inbound.command /tier and /model command handlers
Langchain4jAdapter adapter.outbound.llm OpenAI/Anthropic/Gemini protocol dispatch, ID remapping, name sanitization
ModelConfigService infrastructure.config Model capability lookups from models.json
ProviderModelDiscoveryService domain.service Live provider API discovery for the Model Catalog
ModelSelectionService domain.service Per-user model override resolution, provider filtering
AgentContext domain.model Runtime state: holds modelTier, activeSkill

Debugging Model Routing

Log messages

The runtime produces detailed tier-resolution logs at INFO level:

[ContextBuilding] Resolved tier: coding (source: skill 'code-review')
[LLM] Model tier: coding, selected model: openai/gpt-5.2, reasoning: medium

On subsequent iterations with dynamic upgrade:

[DynamicTier] Detected coding activity, upgrading tier: balanced -> coding
[LLM] Model tier: coding, selected model: openai/gpt-5.2, reasoning: medium

User-initiated tier changes:

[TierTool] Tier changed to: smart
[LLM] Model tier: smart, selected model: openai/gpt-5.1, reasoning: high

The /status command

Use /status in Telegram to check active configuration, including current model tier. For server-side flags and health, see Configuration Guide — Diagnostics.

The /tier command

Use /tier to check or change the current tier:

/tier              # Show current: "Tier: balanced, Force: off"
/tier coding       # Switch to coding tier
/tier smart force  # Lock to smart tier (ignores skill overrides + dynamic upgrades)

Examples

Greeting (balanced tier — default)

User: "Hi, how are you?"

ContextBuildingSystem:
  No user tier preference, no active skill
  Tier: null → balanced (fallback)

ToolLoopExecutionSystem:
  Tier: balanced → openai/gpt-5.1 (reasoning: medium)

Code generation (coding tier from skill)

Skill "code-review" has model_tier: coding

User: "Review this PR"

ContextBuildingSystem:
  Active skill: code-review (model_tier: coding)
  No user force → use skill tier
  Tier: coding

ToolLoopExecutionSystem:
  Tier: coding → openai/gpt-5.2 (reasoning: medium)

User-forced tier overrides skill

User ran: /tier smart force

Skill "code-review" has model_tier: coding

User: "Review this PR"

ContextBuildingSystem:
  User preference: smart (force=true)
  → Skill's coding tier is ignored
  Tier: smart

ToolLoopExecutionSystem:
  Tier: smart → openai/gpt-5.1 (reasoning: high)

Dynamic upgrade (balanced -> coding)

User: "Help me with this project"

ContextBuildingSystem:
  No user tier, no skill tier → balanced

ToolLoopExecutionSystem (iteration 0):
  Tier: balanced → openai/gpt-5.1 (reasoning: medium)

--- LLM calls filesystem.write_file("app.py", ...) ---

DynamicTierSystem (iteration 1):
  Detected: filesystem write on .py file → upgrade to "coding"

ToolLoopExecutionSystem (iteration 1):
  Tier: coding → openai/gpt-5.2 (reasoning: medium)

LLM switches tier via tool

User: "This needs deep analysis"

LLM calls set_tier({"tier": "deep"})
  → context.modelTier = "deep"

ToolLoopExecutionSystem (next iteration):
  Tier: deep → openai/gpt-5.2 (reasoning: xhigh)