Skip to content

Model Routing

Alex Kuleshov edited this page Mar 9, 2026 · 7 revisions

Model Routing

How the bot selects the right LLM model for each request through tier-based routing.

See also: Configuration, Quick Start, Deployment


Overview

The bot uses a 4-tier model selection strategy that picks the most appropriate model based on task complexity. The tier is determined from multiple sources with clear priority:

  1. User preference - set via /tier command or set_tier tool
  2. Skill override - model_tier field in skill YAML frontmatter
  3. Dynamic upgrade - DynamicTierSystem promotes to coding when code activity is detected mid-conversation
  4. Fallback - "balanced" when no tier is explicitly set
  5. Per-user model override - set via /model command

Operationally, model setup now follows this flow:

  1. Configure provider profiles in LLM Providers
  2. Maintain capability metadata in Model Catalog
  3. Assign routing and tier slots in Model Router
User Message
    |
    v
[ContextBuildingSystem]  --- Resolves tier from user prefs / active skill
    |                        Priority: force+user > skill > user pref > balanced
    v
[DynamicTierSystem]      --- May upgrade to coding if code activity is detected
    |                        (only on iteration > 0, never downgrades)
    v
[ModelSelectionService]  --- Resolves actual model for the tier
    |                        (user override > router config fallback)
    v
[ToolLoopExecutionSystem] --- Selects model + reasoning level based on modelTier
    v
LLM API Call

Model Tiers

Tier Reasoning Typical Use Cases Default Model
balanced medium Greetings, general questions, summarization openai/gpt-5.1
smart high Complex analysis, architecture decisions, multi-step planning openai/gpt-5.1
coding medium Code generation, debugging, refactoring, code review openai/gpt-5.2
deep xhigh Deep scientific or highly technical reasoning openai/gpt-5.2

Each tier is independently configurable.

Configuration

{
  "modelRouter": {
    "routingModel": "openai/gpt-5.2-codex",
    "routingModelReasoning": "none",
    "balancedModel": "openai/gpt-5.1",
    "balancedModelReasoning": "none",
    "smartModel": "openai/gpt-5.1",
    "smartModelReasoning": "none",
    "codingModel": "openai/gpt-5.2",
    "codingModelReasoning": "none",
    "deepModel": "openai/gpt-5.2",
    "deepModelReasoning": "none",
    "dynamicTierEnabled": true,
    "temperature": 0.7
  }
}

Reasoning models may ignore temperature. The effective reasoning and token limits are derived from models/models.json.


How Tier Assignment Works

Tier Priority

The tier is resolved in ContextBuildingSystem with this priority:

Priority Source Condition
1 User preference + force tierForce=true and modelTier set
2 Skill model_tier Active skill declares a preferred tier
3 User preference modelTier set without force
4 Fallback balanced

/tier

/tier
/tier coding
/tier smart force

Key behavior:

  • /tier <tier> clears force
  • /tier <tier> force locks the tier
  • the setting persists in user preferences

set_tier tool

The LLM can switch tiers mid-conversation with:

{
  "tier": "coding"
}
  • blocked if the user locked the tier with force
  • applies immediately for the current conversation
  • does not persist to user preferences

/model

Users can override the default model for any tier:

/model
/model list
/model <tier> <provider/model>
/model <tier> reasoning <level>
/model <tier> reset

Key behavior:

  • overrides are stored per user
  • default reasoning is auto-applied from models.json
  • the model provider must be in BOT_MODEL_SELECTION_ALLOWED_PROVIDERS
  • /tier picks the active tier, /model customizes what each tier points to

Skill override

Skills can declare:

model_tier: coding

If the user has not force-locked the tier, the skill tier takes precedence.

Dynamic Tier Upgrade

DynamicTierSystem can upgrade to coding when the current run shows code activity.

Signals include:

  • code file reads and writes
  • code-related shell commands
  • stack traces in tool results

Rules:

  • upgrades only, never downgrades
  • skips if already coding or deep
  • skips if the user force-locked the tier

Multi-Provider Setup

You can mix different providers across tiers:

{
  "llm": {
    "providers": {
      "openai": { "apiKey": "sk-proj-...", "apiType": "openai" },
      "anthropic": { "apiKey": "sk-ant-...", "apiType": "anthropic" },
      "google": { "apiKey": "AIza...", "apiType": "gemini" }
    }
  },
  "modelRouter": {
    "balancedModel": "openai/gpt-5.1",
    "smartModel": "anthropic/claude-opus-4-6",
    "codingModel": "openai/gpt-5.2"
  }
}

Provider config lookup is based on the model prefix. Protocol dispatch is controlled by llm.providers.<provider>.apiType.


models.json Reference

Model capabilities are defined in models/models.json.

The dashboard now manages this through Model Catalog and can fetch live suggestions from provider APIs.

Each entry specifies:

Field Type Description
provider string Provider profile key used by runtime config and model discovery
displayName string Human-readable label
supportsTemperature boolean Whether to send temperature
supportsVision boolean Whether the model supports image inputs
maxInputTokens integer Context limit for non-reasoning models
reasoning object Reasoning config for reasoning-capable models

Example:

{
  "models": {
    "openai/gpt-5.1": {
      "provider": "openai",
      "displayName": "GPT-5.1",
      "supportsTemperature": false,
      "supportsVision": true,
      "reasoning": {
        "default": "medium",
        "levels": {
          "low": { "maxInputTokens": 1000000 },
          "medium": { "maxInputTokens": 1000000 },
          "high": { "maxInputTokens": 500000 }
        }
      }
    }
  },
  "defaults": {
    "supportsTemperature": true,
    "supportsVision": true,
    "maxInputTokens": 128000
  }
}

Resolution rules

ModelConfigService resolves a model in this order:

  1. exact match, for example openai/gpt-5.1
  2. stripped provider prefix, for example gpt-5.1
  3. prefix match, for example gpt-5.1-preview
  4. fallback to defaults

Both plain ids and provider-scoped ids work, but provider-scoped ids are preferred when the same raw model id can appear under multiple providers.

Live Discovery

The dashboard can seed catalog entries via:

  • GET /api/models/discover/{provider}

ProviderModelDiscoveryService supports:

  • OpenAI-compatible /models
  • Anthropic /v1/models
  • Gemini /v1beta/models

Only provider profiles with configured API keys can be discovered successfully.


Routing Configuration

{
  "modelRouter": {
    "routingModel": "openai/gpt-5.2-codex",
    "balancedModel": "openai/gpt-5.1",
    "smartModel": "openai/gpt-5.1",
    "codingModel": "openai/gpt-5.2",
    "deepModel": "openai/gpt-5.2",
    "dynamicTierEnabled": true
  }
}

Dashboard mapping:

  • LLM Providers edits llm.providers
  • Model Catalog edits models/models.json
  • Model Router edits modelRouter

Large Input Protection

The bot uses a layered defense against context overflow:

  1. AutoCompactionSystem proactively compacts history
  2. tool results are truncated before they explode the context
  3. emergency per-message truncation is applied on context overflow errors

This is why models.json token limits matter beyond just UI selection.


Key Classes

Class Purpose
ContextBuildingSystem resolves tier and builds prompt context
DynamicTierSystem upgrades to coding mid-run when needed
ToolLoopExecutionSystem executes the tool loop and final model call logic
AutoCompactionSystem prevents context overflow before the LLM call
CommandRouter handles /tier and /model
Langchain4jAdapter provider protocol dispatch, tool id remapping, name sanitization
ModelConfigService model capability lookups from models.json
ProviderModelDiscoveryService live provider API discovery for the Model Catalog
ModelSelectionService per-user override resolution and provider filtering

Debugging

Typical logs:

[ContextBuilding] Resolved tier: coding
[LLM] Model tier: coding, selected model: openai/gpt-5.2
[DynamicTier] Detected coding activity, upgrading tier: balanced -> coding

Useful commands:

  • /status
  • /tier
  • /model

Clone this wiki locally