-
Notifications
You must be signed in to change notification settings - Fork 4
Model Routing
How the bot selects the right LLM model for each request through tier-based routing.
See also: Configuration, Quick Start, Deployment
The bot uses a 4-tier model selection strategy that picks the most appropriate model based on task complexity. The tier is determined from multiple sources with clear priority:
-
User preference - set via
/tiercommand orset_tiertool -
Skill override -
model_tierfield in skill YAML frontmatter -
Dynamic upgrade -
DynamicTierSystempromotes tocodingwhen code activity is detected mid-conversation -
Fallback -
"balanced"when no tier is explicitly set -
Per-user model override - set via
/modelcommand
Operationally, model setup now follows this flow:
- Configure provider profiles in
LLM Providers - Maintain capability metadata in
Model Catalog - Assign routing and tier slots in
Model Router
User Message
|
v
[ContextBuildingSystem] --- Resolves tier from user prefs / active skill
| Priority: force+user > skill > user pref > balanced
v
[DynamicTierSystem] --- May upgrade to coding if code activity is detected
| (only on iteration > 0, never downgrades)
v
[ModelSelectionService] --- Resolves actual model for the tier
| (user override > router config fallback)
v
[ToolLoopExecutionSystem] --- Selects model + reasoning level based on modelTier
v
LLM API Call
| Tier | Reasoning | Typical Use Cases | Default Model |
|---|---|---|---|
| balanced | medium |
Greetings, general questions, summarization | openai/gpt-5.1 |
| smart | high |
Complex analysis, architecture decisions, multi-step planning | openai/gpt-5.1 |
| coding | medium |
Code generation, debugging, refactoring, code review | openai/gpt-5.2 |
| deep | xhigh |
Deep scientific or highly technical reasoning | openai/gpt-5.2 |
Each tier is independently configurable.
{
"modelRouter": {
"routingModel": "openai/gpt-5.2-codex",
"routingModelReasoning": "none",
"balancedModel": "openai/gpt-5.1",
"balancedModelReasoning": "none",
"smartModel": "openai/gpt-5.1",
"smartModelReasoning": "none",
"codingModel": "openai/gpt-5.2",
"codingModelReasoning": "none",
"deepModel": "openai/gpt-5.2",
"deepModelReasoning": "none",
"dynamicTierEnabled": true,
"temperature": 0.7
}
}Reasoning models may ignore temperature. The effective reasoning and token limits are derived from models/models.json.
The tier is resolved in ContextBuildingSystem with this priority:
| Priority | Source | Condition |
|---|---|---|
| 1 | User preference + force |
tierForce=true and modelTier set |
| 2 | Skill model_tier
|
Active skill declares a preferred tier |
| 3 | User preference |
modelTier set without force |
| 4 | Fallback | balanced |
/tier
/tier coding
/tier smart force
Key behavior:
-
/tier <tier>clears force -
/tier <tier> forcelocks the tier - the setting persists in user preferences
The LLM can switch tiers mid-conversation with:
{
"tier": "coding"
}- blocked if the user locked the tier with force
- applies immediately for the current conversation
- does not persist to user preferences
Users can override the default model for any tier:
/model
/model list
/model <tier> <provider/model>
/model <tier> reasoning <level>
/model <tier> reset
Key behavior:
- overrides are stored per user
- default reasoning is auto-applied from
models.json - the model provider must be in
BOT_MODEL_SELECTION_ALLOWED_PROVIDERS -
/tierpicks the active tier,/modelcustomizes what each tier points to
Skills can declare:
model_tier: codingIf the user has not force-locked the tier, the skill tier takes precedence.
DynamicTierSystem can upgrade to coding when the current run shows code activity.
Signals include:
- code file reads and writes
- code-related shell commands
- stack traces in tool results
Rules:
- upgrades only, never downgrades
- skips if already
codingordeep - skips if the user force-locked the tier
You can mix different providers across tiers:
{
"llm": {
"providers": {
"openai": { "apiKey": "sk-proj-...", "apiType": "openai" },
"anthropic": { "apiKey": "sk-ant-...", "apiType": "anthropic" },
"google": { "apiKey": "AIza...", "apiType": "gemini" }
}
},
"modelRouter": {
"balancedModel": "openai/gpt-5.1",
"smartModel": "anthropic/claude-opus-4-6",
"codingModel": "openai/gpt-5.2"
}
}Provider config lookup is based on the model prefix. Protocol dispatch is controlled by llm.providers.<provider>.apiType.
Model capabilities are defined in models/models.json.
The dashboard now manages this through Model Catalog and can fetch live suggestions from provider APIs.
Each entry specifies:
| Field | Type | Description |
|---|---|---|
provider |
string | Provider profile key used by runtime config and model discovery |
displayName |
string | Human-readable label |
supportsTemperature |
boolean | Whether to send temperature
|
supportsVision |
boolean | Whether the model supports image inputs |
maxInputTokens |
integer | Context limit for non-reasoning models |
reasoning |
object | Reasoning config for reasoning-capable models |
Example:
{
"models": {
"openai/gpt-5.1": {
"provider": "openai",
"displayName": "GPT-5.1",
"supportsTemperature": false,
"supportsVision": true,
"reasoning": {
"default": "medium",
"levels": {
"low": { "maxInputTokens": 1000000 },
"medium": { "maxInputTokens": 1000000 },
"high": { "maxInputTokens": 500000 }
}
}
}
},
"defaults": {
"supportsTemperature": true,
"supportsVision": true,
"maxInputTokens": 128000
}
}ModelConfigService resolves a model in this order:
- exact match, for example
openai/gpt-5.1 - stripped provider prefix, for example
gpt-5.1 - prefix match, for example
gpt-5.1-preview - fallback to
defaults
Both plain ids and provider-scoped ids work, but provider-scoped ids are preferred when the same raw model id can appear under multiple providers.
The dashboard can seed catalog entries via:
GET /api/models/discover/{provider}
ProviderModelDiscoveryService supports:
- OpenAI-compatible
/models - Anthropic
/v1/models - Gemini
/v1beta/models
Only provider profiles with configured API keys can be discovered successfully.
{
"modelRouter": {
"routingModel": "openai/gpt-5.2-codex",
"balancedModel": "openai/gpt-5.1",
"smartModel": "openai/gpt-5.1",
"codingModel": "openai/gpt-5.2",
"deepModel": "openai/gpt-5.2",
"dynamicTierEnabled": true
}
}Dashboard mapping:
-
LLM Providerseditsllm.providers -
Model Catalogeditsmodels/models.json -
Model RoutereditsmodelRouter
The bot uses a layered defense against context overflow:
-
AutoCompactionSystemproactively compacts history - tool results are truncated before they explode the context
- emergency per-message truncation is applied on context overflow errors
This is why models.json token limits matter beyond just UI selection.
| Class | Purpose |
|---|---|
ContextBuildingSystem |
resolves tier and builds prompt context |
DynamicTierSystem |
upgrades to coding mid-run when needed |
ToolLoopExecutionSystem |
executes the tool loop and final model call logic |
AutoCompactionSystem |
prevents context overflow before the LLM call |
CommandRouter |
handles /tier and /model
|
Langchain4jAdapter |
provider protocol dispatch, tool id remapping, name sanitization |
ModelConfigService |
model capability lookups from models.json
|
ProviderModelDiscoveryService |
live provider API discovery for the Model Catalog |
ModelSelectionService |
per-user override resolution and provider filtering |
Typical logs:
[ContextBuilding] Resolved tier: coding
[LLM] Model tier: coding, selected model: openai/gpt-5.2
[DynamicTier] Detected coding activity, upgrading tier: balanced -> coding
Useful commands:
/status/tier/model
GolemCore Bot -- Apache License 2.0 | GitHub | Issues | Discussions
Getting Started
Core Concepts
Features
Reference
Development