Skip to content

fix(kernel): resolve "default" provider in fallback_models before driver init#710

Open
Reaster0 wants to merge 1 commit intoRightNow-AI:mainfrom
Reaster0:fix/fallback-default-provider-resolution
Open

fix(kernel): resolve "default" provider in fallback_models before driver init#710
Reaster0 wants to merge 1 commit intoRightNow-AI:mainfrom
Reaster0:fix/fallback-default-provider-resolution

Conversation

@Reaster0
Copy link

@Reaster0 Reaster0 commented Mar 18, 2026

Summary

  • Fallback model entries with provider = "default" were passed verbatim to create_driver(), which only recognises real provider names (ollama, openai, anthropic, …)
  • The primary model already resolves "default" → kernel config at spawn_agent(), but the fallback loop was missing the same resolution — causing every bundled agent with a "default" fallback to silently lose all fallback drivers:
    WARN Fallback driver 'default' failed to init: Unknown provider 'default'
    
  • This is particularly visible for users whose config.toml sets a non-standard default provider (e.g. ollama pointing at a local proxy), where the fallback is the only recovery path

Changes

  • Mirror the primary-model overlay logic for fallback entries: resolve provider, model, api_key_env, and base_url from config.default_model when the fallback specifies "default" or empty string
  • Inherit base_url from default_model before falling back to lookup_provider_url(), so custom endpoints propagate correctly
  • Use resolved values in strip_provider_prefix() and warn messages

Test plan

  • cargo build --workspace --lib compiles
  • cargo test --workspace — existing tests pass (no signature changes to public API)
  • Start daemon with [default_model] provider = "ollama" and an agent with [[fallback_models]] provider = "default" — verify the Unknown provider 'default' warning is gone
  • Kill the primary LLM endpoint mid-request — verify the fallback driver activates using the resolved provider

🤖 Generated with Claude Code

…ver init

The fallback model loop passed `provider = "default"` verbatim to
`create_driver()`, which only recognises real provider names (ollama,
openai, anthropic, …).  The primary model overlay at spawn_agent()
already resolves "default" → kernel config, but fallback_models was
skipped, causing every bundled agent with a "default" fallback to log:

    Fallback driver 'default' failed to init: Unknown provider 'default'

This meant agents had zero fallback drivers, silently degrading
resilience for anyone whose config.toml sets a non-standard default
provider (e.g. ollama pointing at a local proxy).

Changes:
- Mirror the primary-model overlay logic for fallback entries:
  resolve provider, model, api_key_env, and base_url from
  `config.default_model` when the fallback specifies "default" or empty.
- Inherit `base_url` from default_model before falling back to
  `lookup_provider_url()`, so custom endpoints propagate correctly.
- Use resolved values in `strip_provider_prefix()` and warn messages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Member

@jaberjaber23 jaberjaber23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two bugs in the implementation:

  1. base_url leaks across providers. The PR unconditionally inherits dm.base_url for all fallback providers via .or_else(|| dm.base_url.clone()). But the primary model logic explicitly guards this with if agent_provider == default_provider (with comment "Don't inherit default provider's base_url when switching providers"). If default is ollama with base_url = "http://localhost:11434", an openai fallback would incorrectly get ollama's URL. Must guard with fb_provider == dm.provider.

  2. Ignores hot-reload. The PR uses &self.config.default_model directly, but primary resolution reads self.default_model_override first (the hot-reload mechanism for dashboard model changes). Should use effective_default already computed at line 4614.

  3. Dead let _ = &fb_model_name; line — does nothing, fb_model_name is already used on the next line.

  4. Zero tests added for a kernel-level driver init change. Need at minimum: "default" provider resolves correctly, cross-provider base_url doesn't leak, empty strings resolve to default.

The fix direction is correct but needs these issues addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants