Skip to content

feat: OpenAI-compatible /v1 client and provider routing#218

Merged
quiet-node merged 9 commits into
mainfrom
feat/v1-client-and-routing
Jun 15, 2026
Merged

feat: OpenAI-compatible /v1 client and provider routing#218
quiet-node merged 9 commits into
mainfrom
feat/v1-client-and-routing

Conversation

@quiet-node

@quiet-node quiet-node commented Jun 12, 2026

Copy link
Copy Markdown
Owner

Overview

This PR makes Thuki's bundled inference engine reachable and adds support for OpenAI-compatible servers: a generic OpenAI-compatible /v1 chat client, a new openai provider kind, and routing of every LLM consumer (chat, search, title generation, warmup, capabilities) by the active provider's kind. Stacked on the engine-runner PR (#217). Fresh installs still default to Ollama at this point in the stack; the default flips in #219.

What changed

  • openai provider kind: [[inference.providers]] entries with kind = "openai" (label, base_url, model, and a manual vision flag). API keys are stored write-only in the macOS Keychain (keychain.rs), never in config.toml and never returned over IPC.
  • Legacy-config pin: configs written before the providers list existed are pinned to Ollama on load, regardless of what the default is or becomes. Existing users keep their working setup; only fresh installs follow the compiled default.
  • /v1 SSE client (openai.rs): streaming chat with content parts, bounded SSE line buffering, cancellation, and typed error classification (EngineUnreachable, EngineStartFailed, model-not-found and auth variants).
  • Routing by kind (commands.rs): ask_model dispatches builtin chat through the engine runner plus /v1, Ollama through native /api/chat, and openai through /v1 against the provider's base_url. Search and title generation run through a provider-kind LlmTransport; warmup, eviction, the VRAM poller, and capability detection all branch by kind, with a /props probe gating vision for builtin models.

How it works

The active provider is resolved from managed config per request; builtin requests first await ensure_loaded on the engine runner, then stream from the local server like any other /v1 endpoint. Capabilities are cached per (provider_id, model). All three kinds emit the same StreamChunk channel contract, so the frontend streaming path is provider-agnostic.

Testing

Both gates exit 0 at the branch head. The /v1 client is covered against mock servers (SSE framing, oversized lines, missing done markers, cancellation, auth and 404 classification); routing, warmup, and capability branching are covered per kind in backend unit tests.

Base automatically changed from feat/engine-runner-and-model-library to main June 13, 2026 23:40
Signed-off-by: Logan Nguyen <lg.131.dev@gmail.com>
Signed-off-by: Logan Nguyen <lg.131.dev@gmail.com>
Signed-off-by: Logan Nguyen <lg.131.dev@gmail.com>
Signed-off-by: Logan Nguyen <lg.131.dev@gmail.com>
Signed-off-by: Logan Nguyen <lg.131.dev@gmail.com>
Signed-off-by: Logan Nguyen <lg.131.dev@gmail.com>
…port

Signed-off-by: Logan Nguyen <lg.131.dev@gmail.com>
Signed-off-by: Logan Nguyen <lg.131.dev@gmail.com>
…mirror

Signed-off-by: Logan Nguyen <lg.131.dev@gmail.com>
@quiet-node quiet-node force-pushed the feat/v1-client-and-routing branch from f8d5db6 to a6eaf2d Compare June 13, 2026 23:40
@quiet-node quiet-node merged commit 25fe634 into main Jun 15, 2026
3 checks passed
@quiet-node quiet-node deleted the feat/v1-client-and-routing branch June 15, 2026 22:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant