You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Klonode is supposed to be model-agnostic at the routing layer — the whole point of CONTEXT.md generation is that any capable LLM should be able to consume it. Today, though, the Workstation chat panel can only talk to one specific CLI backend, which means anyone without a subscription to that provider has no way to actually use Klonode interactively. Gemma (Google's open-weight family — Gemma 2 and Gemma 3) is the most obvious gap to close first: it runs locally via Ollama, it's free, it respects privacy, it works offline, and it has a huge community. Supporting it would immediately open Klonode to every developer who wants a local-first or privacy-first workflow, and it would force us to build the provider abstraction that every subsequent backend (Gemini API, Vertex, LM Studio, generic OpenAI-compat) will reuse.
Scope
In scope
A working Ollama + Gemma backend for the chat panel in Q&A mode (routing questions against CONTEXT.md files, codebase explanations, "where is X logic", "what does this folder do")
A provider abstraction that makes adding future backends straightforward
A backend selector in the chat panel settings UI
A quickstart doc so a new user can go from zero to "chatting with Gemma about my repo" in under five minutes
Out of scope for this issue (can be follow-ups)
Full tool-calling parity (file edits, shell commands) — Gemma's tool-use story is still evolving and not worth blocking this PR on
Hosted Gemini API / Vertex AI / LM Studio backends (stretch goals, listed below)
Changing anything about how CONTEXT.md files are generated or routed
Any change to the existing CLI path's behavior
Proposed approach
Introduce a ChatBackend interface and move the current hardcoded spawn logic behind it:
ClaudeCliBackend — extracted, unchanged behavior, spawns the existing CLI via child_process.spawn, parses stream-json lines, emits the same SSE event shapes the frontend already handles. This is a pure refactor.
OllamaGemmaBackend — fetch to http://localhost:11434/api/chat with { model: 'gemma2' (or 'gemma3:27b'), messages, stream: true }, read the NDJSON response stream, and translate each chunk into a text SSE event. Emits a single result event at the end. supportsTools() returns false for now.
Stream handler dispatches — +server.ts reads the backend id from the request (or from settings), looks it up in a small registry, and iterates backend.stream(...) writing each event to the ReadableStream controller. The existing event names (session, tool, text, result, stderr, done, error) stay exactly the same so the client doesn't need to change.
Graceful tool degradation — when the current backend's supportsTools() is false and the user's message looks like a code-edit request ("change", "fix", "add", "refactor"), the backend wraps the prompt with a note explaining that tool use isn't available on this backend and asks Gemma to respond with a plan or a patch the user can apply manually. No silent failures.
Settings — settings.ts gets a backend: 'claude-cli' | 'ollama-gemma' field (plus a spot for ollamaUrl and ollamaModel). Default stays on the current CLI backend so existing users see zero change.
packages/ui/src/lib/stores/chat.ts — pass the selected backend id through to the streaming endpoint
packages/ui/src/lib/components/ChatPanel/ChatPanel.svelte — add a small backend selector (dropdown or segmented control) with a visible indicator showing which backend the current message was answered by
Acceptance criteria
ollama pull gemma2 (or gemma3:4b for lower-end machines) works and the Klonode chat panel can stream a response from it in real time
The backend selector in the chat panel shows at least two options: the existing CLI backend and Ollama (Gemma)
The selected backend is persisted in settings and the chat panel shows a small indicator (toast, badge, or label) so the user always knows which backend answered a given message
When the user asks Gemma for something that requires tool use (e.g. "edit this file"), the response includes a clear note that the current backend can't directly edit files, and instead returns a plan or a diff the user can apply
docs/backends/gemma.md has a copy-pasteable quickstart that takes a new user from zero to a working Gemma chat in under five minutes, with troubleshooting for the common Ollama issues (port in use, model not pulled, Windows path weirdness)
No regression in the existing CLI path — default backend is unchanged, existing users see no difference unless they explicitly switch
npm run build in packages/ui passes, existing tests still pass, and there's at least one unit test that exercises the OllamaGemmaBackend against a mocked fetch
Nice to have (stretch)
Google AI Studio backend (gemini-pro / gemini-flash, API key in settings)
Vertex AI backend (uses gcloud ADC)
LM Studio backend (already OpenAI-compatible — could share code with a generic OpenAI-compat backend)
Generic OpenAI-compatible backend (base URL + API key + model name — covers LM Studio, llama.cpp's llama-server, vLLM, LocalAI, OpenRouter, Together, Groq, and many more in one shot)
An optional "tool use via prompting" mode that parses structured responses from Gemma and executes a restricted set of read-only tools (Read, Glob, Grep) so the model can actually look at files beyond what's in the routed CONTEXT.md
Honest note on the tool-use gap
The current chat path relies heavily on tool use — the CLI backend can read files, run greps, and edit code directly. Gemma 2's tool-calling is partial, and while Gemma 3 is meaningfully better, neither is as robust as what Klonode gets from its current backend today. Rather than pretend otherwise, this issue intentionally scopes the first PR to Q&A against routed context (which Gemma is genuinely good at — that's the whole point of Klonode's routing: smaller, more focused context windows) and treats full tool parity as a separate, later problem. This is the honest path forward and also the one most likely to actually land.
How to claim this
Comment on this issue saying you'd like to take it, then open a draft PR as early as you can — even with just the ChatBackend interface and a stubbed OllamaGemmaBackend. The abstraction layer is the part most likely to need design feedback, so it's much better to get that reviewed before you go deep on streaming plumbing or UI work. Don't be shy about asking questions in the draft PR's description.
Why
Klonode is supposed to be model-agnostic at the routing layer — the whole point of CONTEXT.md generation is that any capable LLM should be able to consume it. Today, though, the Workstation chat panel can only talk to one specific CLI backend, which means anyone without a subscription to that provider has no way to actually use Klonode interactively. Gemma (Google's open-weight family — Gemma 2 and Gemma 3) is the most obvious gap to close first: it runs locally via Ollama, it's free, it respects privacy, it works offline, and it has a huge community. Supporting it would immediately open Klonode to every developer who wants a local-first or privacy-first workflow, and it would force us to build the provider abstraction that every subsequent backend (Gemini API, Vertex, LM Studio, generic OpenAI-compat) will reuse.
Scope
In scope
Ollama + Gemmabackend for the chat panel in Q&A mode (routing questions against CONTEXT.md files, codebase explanations, "where is X logic", "what does this folder do")Out of scope for this issue (can be follow-ups)
Proposed approach
Introduce a
ChatBackendinterface and move the current hardcoded spawn logic behind it:Sketch of the two initial implementations:
ClaudeCliBackend— extracted, unchanged behavior, spawns the existing CLI viachild_process.spawn, parsesstream-jsonlines, emits the same SSE event shapes the frontend already handles. This is a pure refactor.OllamaGemmaBackend— fetch tohttp://localhost:11434/api/chatwith{ model: 'gemma2' (or 'gemma3:27b'), messages, stream: true }, read the NDJSON response stream, and translate each chunk into atextSSE event. Emits a singleresultevent at the end.supportsTools()returnsfalsefor now.Stream handler dispatches —
+server.tsreads the backend id from the request (or from settings), looks it up in a small registry, and iteratesbackend.stream(...)writing each event to theReadableStreamcontroller. The existing event names (session,tool,text,result,stderr,done,error) stay exactly the same so the client doesn't need to change.Graceful tool degradation — when the current backend's
supportsTools()isfalseand the user's message looks like a code-edit request ("change", "fix", "add", "refactor"), the backend wraps the prompt with a note explaining that tool use isn't available on this backend and asks Gemma to respond with a plan or a patch the user can apply manually. No silent failures.Settings —
settings.tsgets abackend: 'claude-cli' | 'ollama-gemma'field (plus a spot forollamaUrlandollamaModel). Default stays on the current CLI backend so existing users see zero change.Files to touch
New
packages/ui/src/lib/backends/index.ts—ChatBackendinterface,StreamEventtype, backend registrypackages/ui/src/lib/backends/claude-cli.ts— extracted current behaviorpackages/ui/src/lib/backends/ollama-gemma.ts— new Ollama/Gemma backenddocs/backends/gemma.md— quickstart (install Ollama,ollama pull gemma2, point Klonode athttp://localhost:11434, screenshot of selector)Modified
packages/ui/src/routes/api/chat/stream/+server.ts— replace the inlinespawnlogic with a dispatcher that callsbackend.stream(opts)packages/ui/src/lib/stores/settings.ts— addbackend,ollamaUrl,ollamaModelfields (keep existingcliPathworking)packages/ui/src/lib/stores/chat.ts— pass the selected backend id through to the streaming endpointpackages/ui/src/lib/components/ChatPanel/ChatPanel.svelte— add a small backend selector (dropdown or segmented control) with a visible indicator showing which backend the current message was answered byAcceptance criteria
ollama pull gemma2(orgemma3:4bfor lower-end machines) works and the Klonode chat panel can stream a response from it in real timeOllama (Gemma)docs/backends/gemma.mdhas a copy-pasteable quickstart that takes a new user from zero to a working Gemma chat in under five minutes, with troubleshooting for the common Ollama issues (port in use, model not pulled, Windows path weirdness)npm run buildinpackages/uipasses, existing tests still pass, and there's at least one unit test that exercises theOllamaGemmaBackendagainst a mocked fetchNice to have (stretch)
gemini-pro/gemini-flash, API key in settings)llama-server, vLLM, LocalAI, OpenRouter, Together, Groq, and many more in one shot)Read,Glob,Grep) so the model can actually look at files beyond what's in the routed CONTEXT.mdHonest note on the tool-use gap
The current chat path relies heavily on tool use — the CLI backend can read files, run greps, and edit code directly. Gemma 2's tool-calling is partial, and while Gemma 3 is meaningfully better, neither is as robust as what Klonode gets from its current backend today. Rather than pretend otherwise, this issue intentionally scopes the first PR to Q&A against routed context (which Gemma is genuinely good at — that's the whole point of Klonode's routing: smaller, more focused context windows) and treats full tool parity as a separate, later problem. This is the honest path forward and also the one most likely to actually land.
How to claim this
Comment on this issue saying you'd like to take it, then open a draft PR as early as you can — even with just the
ChatBackendinterface and a stubbedOllamaGemmaBackend. The abstraction layer is the part most likely to need design feedback, so it's much better to get that reviewed before you go deep on streaming plumbing or UI work. Don't be shy about asking questions in the draft PR's description.