Skip to content

Support Gemma as a chat backend (community contribution — help wanted) #71

@smorchj

Description

@smorchj

Why

Klonode is supposed to be model-agnostic at the routing layer — the whole point of CONTEXT.md generation is that any capable LLM should be able to consume it. Today, though, the Workstation chat panel can only talk to one specific CLI backend, which means anyone without a subscription to that provider has no way to actually use Klonode interactively. Gemma (Google's open-weight family — Gemma 2 and Gemma 3) is the most obvious gap to close first: it runs locally via Ollama, it's free, it respects privacy, it works offline, and it has a huge community. Supporting it would immediately open Klonode to every developer who wants a local-first or privacy-first workflow, and it would force us to build the provider abstraction that every subsequent backend (Gemini API, Vertex, LM Studio, generic OpenAI-compat) will reuse.

Scope

In scope

  • A working Ollama + Gemma backend for the chat panel in Q&A mode (routing questions against CONTEXT.md files, codebase explanations, "where is X logic", "what does this folder do")
  • A provider abstraction that makes adding future backends straightforward
  • A backend selector in the chat panel settings UI
  • A quickstart doc so a new user can go from zero to "chatting with Gemma about my repo" in under five minutes

Out of scope for this issue (can be follow-ups)

  • Full tool-calling parity (file edits, shell commands) — Gemma's tool-use story is still evolving and not worth blocking this PR on
  • Hosted Gemini API / Vertex AI / LM Studio backends (stretch goals, listed below)
  • Changing anything about how CONTEXT.md files are generated or routed
  • Any change to the existing CLI path's behavior

Proposed approach

Introduce a ChatBackend interface and move the current hardcoded spawn logic behind it:

// packages/ui/src/lib/backends/index.ts
export interface StreamEvent {
  type: 'session' | 'tool' | 'text' | 'result' | 'stderr' | 'done' | 'error';
  data: unknown;
}

export interface ChatBackendOptions {
  prompt: string;
  systemPrompt: string;
  cwd: string;
  sessionId?: string;
  maxTurns: number;
  allowedTools?: string[];
  executionMode: 'question' | 'plan' | 'bypass';
  signal?: AbortSignal;
}

export interface ChatBackend {
  id: string;                                   // 'claude-cli' | 'ollama-gemma' | ...
  label: string;                                // shown in the UI selector
  supportsTools(): boolean;                     // affects fallback messaging
  stream(opts: ChatBackendOptions): AsyncIterable<StreamEvent>;
}

Sketch of the two initial implementations:

  1. ClaudeCliBackend — extracted, unchanged behavior, spawns the existing CLI via child_process.spawn, parses stream-json lines, emits the same SSE event shapes the frontend already handles. This is a pure refactor.

  2. OllamaGemmaBackend — fetch to http://localhost:11434/api/chat with { model: 'gemma2' (or 'gemma3:27b'), messages, stream: true }, read the NDJSON response stream, and translate each chunk into a text SSE event. Emits a single result event at the end. supportsTools() returns false for now.

  3. Stream handler dispatches+server.ts reads the backend id from the request (or from settings), looks it up in a small registry, and iterates backend.stream(...) writing each event to the ReadableStream controller. The existing event names (session, tool, text, result, stderr, done, error) stay exactly the same so the client doesn't need to change.

  4. Graceful tool degradation — when the current backend's supportsTools() is false and the user's message looks like a code-edit request ("change", "fix", "add", "refactor"), the backend wraps the prompt with a note explaining that tool use isn't available on this backend and asks Gemma to respond with a plan or a patch the user can apply manually. No silent failures.

  5. Settingssettings.ts gets a backend: 'claude-cli' | 'ollama-gemma' field (plus a spot for ollamaUrl and ollamaModel). Default stays on the current CLI backend so existing users see zero change.

Files to touch

New

  • packages/ui/src/lib/backends/index.tsChatBackend interface, StreamEvent type, backend registry
  • packages/ui/src/lib/backends/claude-cli.ts — extracted current behavior
  • packages/ui/src/lib/backends/ollama-gemma.ts — new Ollama/Gemma backend
  • docs/backends/gemma.md — quickstart (install Ollama, ollama pull gemma2, point Klonode at http://localhost:11434, screenshot of selector)

Modified

  • packages/ui/src/routes/api/chat/stream/+server.ts — replace the inline spawn logic with a dispatcher that calls backend.stream(opts)
  • packages/ui/src/lib/stores/settings.ts — add backend, ollamaUrl, ollamaModel fields (keep existing cliPath working)
  • packages/ui/src/lib/stores/chat.ts — pass the selected backend id through to the streaming endpoint
  • packages/ui/src/lib/components/ChatPanel/ChatPanel.svelte — add a small backend selector (dropdown or segmented control) with a visible indicator showing which backend the current message was answered by

Acceptance criteria

  • ollama pull gemma2 (or gemma3:4b for lower-end machines) works and the Klonode chat panel can stream a response from it in real time
  • The backend selector in the chat panel shows at least two options: the existing CLI backend and Ollama (Gemma)
  • The selected backend is persisted in settings and the chat panel shows a small indicator (toast, badge, or label) so the user always knows which backend answered a given message
  • When the user asks Gemma for something that requires tool use (e.g. "edit this file"), the response includes a clear note that the current backend can't directly edit files, and instead returns a plan or a diff the user can apply
  • docs/backends/gemma.md has a copy-pasteable quickstart that takes a new user from zero to a working Gemma chat in under five minutes, with troubleshooting for the common Ollama issues (port in use, model not pulled, Windows path weirdness)
  • No regression in the existing CLI path — default backend is unchanged, existing users see no difference unless they explicitly switch
  • npm run build in packages/ui passes, existing tests still pass, and there's at least one unit test that exercises the OllamaGemmaBackend against a mocked fetch

Nice to have (stretch)

  • Google AI Studio backend (gemini-pro / gemini-flash, API key in settings)
  • Vertex AI backend (uses gcloud ADC)
  • LM Studio backend (already OpenAI-compatible — could share code with a generic OpenAI-compat backend)
  • Generic OpenAI-compatible backend (base URL + API key + model name — covers LM Studio, llama.cpp's llama-server, vLLM, LocalAI, OpenRouter, Together, Groq, and many more in one shot)
  • An optional "tool use via prompting" mode that parses structured responses from Gemma and executes a restricted set of read-only tools (Read, Glob, Grep) so the model can actually look at files beyond what's in the routed CONTEXT.md

Honest note on the tool-use gap

The current chat path relies heavily on tool use — the CLI backend can read files, run greps, and edit code directly. Gemma 2's tool-calling is partial, and while Gemma 3 is meaningfully better, neither is as robust as what Klonode gets from its current backend today. Rather than pretend otherwise, this issue intentionally scopes the first PR to Q&A against routed context (which Gemma is genuinely good at — that's the whole point of Klonode's routing: smaller, more focused context windows) and treats full tool parity as a separate, later problem. This is the honest path forward and also the one most likely to actually land.

How to claim this

Comment on this issue saying you'd like to take it, then open a draft PR as early as you can — even with just the ChatBackend interface and a stubbed OllamaGemmaBackend. The abstraction layer is the part most likely to need design feedback, so it's much better to get that reviewed before you go deep on streaming plumbing or UI work. Don't be shy about asking questions in the draft PR's description.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions