Support Gemma as a chat backend (community contribution — help wanted)

## Why

Klonode is supposed to be model-agnostic at the routing layer — the whole point of CONTEXT.md generation is that *any* capable LLM should be able to consume it. Today, though, the Workstation chat panel can only talk to one specific CLI backend, which means anyone without a subscription to that provider has no way to actually *use* Klonode interactively. Gemma (Google's open-weight family — Gemma 2 and Gemma 3) is the most obvious gap to close first: it runs locally via Ollama, it's free, it respects privacy, it works offline, and it has a huge community. Supporting it would immediately open Klonode to every developer who wants a local-first or privacy-first workflow, and it would force us to build the provider abstraction that every subsequent backend (Gemini API, Vertex, LM Studio, generic OpenAI-compat) will reuse.

## Scope

**In scope**
- A working `Ollama + Gemma` backend for the chat panel in Q&A mode (routing questions against CONTEXT.md files, codebase explanations, "where is X logic", "what does this folder do")
- A provider abstraction that makes adding future backends straightforward
- A backend selector in the chat panel settings UI
- A quickstart doc so a new user can go from zero to "chatting with Gemma about my repo" in under five minutes

**Out of scope for this issue** (can be follow-ups)
- Full tool-calling parity (file edits, shell commands) — Gemma's tool-use story is still evolving and not worth blocking this PR on
- Hosted Gemini API / Vertex AI / LM Studio backends (stretch goals, listed below)
- Changing anything about how CONTEXT.md files are generated or routed
- Any change to the existing CLI path's behavior

## Proposed approach

Introduce a `ChatBackend` interface and move the current hardcoded spawn logic behind it:

```ts
// packages/ui/src/lib/backends/index.ts
export interface StreamEvent {
  type: 'session' | 'tool' | 'text' | 'result' | 'stderr' | 'done' | 'error';
  data: unknown;
}

export interface ChatBackendOptions {
  prompt: string;
  systemPrompt: string;
  cwd: string;
  sessionId?: string;
  maxTurns: number;
  allowedTools?: string[];
  executionMode: 'question' | 'plan' | 'bypass';
  signal?: AbortSignal;
}

export interface ChatBackend {
  id: string;                                   // 'claude-cli' | 'ollama-gemma' | ...
  label: string;                                // shown in the UI selector
  supportsTools(): boolean;                     // affects fallback messaging
  stream(opts: ChatBackendOptions): AsyncIterable<StreamEvent>;
}
```

Sketch of the two initial implementations:

1. **`ClaudeCliBackend`** — extracted, unchanged behavior, spawns the existing CLI via `child_process.spawn`, parses `stream-json` lines, emits the same SSE event shapes the frontend already handles. This is a pure refactor.

2. **`OllamaGemmaBackend`** — fetch to `http://localhost:11434/api/chat` with `{ model: 'gemma2' (or 'gemma3:27b'), messages, stream: true }`, read the NDJSON response stream, and translate each chunk into a `text` SSE event. Emits a single `result` event at the end. `supportsTools()` returns `false` for now.

3. **Stream handler dispatches** — `+server.ts` reads the backend id from the request (or from settings), looks it up in a small registry, and iterates `backend.stream(...)` writing each event to the `ReadableStream` controller. The existing event names (`session`, `tool`, `text`, `result`, `stderr`, `done`, `error`) stay exactly the same so the client doesn't need to change.

4. **Graceful tool degradation** — when the current backend's `supportsTools()` is `false` and the user's message looks like a code-edit request ("change", "fix", "add", "refactor"), the backend wraps the prompt with a note explaining that tool use isn't available on this backend and asks Gemma to respond with a *plan* or a *patch* the user can apply manually. No silent failures.

5. **Settings** — `settings.ts` gets a `backend: 'claude-cli' | 'ollama-gemma'` field (plus a spot for `ollamaUrl` and `ollamaModel`). Default stays on the current CLI backend so existing users see zero change.

## Files to touch

**New**
- `packages/ui/src/lib/backends/index.ts` — `ChatBackend` interface, `StreamEvent` type, backend registry
- `packages/ui/src/lib/backends/claude-cli.ts` — extracted current behavior
- `packages/ui/src/lib/backends/ollama-gemma.ts` — new Ollama/Gemma backend
- `docs/backends/gemma.md` — quickstart (install Ollama, `ollama pull gemma2`, point Klonode at `http://localhost:11434`, screenshot of selector)

**Modified**
- `packages/ui/src/routes/api/chat/stream/+server.ts` — replace the inline `spawn` logic with a dispatcher that calls `backend.stream(opts)`
- `packages/ui/src/lib/stores/settings.ts` — add `backend`, `ollamaUrl`, `ollamaModel` fields (keep existing `cliPath` working)
- `packages/ui/src/lib/stores/chat.ts` — pass the selected backend id through to the streaming endpoint
- `packages/ui/src/lib/components/ChatPanel/ChatPanel.svelte` — add a small backend selector (dropdown or segmented control) with a visible indicator showing which backend the current message was answered by

## Acceptance criteria

- [ ] `ollama pull gemma2` (or `gemma3:4b` for lower-end machines) works and the Klonode chat panel can stream a response from it in real time
- [ ] The backend selector in the chat panel shows at least two options: the existing CLI backend and `Ollama (Gemma)`
- [ ] The selected backend is persisted in settings and the chat panel shows a small indicator (toast, badge, or label) so the user always knows which backend answered a given message
- [ ] When the user asks Gemma for something that requires tool use (e.g. "edit this file"), the response includes a clear note that the current backend can't directly edit files, and instead returns a plan or a diff the user can apply
- [ ] `docs/backends/gemma.md` has a copy-pasteable quickstart that takes a new user from zero to a working Gemma chat in under five minutes, with troubleshooting for the common Ollama issues (port in use, model not pulled, Windows path weirdness)
- [ ] No regression in the existing CLI path — default backend is unchanged, existing users see no difference unless they explicitly switch
- [ ] `npm run build` in `packages/ui` passes, existing tests still pass, and there's at least one unit test that exercises the `OllamaGemmaBackend` against a mocked fetch

## Nice to have (stretch)

- Google AI Studio backend (`gemini-pro` / `gemini-flash`, API key in settings)
- Vertex AI backend (uses gcloud ADC)
- LM Studio backend (already OpenAI-compatible — could share code with a generic OpenAI-compat backend)
- Generic OpenAI-compatible backend (base URL + API key + model name — covers LM Studio, llama.cpp's `llama-server`, vLLM, LocalAI, OpenRouter, Together, Groq, and many more in one shot)
- An optional "tool use via prompting" mode that parses structured responses from Gemma and executes a restricted set of read-only tools (`Read`, `Glob`, `Grep`) so the model can actually look at files beyond what's in the routed CONTEXT.md

## Honest note on the tool-use gap

The current chat path relies heavily on tool use — the CLI backend can read files, run greps, and edit code directly. Gemma 2's tool-calling is partial, and while Gemma 3 is meaningfully better, neither is as robust as what Klonode gets from its current backend today. Rather than pretend otherwise, this issue intentionally scopes the first PR to **Q&A against routed context** (which Gemma is genuinely good at — that's the whole point of Klonode's routing: smaller, more focused context windows) and treats full tool parity as a separate, later problem. This is the honest path forward and also the one most likely to actually land.

## How to claim this

Comment on this issue saying you'd like to take it, then open a draft PR as early as you can — even with just the `ChatBackend` interface and a stubbed `OllamaGemmaBackend`. The abstraction layer is the part most likely to need design feedback, so it's much better to get that reviewed before you go deep on streaming plumbing or UI work. Don't be shy about asking questions in the draft PR's description.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Gemma as a chat backend (community contribution — help wanted) #71

Why

Scope

Proposed approach

Files to touch

Acceptance criteria

Nice to have (stretch)

Honest note on the tool-use gap

How to claim this

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Support Gemma as a chat backend (community contribution — help wanted) #71

Description

Why

Scope

Proposed approach

Files to touch

Acceptance criteria

Nice to have (stretch)

Honest note on the tool-use gap

How to claim this

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions