Phase 2 follow-ups (deferred cleanups and decisions)

Tracking issue for follow-ups surfaced while building and reviewing the Phase 2 local inference engine stack (#217, #218, #219). These are deliberately kept out of those PRs to avoid scope creep; each can land on its own. The whole feature is unreleased (release held until Phase 3), so config-shape changes captured here are still cheap and migration-free.

Append new items to the checklist as they come up during review.

## Follow-ups

- [ ] 1. Unify the two model-residency knobs into one field (details below)
- [ ] 2. Hide the OpenAI provider kind behind a compile-time flag, off by default, before release (details below)
- [ ] 3. Remove internal phase jargon (Phase 1/2/3 and Phase A/B) from shipping code, comments, and tests (details below)

---

### 1. Unify the two model-residency knobs into one field

**Problem.** The residency timeout is configured by two separate fields that mean the same thing but behave inconsistently:

| Field | Provider | `0` means | `-1` means | Enforced by |
|---|---|---|---|---|
| `keep_warm_inactivity_minutes` (i32) | Ollama | defer to Ollama's own default (~5 min) | keep resident forever | per-request `keep_alive` pass-through |
| `idle_unload_minutes` (u32) | built-in | keep resident forever | n/a (unsigned) | engine runner idle timer |

The same value `0` does opposite things across the two fields, and neither name says which provider it governs. The schema even needs a comment on one field pointing at the other just to disambiguate them.

**Proposed fix.** Collapse onto the better-designed scheme (the Ollama one):

- Keep a single field, `keep_warm_inactivity_minutes` (already on `main`, already signed with the sentinel scheme), and remove `idle_unload_minutes`.
- Make the built-in engine obey the same field. Translate at the boundary that already forwards config changes to the runner: `-1` to no idle timer (forever), `0` to the natural short default (~5 min), `N` to N minutes. The runner's internal "0 = disabled" convention stays an implementation detail behind that one small, testable translator. The Ollama path does not change.
- Collapse the two conditional Settings controls into one "keep model warm" control, shown for built-in and Ollama, not applicable for OpenAI (remote server; Thuki does not manage its residency).
- Update `docs/configurations.md`: merge the two rows, update the example TOML.

Unified semantics: `-1` keep resident forever; `0` use the provider's natural short default (about 5 minutes: for Ollama this defers to Ollama, for built-in Thuki applies its own 5-minute timer); `N` keep warm for N minutes of inactivity, then unload.

**Decision needed: the built-in default.** Today built-in defaults to "resident forever" (`idle_unload_minutes = 0`). Under the unified scheme `0` means "~5 min", so a `0` default would change built-in to unload after about 5 minutes idle by default. Recommendation: default to `0` (~5 min). Built-in is now the default provider on every fresh install, and pinning multiple GB of RAM resident forever with no setting touched is too aggressive on a 16 GB Mac; ~5 minutes keeps it instant during an active session and frees RAM when idle, and power users can set `-1`. Alternative: default `-1` to preserve the current "always resident" behavior.

**Tradeoff.** A single field cannot express different residency for Ollama vs built-in at the same time. Judged YAGNI for a single-active-provider app.

**Pointers.**
- Fields and bounds: `src-tauri/src/config/schema.rs` (both fields), `src-tauri/src/config/defaults.rs` (`DEFAULT_*` / `BOUNDS_*`, `ALLOWED_FIELDS`), `src-tauri/src/config/loader.rs` (clamps).
- Ollama consumer (keep_alive): `src-tauri/src/lib.rs` and `src-tauri/src/commands.rs` (the `if keep_warm_inactivity_minutes == 0` branches), `warmup::keep_alive_string`.
- Built-in consumer (runner timer): `src-tauri/src/settings_commands.rs` (`forward_idle_unload_minutes`, `idle_unload_minutes_changed`) to `engine.set_idle_minutes`, plus startup seeding in `lib.rs`.
- Settings UI: `src/settings/tabs/ProviderCards.tsx` and `ModelTab.tsx`.

Why deferred: it is a clean, self-contained change, but it threads through all three stacked PRs; keeping it separate keeps those diffs focused on the feature. Best done before the Phase 3 release, while the `idle_unload_minutes` config value has not yet shipped to users.

---

### 2. Hide the OpenAI provider kind behind a compile-time flag before release

**Problem.** PR #218 and #219 fully build and 100%-test the `openai` provider kind: the ability to connect Thuki to any OpenAI-compatible server (LM Studio, Jan, a remote `llama-server`), including the `/v1` routing arm, `keychain.rs` API-key storage, and the Settings add/edit/remove-provider UI (`ProviderCards.tsx`). Product decision (2026-06-15): Thuki ships local-only, the bundled built-in engine plus local or remote Ollama. We do not want the OpenAI path visible to end users at ship time, but the code is built, reviewed, and fully tested, so we keep it rather than removing it (removal would be real surgery, mostly across #219's Settings UI).

**Decision (ratified 2026-06-15).**

- Keep #218 and #219 intact; remove nothing.
- Gate the path behind a **compile-time, dev-only flag**, default OFF. Shipped builds exclude the affordance entirely; only developers can enable it at build time. Nothing appears in `config.toml` or the docs.
- **Scope: UI only.** Gate only the user-facing Settings affordance(s) that let a user create or manage an `openai` provider. The backend (config acceptance of the kind, the `/v1` routing arm, `keychain.rs`) stays active and tested. Rationale: with no UI to create an `openai` provider, no end user can ever reach one, so nothing downstream (model picker, chat) surfaces it; the dormant, tested backend stays as-is.
- **Required before release.** Must land before the Phase 3 release that would otherwise expose the path to end users.

Note: this gates the `openai` *provider kind*, not the `/v1` client itself. The bundled built-in engine speaks `/v1` and depends on that client, so it must stay fully active; built-in chat is unaffected.

**Implementation sketch.**

- A frontend build-time constant (for example a Vite `define` global or an `import.meta.env` flag), tree-shaken out when false, gating the "add OpenAI provider" affordance in `src/settings/tabs/ProviderCards.tsx` and any openai-only controls (base-URL edit, API-key field, vision flag). Default false in production builds.
- No Rust changes required under the UI-only scope.
- Coverage: the 100% gate must still hold with the flag off. Decide the approach when implementing (for example, drive the constant so tests exercise the enabled branch, or cover both branches).

**Accepted edge.** Under a UI-only gate, a hand-edited `config.toml` that already contains an `openai` provider would still be honored by the backend. Acceptable: an end user cannot reach that state (no UI to create one), and it is a developer-only scenario.

**Pointers.**

- Visible surface to gate: `src/settings/tabs/ProviderCards.tsx` (add/edit/remove openai provider, API-key field, vision flag).
- Backend that stays live (do not gate): `src-tauri/src/openai.rs` (the `/v1` client, shared with built-in), `src-tauri/src/keychain.rs`, the `openai` routing arm in `src-tauri/src/commands.rs`, the `openai` kind in `src-tauri/src/config/{schema,loader}.rs`.
- Backend CRUD for openai providers: `src-tauri/src/settings_commands.rs`.

Why deferred: the code is intentionally kept whole and tested in #218/#219; the flag is a small, focused frontend change best made on its own. Must land before the Phase 3 release so the path is never exposed to end users.

---

### 3. Remove internal phase jargon from shipping code, comments, and tests

**Problem.** The codebase leaks two internal phasing vocabularies that mean nothing to a developer reading the repo cold:

- **"Phase 1/2/3"**: the local-inference-engine project phases (e.g. "Phase 2 bundles the engine", "Phase 3 onboarding gate").
- **"Phase A/B"**: an older capability-feature phasing (compose vs history capability checks), unrelated to the engine.

Comments and section headers should describe the functionality or technical behavior, not the internal delivery phase. The docs (`docs/configurations.md`, `CLAUDE.md`) already drop the phase wording on the #219 branch; this item covers the code, comments, and tests, which no PR cleans.

**Scope.** Reword these to describe the behavior instead of the phase:

- `src/utils/capabilityConflicts.ts` (post-Phase-A, Phase B) and `src/utils/__tests__/capabilityConflicts.test.ts`.
- `src/utils/sanitizeAssistantContent.ts` (pre-Phase-B) and its test.
- `src/components/ChatBubble.tsx` (pre-Phase-B history).
- `src/App.tsx` ("Phase B history-based capability strip").
- `src-tauri/src/commands.rs` ("Phase B picker hint", "pre-routing Phase 1 payload").
- `src-tauri/src/config/defaults.rs`, `loader.rs`, `tests.rs` (Phase 1/2 provider-seeding comments).
- `src-tauri/src/models/mod.rs` ("Phase 3 onboarding gate", "full detection is Phase 3").
- `src-tauri/src/settings_commands/tests.rs` ("builtin active since Phase 2").
- `src-tauri/src/screenshot.rs`: uses "Phase 1 / Phase 2" as local algorithm steps; reword to "Step 1 / Step 2" to avoid the vocabulary clash.

**Leave alone** (legitimate, not jargon): the animation `phase` in `src/App.tsx` (morph state) and the download `mmproj phase` in `DownloadProgress.test.tsx` (a real download stage).

**Why one pass.** The references span pre-existing `main` code and the stacked PRs and are purely cosmetic (comments, a couple of section headers, test labels). Cleanest as one focused sweep landed around the rest of the stack. Verify the docs are phase-free once #219 merges.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase 2 follow-ups (deferred cleanups and decisions) #220

Follow-ups

1. Unify the two model-residency knobs into one field

2. Hide the OpenAI provider kind behind a compile-time flag before release

3. Remove internal phase jargon from shipping code, comments, and tests

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Field	Provider	`0` means	`-1` means	Enforced by
`keep_warm_inactivity_minutes` (i32)	Ollama	defer to Ollama's own default (~5 min)	keep resident forever	per-request `keep_alive` pass-through
`idle_unload_minutes` (u32)	built-in	keep resident forever	n/a (unsigned)	engine runner idle timer

Phase 2 follow-ups (deferred cleanups and decisions) #220

Description

Follow-ups

1. Unify the two model-residency knobs into one field

2. Hide the OpenAI provider kind behind a compile-time flag before release

3. Remove internal phase jargon from shipping code, comments, and tests

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions