Project Specification: lite-chat

A complete blueprint for recreating the backend of this project using AI coding assistants. This document focuses on architecture, business logic, and data flow — not UI.

1. Project Overview

lite-chat is a lightweight, local AI chat application powered by Ollama. Users can have multi-turn conversations with locally-hosted LLMs, with support for streaming responses, image analysis, smart context management, and conversation history with undo/restore.

Core Capabilities

Multi-turn chat with any Ollama model
Real-time token streaming via Server-Sent Events (SSE)
Automatic conversation summarization to stay within context limits
Image attachment support (base64-encoded, sent to vision models)
AI-generated conversation titles
Response regeneration
Message editing and deletion with snapshot-based undo
Model management (list, pull new models)
User preferences (default model, system prompt)

2. Tech Stack

Backend

Component	Technology	Version	Purpose
Framework	FastAPI	0.115.0	Async web framework with auto-generated OpenAPI docs
Server	Uvicorn	0.30.6	ASGI server
Database	SQLite via aiosqlite	0.20.0	Async SQLite — zero-config, file-based persistence
Validation	Pydantic	2.9.2	Request/response models with type validation
HTTP Client	httpx	0.27.2	Async HTTP for Ollama REST API calls
LLM Framework	LangChain + langchain-ollama	0.3.1 / 0.2.0	Imported but Ollama REST API is used directly via httpx
Testing	pytest + pytest-asyncio	7.0+ / 0.21.0+	Async test support with auto mode

Frontend (for reference only)

Component	Technology
Framework	Next.js 14 (App Router)
Language	TypeScript
Styling	Tailwind CSS
UI Library	shadcn/ui
State	Zustand

External Dependency

Service	URL	Purpose
Ollama	`http://localhost:11434`	Local LLM inference server

3. Architecture

Layered Architecture

Routes (API handlers)
  └── Services (business logic)
       ├── Database (aiosqlite)
       └── Ollama (httpx REST calls)

Routes: Thin handlers that validate input, call services, return responses. No business logic.
Services: Stateless functions that receive db: aiosqlite.Connection as a parameter. All database and Ollama interactions happen here.
Database: Async SQLite with WAL mode and foreign keys enabled. Connection provided via FastAPI dependency injection (get_db async generator).

Application Lifecycle

@asynccontextmanager
async def lifespan(app: FastAPI):
    # Startup: init DB schema + seed default user, check Ollama health
    await init_db()
    await check_ollama_health()  # Warning only, not fatal
    yield
    # Shutdown: log only

CORS

Wide open for local development:

allow_origins=["*"]
allow_credentials=True
allow_methods=["*"]
allow_headers=["*"]

Database Connection

Provided via async generator dependency:

async def get_db():
    db = await aiosqlite.connect(DB_PATH)
    db.row_factory = aiosqlite.Row  # Dict-like row access
    await db.execute("PRAGMA journal_mode=WAL")
    await db.execute("PRAGMA foreign_keys=ON")
    try:
        yield db
    finally:
        await db.close()

4. Configuration

All configuration lives in app/config.py as module-level constants:

Constant	Default Value	Description
`DB_PATH`	`../data/chat.db`	SQLite database file path (relative to `app/`)
`OLLAMA_BASE_URL`	`http://localhost:11434`	Ollama server base URL
`DEFAULT_MODEL`	`qwen3.5:9b`	Fallback model when user has no preference
`CONTEXT_FULL_EXCHANGES`	`5`	Number of recent user+assistant exchange pairs to keep in full context
`SUMMARY_MAX_TOKENS`	`500`	Target summary conciseness (used in prompt, not enforced)
`SUMMARY_MODEL`	`None`	Model for summarization; `None` = use same model as chat
`SSE_RETRY_TIMEOUT`	`15000`	SSE retry timeout in milliseconds
`DEFAULT_USER_ID`	`"default"`	Single-user system — all data belongs to this user
`DEFAULT_USER_NAME`	`"User"`	Default display name

5. Database Schema

SQLite with 4 tables. Schema is created on startup via init_db() using CREATE TABLE IF NOT EXISTS.

users

Column	Type	Default	Description
`id`	TEXT PK	`'default'`	Single-user system
`name`	TEXT	`'User'`	Display name
`default_model`	TEXT	`'qwen3.5:9b'`	Preferred LLM model
`system_prompt`	TEXT	NULL	Global system prompt
`created_at`	TIMESTAMP	CURRENT_TIMESTAMP
`updated_at`	TIMESTAMP	CURRENT_TIMESTAMP

conversations

Column	Type	Default	Description
`id`	TEXT PK	UUID
`user_id`	TEXT FK→users	`'default'`
`title`	TEXT	`'New Chat'`	Display title (AI-generated on first message)
`model_name`	TEXT	NULL	Model used for this conversation
`system_prompt`	TEXT	NULL	Per-conversation system prompt override
`summary`	TEXT	NULL	Rolling summary of old messages
`summary_upto_msg_id`	TEXT	NULL	Last message ID included in summary
`created_at`	TIMESTAMP	CURRENT_TIMESTAMP
`updated_at`	TIMESTAMP	CURRENT_TIMESTAMP	Updated on any message or conversation change

FK: user_id → users(id) ON DELETE CASCADE

messages

Column	Type	Default	Constraint	Description
`id`	TEXT PK	UUID
`conversation_id`	TEXT FK		NOT NULL
`role`	TEXT		CHECK IN ('user', 'assistant', 'system')
`content`	TEXT		NOT NULL	Message text
`image_base64`	TEXT	NULL		Base64-encoded image (user messages only)
`thinking`	TEXT	NULL		Model's reasoning/thinking output
`tokens_used`	INTEGER	0		Total tokens (prompt + completion)
`tokens_per_sec`	REAL	0		Generation speed
`thinking_duration`	INTEGER	0		Seconds spent in thinking phase
`is_summarized`	BOOLEAN	0		Marked true after included in a summary
`created_at`	TIMESTAMP	CURRENT_TIMESTAMP

FK: conversation_id → conversations(id) ON DELETE CASCADE

deleted_messages (snapshot/undo storage)

Column	Type	Default	Description
`id`	TEXT PK		Original message ID
`snapshot_id`	TEXT		Groups messages deleted together (UUID)
`conversation_id`	TEXT		Original conversation ID
`role`	TEXT
`content`	TEXT
`image_base64`	TEXT	NULL
`thinking`	TEXT	NULL
`tokens_used`	INTEGER	0
`tokens_per_sec`	REAL	0
`thinking_duration`	INTEGER	0
`original_created_at`	TIMESTAMP		Preserves original ordering
`deleted_at`	TIMESTAMP	CURRENT_TIMESTAMP

No foreign keys — this table stores orphaned data for restoration.

Schema Migration

For existing databases, missing columns are added via ALTER TABLE ... ADD COLUMN wrapped in try/except (silently skips if column exists):

try:
    await db.execute("ALTER TABLE messages ADD COLUMN tokens_per_sec REAL DEFAULT 0")
except Exception:
    pass

Default User Seeding

On every startup, checks if user 'default' exists. If not, inserts it.

6. API Endpoints

All endpoints are prefixed with /api/.

Chat

`POST /api/chat` — Send Message (SSE Stream)

Request:

{
  "conversation_id": "uuid | null",
  "model": "model-name | null",
  "message": "user message (required, min 1 char)",
  "image": "base64-string | null"
}

Behavior:

Resolve model: request → user preference → DEFAULT_MODEL
If no conversation_id: create new conversation, mark is_first_message = true
If conversation_id provided: verify it exists, check if message_count == 0 for first message detection
Save user message to DB
If first message: generate AI title via Ollama (blocking, non-streamed), update conversation title, emit title SSE event
Build context (system prompt + summary + last N messages)
Stream response from Ollama with think: true
On stream complete: save assistant message to DB with token stats

SSE Event Types:

Type	Fields	When
`title`	`conversation_id`, `title`	First message only, before content starts
`thinking`	`content`	Each thinking token from model
`content`	`content`	Each content token from model
`done`	`conversation_id`, `message_id`, `user_message_id`, `tokens_used`, `tokens_per_sec`, `thinking_duration`	Stream complete
`error`	`content`	On any error

SSE Format: data: {JSON}\n\n — standard SSE with text/event-stream content type.

`POST /api/chat/regenerate` — Regenerate Response

Request:

{
  "conversation_id": "uuid (required)",
  "message_id": "uuid of assistant message (required)",
  "model": "model-name | null"
}

Behavior:

Verify conversation and message exist, message must be role = 'assistant'
Delete the old assistant message from DB
Rebuild context (user message is still there)
Stream new response (same SSE format as /api/chat)
Save new assistant message

Conversations

`POST /api/conversations` — Create Conversation

Request: ConversationCreate — title, model_name, system_prompt (all optional) Response: 201 — ConversationResponse

`GET /api/conversations` — List All Conversations

Response: 200 — ConversationResponse[] ordered by updated_at DESC

Each includes message_count via LEFT JOIN COUNT

`GET /api/conversations/{conv_id}` — Get Conversation Detail

Response: 200 — ConversationDetailResponse (includes all messages ordered by created_at ASC) Error: 404 if not found

`PUT /api/conversations/{conv_id}` — Update Conversation

Request: ConversationUpdate — title, system_prompt (both optional, only provided fields are updated) Response: 200 — ConversationResponse

`DELETE /api/conversations/{conv_id}` — Delete Conversation

Response: 204 No Content. Messages are cascade-deleted by SQLite FK.

Messages

`GET /api/conversations/{conv_id}/messages` — Get Messages

Response: 200 — MessageResponse[]

`PUT /api/conversations/{conv_id}/messages/{msg_id}` — Update Message

Request: { "content": "new text" } Response: 200 — MessageResponse

`DELETE /api/conversations/{conv_id}/messages/{msg_id}` — Delete Message (Truncate + Snapshot)

This is NOT a simple delete. It deletes this message AND everything after it, creating a snapshot for undo.

Response: 200

{
  "snapshot_id": "uuid",
  "deleted_count": 3,
  "conversation_deleted": false
}

If all messages are deleted, the conversation itself is deleted and conversation_deleted: true.

`POST /api/conversations/restore/{snapshot_id}` — Restore (Undo)

Restores all messages from a snapshot. If the conversation was deleted, recreates it with title "Restored Chat".

Response: 200

{
  "conversation_id": "uuid",
  "restored_count": 3
}

Models

`GET /api/models` — List Available Models

Proxies GET /api/tags from Ollama. Filters out :latest tag variants when a specific tag exists with the same digest.

Response: 200 — { "models": [...] }

`POST /api/models/pull` — Pull New Model

Request: { "name": "llama3.2" } Response: Streams NDJSON progress from Ollama's pull endpoint.

Health

`GET /api/health` — Health Check

Response: 200

{
  "status": "healthy | degraded",
  "ollama": "connected | unreachable",
  "database": "connected"
}

User

`GET /api/user` — Get User Profile

Response: 200 — UserProfile

`PUT /api/user` — Update Preferences

Request: UserPreferencesUpdate — name, default_model, system_prompt (all optional) Response: 200 — UserProfile

7. Authentication & Authorization

None. This is a single-user, local-only application. All data is scoped to a hardcoded DEFAULT_USER_ID = "default". The user table exists to store preferences, not for auth.

All conversation queries filter by user_id = DEFAULT_USER_ID. If multi-user support were added, you'd:

Add auth middleware (JWT/session)
Extract user ID from the request instead of using the constant
The existing user_id foreign keys already support this

8. Business Logic

8.1 Chat Streaming Flow

User sends message
  → Resolve model (request > user pref > default)
  → Create conversation if needed
  → Save user message to DB
  → [First message] Generate AI title → emit "title" SSE event
  → Build context window
  → Stream from Ollama (think=true)
  → For each chunk:
      - thinking tokens → emit "thinking" SSE event
      - content tokens → emit "content" SSE event
      - Track timing for thinking_duration
  → On done:
      - Calculate tokens_per_sec from eval_count / eval_duration
      - Save assistant message with stats
      - Emit "done" SSE event

8.2 Smart Context Management

The system maintains a bounded context window to avoid exceeding model limits.

Configuration:

CONTEXT_FULL_EXCHANGES = 5 → keeps last 10 messages (5 user + 5 assistant) in full

Context Building (build_context):

1. Count total messages in conversation
2. If total > 10: trigger summarization of old messages
3. Build message list:
   a. System prompt (conversation-level > user-level > skip)
   b. Summary of old messages (if exists)
   c. Last 10 messages in chronological order
   d. [New user message is already included as it was saved before context build]

Summarization Trigger:

Runs when total_messages > LAST_N_MESSAGES (10)
Only summarizes messages NOT already marked is_summarized
Excludes the last 10 messages from summarization

Summarization Process:

Fetch unsummarized messages (excluding last 10)
Fetch existing summary from conversation (if any)
Call Ollama non-streamed with summarizer prompt:
- System: "Condense into 2-3 paragraphs preserving key facts, decisions, preferences"
- User: existing summary (if any) + new messages text
Strip <think> tags from response
Update conversations.summary and conversations.summary_upto_msg_id
Mark old messages as is_summarized = 1
On failure: silently skip (warning log)

Image Handling in Context: User messages with images include "images": [base64_string] in the Ollama message format.

8.3 AI Title Generation

When the first message is sent to a conversation:

Call Ollama non-streamed with a title generation prompt:
- System: "Generate a very short title (3-6 words, max 50 chars) for a chat conversation..."
- User: the first message content
- stream: false, think: false
- Timeout: 15 seconds
Sanitize response:
- Strip leading/trailing quotes
- Remove <think>...</think> blocks (regex with DOTALL)
- Reject empty or >60 char titles
Fallback: first 50 chars of message + "..." if AI fails
Model used: request model or qwen3.5:9b if none specified

The title is generated and emitted via SSE before the response starts streaming, so the frontend can update the sidebar immediately.

8.4 Message Deletion with Undo (Truncation + Snapshot)

This is the most complex operation. When a message is "deleted" from the chat:

Truncate:

Find the target message's created_at
Select ALL messages with created_at >= target (this message and everything after it)
Copy each to deleted_messages with a shared snapshot_id (UUID)
Delete originals from messages table
Check if any messages remain:
- Yes: update conversation updated_at
- No: delete the conversation entirely, set conversation_deleted = true
Return { snapshot_id, deleted_count, conversation_deleted }

Restore:

Fetch all deleted_messages with the given snapshot_id
Check if the conversation still exists:
- Yes: use it
- No: recreate with title = "Restored Chat"
Re-insert all messages using INSERT OR IGNORE (preserves original_created_at)
Delete the snapshot from deleted_messages
Return { conversation_id, restored_count }

8.5 Response Regeneration

Verify the target message is role = 'assistant'
Delete it from DB (simple DELETE, no snapshot)
Rebuild context (the preceding user message remains)
Stream a fresh response from Ollama
Save the new response

8.6 Model Management

List Models:

Proxies GET {OLLAMA_BASE_URL}/api/tags
Deduplication: if both model:latest and model:specific-tag exist with the same digest, removes the :latest variant

Pull Model:

Proxies POST {OLLAMA_BASE_URL}/api/pull as NDJSON stream
Frontend receives progress updates line-by-line

Health Check:

GET {OLLAMA_BASE_URL}/ — expects 200 status code
Returns true/false, does not throw

8.7 User Preferences

Single user (id = "default")
get_or_create_default_user: creates the user row on first access if missing
Update: only modifies provided fields (Pydantic exclude_unset=True)
System prompt precedence: conversation-level > user-level > none

9. Pydantic Schemas

Chat

class ChatRequest(BaseModel):
    conversation_id: str | None = None
    model: str | None = None
    message: str = Field(..., min_length=1)
    image: str | None = None  # base64

class RegenerateRequest(BaseModel):
    conversation_id: str
    message_id: str
    model: str | None = None

Conversation

class ConversationCreate(BaseModel):
    title: str | None = "New Chat"
    model_name: str | None = None
    system_prompt: str | None = None

class ConversationUpdate(BaseModel):
    title: str | None = None
    system_prompt: str | None = None

class ConversationResponse(BaseModel):
    id: str
    title: str
    model_name: str | None
    message_count: int = 0
    created_at: datetime
    updated_at: datetime

class MessageResponse(BaseModel):
    id: str
    role: str
    content: str
    thinking: str | None = None
    image_base64: str | None = None
    tokens_used: int = 0
    tokens_per_sec: float = 0.0
    thinking_duration: int = 0
    created_at: datetime

class ConversationDetailResponse(BaseModel):
    id: str
    title: str
    model_name: str | None
    system_prompt: str | None
    created_at: datetime
    updated_at: datetime
    messages: list[MessageResponse] = []

User

class UserProfile(BaseModel):
    id: str
    name: str
    default_model: str
    system_prompt: str | None = None

class UserPreferencesUpdate(BaseModel):
    name: str | None = None
    default_model: str | None = None
    system_prompt: str | None = None

Note: Schemas with model_name fields use ConfigDict(protected_namespaces=()) to avoid Pydantic's reserved namespace warning for model_* fields.

10. Third-Party Integrations

Ollama REST API

All LLM interactions use Ollama's REST API directly via httpx (not LangChain, despite it being a dependency).

Endpoints used:

Ollama Endpoint	Used For	Streamed
`POST /api/chat`	Chat responses	Yes (NDJSON lines)
`POST /api/chat`	Summarization	No (`stream: false`)
`POST /api/chat`	Title generation	No (`stream: false`)
`GET /api/tags`	List models	No
`POST /api/pull`	Pull models	Yes (NDJSON lines)
`GET /`	Health check	No

Chat streaming format: Each NDJSON line from Ollama contains:

{
  "message": {
    "content": "token",
    "thinking": "thinking token (when think=true)"
  },
  "done": false
}

Final chunk has done: true and includes:

eval_count — number of generated tokens
eval_duration — generation time in nanoseconds
prompt_eval_count — number of prompt tokens

Thinking mode: All chat requests use "think": true. Ollama models that support thinking (like Qwen) return reasoning in the thinking field. Non-thinking models simply return empty thinking fields.

11. Project Structure

backend/
├── app/
│   ├── main.py              # FastAPI app, lifespan, CORS, router registration
│   ├── config.py             # All constants
│   ├── database.py           # SQLite schema, get_db dependency, init_db
│   │
│   ├── routes/               # API endpoint handlers (thin)
│   │   ├── chat.py           # POST /api/chat, /api/chat/regenerate
│   │   ├── conversations.py  # CRUD + message ops + truncate/restore
│   │   ├── models.py         # GET /api/models, POST /api/models/pull, GET /api/health
│   │   └── user.py           # GET/PUT /api/user
│   │
│   ├── services/             # Business logic (stateless, db passed in)
│   │   ├── chat_service.py   # stream_chat, stream_regenerate, build_context, summarize
│   │   ├── conversation_service.py  # CRUD, truncate, restore, AI title
│   │   ├── model_service.py  # list_models, pull_model, health check
│   │   └── user_service.py   # get/update preferences
│   │
│   ├── schemas/              # Pydantic request/response models
│   │   ├── chat.py
│   │   ├── conversation.py
│   │   └── user.py
│   │
│   └── prompts/
│       └── summarize.py      # Summarizer system prompt + builder
│
├── test/                     # pytest tests
│   ├── conftest.py           # In-memory DB fixture, test helpers
│   ├── test_conversation_service.py   # 23 service unit tests
│   └── test_conversation_routes.py    # 10 API integration tests
│
├── data/
│   └── chat.db               # SQLite database (auto-created)
│
├── requirements.txt
└── pytest.ini

12. Environment & Setup

Prerequisites

Python 3.11+
Ollama installed and running at localhost:11434
At least one Ollama model pulled (e.g., ollama pull qwen3.5:9b)

Installation

cd backend
python -m venv .venv
source .venv/bin/activate       # Windows: .venv\Scripts\activate
pip install -r requirements.txt

Running

uvicorn app.main:app --reload
# Runs at http://localhost:8000
# Swagger UI at http://localhost:8000/docs

Database

Auto-created at backend/data/chat.db on first startup
Schema is idempotent (CREATE TABLE IF NOT EXISTS)
Missing columns added via ALTER TABLE with try/except

Testing

cd backend
pytest test/ -v

Tests use in-memory SQLite (no file DB needed)
Ollama calls are mocked via unittest.mock
asyncio_mode = auto in pytest.ini — no @pytest.mark.asyncio needed

13. Key Design Decisions

Direct Ollama REST vs LangChain: Despite LangChain being a dependency, all Ollama calls use httpx directly. This gives full control over streaming, thinking mode, and token statistics.
SQLite over Postgres: Zero-config, file-based, perfect for a local single-user app. WAL mode enables concurrent reads during writes.
Snapshot-based undo: Instead of soft-deletes on messages, deleted messages are moved to a separate deleted_messages table with a shared snapshot_id. This keeps the main messages table clean while supporting full undo.
Rolling summarization: Old messages aren't deleted — they're marked is_summarized = 1 and a rolling summary replaces them in the context window. This preserves full history in the DB while keeping LLM context bounded.
Title before response: AI title generation is a blocking call that completes before response streaming begins. This ensures the frontend has the conversation title immediately, avoiding a jarring delayed update.
Stateless services: All service functions receive the DB connection as a parameter. No service holds state. This makes testing straightforward — just pass an in-memory DB.
Single-user by design: The DEFAULT_USER_ID constant simplifies everything. Multi-user support would require adding auth middleware and replacing the constant with request-scoped user extraction, but the schema already supports it via user_id foreign keys.

14. Error Handling

Ollama unreachable: Chat endpoints return SSE error events. Health endpoint returns "degraded". Summarization and title generation silently fall back (warning logged).
Missing resources: Routes return 404 via HTTPException.
Validation errors: Pydantic returns 422 automatically.
All errors follow format: { "detail": "error message" }

15. LLM Prompts

Summarizer

System prompt:

"You are a conversation summarizer. Condense the following into a concise summary preserving key facts, decisions, user preferences, and important context. 2-3 paragraphs max."

User prompt construction:

[If existing summary exists:]
Existing summary:
{existing_summary}

New messages to incorporate:
USER: message1
ASSISTANT: message2
...

Title Generator

System prompt:

"Generate a very short title (3-6 words, max 50 chars) for a chat conversation based on the user's first message. Return ONLY the title, nothing else. No quotes, no punctuation at the end, no explanation."

FilesExpand file tree

project-specification.md

Latest commit

History

project-specification.md

File metadata and controls

Project Specification: lite-chat

1. Project Overview

Core Capabilities

2. Tech Stack

Backend

Frontend (for reference only)

External Dependency

3. Architecture

Layered Architecture

Application Lifecycle

CORS

Database Connection

4. Configuration

5. Database Schema

users

conversations

messages

deleted_messages (snapshot/undo storage)

Schema Migration

Default User Seeding

6. API Endpoints

Chat

POST /api/chat — Send Message (SSE Stream)

POST /api/chat/regenerate — Regenerate Response

Conversations

POST /api/conversations — Create Conversation

GET /api/conversations — List All Conversations

GET /api/conversations/{conv_id} — Get Conversation Detail

PUT /api/conversations/{conv_id} — Update Conversation

DELETE /api/conversations/{conv_id} — Delete Conversation

Messages

GET /api/conversations/{conv_id}/messages — Get Messages

PUT /api/conversations/{conv_id}/messages/{msg_id} — Update Message

DELETE /api/conversations/{conv_id}/messages/{msg_id} — Delete Message (Truncate + Snapshot)

POST /api/conversations/restore/{snapshot_id} — Restore (Undo)

Models

GET /api/models — List Available Models

POST /api/models/pull — Pull New Model

Health

GET /api/health — Health Check

User

GET /api/user — Get User Profile

PUT /api/user — Update Preferences

7. Authentication & Authorization

8. Business Logic

8.1 Chat Streaming Flow

8.2 Smart Context Management

8.3 AI Title Generation

8.4 Message Deletion with Undo (Truncation + Snapshot)

8.5 Response Regeneration

8.6 Model Management

8.7 User Preferences

9. Pydantic Schemas

Chat

Conversation

User

10. Third-Party Integrations

Ollama REST API

11. Project Structure

12. Environment & Setup

Prerequisites

Installation

Running

Database

Testing

13. Key Design Decisions

14. Error Handling

15. LLM Prompts

Summarizer

Title Generator

`POST /api/chat` — Send Message (SSE Stream)

`POST /api/chat/regenerate` — Regenerate Response

`POST /api/conversations` — Create Conversation

`GET /api/conversations` — List All Conversations

`GET /api/conversations/{conv_id}` — Get Conversation Detail

`PUT /api/conversations/{conv_id}` — Update Conversation

`DELETE /api/conversations/{conv_id}` — Delete Conversation

`GET /api/conversations/{conv_id}/messages` — Get Messages

`PUT /api/conversations/{conv_id}/messages/{msg_id}` — Update Message

`DELETE /api/conversations/{conv_id}/messages/{msg_id}` — Delete Message (Truncate + Snapshot)

`POST /api/conversations/restore/{snapshot_id}` — Restore (Undo)

`GET /api/models` — List Available Models

`POST /api/models/pull` — Pull New Model

`GET /api/health` — Health Check

`GET /api/user` — Get User Profile

`PUT /api/user` — Update Preferences