Python backend for Freva-GPT assistant. The service mirrors the Rust implementation of the FrevaGPT API while adding Python-only tooling such as LiteLLM-native prompting, Mongo-backed thread storage, and MCP (Model Context Protocol) tool orchestration for RAG and code execution.
- FastAPI app with strict auth parity to the production Rust service (
/api/chatbot/*) - Streaming responses via LiteLLM/OpenAI-compatible SSE (
application/x-ndjson) with code + image variants - Persistent conversation threads in MongoDB and JSONL files (
threads/), plus per-user scratch space (cache/) - MCP manager that wires the backend to dedicated tool servers (
rag,code) - Docker compose stack that includes LiteLLM, Ollama, the backend, and both MCP servers
- Comprehensive pytest suite covering auth, prompting, storage, litellm client helpers, and route matrices
podmanordocker- MongoDB reachable via vault URL
- Credentials & headers for the Freva auth/vault services
Create .env (used by FastAPI, Docker, and MCP servers). See .env.example for guidance.
podman compose up --buildServices that start:
freva-gpt-backend: FastAPI app (debugpy toggle viaDEBUG=truefor remote debugging session)rag: MCP server exposingget_context_from_resourcescode: MCP server running the sandboxed Jupyter kernel and exposingcode_interpreterlitellm: LiteLLM proxy that readslitellm_config.yamlollama: Optional local model runner for LiteLLM backends
Bind mounts expose /work, logs, threads, and shared cache to other Freva services. Provide GPU access to Ollama via Docker device reservations when needed.
podmanordocker
Create .env (used by FastAPI, Docker, and MCP servers). See .env.example for guidance.
./dev.sh up -d --build| Path | Purpose |
|---|---|
src/app.py |
FastAPI entrypoint, CORS policy, router registration, app lifespan hooks |
src/api/chatbot/* |
HTTP handlers for chat operations (availablechatbots, streamresponse, getthread, etc.) |
src/services/streaming/ |
LiteLLM client, orchestrator, stream variant definitions, heartbeat helpers |
src/services/storage/ |
MongoDB + disk-backed persistence (threads/ JSONL, cache/ scratch space) |
src/services/mcp/ |
MCP manager and MCP client |
src/services/authentication/ |
Authentication: DEV mode auth surpassing OIDC requirements |
src/core/ |
Settings, prompt assembly, logging, startup checks, available-model parsing |
src/tools/ |
MCP servers (code interpreter + RAG), auth helpers, header gate middleware |
prompt_library/ |
Baseline system prompts, summary prompts, and few-shot examples (JSONL) |
resources/ |
Documentation corpora used by the RAG tool (stableclimgen seed content) |
docker/ |
Dockerfiles for backend, LiteLLM/Ollama helpers, rag/code MCP servers |
scripts/ |
Dev utilities (dev_chat.py, dev_script.py, check_kernel_env.py) |
tests/ |
Pytest suite covering auth, prompting, streaming, storage, and endpoints |
litellm_config.yaml |
Source of truth for model catalog (consumed by available_chatbots()) |
Generated artifacts that persist across runs:
threads/(JSONL transcript per thread id)cache/{user_id}/{thread_id}(LLM-created files, plots, etc.)logs/(when mounted in Docker)
- FastAPI layer enforces auth via
AuthRequired(Bearer tokens validated againstx-freva-rest-url), injects usernames, and validates per-request headers (x-freva-vault-url,freva-config, etc.). - LiteLLM proxy (
FREVAGPT_LITE_LLM_ADDRESS) provides OpenAI-compatible chat + embeddings endpoints; completions stream intoStreamVariantclasses that normalize assistant text, code blocks, tool hints, images, and server hints. - Persistence uses both MongoDB (main storage) and optional disk mirrors. The
x-freva-vault-urlheader resolves the Mongo URI at runtime so each tenant can point at its own database. - MCP Manager (
src/services/mcp/mcp_manager.py) connects to tool servers listed inFREVAGPT_AVAILABLE_MCP_SERVERS(e.g.,["rag", "code"]), discovers tools, exposes OpenAI function schemas to LiteLLM, and routes tool invocations with per-thread session ids. - RAG + Code MCP servers run as separate ASGI apps (dockerized) with optional JWT auth. Requests flow through
header_gateso required headers (mongodb-uri,freva-config-path) become ContextVars before code executes. - Prompting loads baseline templates + few-shot examples per model and replays thread history (minus prompts, meta) to LiteLLM, matching the Rust semantics.
| Method | Path | Description | Notes |
|---|---|---|---|
GET |
/api/chatbot/ping |
Static ping stub | Placeholder |
GET |
/api/chatbot/docs |
Docs payload stub | Placeholder |
GET |
/api/chatbot/help |
Help payload stub | Placeholder |
GET |
/api/chatbot/availablechatbots |
Returns model names from litellm_config.yaml |
Requires auth |
GET |
/api/chatbot/getthread?thread_id=... |
Fetches thread contents omitting prompts + redundant StreamEnd variants | Needs x-freva-vault-url |
GET |
/api/chatbot/getuserthreads |
Returns latest 10 threads for authenticated user | Falls back to query user_id only if ALLOW_FALLBACK_OLD_AUTH |
GET |
/api/chatbot/streamresponse |
Starts an SSE stream of StreamVariant JSON payloads |
Query params: thread_id, input (required), chatbot |
GET/POST |
/api/chatbot/stop |
Initiates stopping of an active conversation | Requires auth |
- Response type:
application/x-ndjson - Each
data:line is a JSON object withvariantdiscriminators (Assistant,Code,CodeOutput,CodeError,Image,ServerHint,StreamEnd, etc.). - Code tool calls stream incremental chunks while LiteLLM emits
tool_calls. When the MCP tool resolves, results are converted back into JSON events and appended to Mongo/disk storage. - Server automatically injects
thread_idhints and records the conversation before returning the SSE chunk, ensuring replay safety.
- MongoDB (
mongodb_storage.py): canonical record for threads. Each document storesuser_id,thread_id, ISO timestamp, topic (summarized via LiteLLM), and serializedStreamVariantlist. - Disk mirrors (
thread_storage.py): keep JSONL copies underthreads/{thread_id}.txt, enabling offline replay and dev tooling. Topic of a thread is saved inthreads/{thread_id}.meta.json. cache/scratch:create_dir_at_cache()ensures each user/thread has a writable directory for generated files (plots, CSVs). Entries are sanitized if user IDs contain unsupported characters.- Prompt library:
prompt_library/baselinecontainsstarting_prompt.txt,summary_prompt.txt, andexamples.jsonl. GPT-5 models currently fall back to baseline prompts (warning logged). Customize by adding new prompt sets and updating_resolve_baseline_dir()/_resolve_gpt5_dir_or_placeholder(). - Resources:
resources/stableclimgenseeds the RAG MCP server. Drop additional corpora per library folder and list them inFREVAGPT_AVAILABLE_LIBRARIESinsidesrc/tools/rag/server.py.
- RAG server (
src/tools/rag/server.py): indexes documentation with custom loaders + splitters, stores embeddings in MongoDB (embeddings), and surfaces a single toolget_context_from_resources. LiteLLM requests embed queries through the same proxy (FREVAGPT_LITE_LLM_ADDRESS). - Code interpreter (
src/tools/code_interpreter/server.py): spins up per-session Jupyter kernels, sanitizes input, enforces configurable timeouts, and injects Freva config via environment variables. Outputs include stdout/stderr, display data, and structured errors. - Header gate (
src/tools/header_gate.py): wraps each MCP ASGI app so critical headers become ContextVars and requests fail fast when missing/invalid (e.g., missing Mongo URI yields SSE-friendly JSON-RPC errors). - Manager (
src/services/mcp/mcp_manager.py): caches clients, discovers tool schemas, exports OpenAI function definitions, and pins MCP session ids to thread ids for deterministic tool contexts.
- Run tests:
uv run pytest(oruv run pytest tests/test_auth.py -k bearerfor focused cases). Tests cover auth flows, prompt assembly, storage, stream variant conversions, and route parameter validation. - Interactive chat:
uv run python scripts/dev_chat.pystarts a REPL that exercises the same orchestrator logic, persisting outputs to disk and optionally pointing at local MCP servers.
- Auth failures: verify headers include both
Authorizationandx-freva-rest-url. Inspect FastAPI logs for the exact HTTP status. - Missing models: ensure
litellm_config.yamlis readable and containsmodel_namekeys.available_chatbots()aborts the process if it cannot find any entries. - MCP issues: backend logs warn but continue when tool discovery fails; LiteLLM will simply not emit tool calls. Use
settings.AVAILABLE_MCP_SERVERSto enable/disable targets explicitly. - File access: Make sure
freva-configheaders point at mounted paths and/workis mounted read-only where expected. - Mongo connectivity:
_get_database()retries without URI query params. Persistent failures return HTTP 503; check vault responses and network policies.