FrevaGPT Backend (Python)

Python backend for Freva-GPT assistant. The service mirrors the Rust implementation of the FrevaGPT API while adding Python-only tooling such as LiteLLM-native prompting, Mongo-backed thread storage, and MCP (Model Context Protocol) tool orchestration for RAG and code execution.

Highlights

FastAPI app with strict auth parity to the production Rust service (/api/chatbot/*)
Streaming responses via LiteLLM/OpenAI-compatible SSE (application/x-ndjson) with code + image variants
Persistent conversation threads in MongoDB and JSONL files (threads/), plus per-user scratch space (cache/)
MCP manager that wires the backend to dedicated tool servers (rag, code)
Docker compose stack that includes LiteLLM, Ollama, the backend, and both MCP servers
Comprehensive pytest suite covering auth, prompting, storage, litellm client helpers, and route matrices

Quick Start (deployment)

Requirements

podman or docker
MongoDB reachable via vault URL
Credentials & headers for the Freva auth/vault services

Configure environment

Create .env (used by FastAPI, Docker, and MCP servers). See .env.example for guidance.

Full stack via Docker Compose

podman compose up --build

Services that start:

freva-gpt-backend: FastAPI app (debugpy toggle via DEBUG=true for remote debugging session)
rag: MCP server exposing get_context_from_resources
code: MCP server running the sandboxed Jupyter kernel and exposing code_interpreter
litellm: LiteLLM proxy that reads litellm_config.yaml
ollama: Optional local model runner for LiteLLM backends

Bind mounts expose /work, logs, threads, and shared cache to other Freva services. Provide GPU access to Ollama via Docker device reservations when needed.

Quick Start (local dev)

Requirements

podman or docker

Configure environment

Create .env (used by FastAPI, Docker, and MCP servers). See .env.example for guidance.

Start docker containers in DEV mode

./dev.sh up -d --build

Repository Layout

Path	Purpose
`src/app.py`	FastAPI entrypoint, CORS policy, router registration, app lifespan hooks
`src/api/chatbot/*`	HTTP handlers for chat operations (`availablechatbots`, `streamresponse`, `getthread`, etc.)
`src/services/streaming/`	LiteLLM client, orchestrator, stream variant definitions, heartbeat helpers
`src/services/storage/`	MongoDB + disk-backed persistence (`threads/` JSONL, `cache/` scratch space)
`src/services/mcp/`	MCP manager and MCP client
`src/services/authentication/`	Authentication: DEV mode auth surpassing OIDC requirements
`src/core/`	Settings, prompt assembly, logging, startup checks, available-model parsing
`src/tools/`	MCP servers (code interpreter + RAG), auth helpers, header gate middleware
`prompt_library/`	Baseline system prompts, summary prompts, and few-shot examples (JSONL)
`resources/`	Documentation corpora used by the RAG tool (`stableclimgen` seed content)
`docker/`	Dockerfiles for backend, LiteLLM/Ollama helpers, rag/code MCP servers
`scripts/`	Dev utilities (`dev_chat.py`, `dev_script.py`, `check_kernel_env.py`)
`tests/`	Pytest suite covering auth, prompting, streaming, storage, and endpoints
`litellm_config.yaml`	Source of truth for model catalog (consumed by `available_chatbots()`)

Generated artifacts that persist across runs:

threads/ (JSONL transcript per thread id)
cache/{user_id}/{thread_id} (LLM-created files, plots, etc.)
logs/ (when mounted in Docker)

Architecture at a Glance

FastAPI layer enforces auth via AuthRequired (Bearer tokens validated against x-freva-rest-url), injects usernames, and validates per-request headers (x-freva-vault-url, freva-config, etc.).
LiteLLM proxy (FREVAGPT_LITE_LLM_ADDRESS) provides OpenAI-compatible chat + embeddings endpoints; completions stream into StreamVariant classes that normalize assistant text, code blocks, tool hints, images, and server hints.
Persistence uses both MongoDB (main storage) and optional disk mirrors. The x-freva-vault-url header resolves the Mongo URI at runtime so each tenant can point at its own database.
MCP Manager (src/services/mcp/mcp_manager.py) connects to tool servers listed in FREVAGPT_AVAILABLE_MCP_SERVERS (e.g., ["rag", "code"]), discovers tools, exposes OpenAI function schemas to LiteLLM, and routes tool invocations with per-thread session ids.
RAG + Code MCP servers run as separate ASGI apps (dockerized) with optional JWT auth. Requests flow through header_gate so required headers (mongodb-uri, freva-config-path) become ContextVars before code executes.
Prompting loads baseline templates + few-shot examples per model and replays thread history (minus prompts, meta) to LiteLLM, matching the Rust semantics.

API Surface

Method	Path	Description	Notes
`GET`	`/api/chatbot/ping`	Static ping stub	Placeholder
`GET`	`/api/chatbot/docs`	Docs payload stub	Placeholder
`GET`	`/api/chatbot/help`	Help payload stub	Placeholder
`GET`	`/api/chatbot/availablechatbots`	Returns model names from `litellm_config.yaml`	Requires auth
`GET`	`/api/chatbot/getthread?thread_id=...`	Fetches thread contents omitting prompts + redundant StreamEnd variants	Needs `x-freva-vault-url`
`GET`	`/api/chatbot/getuserthreads`	Returns latest 10 threads for authenticated user	Falls back to query `user_id` only if `ALLOW_FALLBACK_OLD_AUTH`
`GET`	`/api/chatbot/streamresponse`	Starts an SSE stream of `StreamVariant` JSON payloads	Query params: `thread_id`, `input` (required), `chatbot`
`GET/POST`	`/api/chatbot/stop`	Initiates stopping of an active conversation	Requires auth

Streaming contract

Response type: application/x-ndjson
Each data: line is a JSON object with variant discriminators (Assistant, Code, CodeOutput, CodeError, Image, ServerHint, StreamEnd, etc.).
Code tool calls stream incremental chunks while LiteLLM emits tool_calls. When the MCP tool resolves, results are converted back into JSON events and appended to Mongo/disk storage.
Server automatically injects thread_id hints and records the conversation before returning the SSE chunk, ensuring replay safety.

Persistence, Prompts, and Assets

MongoDB (mongodb_storage.py): canonical record for threads. Each document stores user_id, thread_id, ISO timestamp, topic (summarized via LiteLLM), and serialized StreamVariant list.
Disk mirrors (thread_storage.py): keep JSONL copies under threads/{thread_id}.txt, enabling offline replay and dev tooling. Topic of a thread is saved in threads/{thread_id}.meta.json.
cache/ scratch: create_dir_at_cache() ensures each user/thread has a writable directory for generated files (plots, CSVs). Entries are sanitized if user IDs contain unsupported characters.
Prompt library: prompt_library/baseline contains starting_prompt.txt, summary_prompt.txt, and examples.jsonl. GPT-5 models currently fall back to baseline prompts (warning logged). Customize by adding new prompt sets and updating _resolve_baseline_dir() / _resolve_gpt5_dir_or_placeholder().
Resources: resources/stableclimgen seeds the RAG MCP server. Drop additional corpora per library folder and list them in FREVAGPT_AVAILABLE_LIBRARIES inside src/tools/rag/server.py.

MCP Tooling

RAG server (src/tools/rag/server.py): indexes documentation with custom loaders + splitters, stores embeddings in MongoDB (embeddings), and surfaces a single tool get_context_from_resources. LiteLLM requests embed queries through the same proxy (FREVAGPT_LITE_LLM_ADDRESS).
Code interpreter (src/tools/code_interpreter/server.py): spins up per-session Jupyter kernels, sanitizes input, enforces configurable timeouts, and injects Freva config via environment variables. Outputs include stdout/stderr, display data, and structured errors.
Header gate (src/tools/header_gate.py): wraps each MCP ASGI app so critical headers become ContextVars and requests fail fast when missing/invalid (e.g., missing Mongo URI yields SSE-friendly JSON-RPC errors).
Manager (src/services/mcp/mcp_manager.py): caches clients, discovers tool schemas, exports OpenAI function definitions, and pins MCP session ids to thread ids for deterministic tool contexts.

Development Workflow

Run tests: uv run pytest (or uv run pytest tests/test_auth.py -k bearer for focused cases). Tests cover auth flows, prompt assembly, storage, stream variant conversions, and route parameter validation.
Interactive chat: uv run python scripts/dev_chat.py starts a REPL that exercises the same orchestrator logic, persisting outputs to disk and optionally pointing at local MCP servers.

Troubleshooting

Auth failures: verify headers include both Authorization and x-freva-rest-url. Inspect FastAPI logs for the exact HTTP status.
Missing models: ensure litellm_config.yaml is readable and contains model_name keys. available_chatbots() aborts the process if it cannot find any entries.
MCP issues: backend logs warn but continue when tool discovery fails; LiteLLM will simply not emit tool calls. Use settings.AVAILABLE_MCP_SERVERS to enable/disable targets explicitly.
File access: Make sure freva-config headers point at mounted paths and /work is mounted read-only where expected.
Mongo connectivity: _get_database() retries without URI query params. Persistent failures return HTTP 503; check vault responses and network policies.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FrevaGPT Backend (Python)

Highlights

Quick Start (deployment)

Requirements

Configure environment

Full stack via Docker Compose

Quick Start (local dev)

Requirements

Configure environment

Start docker containers in DEV mode

Repository Layout

Architecture at a Glance

API Surface

Streaming contract

Persistence, Prompts, and Assets

MCP Tooling

Development Workflow

Troubleshooting

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 492 Commits
docker		docker
resources/stableclimgen		resources/stableclimgen
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
dev.sh		dev.sh
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.yml		docker-compose.yml
litellm_config.yaml		litellm_config.yaml
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

FrevaGPT Backend (Python)

Highlights

Quick Start (deployment)

Requirements

Configure environment

Full stack via Docker Compose

Quick Start (local dev)

Requirements

Configure environment

Start docker containers in DEV mode

Repository Layout

Architecture at a Glance

API Surface

Streaming contract

Persistence, Prompts, and Assets

MCP Tooling

Development Workflow

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages