A production-ready, open-source LLMOps stack template that combines:
- LiteLLM β unified LLM API gateway, virtual keys, cost allocation, model access management
- Langfuse β LLM observability, evaluation, prompt management, and dataset creation
Deploy once, connect any LLM provider, control costs and access, and get full observability β all from a single docker compose up.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Your Application β
β (uses standard OpenAI SDK pointed at LiteLLM) β
ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββ
β OpenAI-compatible API (port 4000)
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LiteLLM Proxy β
β β’ Unified API for OpenAI / Azure / Anthropic / Ollama / β¦ β
β β’ Virtual keys & team budgets β
β β’ Model access control & rate limiting β
β β’ Cost tracking & spend logs β
β β’ Redis caching β
βββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββ¬ββββββ
β forwards requests β spend / metrics β traces
βΌ βΌ βΌ
LLM Providers PostgreSQL (litellm db) Langfuse Server
(OpenAI, Azure, (port 3000)
Anthropic, Ollama) ββββββββββββββββββ
β Langfuse UI β
β β’ Traces β
β β’ Evaluations β
β β’ Prompts β
β β’ Datasets β
βββββββββ¬βββββββββ
β
βββββββββββββββββββββββββββββΌββββββββββββββββ
βΌ βΌ βΌ
PostgreSQL ClickHouse MinIO
(langfuse db) (analytics store) (blob store)
| Service | Image | Default Port | Purpose |
|---|---|---|---|
| litellm | ghcr.io/berriai/litellm:main-latest |
4000 | LLM API gateway |
| langfuse-server | langfuse/langfuse:3 |
3000 | Observability UI & API |
| langfuse-worker | langfuse/langfuse-worker:3 |
β | Background job processor |
| postgres | postgres:16-alpine |
5432 (internal) | Relational store |
| clickhouse | clickhouse/clickhouse-server:24.12 |
8123 (internal) | Analytics / event store |
| redis | redis:7-alpine |
6379 (internal) | Cache & queue |
| minio | minio/minio:latest |
9000 / 9001 | S3-compatible blob storage |
- Docker β₯ 24 and Docker Compose v2
- An API key for at least one LLM provider (OpenAI, Azure, Anthropic, or a local Ollama install)
git clone https://github.com/your-org/GenAIOps-OSS.git
cd GenAIOps-OSS
cp .env.example .envEdit .env and fill in your real values:
# Required β change ALL placeholder values
LITELLM_MASTER_KEY=sk-litellm-your-secret-key
LITELLM_SALT_KEY=a-random-32-character-string-here
LANGFUSE_NEXTAUTH_SECRET=another-32-char-random-string
LANGFUSE_SALT=yet-another-random-salt
# At least one LLM provider key
OPENAI_API_KEY=sk-...Security: never commit
.envto version control. The.gitignorealready excludes it.
docker compose up -dFirst run downloads all images (~3 GB) and runs database migrations β allow ~2 minutes.
Check that everything is healthy:
docker compose ps| Service | URL | Default credentials |
|---|---|---|
| LiteLLM API | http://localhost:4000 | Bearer LITELLM_MASTER_KEY |
| LiteLLM UI | http://localhost:4000/ui | admin / LITELLM_MASTER_KEY |
| Langfuse UI | http://localhost:3000 | Create account on first visit |
| MinIO Console | http://localhost:9001 | MINIO_ROOT_USER / password |
cd app
pip install -r requirements.txt
python main.pyVirtual keys let you assign budgets and track spend per team, project, or user.
# Create a virtual key for a team with a monthly $50 budget
curl -X POST http://localhost:4000/key/generate \
-H "Authorization: Bearer $LITELLM_MASTER_KEY" \
-H "Content-Type: application/json" \
-d '{
"team_id": "team-engineering",
"max_budget": 50,
"budget_duration": "monthly",
"models": ["gpt-4o-mini", "gpt-3.5-turbo"],
"metadata": {"project": "customer-chat"}
}'See docs/cost-allocation.md for full details.
Control which models each virtual key or team can access:
# Create a read-only research key restricted to cheap models
curl -X POST http://localhost:4000/key/generate \
-H "Authorization: Bearer $LITELLM_MASTER_KEY" \
-H "Content-Type: application/json" \
-d '{
"models": ["gpt-4o-mini", "claude-3-haiku"],
"tpm_limit": 100000,
"rpm_limit": 100
}'See docs/model-access-management.md for full details.
Every request through LiteLLM is automatically traced in Langfuse. Open http://localhost:3000 to see:
- Traces β full request/response detail per call
- Metrics β latency, token usage, cost over time
- Evaluations β attach human or LLM-generated quality scores
- Sessions β group traces into user sessions
See docs/observability.md for full details.
GenAIOps-OSS/
βββ docker-compose.yml # Orchestrates all services
βββ .env.example # Environment variables template
βββ litellm/
β βββ Dockerfile # Extends LiteLLM image with config
β βββ config.yaml # LiteLLM proxy configuration
βββ app/
β βββ main.py # Demo script (run with python main.py)
β βββ requirements.txt
β βββ utils/
β β βββ llm_client.py # OpenAI client factory for LiteLLM proxy
β β βββ tracing.py # Langfuse tracing helpers
β β βββ cost_tracker.py # LiteLLM spend API client
β βββ examples/
β βββ chat_completion.py # Multi-model chat example
β βββ evaluation.py # LLM-as-a-judge evaluation
β βββ prompt_management.py# Langfuse prompt CRUD
βββ tests/
β βββ conftest.py # pytest fixtures
β βββ test_utils.py # Unit tests (no external services)
β βββ requirements.txt
βββ scripts/
β βββ init-postgres.sh # Creates litellm + langfuse databases
βββ docs/
βββ architecture.md
βββ configuration.md
βββ cost-allocation.md
βββ model-access-management.md
βββ observability.md
βββ evaluation.md
βββ prompt-management.md
See docs/configuration.md for a full reference of litellm/config.yaml.
pip install -r tests/requirements.txt -r app/requirements.txt
pytest tests/ -vTests are fully offline β all external services are mocked.
- Fork the repository and create a feature branch.
- Make your changes and add tests where appropriate.
- Run
pytest tests/ -vto verify all tests pass. - Open a pull request with a clear description of the change.
Please keep commits focused: one logical change per commit.