A fully self-hosted, open-source AI stack for your personal knowledge base. Run powerful language models on your own VPS with a chat interface, a RAG engine that searches your notes and PDFs, a Telegram bot with multi-model routing, and a protected REST API — no cloud subscriptions required, no data leaving your server.
| Service | Purpose | Access |
|---|---|---|
| Ollama | Serves the Hermes LLM locally | Internal only |
| Open WebUI | Chat UI for the Hermes model | http://your-ip/ |
| AnythingLLM | RAG — chat with your notes & PDFs | http://your-ip:3002/ |
| Telegram Bot | Private AI assistant with model routing + file upload | Telegram |
| Nginx | Reverse proxy + Bearer token API auth | Port 80 / 3002 |
| systemd | Auto-start all services on boot | — |
Default model: NousResearch Hermes-3-Llama-3.1-8B — Q4_K_M quantization (~5.5 GB, runs on CPU)
┌──────────────────────────────────────┐
│ YOUR VPS │
│ │
Browser / API ──────▶│ Nginx (port 80) │
│ ├── / → Open WebUI │
│ └── /ollama/ → Ollama API │
│ │
Browser ──────▶│ Nginx (port 3002) → AnythingLLM │
│ │
Telegram ──────▶│ Telegram Bot │
│ ├── Local Hermes (Ollama) │
│ └── Cloud models (OpenRouter) │
│ │
│ Docker Compose │
│ ├── kbai-ollama :11434 │
│ ├── kbai-open-webui :3001 │
│ ├── kbai-anythingllm :8081 │
│ └── kbai-telegram-bot │
└──────────────────────────────────────┘
| Resource | Minimum | Recommended |
|---|---|---|
| RAM | 8 GB + 4 GB swap | 16 GB |
| Disk | 20 GB free | 40 GB free |
| OS | Ubuntu 22.04 / 24.04 | Ubuntu 24.04 LTS |
| CPU | 4 cores | 8+ cores |
| GPU | Not required | NVIDIA (auto-detected) |
The Q4_K_M model uses ~5.5 GB RAM at runtime. A 4 GB swap file is strongly recommended on 8 GB VPS instances.
SSH into your VPS as root, then run:
git clone -b claude/exciting-dirac-1j88g https://github.com/ravellerh/kbai /opt/kbai
DOMAIN=your-ip-or-domain bash /opt/kbai/scripts/setup.shReplace your-ip-or-domain with your VPS public IP or a domain name pointing to it.
The script will:
- Add 4 GB swap (prevents OOM kills on 8 GB VPS)
- Fix any broken dpkg state
- Install Docker, Nginx, and all dependencies
- Generate random API keys and secrets
- Pull and start all containers
- Configure Nginx reverse proxy with Bearer token auth
- Enable auto-start on boot via systemd
- Download the Hermes-3-Llama-3.1-8B model (~5.5 GB)
When finished, the script prints your URLs and API key — save them.
A private Telegram bot that connects to your AI stack. Features:
- Multi-model routing — switch between local Hermes and 8 cloud models with one tap
- File-to-knowledge-base — send any PDF, TXT, DOCX, MD, or CSV and it's automatically embedded into AnythingLLM
- Conversation memory — per-chat history with
/resetto clear - Private mode — locked to your Telegram chat ID only
| Model | Provider | Cost |
|---|---|---|
| Hermes 8B | Local VPS | Free (already paid) |
| Gemini Flash 1.5 | OpenRouter | ~$0.0001/msg |
| DeepSeek V3 | OpenRouter | ~$0.0003/msg |
| Qwen 2.5 72B | OpenRouter | ~$0.0005/msg |
| Llama 3.3 70B | OpenRouter | ~$0.0003/msg |
| Claude Haiku 4.5 | OpenRouter | ~$0.001/msg |
| GPT-4o mini | OpenRouter | ~$0.001/msg |
| Nemotron 70B | OpenRouter | ~$0.001/msg |
| Mistral Large | OpenRouter | ~$0.002/msg |
- Create a bot with @BotFather on Telegram — get a token
- Get a free OpenRouter API key (optional, for cloud models)
- Get an AnythingLLM API key: AnythingLLM → Settings → API Keys → Generate
- Add to
/opt/kbai/.envon your VPS:
TELEGRAM_BOT_TOKEN=your-bot-token
OPENROUTER_API_KEY=your-openrouter-key
ANYTHINGLLM_API_KEY=your-anythingllm-key- Build and start:
cd /opt/kbai && docker compose up -d --build telegram-bot- Send
/startto your bot — it will reply with your chat ID - Add
TELEGRAM_ALLOWED_CHAT_ID=<your-id>to.envand restart to lock it down
| Command | Action |
|---|---|
/start |
Show status and setup info |
/model |
Open model switcher (inline tap buttons) |
/kb |
Show knowledge base status and document count |
/reset |
Clear conversation history |
/help |
Show help |
| Send a file | Upload PDF/TXT/DOCX/MD/CSV to knowledge base |
- Open
http://your-ip/ - Create your admin account on first visit
- The Hermes model is pre-loaded — start chatting
- Open
http://your-ip:3002/ - Create your admin account
- LLM and embedding are pre-configured via environment variables — no manual setup needed
- Create Workspaces for each domain (e.g.
Research,Notes,Projects) - Upload your PDFs and Markdown notes, or send files via the Telegram bot
- Chat with your documents — AnythingLLM retrieves relevant context using RAG
The Ollama REST API is exposed at /ollama/ and requires a Bearer token.
# View your API key
grep HERMES_API_KEY /opt/kbai/.envcurl http://your-domain/ollama/api/generate \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "hf.co/NousResearch/Hermes-3-Llama-3.1-8B-GGUF:Q4_K_M",
"prompt": "Explain attention mechanisms in transformers.",
"stream": false
}'curl http://your-domain/ollama/api/chat \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "hf.co/NousResearch/Hermes-3-Llama-3.1-8B-GGUF:Q4_K_M",
"messages": [
{"role": "system", "content": "You are a helpful research assistant."},
{"role": "user", "content": "What is a transformer?"}
]
}'docker exec kbai-ollama ollama pull mistral
docker exec kbai-ollama ollama pull llama3.2
docker exec kbai-ollama ollama listRequires a domain name (not just an IP) pointing to your VPS.
apt-get install -y certbot python3-certbot-nginx
certbot --nginx -d your-domain.com# Status
docker ps
# Restart everything
systemctl restart kbai-hermes
# Logs
docker logs kbai-ollama
docker logs kbai-open-webui
docker logs kbai-anythingllm
docker logs kbai-telegram-bot
# Rebuild a single service
cd /opt/kbai && docker compose up -d --build --force-recreate telegram-bot├── docker-compose.yml # All services
├── .env.example # Environment template (copy → .env)
├── nginx/
│ ├── kbai-hermes.conf.template # Proxy: Open WebUI + Ollama API
│ └── kbai-anythingllm.conf.template # Proxy: AnythingLLM on :3002
├── telegram-bot/
│ ├── bot.py # Telegram bot
│ ├── Dockerfile
│ └── requirements.txt
├── systemd/
│ └── kbai-hermes.service # Auto-start on boot
└── scripts/
├── setup.sh # Main installer
├── install.sh # Bootstrap
└── pull-model.sh # Pull Hermes into Ollama
apt-get gets killed (OOM) during install Add swap before running setup — the Q4_K_M model needs ~5.5 GB RAM and install tools need headroom.
Open WebUI shows "Backend Required" error
Nginx is routing /api/ to Ollama instead of Open WebUI. The correct config routes /ollama/ to Ollama. Check your nginx site config matches nginx/kbai-hermes.conf.template.
AnythingLLM can't connect to Ollama
Use http://kbai-ollama:11434 as the base URL — not localhost. Services communicate over the Docker internal network.
Telegram bot not responding
Check it's running: docker ps | grep telegram. View logs: docker logs kbai-telegram-bot --tail 30.
File upload to knowledge base fails
Ensure ANYTHINGLLM_API_KEY is set in .env and at least one workspace exists in AnythingLLM. Check logs: docker logs kbai-telegram-bot --tail 20.
Port 3002 not reachable
Check UFW: ufw status. If active, run: ufw allow 3002/tcp && ufw reload.
Model pull fails / runs out of disk
Check disk space: df -h. Hermes needs ~6 GB, nomic-embed-text needs ~270 MB.