Proxmox LXC setup scripts for running local AI models in isolated containers. Styled after community-scripts/ProxmoxVE.
Targets AMD Ryzen AI 9 HX 370 (Radeon 890M) with 64GB RAM, but works on any Proxmox host with an AMD iGPU or dGPU.
| Script | Model | Stack | RAM | Port |
|---|---|---|---|---|
ct/llm.sh |
Qwen2.5 7B / 14B / 32B | llama.cpp + Vulkan/RADV | 8–24 GB | 8080 |
ct/tts.sh |
Piper TTS | Python HTTP API, CPU-only | 512 MB | 5500 |
ct/vision.sh |
Moondream2 / Qwen2.5-VL 7B | llama.cpp multimodal + Vulkan/RADV | 4–8 GB | 8081 |
All GPU containers use /dev/dri passthrough (no full GPU passthrough required). Vulkan/RADV is used instead of ROCm — on the Radeon 890M (gfx1150), RADV outperforms ROCm by ~60% due to unified memory access via GTT.
Run each script as root on the Proxmox host:
bash ct/llm.sh
bash ct/tts.sh
bash ct/vision.shEach script prompts for configuration (storage, CTID, hostname, RAM, CPU cores, model selection), then handles the full build: template download, container creation, GPU passthrough config, and running the install script inside the container.
The install/ scripts are not meant to be run directly — the ct/ scripts push and execute them automatically.
OpenAI-compatible API on port 8080.
# Chat completions
curl http://<ip>:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"qwen2.5","messages":[{"role":"user","content":"Hello"}]}'
# List models
curl http://<ip>:8080/v1/modelsCompatible with any OpenAI SDK — set base_url to http://<ip>:8080/v1.
HTTP synthesis API on port 5500. Returns audio/wav.
curl http://<ip>:5500/synthesize \
-H "Content-Type: application/json" \
-d '{"text":"Hello, this is a test."}' \
--output speech.wav
# Health check
curl http://<ip>:5500/healthOptional Wyoming protocol on port 10200 (Home Assistant compatible) — enable during setup.
Voice can be changed by editing /etc/default/piper and restarting the service.
OpenAI-compatible multimodal API on port 8081. Accepts base64-encoded images.
curl http://<ip>:8081/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "vision",
"messages": [{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,<b64>"}},
{"type": "text", "text": "What is in this image?"}
]
}]
}'A Frigate integration helper is included at /opt/frigate-integration/analyze.sh inside the container:
# Inside the vision container, or via pct exec:
/opt/frigate-integration/analyze.sh /path/to/snapshot.jpg "Is there a person here?"Model files are downloaded automatically during setup. If a download fails, the service is registered but not started — drop a GGUF into /opt/models/ and start the service manually.
Vision models require two files each (text model + mmproj):
| Model | Text model | mmproj |
|---|---|---|
| Moondream2 | moondream2-text-model-f16.gguf |
moondream2-mmproj-f16.gguf |
| Qwen2.5-VL 7B | Qwen2.5-VL-7B-Instruct-Q4_K_M.gguf |
Qwen2.5-VL-7B-Instruct-mmproj-f16.gguf |
Verify current filenames at the source repos before downloading manually:
# View logs
pct exec <ctid> -- journalctl -u llama-server -f
pct exec <ctid> -- journalctl -u piper-http -f
pct exec <ctid> -- journalctl -u vision-server -f
# Restart a service
pct exec <ctid> -- systemctl restart llama-server
# Check GPU is accessible
pct exec <ctid> -- vulkaninfo --summary
# Shell into a container
pct enter <ctid>- Proxmox VE 8+
- AMD GPU with Vulkan support (iGPU or dGPU) —
/dev/dri/renderD128must exist on the host - 64 GB RAM recommended for running all three containers simultaneously alongside other VMs