Skip to content

jasonjgeiger/aiModelLXCs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

aiModelLXCs

Proxmox LXC setup scripts for running local AI models in isolated containers. Styled after community-scripts/ProxmoxVE.

Targets AMD Ryzen AI 9 HX 370 (Radeon 890M) with 64GB RAM, but works on any Proxmox host with an AMD iGPU or dGPU.


Containers

Script Model Stack RAM Port
ct/llm.sh Qwen2.5 7B / 14B / 32B llama.cpp + Vulkan/RADV 8–24 GB 8080
ct/tts.sh Piper TTS Python HTTP API, CPU-only 512 MB 5500
ct/vision.sh Moondream2 / Qwen2.5-VL 7B llama.cpp multimodal + Vulkan/RADV 4–8 GB 8081

All GPU containers use /dev/dri passthrough (no full GPU passthrough required). Vulkan/RADV is used instead of ROCm — on the Radeon 890M (gfx1150), RADV outperforms ROCm by ~60% due to unified memory access via GTT.


Usage

Run each script as root on the Proxmox host:

bash ct/llm.sh
bash ct/tts.sh
bash ct/vision.sh

Each script prompts for configuration (storage, CTID, hostname, RAM, CPU cores, model selection), then handles the full build: template download, container creation, GPU passthrough config, and running the install script inside the container.

The install/ scripts are not meant to be run directly — the ct/ scripts push and execute them automatically.


Services

LLM — llama-server.service

OpenAI-compatible API on port 8080.

# Chat completions
curl http://<ip>:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen2.5","messages":[{"role":"user","content":"Hello"}]}'

# List models
curl http://<ip>:8080/v1/models

Compatible with any OpenAI SDK — set base_url to http://<ip>:8080/v1.

TTS — piper-http.service

HTTP synthesis API on port 5500. Returns audio/wav.

curl http://<ip>:5500/synthesize \
  -H "Content-Type: application/json" \
  -d '{"text":"Hello, this is a test."}' \
  --output speech.wav

# Health check
curl http://<ip>:5500/health

Optional Wyoming protocol on port 10200 (Home Assistant compatible) — enable during setup.

Voice can be changed by editing /etc/default/piper and restarting the service.

Vision — vision-server.service

OpenAI-compatible multimodal API on port 8081. Accepts base64-encoded images.

curl http://<ip>:8081/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vision",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,<b64>"}},
        {"type": "text", "text": "What is in this image?"}
      ]
    }]
  }'

A Frigate integration helper is included at /opt/frigate-integration/analyze.sh inside the container:

# Inside the vision container, or via pct exec:
/opt/frigate-integration/analyze.sh /path/to/snapshot.jpg "Is there a person here?"

Model Notes

Model files are downloaded automatically during setup. If a download fails, the service is registered but not started — drop a GGUF into /opt/models/ and start the service manually.

Vision models require two files each (text model + mmproj):

Model Text model mmproj
Moondream2 moondream2-text-model-f16.gguf moondream2-mmproj-f16.gguf
Qwen2.5-VL 7B Qwen2.5-VL-7B-Instruct-Q4_K_M.gguf Qwen2.5-VL-7B-Instruct-mmproj-f16.gguf

Verify current filenames at the source repos before downloading manually:


Managing Containers

# View logs
pct exec <ctid> -- journalctl -u llama-server -f
pct exec <ctid> -- journalctl -u piper-http -f
pct exec <ctid> -- journalctl -u vision-server -f

# Restart a service
pct exec <ctid> -- systemctl restart llama-server

# Check GPU is accessible
pct exec <ctid> -- vulkaninfo --summary

# Shell into a container
pct enter <ctid>

Hardware Requirements

  • Proxmox VE 8+
  • AMD GPU with Vulkan support (iGPU or dGPU) — /dev/dri/renderD128 must exist on the host
  • 64 GB RAM recommended for running all three containers simultaneously alongside other VMs

About

Scripts for running LLMs, TTS, Vision models on Proxmox in LXCs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages