Strix Halo local LLM guide: 65-87 t/s on Ryzen AI Max+ 395 128GB mini PCs. Benchmarks, setup, backend comparisons, and failure cases.
-
Updated
May 4, 2026 - Python
Strix Halo local LLM guide: 65-87 t/s on Ryzen AI Max+ 395 128GB mini PCs. Benchmarks, setup, backend comparisons, and failure cases.
vLLM Qwen 3.6-27B (AWQ-INT4) + DFlash speculative decoding on AMD Strix Halo (gfx1151 iGPU, 128 GB UMA, ROCm 7.13). 24.8 t/s single-stream, vision, tool calling, 256K context, OpenAI-compatible, Docker. Matches DGX Spark FP8+DFlash+MTP at a third of the cost. No CUDA.
Claude Code skill for AMD Strix Halo (Ryzen AI MAX+ 395) ML setup. Handles PyTorch installation (official wheels don't work with gfx1151), GTT memory config, and environment setup. Enables 30B parameter models.
Talos-O (Omni): A sovereign, embodied agentic organism forged on AMD Strix Halo. Integrating the Chimera Kernel (Linux 7.0), Zero-Copy Introspection, and the Phronesis Engine. Built from First Principles.
Docker stack: Ollama v0.21.0 built from source against ROCm 7.2.2 with native gfx1151 (Strix Halo) — serves Gemma 4 up to 256K context on AMD Ryzen AI MAX+ 395 / Radeon 8060S. Includes a 9-layer make validate ladder for the host firmware, ROCm runtime, container, and long-context inference.
Drop-in recipe for running faster-whisper on AMD Strix Halo (Ryzen AI Max+ 395, gfx1151) with Ubuntu 26.04 + ROCm 7.2.2 — no source build required
Add a description, image, and links to the ryzen-ai-max topic page so that developers can more easily learn about it.
To associate your repository with the ryzen-ai-max topic, visit your repo's landing page and select "manage topics."