From e25dac8b19022afe6a05f3ee862fe313fcee58be Mon Sep 17 00:00:00 2001 From: stefanwalcz Date: Sat, 4 Apr 2026 17:01:49 +0200 Subject: [PATCH 1/2] docs(gpu): add gfx1151 / ROCm 7.x and fix ROCm section MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Fix typo: "deditated" → "dedicated", "ROCm6" → "ROCm" - Add ROCm 7.x to requirements (alongside ROCm 6.x) - Add Ubuntu 24.04 to tested OS list - Add AMD Strix Halo / gfx1151 section with kernel params, required env vars (HSA_OVERRIDE_GFX_VERSION, ROCBLAS_USE_HIPBLASLT), and Docker Compose example - Add gfx1151 to the list of compiled GPU targets - Add ROCm version column to verified devices table - Add gfx1151 / Radeon 8060S (ROCm 7.11.0) as verified device --- docs/content/features/GPU-acceleration.md | 68 +++++++++++++++++------ 1 file changed, 51 insertions(+), 17 deletions(-) diff --git a/docs/content/features/GPU-acceleration.md b/docs/content/features/GPU-acceleration.md index aedc4751681d..8445d56cfaa4 100644 --- a/docs/content/features/GPU-acceleration.md +++ b/docs/content/features/GPU-acceleration.md @@ -151,14 +151,14 @@ llama_init_from_file: kv self size = 512.00 MB ## ROCM(AMD) acceleration -There are a limited number of tested configurations for ROCm systems however most newer deditated GPU consumer grade devices seem to be supported under the current ROCm6 implementation. +There are a limited number of tested configurations for ROCm systems however most newer dedicated GPU consumer grade devices seem to be supported under the current ROCm implementation. Due to the nature of ROCm it is best to run all implementations in containers as this limits the number of packages required for installation on host system, compatibility and package versions for dependencies across all variations of OS must be tested independently if desired, please refer to the [build]({{%relref "installation/build#Acceleration" %}}) documentation. ### Requirements -- `ROCm 6.x.x` compatible GPU/accelerator -- OS: `Ubuntu` (22.04, 20.04), `RHEL` (9.3, 9.2, 8.9, 8.8), `SLES` (15.5, 15.4) +- `ROCm 6.x.x` or `ROCm 7.x.x` compatible GPU/accelerator +- OS: `Ubuntu` (22.04, 24.04), `RHEL` (9.3, 9.2, 8.9, 8.8), `SLES` (15.5, 15.4) - Installed to host: `amdgpu-dkms` and `rocm` >=6.0.0 as per ROCm documentation. ### Recommendations @@ -166,30 +166,64 @@ Due to the nature of ROCm it is best to run all implementations in containers as - Make sure to do not use GPU assigned for compute for desktop rendering. - Ensure at least 100GB of free space on disk hosting container runtime and storing images prior to installation. +### AMD Strix Halo / gfx1151 (RDNA 3.5) + +AMD Ryzen AI MAX+ (Strix Halo) APUs with an integrated Radeon 8060S (gfx1151 / RDNA 3.5) are +supported with ROCm 7.11.0+. These systems (e.g. Geekom A9 Mega, ASUS ROG Flow Z13 2025) +provide up to 96 GB of unified VRAM accessible by the GPU. + +**Required kernel boot parameters** (add to `GRUB_CMDLINE_LINUX` and run `update-grub`): +``` +iommu=pt amdgpu.gttsize=126976 ttm.pages_limit=32505856 +``` + +**Required runtime environment variables** (set automatically in LocalAI containers): +```bash +HSA_OVERRIDE_GFX_VERSION=11.5.1 +ROCBLAS_USE_HIPBLASLT=1 +``` + +**Running LocalAI on gfx1151:** +```yaml + image: quay.io/go-skynet/local-ai:master-gpu-hipblas + environment: + - HSA_OVERRIDE_GFX_VERSION=11.5.1 + - ROCBLAS_USE_HIPBLASLT=1 + - GPU_TARGETS=gfx1151 + devices: + - /dev/dri + - /dev/kfd + group_add: + - video +``` + +For llama.cpp models, enable flash attention (`--flash-attention`) and disable mmap (`--no-mmap`) for best performance on APU systems. + ### Limitations Ongoing verification testing of ROCm compatibility with integrated backends. Please note the following list of verified backends and devices. -LocalAI hipblas images are built against the following targets: gfx900,gfx906,gfx908,gfx940,gfx941,gfx942,gfx90a,gfx1030,gfx1031,gfx1100,gfx1101 +LocalAI hipblas images are built against the following targets: gfx900,gfx906,gfx908,gfx940,gfx941,gfx942,gfx90a,gfx1030,gfx1031,gfx1100,gfx1101,gfx1151 If your device is not one of these you must specify the corresponding `GPU_TARGETS` and specify `REBUILD=true`. Otherwise you don't need to specify these in the commands below. ### Verified -The devices in the following list have been tested with `hipblas` images running `ROCm 6.0.0` - -| Backend | Verified | Devices | -| ---- | ---- | ---- | -| llama.cpp | yes | Radeon VII (gfx906) | -| diffusers | yes | Radeon VII (gfx906) | -| piper | yes | Radeon VII (gfx906) | -| whisper | no | none | -| coqui | no | none | -| transformers | no | none | -| sentencetransformers | no | none | -| transformers-musicgen | no | none | -| vllm | no | none | +The devices in the following list have been tested with `hipblas` images. + +| Backend | Verified | Devices | ROCm Version | +| ---- | ---- | ---- | ---- | +| llama.cpp | yes | Radeon VII (gfx906) | 6.0.0 | +| llama.cpp | yes | Radeon 8060S / gfx1151 (Strix Halo) | 7.11.0 | +| diffusers | yes | Radeon VII (gfx906) | 6.0.0 | +| piper | yes | Radeon VII (gfx906) | 6.0.0 | +| whisper | no | none | - | +| coqui | no | none | - | +| transformers | no | none | - | +| sentencetransformers | no | none | - | +| transformers-musicgen | no | none | - | +| vllm | no | none | - | **You can help by expanding this list.** From 1b5febdc6590a18f940b90b699816f0a46298ca5 Mon Sep 17 00:00:00 2001 From: stefanwalcz Date: Sat, 4 Apr 2026 17:39:40 +0200 Subject: [PATCH 2/2] =?UTF-8?q?fix(docs/gpu):=20correct=20gfx1151=20sectio?= =?UTF-8?q?n=20=E2=80=94=20env=20vars,=20image=20tag,=20safety=20warning?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add all 4 required env vars (HSA_OVERRIDE_GFX_VERSION, ROCBLAS_USE_HIPBLASLT, HSA_XNACK=1, HSA_ENABLE_SDMA=0) with descriptions in a table - Fix Docker Compose example to use the ROCm 7.x image tag (-gpu-hipblas-rocm7), not the ROCm 6.x image - Add explicit warning: GGML_CUDA_ENABLE_UNIFIED_MEMORY must NOT be set (even =0 activates hipMallocManaged due to getenv != nullptr check) - Add --force-recreate note (docker restart does not update container env) - Add tested hardware note (Geekom A9 Mega / Ryzen AI MAX+ 395) --- docs/content/features/GPU-acceleration.md | 35 ++++++++++++++++------- 1 file changed, 24 insertions(+), 11 deletions(-) diff --git a/docs/content/features/GPU-acceleration.md b/docs/content/features/GPU-acceleration.md index 8445d56cfaa4..c6e6e37239a1 100644 --- a/docs/content/features/GPU-acceleration.md +++ b/docs/content/features/GPU-acceleration.md @@ -169,27 +169,36 @@ Due to the nature of ROCm it is best to run all implementations in containers as ### AMD Strix Halo / gfx1151 (RDNA 3.5) AMD Ryzen AI MAX+ (Strix Halo) APUs with an integrated Radeon 8060S (gfx1151 / RDNA 3.5) are -supported with ROCm 7.11.0+. These systems (e.g. Geekom A9 Mega, ASUS ROG Flow Z13 2025) -provide up to 96 GB of unified VRAM accessible by the GPU. +supported with ROCm 7.11.0+. These systems provide up to 96 GB of unified VRAM accessible by the GPU. -**Required kernel boot parameters** (add to `GRUB_CMDLINE_LINUX` and run `update-grub`): +Tested on: Geekom A9 Mega (AMD Ryzen AI MAX+ 395, ROCm 7.11.0, Ubuntu 24.04, kernel 6.14). + +**Required kernel boot parameters** (add to `GRUB_CMDLINE_LINUX` in `/etc/default/grub`, then run `update-grub`): ``` iommu=pt amdgpu.gttsize=126976 ttm.pages_limit=32505856 ``` -**Required runtime environment variables** (set automatically in LocalAI containers): -```bash -HSA_OVERRIDE_GFX_VERSION=11.5.1 -ROCBLAS_USE_HIPBLASLT=1 -``` +**Required environment variables** for gfx1151 (set automatically in the ROCm 7.x image): + +| Variable | Value | Purpose | +|----------|-------|---------| +| `HSA_OVERRIDE_GFX_VERSION` | `11.5.1` | Tells the HSA runtime to use gfx1151 code objects | +| `ROCBLAS_USE_HIPBLASLT` | `1` | Prefer hipBLASLt over rocBLAS for GEMM (required for gfx1151) | +| `HSA_XNACK` | `1` | Enable XNACK (memory-fault retry) for APU unified memory | +| `HSA_ENABLE_SDMA` | `0` | Disable SDMA engine — causes hangs on APU/iGPU configs | -**Running LocalAI on gfx1151:** +> **Warning:** Do **not** set `GGML_CUDA_ENABLE_UNIFIED_MEMORY`. The C-level check is +> `getenv(...) != nullptr`, so even `=0` activates `hipMallocManaged` (allocates from +> system RAM instead of the 96 GB VRAM pool). + +**Running LocalAI on gfx1151** (using the ROCm 7.x image, tag suffix `-gpu-hipblas-rocm7`): ```yaml - image: quay.io/go-skynet/local-ai:master-gpu-hipblas + image: quay.io/go-skynet/local-ai:master-gpu-hipblas-rocm7 environment: - HSA_OVERRIDE_GFX_VERSION=11.5.1 - ROCBLAS_USE_HIPBLASLT=1 - - GPU_TARGETS=gfx1151 + - HSA_XNACK=1 + - HSA_ENABLE_SDMA=0 devices: - /dev/dri - /dev/kfd @@ -197,6 +206,10 @@ ROCBLAS_USE_HIPBLASLT=1 - video ``` +> **Note:** When updating the image, always recreate the container (`docker compose up --force-recreate`) +> rather than just restarting it. `docker compose restart` preserves the old container environment +> and will not pick up updated env vars from the image. + For llama.cpp models, enable flash attention (`--flash-attention`) and disable mmap (`--no-mmap`) for best performance on APU systems. ### Limitations