Add amdxdna HAL driver for AMD XDNA NPUs by jtuyls · Pull Request #37 · ROCm/hrx-system

jtuyls · 2026-06-05T14:07:41Z

Introduce an IREE HAL runtime driver targeting AMD XDNA NPUs directly through the in-kernel amdxdna driver's DRM ioctl ABI (kernel-managed queue / KMQ). It is self-contained: a small static user-space shim links into the runtime and talks to the device directly, keeping the dependency surface minimal. The driver is gated by the IREE_HAL_DRIVER_AMDXDNA build option and wired into the HAL driver registry, init.c, and libhrx's GPU-driver selection.

Executable format: a new pdi_executable_def.fbs ("PDIX") schema models an executable as a shared PDI pool plus per-entry-point runs. Each run is an XAie transaction ("TXN") control-code stream with an optional control-packet data_payload (array reconfiguration) and an optional host patch_table. Entry points reference PDIs by index, so several can share a single loaded PDI (e.g. manually merged kernels).

Submission paths: by default each command is submitted as ERT_START_CU and the firmware patches shim-DMA addresses. An opt-in path (--amdxdna_cmd_chain) batches a command buffer's dispatches into one ERT_CMD_CHAIN, host-patching the buffer-descriptor addresses from the compiler-emitted patch_table (validated for npu4).

Device model: host-side timeline semaphores and a single-worker async queue that defers HAL queue ops until their waits are satisfied and serializes all NPU access. Adds the allocator, buffers, no-op executable cache, events, and the Linux/KMQ native binding.

Shim: the user-space shim under shim/linux/kmq is self-contained, exception-free code rewritten from amd/xdna-driver's shim, plus the verbatim kernel UAPI (amdxdna_accel.h, GPL-2.0-WITH-syscall-note) and ERT ABI (ert.h, dual-licensed) headers. Provenance and per-file licenses are documented in shim/linux/kmq/README.md; per-file SPDX headers are authoritative.

Tests cover the allocator, buffers, async queue, driver, executable parsing/verification, semaphores/events, and the host patch-table logic (TXN op sizing, sentinel constant patching, and address patching).

jtuyls · 2026-06-08T18:20:10Z

@benvanik Could you help review this PR?

Batch each FLM runlist into one HRX ERT_CMD_CHAIN (forward_runlist) instead of one synchronous dispatch per kernel, amortizing per-dispatch submit/completion overhead. On by default in the shim; set FLM_CHAIN=0 to fall back to per-dispatch. Requires the HRX amdxdna command-chain support in ROCm/hrx-system#37. Measured (Qwen3-0.6B, Strix Point, flm bench): decode 45.1/33.0/16.8/10.0 tok/s at 1k/4k/16k/32k (vs 39.6/30.2/15.0/9.4 per-dispatch). Adds bench/: a standalone, HRX-only microbenchmark (libhrx.so only, no shim/ runtime) replaying one captured runlist as an ERT_CMD_CHAIN vs separate dispatches. The shim's env-gated FLM_DUMP_RUNLIST capture regenerates the (uncommitted) runlist artifacts locally. Self-contained: references only this branch + an HRX checkout/build. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Adds the amdxdna HAL driver with Linux KMQ and Windows MCDM native shims, PDIX and XADX executable schemas, async queue execution, transfer-queue support, native command recording/submission paths, HRX runtime integration, and amdxdna host CI coverage. Includes unit and CTS coverage for allocator/buffer/device/event/semaphore/executable paths, command-buffer planning and caches, transfer queue behavior, XADX/PDIX artifact handling, and platform shim utilities.

jtuyls force-pushed the amdxdna-hal-native branch 3 times, most recently from e4dc83e to 565102b Compare June 5, 2026 20:29

jtuyls requested a review from benvanik June 5, 2026 20:31

jtuyls force-pushed the amdxdna-hal-native branch 2 times, most recently from c59c420 to 24bcc85 Compare June 15, 2026 19:36

jtuyls changed the title ~~Add amdxdna HAL driver for AMD XDNA NPUs (Linux KMQ)~~ Add amdxdna HAL driver for AMD XDNA NPUs Jun 15, 2026

jtuyls mentioned this pull request Jun 15, 2026

Make core HRX build helpers Windows-safe #97

Merged

jtuyls force-pushed the amdxdna-hal-native branch 6 times, most recently from 584a451 to 9df8d43 Compare June 17, 2026 11:12

jtuyls force-pushed the amdxdna-hal-native branch from 9df8d43 to 637809f Compare June 17, 2026 11:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add amdxdna HAL driver for AMD XDNA NPUs#37

Add amdxdna HAL driver for AMD XDNA NPUs#37
jtuyls wants to merge 1 commit into
ROCm:mainfrom
jtuyls:amdxdna-hal-native

jtuyls commented Jun 5, 2026

Uh oh!

jtuyls commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jtuyls commented Jun 5, 2026

Uh oh!

jtuyls commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant