gozel is a pure-Go binding for the Intel oneAPI Level Zero API — enabling direct GPU/NPU compute from Go without cgo.
Built on purego and Windows syscall, gozel loads ze_loader at runtime via FFI, avoiding all C compiler dependencies. The entire API surface is auto-generated from the official Level Zero SDK headers, keeping bindings always in sync with upstream.
- Projects Using gozel
- Features
- Platform Support
- Quick Start
- The
zePackage — High-Level API - Contributing
- License
- Appendix I. Architecture
- Appendix II. Regenerating Bindings from Headers
- Appendix III. Building SPIR-V Kernels
We maintain a list of projects built with gozel. If your project uses this library, please open a PR to add it here!
| Project | Description | Link |
|---|---|---|
- No cgo — Pure Go via purego and Windows syscall. No C toolchain required at build time.
- Auto-generated — All 200+ Level Zero functions generated directly from the official C headers, covering Core, Sysman, Tools and Runtime APIs.
- High-level wrapper — The
zesub-package provides an idiomatic Go API over the raw bindings: typed handles, error returns, automatic resource lifetime. - Kernel in Go — Embed SPIR-V binaries with
//go:embed, load and launch GPU kernels entirely from Go. - Extensible — Add new examples, wrap more APIs, or plug in your own SPIR-V pipelines.
gozel targets all Intel GPUs including Intel Arc / Iris Xe / Data Center GPUs that ship a Level Zero driver. Any device visible to
ze_loadershould work.
| OS | Architecture | Runtime Requirement |
|---|---|---|
| Windows | amd64 | Intel GPU driver with ze_loader.dll |
| Linux | amd64 | Intel compute-runtime with libze_loader.so |
| More | under | testing... |
go get github.com/fumiama/gozelpackage main
import (
"fmt"
"github.com/fumiama/gozel/ze"
)
func main() {
gpus, err := ze.InitGPUDrivers()
if err != nil {
panic(err)
}
fmt.Printf("Found %d GPU driver(s)\n", len(gpus))
devs, _ := gpus[0].DeviceGet()
for _, d := range devs {
prop, _ := d.DeviceGetProperties()
fmt.Printf(" Device: %s\n", string(prop.Name[:]))
}
// Found 1 GPU driver(s)
// Device: Graphics
}This example adds up two large float32 vectors.
cd examples/vadd
# go generate # Optional, requires SYCL toolchain (see below)
go run main.goYou will see the result like
=============== Device Basic Properties ===============
Running on device: ID = 1 , Name = Graphics @ 4.00 GHz.
=============== Device Compute Properties ===============
Max Group Size (X, Y, Z): (1024, 1024, 1024)
Max Group Count (X, Y, Z): (4294967295, 4294967295, 4294967295)
Max Total Group Size: 1024
Max Shared Local Memory: 65536
Num Subgroup Sizes: 3
Subgroup Sizes: [8 16 32 0 0 0 0 0]
=============== Computation Configuration ===============
Group Size (X, Y, Z): (1024, 1, 1)
Group Count: 65536
Total Elements (N): 67108864
Buffer Size: 256 MiB
=============== Calculation Results ===============
GPU Execution Time: 56.114500 ms
GPU Throughput: 4.78 GiB/s
=============== Validation Results ===============
CPU Execution Time: 77.190700 ms
CPU Throughput: 3.48 GiB/s
Test Passed!!!| Example | Description | Source |
|---|---|---|
| vadd | Vector addition — GPU kernel launch, memory copy, validation | examples/vadd |
All examples used ze sub-package, which wraps the raw Level Zero bindings into idiomatic Go with typed handles and method chains:
// Initialize
gpus, _ := ze.InitGPUDrivers()
// Context + Device
ctx, _ := gpus[0].ContextCreate()
devs, _ := gpus[0].DeviceGet()
// Memory
hostBuf, _ := ctx.MemAllocHost(size, alignment)
devBuf, _ := ctx.MemAllocDevice(devs[0], size, alignment)
defer ctx.MemFree(hostBuf)
defer ctx.MemFree(devBuf)
// Module + Kernel
mod, _ := ctx.ModuleCreate(devs[0], spirvBytes)
krn, _ := mod.KernelCreate("my_kernel")
krn.SetArgumentValue(0, unsafe.Sizeof(uintptr(0)), unsafe.Pointer(&devBuf))
krn.SetGroupSize(256, 1, 1)
// Command submission
q, _ := ctx.CommandQueueCreate(devs[0])
cl, _ := ctx.CommandListCreate(devs[0])
cl.AppendMemoryCopy(devBuf, hostBuf, size)
cl.AppendBarrier()
cl.AppendLaunchKernel(krn, &gozel.ZeGroupCount{Groupcountx: 4, Groupcounty: 1, Groupcountz: 1})
cl.AppendBarrier()
cl.Close()
q.ExecuteCommandLists(cl)
q.Synchronize()Compared to the raw gozel.ZeCommandListCreate(...) calls, the ze package:
- Eliminates boilerplate descriptor initialization (struct types,
Stypefields) - Returns Go
errorinstead ofZeResultcodes - Provides method syntax on handles (
ctx.CommandListCreate(dev)vs standalone functions) - Manages handle lifetimes with
defer-friendlyDestroy()methods
The ze package currently covers the most common workflows. Contributions to expand coverage are very welcome — see Contributing.
🙌 Have an example to share? We'd love to grow this table — see Contributing.
Contributions of all kinds are welcome. Some particularly impactful areas:
- Examples — Add new
examples/demonstrating different GPU compute patterns (matrix multiply, reduction, image processing, etc.). Every example helps new users get started faster. zepackage coverage — The high-level wrapper doesn't cover the full Level Zero surface yet. Wrapping additional APIs (events, fences, images, sysman queries, etc.) directly benefits all downstream users.- Testing — Help improve test coverage across packages.
- Documentation — Improve godoc comments, add usage guides, or translate documentation.
- Project showcase — If you use gozel in your project, open a PR to add it to the Projects Using gozel table above.
This project is licensed under the GNU Affero General Public License v3.0.
The FFI layer (internal/zecall) loads the Level Zero loader library at runtime, caches procedure addresses, and dispatches calls through purego or Windows syscall. All pointer arguments are protected against GC collection during syscalls via go:uintptrescapes.
gozel/
├── api.go # Auto-generated: registers all L0 functions at init
├── core_*.go # Auto-generated: Core API bindings (ze*)
├── rntm_*.go # Auto-generated: Runtime API bindings (zer*)
├── sysm_*.go # Auto-generated: Sysman API bindings (zes*)
├── tols_*.go # Auto-generated: Tools API bindings (zet*)
├── ze/ # High-level idiomatic Go wrapper
│ ├── init.go # Driver initialization
│ ├── context.go # Context management
│ ├── device.go # Device enumeration & properties
│ ├── module.go # SPIR-V module loading
│ ├── kernel.go # Kernel creation & argument binding
│ ├── mem.go # Device/host memory allocation
│ └── command.go # Command queues, lists, barriers
├── internal/zecall/ # purego FFI layer (loads ze_loader at runtime)
├── cmd/gen/ # Code generator: parses L0 headers → Go source
├── spec/ # Optional L0 SDK headers for dev purpose (input to cmd/gen)
└── examples/
├── quick_start/ # The quick start shown in this README
└── vadd/ # Vector addition: SYCL kernel + Go host
gozel includes a code generator (cmd/gen) that parses the four Level Zero API headers and produces all core_*.go, rntm_*.go, sysm_*.go, tols_*.go files plus api.go.
Place (or symlink) the Level Zero SDK under spec/:
spec/
├── include/level_zero/
│ ├── ze_api.h # Core
│ ├── zer_api.h # Runtime
│ ├── zes_api.h # Sysman
│ └── zet_api.h # Tools
└── lib/
Then run:
go run ./cmd/gen -spec ./specThe generator can download the SDK directly from the level-zero releases:
go run ./cmd/gen -spec v1.28.0go generate .This invokes cmd/gen with the local spec/ directory as configured in doc.go.
GPU kernels in the examples folder are written in SYCL C++ and compiled to SPIR-V for embedding into Go programs, which is a little bit hacky. You can also use ocloc, which is a common practice and you can search for the build doc elsewhere. The build pipeline uses go generate directives:
main.cpp ──clang++ -fsycl──▶ device_kern.bc
│
sycl-post-link
│
▼ device_kern_0.bc
│
clang++ -emit-llvm -S
│
▼ device_kern.ll
│
llvm-spirv
│
▼ main.spv ← embedded via //go:embed
- Intel LLVM SYCL Compiler (provides
clang++with SYCL support,sycl-post-link, etc.)
cd examples/vadd
go generate # compiles main.cpp → main.spv
go run main.go # runs vector addition on GPU