Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
a3bb362
Merge pull request #35 from sharhar/dev
sharhar Mar 12, 2026
500cf6a
fixed shader decorators so they use the shader context API
sharhar Mar 20, 2026
9ea0712
renamed shader playground
sharhar Mar 21, 2026
9d260f9
Added OpenCL codegen option in docs shader playground
sharhar Mar 21, 2026
b6d831a
shader signature refactoring
sharhar Mar 23, 2026
a1508f2
Changed MoltenVK version to patched version
sharhar Mar 24, 2026
983071b
removed my fork of VkFFT
sharhar Mar 24, 2026
9b7a743
some more codegen reorg
sharhar Mar 24, 2026
a13dc72
Payed off more technical debt
sharhar Mar 24, 2026
52e8918
v0.0.36
sharhar Mar 27, 2026
1104e78
moved context.py from shaders/ to codegen/
sharhar Mar 28, 2026
7ffd19d
more code cleanup
sharhar Mar 28, 2026
47bd25e
codegen cleanup
sharhar Mar 28, 2026
32dfe1e
glsl fix
sharhar Mar 28, 2026
721b211
fixed typos
sharhar Mar 28, 2026
753c15f
reworked signal stuff
sharhar Mar 28, 2026
071a5e8
deleted old comments
sharhar Mar 28, 2026
bad8604
Added derministic exec_size examples in docs
sharhar Mar 28, 2026
820054d
README update
sharhar Mar 28, 2026
36a54e0
getting started docs update
sharhar Mar 28, 2026
ab3e9dd
Working to remove asserts from base submodule
sharhar Mar 28, 2026
93df2b5
Working to remove asserts from fft submodule
sharhar Mar 28, 2026
c3cfdf4
Working to remove asserts from vkfft submodule
sharhar Mar 28, 2026
4e79b58
finished removing first pass of asserts
sharhar Mar 28, 2026
8889452
finished removing codegen asserts
sharhar Mar 28, 2026
173aad1
fixed opencl ravel
sharhar Mar 29, 2026
894cf14
added missing m3 abbreviation
sharhar Mar 29, 2026
49cc9d5
moved to context based control flow blocks
sharhar Mar 29, 2026
e90be6f
control flow bug fix
sharhar Mar 29, 2026
7a7e4cf
Version 0.1.0
sharhar Mar 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
204 changes: 163 additions & 41 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,67 +1,189 @@
# Getting Started
# vkdispatch

Welcome to **vkdispatch**! This guide will help you install the library and run your first GPU-accelerated code.
`vkdispatch` is a Python GPU computing framework for writing single-source kernels in Python and dispatching them across multiple runtime backends.

**[WARNING: The documentation is still under heavy development, and has many missing sections]**
It combines runtime code generation, execution helpers, and FFT/reduction utilities in one package. The default PyPI install ships with the Vulkan backend. CUDA and OpenCL backends can be enabled with optional runtime dependencies.

> **Note:** vkdispatch requires a Vulkan-compatible GPU and drivers installed on your system. Please ensure your system meets these requirements before proceeding.
## Highlights

- Single-source Python shaders via `@vd.shader` and `vkdispatch.codegen`
- Multiple runtime backends: Vulkan, CUDA, OpenCL, and a dummy codegen-only backend
- Backend-aware code generation: GLSL for Vulkan, CUDA source for CUDA, and OpenCL C for OpenCL
- Native FFT workflows through `vd.fft`, including mapping hooks for fusion and custom I/O
- VkFFT-backed transforms through `vd.vkfft` on the Vulkan backend
- Reductions through `vd.reduce`
- Batched submission and deferred execution through `vd.CommandGraph`
- CUDA interop through `__cuda_array_interface__` and CUDA Graph capture helpers

## Installation

The default installation method for `vkdispatch` is through PyPI (pip):
### Default Vulkan Install

To install `vkdispatch` with the Vulkan backend, run:

```bash
# Install the package
pip install vkdispatch
```

On mainstream platforms — Windows (x86_64), macOS (x86_64 and Apple Silicon/arm64), and Linux (x86_64) — pip will download a **prebuilt wheel** (built with `cibuildwheel` on GitHub Actions and tagged as *manylinux* where applicable), so no compiler is needed.
This installs the core library, the code generation system, and the Vulkan runtime backend. The Vulkan backend is designed to run on systems supporting Vulkan 1.2 or higher, including macOS via a statically linked MoltenVK. Alternate backends can be added with optional dependencies as described below.

On mainstream platforms - Windows (`x86_64`), macOS (`x86_64` and Apple Silicon/`arm64`), and Linux (`x86_64`) - `pip` will usually download a prebuilt wheel, so no compiler is needed.

On less common platforms, `pip` may fall back to a source build, which takes a few minutes. See [Building From Source](https://sharhar.github.io/vkdispatch/tutorials/building_from_source.html) for toolchain requirements and developer-oriented instructions.

### Core package

For cases where only the codegen component is needed, or in environments where only the CUDA or OpenCL backends are needed, install the core package:

```bash
pip install vkdispatch-core
```

This installs the core library and codegen components, but not the Vulkan runtime backend. To enable runtime features beyond pure codegen, install the optional dependencies below.

### Optional components

- Optional CLI: `pip install vkdispatch-core[cli]`
- CUDA runtime backend: `pip install vkdispatch-core[cuda]`
- OpenCL runtime backend: `pip install vkdispatch-core[opencl]`

## Runtime backends

`vkdispatch` currently supports these runtime backends:

- `vulkan`
- `cuda`
- `opencl`
- `dummy`

If you do not explicitly select a backend, ``vkdispatch`` prefers Vulkan. When the Vulkan backend cannot be imported because it is not installed, initialization falls back to CUDA and then OpenCL.

You can select a backend explicitly in Python:

```python
import vkdispatch as vd

vd.initialize(backend="vulkan")
# vd.initialize(backend="cuda")
# vd.initialize(backend="opencl")
# vd.initialize(backend="dummy")
```

You can also select the backend with an environment variable:

```bash
export VKDISPATCH_BACKEND=vulkan
```

The dummy backend is useful for codegen-only workflows, source inspection, and development environments where no GPU runtime is available.

There are two intended shader-generation modes:

On less common platforms (e.g., non-Apple ARM or other niche architectures), pip may fall back to a **source build**, which takes a few minutes. See **[Building From Source](https://sharhar.github.io/vkdispatch/tutorials/building_from_source.html)** for toolchain requirements and developer-oriented instructions.
- Default mode: generate for the current machine/runtime. This is the normal path and is how `vkdispatch` picks backend-specific defaults and limits.
- Custom mode: initialize with `backend="dummy"` and optionally tune the dummy device limits when you want controlled codegen without relying on the current runtime.

> **Tip:** If you see output like `Building wheel for vkdispatch (pyproject.toml)`, you’re compiling from source.

## Verifying Your Installation
## Verifying your installation

To ensure `vkdispatch` is installed correctly and can detect your GPU, run:
If you installed the optional CLI, you can list devices with:

```bash
# Quick device listing
vdlist

# If the above command is unavailable, try:
python3 -m vkdispatch
# Explicit backend selection can be done with cmdline flags:
vdlist --vulkan
vdlist --cuda
vdlist --opencl
```

If the installation was successful, you should see output listing your GPU(s), for example:

```text
Device 0: Apple M2 Pro
Vulkan Version: 1.2.283
Device Type: Integrated GPU

Features:
Float32 Atomic Add: True

Properties:
64-bit Float Support: False
16-bit Float Support: True
64-bit Int Support: True
16-bit Int Support: True
Max Push Constant Size: 4096 bytes
Subgroup Size: 32
Max Compute Shared Memory Size: 32768

Queues:
0 (count=1, flags=0x7): Graphics | Compute
1 (count=1, flags=0x7): Graphics | Compute
2 (count=1, flags=0x7): Graphics | Compute
3 (count=1, flags=0x7): Graphics | Compute
You can always inspect devices from Python:

```python
import vkdispatch as vd

for device in vd.get_devices():
print(device.get_info_string())
```

## Next Steps
The reported version label depends on the active backend:

- Vulkan devices show a Vulkan version
- CUDA devices show CUDA compute capability
- OpenCL devices show an OpenCL version

## Quick start

The example below defines a simple in-place compute kernel in Python:

```python
import numpy as np
import vkdispatch as vd
import vkdispatch.codegen as vc
from vkdispatch.codegen.abbreviations import Buff, Const, f32

# @vd.shader(exec_size=lambda args: args.buff.size)
@vd.shader("buff.size")
def add_scalar(buff: Buff[f32], bias: Const[f32]):
tid = vc.global_invocation_id().x
buff[tid] = buff[tid] + bias

arr = np.arange(8, dtype=np.float32)
buff = vd.asbuffer(arr)

# If you want a non-default backend, call vd.initialize(backend=...) first.
add_scalar(buff, 1.5)

print(buff.read(0))
```

String launch sizing is the shortest form and is kept for convenience. If you want
the launch rule to be more explicit and deterministic, use the equivalent lambda form
instead: `@vd.shader(exec_size=lambda args: args.buff.size)`.

In normal usage, `vkdispatch` initializes itself and creates a default context on first runtime use. Call `vd.initialize()` and `vd.make_context()` manually only when you want non-default settings such as backend selection, custom device selection, debug logging, or multi-device Vulkan contexts.

## Codegen-Only Workflows

If you want generated source without compiling or dispatching it on the current machine, use the dummy backend explicitly:

```python
import vkdispatch as vd
import vkdispatch.codegen as vc
from vkdispatch.codegen.abbreviations import Buff, Const, f32

vd.initialize(backend='dummy')
vd.set_dummy_context_params(
subgroup_size=32,
max_workgroup_size=(128, 1, 1),
max_workgroup_count=(65535, 65535, 65535),
)
vc.set_codegen_backend('cuda')

# @vd.shader(exec_size=lambda args: args.buff.size)
@vd.shader('buff.size')
def add_scalar(buff: Buff[f32], bias: Const[f32]):
tid = vc.global_invocation_id().x
buff[tid] = buff[tid] + bias

src = add_scalar.get_src(line_numbers=True)
print(src)
```

In this mode, `vkdispatch` uses the dummy device model for launch/layout defaults and emits source for the backend selected with `vc.set_codegen_backend(...)`.

## Documentation

The docs site is still under active development, but the main entry points are here:

- [Getting Started](https://sharhar.github.io/vkdispatch/getting_started.html)
- [Tutorials](https://sharhar.github.io/vkdispatch/tutorials/index.html)
- [Python API Reference](https://sharhar.github.io/vkdispatch/python_api.html)

Some especially useful tutorials:

- **[Tutorials](https://sharhar.github.io/vkdispatch/tutorials/index.html)** — our curated guide to common workflows and examples
- **[Full Python API Reference](https://sharhar.github.io/vkdispatch/python_api.html)** — comprehensive reference for Python-facing components
- [Shader Authoring and Dispatch](https://sharhar.github.io/vkdispatch/tutorials/shader_tutorial.html)
- [Initialization and Context Creation](https://sharhar.github.io/vkdispatch/tutorials/context_system.html)
- [Command Graph Recording](https://sharhar.github.io/vkdispatch/tutorials/command_graph_tutorial.html)
- [Reductions and FFT Workflows](https://sharhar.github.io/vkdispatch/tutorials/reductions_and_fft.html)

Happy GPU programming!
Loading
Loading