Grilly

Deep learning, well done.

GPU-accelerated neural network framework using Vulkan compute shaders. PyTorch-like API that runs on any GPU — AMD, NVIDIA, Intel — no CUDA dependency. 190 GLSL compute shaders compiled to SPIR-V, dispatched through a native C++ layer.

Alpha software. APIs may change between minor versions.

Installation

pip install grilly

For GPU acceleration (requires Vulkan SDK and C++ toolchain):

git clone https://github.com/grillcheese-ai/grilly.git
cd grilly
pip install -e ".[dev]"
cmake -B build -DPYBIND11_FINDPYTHON=ON
cmake --build build --config Release
cp build/Release/grilly_core.*.pyd .   # Windows
# cp build/grilly_core.*.so .          # Linux

Pre-built C++ extension (Windows x64 only):

Download grilly_core.cp312-win_amd64.pyd from the latest release and place it in your grilly install directory:

# Find where grilly is installed
python -c "import grilly; print(grilly.__file__)"
# Copy the .pyd to that directory
cp grilly_core.cp312-win_amd64.pyd /path/to/grilly/

Without the C++ extension, grilly works fully via pure Python + numpy fallbacks — just without GPU acceleration.

See INSTALL.md for full setup, Ubuntu instructions, and troubleshooting.

Requirements

	Minimum	Recommended
Python	3.12+	3.12
GPU VRAM	8 GB	12 GB+
System RAM	32 GB	64 GB
Vulkan	1.1+	Latest drivers

Supported GPUs: AMD (RX 5000+), NVIDIA (GTX 1060+), Intel (Arc A-series).

Quick Start

import numpy as np
from grilly import nn
from grilly.optim import AdamW

model = nn.Sequential(
    nn.Linear(784, 256),
    nn.ReLU(),
    nn.Linear(256, 10),
)

optimizer = AdamW(model.parameters(), lr=1e-3)
loss_fn = nn.CrossEntropyLoss()

x = np.random.randn(32, 784).astype(np.float32)
targets = np.random.randint(0, 10, (32,))

logits = model(x)
loss = loss_fn(logits, targets)
grad = loss_fn.backward(np.ones_like(loss), logits, targets)

model.zero_grad()
model.backward(grad)
optimizer.step()

Autograd

from grilly.nn import Variable, tensor

x = Variable(tensor([1.0, 2.0, 3.0]), requires_grad=True)
y = (x * x).sum()
y.backward()
print(x.grad)  # [2.0, 4.0, 6.0]

Functional API

import grilly.functional as F

F.linear(x, weight, bias)
F.relu(x)
F.softmax(x, dim=-1)
F.flash_attention2(q, k, v)

Architecture

Python (VulkanTensor) → C++ Bridge (grilly_core) → Vulkan Compute Shaders
  nn/ modules            pybind11 bindings           190 SPIR-V shaders
  functional/ ops        dual-validity GPU/CPU        AMD / NVIDIA / Intel
  optim/                 zero CPU↔GPU ping-pong       No CUDA needed

Package layout:

grilly/
├── backend/        # Vulkan GPU dispatch (core, compute, pipelines, autograd)
├── cpp/            # C++ pybind11 extension — grilly_core native ops
├── nn/             # nn.Module layers, SNN framework, multimodal fusion, autograd
├── functional/     # Stateless F.* API (mirrors torch.nn.functional)
├── optim/          # Optimizers and LR schedulers
├── utils/          # DataLoader, VulkanTensor, HuggingFaceBridge, checkpointing
├── shaders/        # 190 GLSL compute shaders + compiled SPIR-V
├── experimental/   # VSA, MoE routing, temporal reasoning, cognitive controller
└── tests/          # 1,820 tests

What's New in 0.5.0 "GPU-First"

C++ Tensor with dual-validity tracking — data stays GPU-resident between ops; no CPU ping-pong
Flash Attention 3 with subgroup acceleration
HYLAAttention (softmax-free), FNetMixing, SympFormerBlock
TAPPA q-similarity for adaptive KV cache eviction
HDC packed ops — 32x memory compression + block-code circular convolution
Sanger GHA for neurogenesis
DisARM gradient estimator
JIT compilation framework (@grilly.jit)
Automatic Mixed Precision (autocast + GradScaler)
ProjectionHeads for structured embeddings
StreamingPipeline for batched embed + upload
bindings.cpp refactored into 11 focused files

Features

Layers

Category	Modules
Linear	`Linear`, `Embedding`, `Dropout`
Convolution	`Conv1d`, `Conv2d`
Recurrent	`LSTM`, `LSTMCell`, `GRU`, `GRUCell`
Normalization	`LayerNorm`, `RMSNorm`, `BatchNorm1d`, `BatchNorm2d`
Activations	`ReLU`, `GELU`, `SiLU`, `SwiGLU`, `GCU`, `RoSwish`
Attention	`FlashAttention2/3`, `HYLAAttention`, `MultiheadAttention`, `RoPE`
LoRA	`LoRALinear`, `LoRAAttention`, `LoRAModel`
Pooling	`MaxPool2d`, `AvgPool2d`, `AdaptiveMaxPool2d`
Loss	`MSELoss`, `CrossEntropyLoss`, `BCELoss`
Containers	`Sequential`, `Residual`

Spiking Neural Networks

Neuron models: IFNode, LIFNode, ParametricLIFNode
Surrogate gradients: ATan, Sigmoid, FastSigmoid
Temporal containers: SeqToANNContainer, MultiStepContainer
ANN-to-SNN conversion: Converter, VoltageScaler

Optimizers

AdamW, Adam, SGD, NLMS, NaturalGradient, AutoHypergradientAdamW (OSGM-style auto LR), plus schedulers: StepLR, CosineAnnealingLR, ReduceLROnPlateau.

Ecosystem

Package	Description
optimum-grilly	HuggingFace Optimum backend — `from_pretrained` → Vulkan inference
CubeMind	Neuro-vector-symbolic reasoning powered by grilly 0.5.0

Testing

uv run pytest tests/ -v                          # all tests (requires Vulkan)
uv run pytest tests/ -m "not gpu" -v             # CPU-only
uv run pytest tests/ --cov=. --cov-report=term   # with coverage

Environment Variables

Variable	Description	Default
`VK_GPU_INDEX`	Select GPU by index	`0`
`GRILLY_DEBUG`	Enable debug logging (`1` = on)	off
`ALLOW_CPU_VULKAN`	Allow Mesa llvmpipe software Vulkan	off

Contributing

Fork the repo and create a feature branch
Add tests for new features
Run ruff check . and uv run pytest tests/ -v
Submit a pull request

License

MIT License — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 249 Commits
.github/workflows		.github/workflows
assets		assets
backend		backend
benchmarks		benchmarks
cpp		cpp
docs		docs
examples		examples
experimental		experimental
experimental_datasets		experimental_datasets
functional		functional
grilly_datasets		grilly_datasets
howtos		howtos
nn		nn
optim		optim
scripts		scripts
shaders		shaders
tests		tests
third_party		third_party
tutorials		tutorials
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CMakeLists.txt		CMakeLists.txt
INSTALL.md		INSTALL.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
SUPPORTED_DEVICES.md		SUPPORTED_DEVICES.md
__init__.py		__init__.py
grilly_core.exp		grilly_core.exp
grilly_core.lib		grilly_core.lib
grilly_core.pdb		grilly_core.pdb
grilly_core_lib.lib		grilly_core_lib.lib
grilly_core_lib.pdb		grilly_core_lib.pdb
main.py		main.py
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
rufferrors.txt		rufferrors.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Grilly

Installation

Requirements

Quick Start

Autograd

Functional API

Architecture

What's New in 0.5.0 "GPU-First"

Features

Layers

Spiking Neural Networks

Optimizers

Ecosystem

Testing

Environment Variables

Contributing

License

About

Uh oh!

Releases 18

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Grilly

Installation

Requirements

Quick Start

Autograd

Functional API

Architecture

What's New in 0.5.0 "GPU-First"

Features

Layers

Spiking Neural Networks

Optimizers

Ecosystem

Testing

Environment Variables

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 18

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages