Deep learning, well done.
GPU-accelerated neural network framework using Vulkan compute shaders. PyTorch-like API that runs on any GPU — AMD, NVIDIA, Intel — no CUDA dependency. 190 GLSL compute shaders compiled to SPIR-V, dispatched through a native C++ layer.
Alpha software. APIs may change between minor versions.
pip install grillyFor GPU acceleration (requires Vulkan SDK and C++ toolchain):
git clone https://github.com/grillcheese-ai/grilly.git
cd grilly
pip install -e ".[dev]"
cmake -B build -DPYBIND11_FINDPYTHON=ON
cmake --build build --config Release
cp build/Release/grilly_core.*.pyd . # Windows
# cp build/grilly_core.*.so . # LinuxPre-built C++ extension (Windows x64 only):
Download grilly_core.cp312-win_amd64.pyd from the latest release and place it in your grilly install directory:
# Find where grilly is installed
python -c "import grilly; print(grilly.__file__)"
# Copy the .pyd to that directory
cp grilly_core.cp312-win_amd64.pyd /path/to/grilly/Without the C++ extension, grilly works fully via pure Python + numpy fallbacks — just without GPU acceleration.
See INSTALL.md for full setup, Ubuntu instructions, and troubleshooting.
| Minimum | Recommended | |
|---|---|---|
| Python | 3.12+ | 3.12 |
| GPU VRAM | 8 GB | 12 GB+ |
| System RAM | 32 GB | 64 GB |
| Vulkan | 1.1+ | Latest drivers |
Supported GPUs: AMD (RX 5000+), NVIDIA (GTX 1060+), Intel (Arc A-series).
import numpy as np
from grilly import nn
from grilly.optim import AdamW
model = nn.Sequential(
nn.Linear(784, 256),
nn.ReLU(),
nn.Linear(256, 10),
)
optimizer = AdamW(model.parameters(), lr=1e-3)
loss_fn = nn.CrossEntropyLoss()
x = np.random.randn(32, 784).astype(np.float32)
targets = np.random.randint(0, 10, (32,))
logits = model(x)
loss = loss_fn(logits, targets)
grad = loss_fn.backward(np.ones_like(loss), logits, targets)
model.zero_grad()
model.backward(grad)
optimizer.step()from grilly.nn import Variable, tensor
x = Variable(tensor([1.0, 2.0, 3.0]), requires_grad=True)
y = (x * x).sum()
y.backward()
print(x.grad) # [2.0, 4.0, 6.0]import grilly.functional as F
F.linear(x, weight, bias)
F.relu(x)
F.softmax(x, dim=-1)
F.flash_attention2(q, k, v)Python (VulkanTensor) → C++ Bridge (grilly_core) → Vulkan Compute Shaders
nn/ modules pybind11 bindings 190 SPIR-V shaders
functional/ ops dual-validity GPU/CPU AMD / NVIDIA / Intel
optim/ zero CPU↔GPU ping-pong No CUDA needed
Package layout:
grilly/
├── backend/ # Vulkan GPU dispatch (core, compute, pipelines, autograd)
├── cpp/ # C++ pybind11 extension — grilly_core native ops
├── nn/ # nn.Module layers, SNN framework, multimodal fusion, autograd
├── functional/ # Stateless F.* API (mirrors torch.nn.functional)
├── optim/ # Optimizers and LR schedulers
├── utils/ # DataLoader, VulkanTensor, HuggingFaceBridge, checkpointing
├── shaders/ # 190 GLSL compute shaders + compiled SPIR-V
├── experimental/ # VSA, MoE routing, temporal reasoning, cognitive controller
└── tests/ # 1,820 tests
- C++ Tensor with dual-validity tracking — data stays GPU-resident between ops; no CPU ping-pong
- Flash Attention 3 with subgroup acceleration
- HYLAAttention (softmax-free), FNetMixing, SympFormerBlock
- TAPPA q-similarity for adaptive KV cache eviction
- HDC packed ops — 32x memory compression + block-code circular convolution
- Sanger GHA for neurogenesis
- DisARM gradient estimator
- JIT compilation framework (
@grilly.jit) - Automatic Mixed Precision (
autocast+GradScaler) - ProjectionHeads for structured embeddings
- StreamingPipeline for batched embed + upload
bindings.cpprefactored into 11 focused files
| Category | Modules |
|---|---|
| Linear | Linear, Embedding, Dropout |
| Convolution | Conv1d, Conv2d |
| Recurrent | LSTM, LSTMCell, GRU, GRUCell |
| Normalization | LayerNorm, RMSNorm, BatchNorm1d, BatchNorm2d |
| Activations | ReLU, GELU, SiLU, SwiGLU, GCU, RoSwish |
| Attention | FlashAttention2/3, HYLAAttention, MultiheadAttention, RoPE |
| LoRA | LoRALinear, LoRAAttention, LoRAModel |
| Pooling | MaxPool2d, AvgPool2d, AdaptiveMaxPool2d |
| Loss | MSELoss, CrossEntropyLoss, BCELoss |
| Containers | Sequential, Residual |
- Neuron models:
IFNode,LIFNode,ParametricLIFNode - Surrogate gradients:
ATan,Sigmoid,FastSigmoid - Temporal containers:
SeqToANNContainer,MultiStepContainer - ANN-to-SNN conversion:
Converter,VoltageScaler
AdamW, Adam, SGD, NLMS, NaturalGradient, AutoHypergradientAdamW (OSGM-style auto LR), plus schedulers: StepLR, CosineAnnealingLR, ReduceLROnPlateau.
| Package | Description |
|---|---|
| optimum-grilly | HuggingFace Optimum backend — from_pretrained → Vulkan inference |
| CubeMind | Neuro-vector-symbolic reasoning powered by grilly 0.5.0 |
uv run pytest tests/ -v # all tests (requires Vulkan)
uv run pytest tests/ -m "not gpu" -v # CPU-only
uv run pytest tests/ --cov=. --cov-report=term # with coverage| Variable | Description | Default |
|---|---|---|
VK_GPU_INDEX |
Select GPU by index | 0 |
GRILLY_DEBUG |
Enable debug logging (1 = on) |
off |
ALLOW_CPU_VULKAN |
Allow Mesa llvmpipe software Vulkan | off |
- Fork the repo and create a feature branch
- Add tests for new features
- Run
ruff check .anduv run pytest tests/ -v - Submit a pull request
MIT License — see LICENSE for details.
