A lightweight C++ tensor library implementing core PyTorch-like operations for CPU. Designed as a minimal foundation for building neural networks without the complexity of full frameworks.
Tensor Operations
- N-dimensional tensor data structure with automatic stride calculation
- Element-wise operations (add, sub, mul, div, neg, exp, log, pow, sqrt, abs, clamp) with NumPy-style broadcasting
- Matrix multiplication with batch dimension support; flash attention
- Convolution and pooling: conv2d, max_pool2d, avg_pool2d
- Reductions: sum, mean, variance, argmax, softmax
- Reshape, transpose, cat, stack, slice, pad
Neural Network Modules
- Linear, Conv2d, MaxPool2d, AvgPool2d, Flatten
- LayerNorm, BatchNorm2d, InputNormalize, Dropout
- MultiHeadAttention, TransformerEncoderLayer, TransformerEncoder
- PositionalEncoding
- Activation functions: ReLU, GELU, Sigmoid
Training
- Reverse-mode autograd (backward computation graph) over all operations
- Optimizers: SGD, Adam, AdamW
- Loss functions: MSE, MAE, BCE, NLL, Cross-entropy
Design
- CPU-only, float32 operations
- Weight loading from raw binary files
Pull torchlite into your CMake project with FetchContent:
include(FetchContent)
FetchContent_Declare(
torchlite
GIT_REPOSITORY https://github.com/joshuazhou744/torchlite-cpp
GIT_TAG main
)
FetchContent_MakeAvailable(torchlite)
target_link_libraries(myapp PRIVATE torchlite)Then #include <tl/tensor.h> in your code. The external dependencies (e.g., Eigen3) will propagate automatically.
mkdir build
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build buildThe Release build enables optimizations (-O3 march=native -DNDEBUG):
-O3: aggressive compiler optimizations-march=native: targets CPU specific instruction set-DNDEBUG: disablesassert()calls in hot paths
Run tests (development):
./build/run_testsinclude/tl/ Public API headers (tensor, ops, nn, activation, factory, autograd)
include/external/ Third-party headers (LibrosaCpp)
src/ Implementation
tests/ Test executables
bench/ Operation benchmarks
examples/ Example usages of the library
See here for a CNN binary classifier built using torchlite-cpp.
- Eigen3: required by LibrosaCpp for audio preprocessing
- LibrosaCpp: single-header mel spectrogram computation (included in
include/external/) - OpenMP: for the multithreaded GEMM kernel
- C++17 or later
- CMake 3.16+
- Eigen3 (
sudo apt install libeigen3-dev) - OpenMP (
sudo apt install libgomp1)
MIT License