This repository is an experimental Rust implementation of Andrej Karpathy’s Micrograd tutorial. It explores the core concepts of a tiny autograd engine, reimagined in Rust.
In this project, I aimed to follow the micrograd tutorial as closely as possible while adapting it to Rust. Due to Rust’s strict ownership and borrow rules, and the dynamic nature of an automatic differentiation engine—which requires shared, mutable references to nodes—I had to make creative design decisions to allow multiple parts of the computation graph to share and update values during backpropagation.
In this repository, you will find the complete implementation of micrograd in Rust, along with a simple multi-layer perceptron (MLP). The only external crate used is rand, which is utilized for generating random weights during the neural network’s initialization.
During development, several challenges arose:
-
Shared Ownership and Mutability:
Rust’s ownership rules meant that I had to wrap the coreValuestruct in anRc<RefCell<...>>. This pattern is common in Rust for cases where you need both shared ownership and interior mutability, but it required careful management to avoid runtime borrow errors. -
Operator Overloading:
Since Rust doesn’t have magic methods like Python, every operation (addition, multiplication, power, etc.) had to be implemented via the corresponding trait. Special care was needed for operations like exponentiation (.pow()) because Rust doesn’t have a built-in operator for it. I implemented.pow()as a method, and for some operations (like power), I even resorted to parsing parameters from a string in the_opfield—though a better design might store such parameters explicitly. -
Backward Pass and Borrowing:
A significant challenge was ensuring that during backpropagation no conflicting mutable borrows occur. In Rust, you can only have one mutable borrow at a time, so when calling the backward functions, I had to ensure that temporary immutable borrows were dropped before new mutable borrows were created. To work around these issues, I chose to store the backward functions as function pointers (inside anOption) rather than closures that capture variables.
To run the project, simply execute:
cargo buildcargo runThe main file tests the simple MLP using the same small dataset as shown in the tutorial. Initially, the predictions are off. The network is trained for 100 iterations, with the mean squared loss printed every 10 iterations, and finally, the prediction after the 101st backpropagation is displayed.
Below is an example of the output:
First predictions:
Sample 0: Prediction = 0.00814962088529803
Sample 1: Prediction = 0.5640571294882423
Sample 2: Prediction = 0.6283985267256972
Sample 3: Prediction = 0.25408125509888685
Epoch 0: Loss = 6.6381184146902505
Epoch 10: Loss = 0.06953021047692366
Epoch 20: Loss = 0.03587764387922449
Epoch 30: Loss = 0.023916247335598453
Epoch 40: Loss = 0.017854165083438733
Epoch 50: Loss = 0.014208895414006356
Epoch 60: Loss = 0.011782189119982193
Epoch 70: Loss = 0.010053569722137914
Epoch 80: Loss = 0.008761229037721184
Epoch 90: Loss = 0.007759361662150957
Epoch 100: Loss = 0.006960435621501898
Final predictions:
Sample 0: Prediction = 0.9676195017562665
Sample 1: Prediction = -0.9700001802282829
Sample 2: Prediction = -0.9502409179751493
Sample 3: Prediction = 0.9503520611992682
