Based off of this video: https://www.youtube.com/watch?v=w8yWXqWQYmU
- Generalized functions to work for any number of layers
- Initialized weights using Kaiming initialization Results: Accuracy after 500 epochs with layer sizes [784, 128, 64, 10] increased to 94% (from original 84%).