| Week | Model | Training Method | Recommended Optimizers | Notes |
|---|---|---|---|---|
| 1 | Linear Regression | Gradient descent on MSE | SGD | Sufficient for convex problem; Adam unnecessary |
| 2 | Logistic Regression | Gradient descent on cross-entropy | SGD, Momentum | Momentum helps with noisy data |
| 3 | k-Nearest Neighbors | No training (instance-based) | N/A | - |
| 4 | Naive Bayes | Probability estimation | N/A | - |
| 5 | Decision Tree | Greedy splitting | N/A | - |
| 6 | Random Forest | Ensemble of trees | N/A | - |
| 7 | Support Vector Machine (SVM) | Gradient descent on hinge loss | SGD, Momentum | Momentum for faster convergence |
| 8 | Perceptron | Gradient descent on classification error | SGD, Momentum | Momentum reduces oscillation |
| 9 | Multilayer Perceptron (MLP) | Backpropagation with nonlinear activations | Adam, RMSProp | Plain SGD usually too slow |
| 10 | MNIST with MLP | Large dataset, deeper MLP | Adam, SGD + Momentum | Adam as default; SGD + Momentum for comparison |
| 11 | Convolutional Neural Network (CNN) | Backpropagation through convolution layers | Adam, RMSProp | Adam most common |
| 12 | Recurrent Neural Network (RNN) | Backpropagation Through Time (BPTT) | Adam, RMSProp | Adam handles vanishing gradients best; SGD struggles |
| Model Category | Recommended Optimizer | Reasoning |
|---|---|---|
| Convex Models (Linear, Logistic, SVM, Perceptron) | SGD or SGD + Momentum | Simple, efficient for convex optimization |
| Deep Models (MLP, CNN, RNN) | Adam or RMSProp | Better handling of complex loss landscapes |
| Non-gradient Models (kNN, Naive Bayes, Trees, Random Forest) | None | No iterative optimization required |