This repository contains pure Python implementations of fundamental Machine Learning algorithms, built from the ground up using NumPy. The goal is to demystify the "black box" of ML libraries by implementing the mathematical optimization logic manually.
The project covers Support Vector Machines (SVM) using Soft Margin optimization and Linear Regression (Univariate & Multivariate) using Gradient Descent. It also includes benchmarking against scikit-learn on real-world datasets like German Credit Data and Iris.
-
Mathematical Formulation: Implements the Hinge Loss function with L2 regularization:
$$J(\mathbf{w}, b) = \frac{1}{2} ||\mathbf{w}||^2 + C \sum_{i=1}^{n} \max(0, 1 - y_i(\mathbf{w} \cdot \mathbf{x}_i + b))$$ - Optimization: Solved using Batch Gradient Descent.
- Visualization: Includes decision boundary plotting with support vector highlighting on the Iris dataset.
-
File:
src/svm/svm_scratch.py
-
Cost Function: Mean Squared Error (MSE).
$$J(\theta) = \frac{1}{2m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)})^2$$ -
Optimization: Manual implementation of Gradient Descent to update weights (
$\theta$ ). -
Features:
- Handles multiple features (Multivariate) via matrix operations.
- Data normalization/standardization from scratch.
-
Files:
src/regression/
- A comparative study using
sklearn.svm.SVCto classify credit risk on the German Credit dataset. - Includes data preprocessing (MinMax Scaling, Standardization) and Confusion Matrix evaluation.
- Clone the repository:
git clone [https://github.com/mariamashraf731/ML-Algorithms-From-Scratch.git](https://github.com/mariamashraf731/ML-Algorithms-From-Scratch.git)
- Install Dependencies:
pip install numpy pandas matplotlib scikit-learn
- Run SVM from Scratch:
python src/svm/svm_scratch.py
- Run Regression:
python src/regression/linear_multivariate.py
- Iris Dataset: For testing SVM decision boundaries.
- German Credit Data: For binary classification (Credit Risk).
- Custom Regression Data: Synthetic datasets for testing gradient descent convergence.
- Convex Optimization: Gradient Descent implementation.
- Vectorization: Efficient matrix multiplication using NumPy.
- Regularization: Soft Margin C-parameter in SVM.
- Data Preprocessing: Standardization and Scaling.