Skip to content

Latest commit

Β 

History

History
92 lines (71 loc) Β· 2.68 KB

File metadata and controls

92 lines (71 loc) Β· 2.68 KB

πŸš€ Machine Learning Portfolio

From-Scratch Implementations & Library Benchmarks

πŸ“Œ Overview

This repository showcases a machine learning portfolio focused on implementing core ML algorithms from scratch and benchmarking them against standard machine learning libraries.
The goal is to develop a strong understanding of model internals, optimization techniques, and evaluation rather than relying solely on high-level APIs.


🎯 Objectives

  • Implement fundamental machine learning algorithms from first principles
  • Understand optimization, loss functions, and convergence behavior
  • Compare custom implementations with scikit-learn models
  • Evaluate models using appropriate metrics
  • Build clean, reproducible, and well-documented ML code

🧠 Algorithms Implemented

βœ… From Scratch

  • Linear Regression (Gradient Descent, Mean Squared Error)
  • Logistic Regression (Sigmoid)
  • K-Means Clustering (Distance-based clustering)
  • Evaluation Metrics (Accuracy, Precision, Recall, RMSE)

βš™οΈ Using Libraries (Benchmarking)

  • Linear Regression – scikit-learn
  • Logistic Regression – scikit-learn
  • K-Means – scikit-learn
  • Model evaluation and comparison

πŸ—‚ Repository Structure

ML-Algorithms/
β”‚
β”œβ”€β”€ Linear-Regression/
β”‚   β”œβ”€β”€ linear-regression-from-scratch.ipynb
β”‚   └── linear-regression.ipynb
β”‚
β”œβ”€β”€ classification/
β”‚   β”œβ”€β”€ logistic_regression_from_scratch.py
β”‚   └── logistic_regression_sklearn.py
β”‚
β”œβ”€β”€ clustering/
β”‚   β”œβ”€β”€ kmeans_from_scratch.py
β”‚   └── kmeans_sklearn.py
β”‚
β”œβ”€β”€ data/
β”‚   └── sample_datasets.csv
β”‚
β”œβ”€β”€ README.md
β”œβ”€β”€ requirements.txt
└── .gitignore


---

πŸ”¬ Methodology

Each algorithm follows a consistent ML workflow:

  1. Data preprocessing and feature scaling
  2. Algorithm implementation from scratch
  3. Model training using iterative optimization
  4. Performance evaluation using suitable metrics
  5. Comparison with library-based implementations
  6. Visualization and error analysis

πŸ“Š Results & Insights

  • Custom implementations achieve performance comparable to scikit-learn baselines on benchmark datasets
  • Loss curves and visualizations help analyze convergence behavior
  • Error analysis highlights strengths and limitations of each model

πŸ›  Tech Stack

  • Language: Python
  • Libraries: NumPy, Pandas, Matplotlib, scikit-learn
  • Tools: Jupyter Notebook, Git, GitHub

βœ… Note

This project emphasizes fundamental understanding of machine learning algorithms.
Libraries are used intentionally after validating concepts through from-scratch implementations.