Machine Learning & Data Science Monorepo

A structured monorepo of Jupyter notebooks covering data science foundations through classic machine learning. This repo favors the mainstream PyData stack - NumPy, pandas, Matplotlib/Seaborn, and scikit‑learn - with minimal external dependencies for a smooth, reproducible setup.

At a Glance

Audience: learners and practitioners who want a pragmatic path from data wrangling to ML modeling.
Format: self-contained Jupyter notebooks organized by topic and difficulty.
Compute: CPU‑friendly; no GPU/CUDA required.
Dependencies: pinned to the PyData stack (see requirements.txt).

Quickstart

1) Prerequisites

Python 3.9+
Graphviz (system package) for decision‑tree visuals:
- macOS: brew install graphviz
- Ubuntu/Debian: sudo apt-get update && sudo apt-get install -y graphviz
- Windows (Chocolatey): choco install graphviz

2) Create an isolated environment

Option A - venv (macOS/Linux/Windows)

python -m venv .venv  
# macOS/Linux  
source .venv/bin/activate  
# Windows PowerShell  
# .venvScriptsActivate.ps1

Option B - Conda

conda create -n ml-ds python=3.9 -y  
conda activate ml-ds

If using Conda, installing Graphviz via conda-forge is recommended:
conda install -c conda-forge graphviz

3) Install requirements (and JupyterLab)

python -m pip install --upgrade pip  
pip install -r requirements.txt jupyterlab

4) Launch notebooks

jupyter lab  
# or: jupyter notebook

Project Structure

machine-learning-data-science/  
├── 1_data-science-foundation/        # NumPy & pandas fundamentals  
├── 2_data-visualization/            # Matplotlib & pandas plotting  
├── 3_exploratory-data-analysis/      # EDA patterns & diagnostics  
├── 4_data-cleaning-preprocessing/    # Data quality, preprocessing, Bayes basics  
├── 5_foundation-machine-learning/    # KNN and modeling‑ready EDA  
├── 6_model-building-evaluation/      # Linear regression & evaluation  
├── 7_advanced-modeling-system-design/# Trees & ML system design  
└── requirements.txt

Topic map & representative notebooks

1_data-science-foundation/ - NumPy & pandas foundations
intro_to_pandas.ipynb, numpy-basics.ipynb, numpy-arrays.ipynb, numpy-multidim-arrays.ipynb, pandas-series.ipynb, pandas-dataframes.ipynb
2_data-visualization/ - Plotting with Matplotlib & pandas
matplotlib-basics.ipynb, Vizualization_With_Matplotlib.ipynb, pandas-aggregation.ipynb, cars.ipynb, cars-python-graphics.ipynb
3_exploratory-data-analysis/ - Exploratory data analysis patterns
MissingData.ipynb, college_EDA.ipynb
4_data-cleaning-preprocessing/ - Data quality, preprocessing & probability
BadData_EDA.ipynb, DataPreprocessing.ipynb, SingleVariable_EDA.ipynb, TwoVariables_EDA.ipynb, BayesTheorem_MeaslesSim.ipynb
5_foundation-machine-learning/ - Intro ML tasks (KNN, modeling‑oriented EDA)
KNN_Classification.ipynb, KNN_Regression.ipynb, KNN_AnomalyDetector.ipynb, TwoVariablesP2_EDA.ipynb, campaign_EDA.ipynb
6_model-building-evaluation/ - Regression & evaluation
LinearRegression1.ipynb, LinearRegression2.ipynb, LinearRegression3.ipynb, LinearRegression4.ipynb, KNN_Hyperparameters.ipynb
7_advanced-modeling-system-design/ - Trees & ML system design
ClassificationTrees1.ipynb, ClassificationTrees2.ipynb, DescisionTrees.ipynb, linear-regression.ipynb, SystemDesign.ipynb

Tip: the numbering is a suggested progression. Each notebook is self‑contained; feel free to jump to topics as needed.

Development Workflow

The repo is notebook‑first. The following conventions keep notebooks clean and reproducible.

Environment management

Use a local virtual environment (.venv) pinned by requirements.txt.
If multiple Python versions are installed, confirm the interpreter used by Jupyter:
bash python -m ipykernel install --user --name ml-ds --display-name "Python (ml-ds)"
Then select Python (ml-ds) as the kernel in JupyterLab.

Notebook hygiene

Restart & Run All before committing changes to verify a clean state.
Prefer pure‑Python + standard PyData idioms; avoid hidden state in global variables.
Keep plots lightweight (Matplotlib/Seaborn); large figures should save to disk via plt.savefig(...) when needed.

Data Notes

Notebooks primarily use toy/synthetic data or built‑in datasets (e.g., from Seaborn or scikit‑learn). No external data downloads are required for core lessons.
When experimenting with personal datasets, prefer CSV/Parquet under a local data/ folder ignored by version control (e.g., add /data to .gitignore).

Troubleshooting

Graphviz errors (tree visualizations fail): Ensure the Graphviz system package is installed (see Quickstart). After installing, fully restart JupyterLab.

Kernel not listed / wrong interpreter: Re‑create the ipykernel with the environment you installed packages into (see ipykernel install command above), then pick it from the kernel selector.

Package mismatch: Run the following in a notebook cell to verify versions:

import sys, numpy, pandas, matplotlib, seaborn, sklearn  
print(sys.version)  
print("numpy=", numpy.__version__)  
print("pandas=", pandas.__version__)  
print("matplotlib=", matplotlib.__version__)  
print("seaborn=", seaborn.__version__)  
print("scikit-learn=", sklearn.__version__)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning & Data Science Monorepo

Contents

At a Glance

Quickstart

1) Prerequisites

2) Create an isolated environment

3) Install requirements (and JupyterLab)

4) Launch notebooks

Project Structure

Topic map & representative notebooks

Development Workflow

Environment management

Notebook hygiene

Data Notes

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
1_data-science-foundation		1_data-science-foundation
2_data-visualization		2_data-visualization
3_exploratory-data-analysis		3_exploratory-data-analysis
4_data-cleaning-preprocessing		4_data-cleaning-preprocessing
5_foundation-machine-learning		5_foundation-machine-learning
6_model-building-evaluation		6_model-building-evaluation
7_advanced-modeling-system-design		7_advanced-modeling-system-design
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Machine Learning & Data Science Monorepo

Contents

At a Glance

Quickstart

1) Prerequisites

2) Create an isolated environment

3) Install requirements (and JupyterLab)

4) Launch notebooks

Project Structure

Topic map & representative notebooks

Development Workflow

Environment management

Notebook hygiene

Data Notes

Troubleshooting

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages