A collection of EDA notebooks — the kind of work that happens before any model gets trained. Curiosity first, conclusions second.
Every dataset has a story, and finding it is half the job. These notebooks walk through public datasets across health, business, food, tech, and lifestyle — each one focused on what the data actually says when you stop assuming and start looking.
Looking at the classic heart-disease dataset through a clinical-curiosity lens. Which features correlate with risk, where does the data lie, and what would a doctor want to see in a one-page summary? Heatmaps, distribution plots, and a clean walk through every variable.
Who built their fortunes themselves, and what patterns emerge? Industry breakdowns, age-at-first-billion, country of origin, and the visualizations to back it all up. A fun one — useful as a template for any "rank the top N" dataset.
A global look at what the world grows, raises, and processes. Country-level comparisons, crop trends over decades, and a few uncomfortable observations about food inequality hiding in the numbers.
The App Store at scale — what categories dominate, what people pay for, and where the long-tail starts. Charts that go beyond "top 10 by rating" into the actual economics of the platform.
Five years of sales data, sliced by region, time, and product. The notebook that doubles as a tutorial on how to structure year-over-year analysis without getting lost in your own pivot tables.
What makes someone order what they order? A look at online food-delivery preferences with the kind of charts that work for a stakeholder deck, not just a notebook.
Less an analysis, more a guide. If you're new to data visualization in Python, this is the notebook I wish I'd had on day one. Matplotlib, Seaborn, Plotly — when to use which, and what the common traps look like.
Python · pandas · NumPy · Matplotlib · Seaborn · Plotly · Jupyter
Each notebook lives in its own .ipynb file. If you want to run them locally:
git clone https://github.com/samanfatima7/exploratory-data-analysis.git
cd exploratory-data-analysis
pip install -r requirements.txt
jupyter notebookDatasets are linked from each notebook on Kaggle.
I'm Saman Fatima — Kaggle Grandmaster (highest rank 24 ), data scientist from Pakistan. More of my work lives on Kaggle and LinkedIn.
If you found something useful here, a ⭐ goes a long way.