A comprehensive Jupyter notebook tutorial demonstrating various data visualization techniques using Python's most popular visualization libraries: Pandas, Matplotlib, and Seaborn. This project uses the World Happiness Report 2019 dataset to showcase practical examples of exploratory data analysis through visualizations.
This repository contains a detailed tutorial on data visualization techniques essential for exploratory data analysis. Through the World Happiness Dataset, you'll learn how to create and customize various types of plots to extract meaningful insights from data.
-
Univariate Analysis
- Histograms for continuous variables
- Count plots for categorical variables
-
Bivariate Analysis
- Scatter plots
- Joint plots
-
Multivariate Analysis
- Correlation matrices with heatmaps
- Scatter plot matrices (SPLOM)
- Pair plots
-
Distribution Analysis
- Box plots for outlier detection
- Violin plots for distribution visualization
- Bee swarm plots for detailed data point distribution
Each visualization type is demonstrated using three different libraries:
- Pandas - Built-in plotting capabilities
- Matplotlib - Low-level plotting with full customization
- Seaborn - High-level statistical visualizations
- Python 3.x
- Pandas - Data manipulation and analysis
- NumPy - Numerical computing
- Matplotlib - Core plotting library
- Seaborn - Statistical data visualization
Ensure you have Python 3.x installed on your system.
- Clone the repository:
git clone https://github.com/DataDarling/Data-Visualization-World-Happiness-Dataset.git
cd Data-Visualization-World-Happiness-Dataset- Install required packages:
pip install numpy pandas matplotlib seaborn jupyter- Launch Jupyter Notebook:
jupyter notebook data_visualization.ipynbThe notebook uses the World Happiness Report 2019 dataset, which includes:
- Overall Rank: Country ranking based on happiness score
- Country: Country name
- Score: Happiness score
- GDP per capita: Economic indicator
- Social Support: Social relationships metric
- Healthy life expectancy: Health metric
- Freedom to make life choices: Freedom indicator
- Generosity: Giving behavior metric
- Perceptions of corruption: Trust in government
Dataset Location: ./data/world_happiness_2019.csv
Source: World Happiness Report
The notebook is organized into clear sections:
- Introduction: Overview of visualization importance and tools
- Whole Dataset Analysis: Correlation matrices and scatter plot matrices
- Single Feature Analysis: Histograms, box plots, violin plots, and swarm plots
- Two Feature Analysis: Scatter plots and joint plots
Simply open data_visualization.ipynb in Jupyter Notebook and run the cells sequentially. Each section contains:
- Explanatory markdown with function references
- Working code examples
- Visual output demonstrations
By working through this notebook, you will learn:
- How to choose the right visualization for your data type
- The differences between Pandas, Matplotlib, and Seaborn approaches
- Best practices for exploratory data analysis
- How to detect outliers using box and violin plots
- How to visualize relationships between variables
- How to customize plots with colors, styles, and themes
- How to create publication-ready visualizations
The notebook demonstrates:
- Correlation Matrix Heatmaps: Understand relationships between all numeric features
- Scatter Matrices: Visualize pairwise relationships in the dataset
- Distribution Plots: Understand the spread and shape of your data
- Categorical Analysis: Compare groups and categories
- Multi-dimensional Plots: Explore complex relationships with color, size, and style encodings
The notebook includes links to:
- Matplotlib Named Colors
- Matplotlib Colormaps
- Seaborn Color Palettes
- Pandas Plotting Documentation
- Matplotlib Plot Types
- Seaborn Gallery
Contributions are welcome! If you'd like to improve this tutorial:
- Fork the repository
- Create a feature branch (
git checkout -b feature/improvement) - Commit your changes (
git commit -am 'Add new visualization example') - Push to the branch (
git push origin feature/improvement) - Open a Pull Request
This project is open source and available for educational purposes.
DataDarling
- GitHub: @DataDarling
- World Happiness Report team for providing the dataset
- The open-source community for the amazing visualization libraries
- Contributors who help improve this educational resource
⭐ If you find this tutorial helpful, please consider giving it a star!
Happy Visualizing! 📈