Skip to content

H12345555/LifeExp-by-Country

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LifeExp-by-Country

Overview

This project demonstrates an end-to-end exploratory data analysis (EDA) of life expectancy trends in Canada, Mexico, and the United States using the Gapminder dataset. The workflow demonstrates core data science, data analytics, and data visualization skills in R, including grouped statistical summaries (mean, median, standard deviation), multi-faceted visualization and plot composition with multiple libraries (ggplot2, tidyverse, patchwork, and gridExtra), statistical interference, and more. The code results in a complete suite of visualizations (density plots, boxplots, bar charts, scatterplots with trends) and statistical tests to evaluate differences in life expectancies across countries from 1957–2007.

Explanation of Contents of this repo:

  • AOV Life Expectancies.png
    • Shows the results of the Tukey comparison and ANOVA.
  • CONF.png
    • Shows the plot representing the % differences in means of each country, within a 95% Confidence Interval (95% CI).
  • Gapminder Life Expectancies.R:
    • The R code used to generate all files and visualizations in this repo. For more infromation on the particular plots used and their contents, refer to the "Visualizations Summary" section at the bottom of this readme.
  • LIFE.png:
    • Top left:
      • Contains 3 subplots (one for each country) showing the changes in the mean life expectancies over time for each country.
    • Top right:
      • A plot comparing the mean and median life expectancy values for each country and where they fall on the probability distribution for life expectancy.
    • Bottom left:
      • Mean life expecatncies for each country visualized in a column chart format for better comparison across countries for particular years.
    • Bottom right:
      • Boxplot showing where the mean and median life expectancies fall for each country. The box plot is used to visually convey what data would be considered statistical anomalies (the left edge of each box is the 25th percentile for life expectancy, and the right edge represents the 75th percentile)
  • LIFE2.png:
    • Density plot showing a visual comparison of the probability distributions for each country overlaid on top of each other.

Skills Demonstrated

Data Wrangling & Tidy Data Principles

  • Filtering data by target countries
  • Grouping and summarizing numeric variables
  • Using the pipe operator (%>%) to create readable, reproducible workflows
  • Creating derived datasets containing summary statistics for use across visualizations

Statistical Analysis

  • Descriptive statistics:
    • mean, median, and standard deviation
  • Inferential statistics:
    • ANOVA to test for differences in mean life expectancy across countries
    • TukeyHSD post-hoc comparisons
  • Interpretation of significance and confidence intervals

Reproducible Workflow

  • Programmatically generating and saving figures as .png files
  • Modular code organization enabling easy updates or parameter changes
  • Clean labeling and theming for professional-quality visual outputs

Data Visualization

  • Demonstrates advanced use of ggplot2:
  • Density plots with overlaid mean/median lines
  • Boxplots annotated with descriptive statistics
  • Scatterplots with smoothing trends and facet wrapping
  • Bar plots with dodge positioning
  • Unified multi-plot layouts using grid.arrange(), ideal for dashboards or reports
  • Consistent styling (dark theme, custom labels, axis scaling)

Project Structure

Core packages imported:

  • gapminder (data source)
  • tidyverse (data manipulation + visualization)
  • patchwork (multi-plot layouts)
  • gridExtra (multi-plot layouts)

Filtered Countries of Interest:

Data was filtered to include only the following countries from North America:

  • United States
  • Mexico
  • Canada

Statistics Analysis:

Analyzed the following statistical parameters in the dataset:

  • Means
  • Medians
  • Standard deviations
  • ANOVA/AOV (Analysis of Variance)
  • 95% Confidence Interval Analysis (95% CI)
  • Tukey HSD was used to identify which country pairs differ significantly

Visualizations Summary:

All visualizations are compiled using grid.arrange() into multi-panel layouts for presentation. Plots & subplots were also saved and exported as .png files, which can also be viewed in this git repository.

  • Density Plot:
    • Compares distributions of life expectancy across countries
  • Boxplot:
    • Visualize spread, outliers, mean and median annotations
  • Scatterplot (with smoothing):
    • Shows life expectancy trends over time (facet per country)
  • Density Plot:
    • Compare distribution shapes with mean/median lines by country
  • Bar Chart:
    • Displays life expectancy values by year, grouped by country

About

Using R, performed statistical analysis and data visualization of life expectancies of United States, Canada, and Mexico, using custom plots and data analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages