Skip to content

Ravihakhan21/netflix-ml-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Netflix Data Analysis & Visualization – ML Project

A machine learning and data analysis project that explores the Netflix dataset to uncover content trends, perform data cleaning, and visualize meaningful patterns in streaming media.


Project Overview

This project explores and analyzes the Netflix dataset to extract actionable insights using Python. It includes data cleaning, exploratory data analysis (EDA), and visualizations to understand content distribution by type, country, genre, and release timeline.

Key Objectives:

  • Clean and preprocess the dataset
  • Analyze the distribution of Netflix content (Movies vs TV Shows)
  • Identify content trends by country, release year, and ratings
  • Visualize top genres, directors, and frequently appearing cast members
  • Answer specific business-oriented queries using code and visual analysis

Dataset

  • Source: Kaggle – Netflix Dataset
  • File: netflix_titles.csv
  • Dataset is not included in this repository due to redistribution restrictions.
  • Download manually from the link above and place it in your project folder to run the notebook.

Techniques & Features Used

  • Data Cleaning (handling nulls, duplicates, format issues)
  • Exploratory Data Analysis (EDA)
  • Grouping, filtering, and cross-analysis by multiple columns
  • Visualization using:
  • Seaborn
  • Matplotlib
  • Insightful Question-Answering (e.g., most active countries, popular genres)

Tools & Technologies

  • Languages: Python
  • Libraries: Pandas, NumPy, Matplotlib, Seaborn
  • Environment: Google Colab
  • Version Control: Git & GitHub

Sample Insights

  • TV Shows dominate recent additions compared to Movies.
  • The United States and India are the top content providers.
  • Peak content additions occurred between 2017–2019.
  • "Documentaries" and "Dramas" are the most frequent genres.

How to Run the Project

  1. Clone the repository or download the notebook.
  2. Download the dataset from Kaggle.
  3. Place netflix_titles.csv in the same directory as the notebook.
  4. Open netflix_Ml_Project.ipynb in Jupyter Notebook or Google Colab.
  5. Run all cells to reproduce the results.

👩‍💻 Author

Raviha Khan
📍 Karachi, Pakistan
🔗 LinkedIn
🐙 GitHub
📧 ravihakhan53@gmail.com


"Learning by doing — turning Netflix data into meaningful insights."

About

This project explores and analyzes the Netflix dataset to uncover patterns and trends in the content offered on the platform. Using Python and Colab, it performs exploratory data analysis (EDA) and applies data cleaning techniques to prepare the data for insights.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors