This repository contains the code and analysis for the Netflix data analysis project. The goal is to analyze the dataset provided by Netflix to generate insights that could help the company in deciding which type of shows/movies to produce and how they can grow the business in different countries.
The dataset consists of a list of all the TV shows/movies available on Netflix. Each entry includes details such as show ID, type (movie or TV show), title, director, cast, country, date added, release year, rating, duration, genre, and description.
As we explore the data, our aim is to answer specific questions and generate actionable insights for Netflix. Some of the key questions we'll address include:
- What type of content is available in different countries?
- How has the number of movies released per year changed over the last 20-30 years?
- Comparison of TV shows vs. movies.
- What is the best time to launch a TV show?
- Analysis of actors/directors of different types of shows/movies.
- Does Netflix focus more on TV shows than movies in recent years?
-
Defining Problem Statement and Analyzing Basic Metrics: Understanding the objectives and initial exploration of the dataset.
-
Data Exploration: Examining the shape of data, data types, missing values, and statistical summaries.
-
Non-Graphical Analysis: Utilizing value counts and unique attributes to understand the data distribution.
-
Visual Analysis:
- Univariate analysis: Using distplots, countplots, and histograms for continuous variables.
- Boxplots for categorical variables.
- Heatmaps and pairplots for correlation analysis.
-
Missing Value & Outlier Check: Detecting missing values and outliers in the dataset.
-
Insights Based on Analysis: Providing observations on the range of attributes, distribution of variables, and relationships between them.
-
Business Insights: Highlighting patterns observed in the data and what can be inferred from them.
-
Recommendations: Actionable items for the business, presented in simple terms without technical jargon.
- Clone this repository.
- Download the dataset files from the provided link.
- Place the dataset files in the appropriate directory.
- Run the analysis scripts in your preferred environment.