Welcome to the English Premier League 21/22 Data Analysis Project repository. This project serves as an entry point to data analysis, focusing on exploring insights from the English Premier League's 2021/2022 season. The project utilizes a beginner-friendly approach, making it an ideal resource for those new to data analysis.
The dataset for this project was sourced from Kaggle. To streamline the analysis, only data up to the shtont column was utilized. This deliberate choice aligns with the project's beginner-level focus.
The dataset was subjected to thorough cleaning and organization to ensure accurate analysis. Additional columns were introduced to reveal insightful information. Various aspects of the data were then analyzed to draw meaningful conclusions.
As part of the project's progression, machine learning models were introduced, including linear regression. These models were employed to predict and understand various aspects of the Premier League based on the available data.
This project sheds light on intriguing facets of the Premier League's 2021/2022 season. Here are some insights that were uncovered:
Discover who emerged as the top scorer during the season, highlighting their exceptional performance on the field.
Explore the countries that were prominently represented in the Premier League, providing a glimpse into the league's global diversity.
Examine the comparison between total shots and shots on target, offering insights into accuracy and efficiency on the pitch.
Delve into the realm of predictions by exploring the likelihood of a penalty being scored or missed in the Premier League.
For a detailed look at the code and step-by-step analysis, refer to the project's source code. Run it as a python or ipynb file on your preffered IDE.