- Jayant
- Christopher Budd
- Mustafa Syed
To predict the box office revenue generated for movies since January 1st, 2015 until 5th November, 2023.
Dataset used: https://www.kaggle.com/datasets/akshaypawar7/millions-of-movies/data
- conda install -c anaconda scikit-learn
- conda install pandas
- conda install numpy
- conda install -c conda-forge matplotlib
- conda install seaborn
- pip install xgboost
- Getting the data
- Loading the data on your machine
- Look at the big picture
- First impressions on the dataset, EDA graphs, and patterns found
- Preprocessing: preparing the data for the ML algorithms
- Data cleaning
- Encoding
- Feature scaling (re-sampling)
- Training and evaluation of 3 ML algorithms
- Algorithms used
- Training
- Findings and results comparison
- Three Graphs for the best performance algorithm
- Limitations of the model
Code found in the pdf
Github repository link: https://github.com/Jayant1Varma/Movie-Box-Office-predictor.git
Original dataset citation: The dataset used was https://www.kaggle.com/datasets/akshaypawar7/millions-of-movies/data .
Presentation video link: https://youtu.be/R6Qv8SrqOKY
However, this dataset is updated daily, but we used this dataset as it was available on November 5th 2023 You can find the exact dataset we used here: https://drive.google.com/file/d/1uPtHyqpAKkqZUpft8A0FPVXPR2iT32SN/view?usp=sharing It is also available on our github