This project aims to predict students' math scores based on various factors such as gender, ethnicity, parental education, lunch type, and test preparation course. 📊 By analyzing the dataset and training machine learning models, the project provides insights into how these factors influence student performance.
The goal of this project is to understand how students' performance (test scores) is affected by other variables such as:
- 👦👧 Gender
- 🌎 Ethnicity
- 🎓 Parental level of education
- 🍱 Lunch type
- 📚 Test preparation course
It also aims to build a machine learning model to predict math scores based on these factors.
- Source: Kaggle - Students Performance in Exams
- Description: The dataset contains 1000 rows and 8 columns:
gender: Gender of the student (Male/Female)race/ethnicity: Ethnicity of the student (Group A, B, C, D, E)parental level of education: Parents' final education levellunch: Type of lunch before the test (Standard/Free or Reduced)test preparation course: Completion status of the test prep coursemath score: Math test scorereading score: Reading test scorewriting score: Writing test score
- Cleaned data, visualized patterns, and gathered insights:
- ✅ Standard lunch improves performance
- 👩🎓 Female students perform better overall, except in math
- ✅ Test prep course boosts scores
- Encoded categorical features with
OneHotEncoder - Scaled numerical data with
StandardScaler - Split the data into training and testing sets
- Trained multiple regression models:
- Linear Regression
- Decision Tree Regressor
- Random Forest Regressor 🌲
- XGBoost, CatBoost, AdaBoost
- Evaluated using:
- MAE, RMSE, and R² Score 📈
- Built a Flask web app for predictions
- Deployed on AWS Elastic Beanstalk 🌐
git clone https://github.com/your-username/student-performance-indicator.git
cd student-performance-indicatorpython -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activatepip install -r requirements.txt
python app.py- Open your browser at
http://127.0.0.1:5000 - Fill in the form with student details
- Click "Predict Your Math Score" to get results 🎯
mlprojects/
├── app.py
├── src/
│ ├── pipeline/
│ │ ├── predict_pipeline.py
│ │ └── train_pipeline.py
│ ├── utils.py
│ └── exception.py
├── notebook/
│ ├── 1. EDA STUDENT PERFORMANCE.ipynb
│ ├── 2. MODEL TRAINING.ipynb
├── templates/
│ ├── index.html
│ ├── home.html
├── artifacts/
├── data/
│ └── stud.csv
├── requirements.txt
└── README.md
- Best Model: Random Forest Regressor 🌲
- R² Score: 0.92 on test data 🚀
- Language: Python 🐍
- Libraries:
- 📊 Pandas, NumPy
- 📈 Matplotlib, Seaborn
- 🤖 Scikit-learn, XGBoost, CatBoost
- Web: Flask 🌐
- Deployment: AWS Elastic Beanstalk ☁️
- Add more features to boost accuracy ⚙️
- Dockerize and deploy with Kubernetes 🐳
- Try more ML models and ensembles
Vijayakumar
💬 Feel free to reach out for questions or suggestions!
This project is licensed under the MIT License. See the LICENSE file for details.