🎓 Student Performance Analysis & Prediction Dashboard

Overview

This project is an interactive Machine Learning Dashboard designed to analyze and predict student academic performance using data analysis and predictive modeling techniques.

The application provides a complete data science workflow including:

Data cleaning and preprocessing
Exploratory data analysis (EDA)
Machine learning model training
Model interpretation
Interactive visualization
Student performance prediction

The project was built using Python, Streamlit, and Scikit-learn to demonstrate a practical end-to-end machine learning system.

🧩 Project Type

This project represents a Machine Learning and Data Analytics Application.

It combines several fields of data science including:

Data Cleaning
Exploratory Data Analysis (EDA)
Machine Learning Modeling
Data Visualization
Interactive Dashboard Development

The system can therefore be classified as:

End-to-End Machine Learning Project

because it includes the complete workflow:

Data → Analysis → Model → Prediction → Interactive Dashboard.

The dashboard allows users to explore the dataset, visualize patterns, and generate predictions using a trained machine learning model.

📌 Project Purpose

Educational institutions often analyze student data to understand factors that influence academic performance.

The goal of this project is to:

Analyze student exam results
Identify patterns in academic performance
Build a machine learning model capable of predicting a student's expected academic score
Provide an interactive dashboard that allows users to explore the dataset and generate predictions

This project represents a Machine Learning portfolio project demonstrating real-world data science skills.

🧠 What This Project Demonstrates

This system demonstrates the full Data Science Pipeline, including:

Data collection
Data cleaning
Exploratory data analysis
Machine learning model training
Model evaluation
Model interpretation
Interactive visualization
User prediction interface

This type of project is commonly used in Data Science and Machine Learning portfolios.

📂 Project Structure

The project is organized into multiple directories, each serving a specific purpose.

Student_Performance_Analysis
│
├── app
│ └── app.py
│
├── data
│ ├── StudentsPerformance.csv
│ └── clean_students.csv
│
├── models
│ └── student_model.pkl
│
├── notebooks
│ └── EDA.ipynb
│
├── src
│ ├── data_cleaning.py
│ └── train_model.py
│
├── assets
│ ├── overview.png
│ ├── eda.png
│ ├── prediction.png
│ └── model_insights.png
│
├── requirements.txt
└── README.md

Each folder contains specific components used in the machine learning workflow.

📄 Explanation of Each Folder and File

app/

This directory contains the Streamlit dashboard application.

app.py

This is the main application file responsible for:

Loading the dataset
Loading the trained machine learning model
Building the user interface
Displaying interactive visualizations
Allowing users to input data
Generating predictions
Showing model insights

This file essentially runs the entire dashboard interface.

data/

This folder stores the datasets used in the project.

StudentsPerformance.csv

This is the original dataset.

It contains information about student demographics and exam scores.

clean_students.csv

This is the processed dataset created after data cleaning.

The cleaned dataset is used for:

data analysis
model training
dashboard visualization

src/

This folder contains the core data science scripts.

data_cleaning.py

This script performs data preprocessing operations including:

loading the raw dataset
removing duplicate records
cleaning categorical data
saving the cleaned dataset

train_model.py

This script trains the machine learning model.

Steps performed:

Load cleaned dataset
Encode categorical variables
Create the target variable
Split dataset into training and testing sets
Train machine learning models
Evaluate model performance
Save the final trained model

The trained model is saved as:

models/student_model.pkl

models/

This folder stores the trained machine learning models.

student_model.pkl

This file contains the trained Random Forest model used by the dashboard to generate predictions.

The model predicts the expected average exam score of a student.

notebooks/

This folder contains Jupyter notebooks used during the analysis phase.

EDA.ipynb

This notebook includes Exploratory Data Analysis (EDA) such as:

score distributions
correlation analysis
visualization of relationships between variables

This step helps understand the dataset before training the model.

assets/

This folder contains images used in the README file.

These images are screenshots of the dashboard interface.

📊 Dashboard Pages

The Streamlit dashboard contains four main pages.

1️⃣ Overview Page

This page provides a general summary of the dataset.

Information displayed includes:

total number of students
gender distribution
average exam scores
dataset preview
distribution charts

Users can also download the dataset.

2️⃣ Data Exploration Page

This page allows users to explore relationships between dataset variables.

Users can select:

X variable
Y variable
visualization type

Available chart types include:

scatter plots
box plots
violin plots
histograms

This page helps users analyze patterns in the dataset.

3️⃣ Prediction Page

This page allows users to predict a student's academic performance.

The user enters:

gender
nationality
parent education
lunch type
test preparation
math score
reading score
writing score

The trained machine learning model processes the inputs and predicts the student's expected average score.

4️⃣ Model Insights Page

This page explains how the machine learning model makes predictions.

It visualizes Feature Importance, showing which variables influence the prediction the most.

This helps users understand the behavior of the machine learning model.

🤖 Machine Learning Model

The model used in this project is:

RandomForestRegressor

Random Forest is an ensemble learning algorithm based on decision trees.

Advantages include:

strong predictive performance
ability to model complex relationships
resistance to overfitting

The model predicts the average student score based on demographic and exam features.

🛠 Technologies Used

This project was developed using the following technologies:

Python

Main libraries:

Pandas
NumPy
Scikit-learn
Streamlit
Plotly
Matplotlib
Seaborn

▶ Running the Application

Install the required dependencies:

pip install -r requirements.txt

Run the Streamlit application:

streamlit run app/app.py

The dashboard will automatically open in your browser.

👨‍💻 Author

This project was developed by

Raidan Alkhateeb

as a Machine Learning and Data Science project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly