Skip to content

RAIDAN44/student-performance-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ“ Student Performance Analysis & Prediction Dashboard

Overview

This project is an interactive Machine Learning Dashboard designed to analyze and predict student academic performance using data analysis and predictive modeling techniques.

The application provides a complete data science workflow including:

  • Data cleaning and preprocessing
  • Exploratory data analysis (EDA)
  • Machine learning model training
  • Model interpretation
  • Interactive visualization
  • Student performance prediction

The project was built using Python, Streamlit, and Scikit-learn to demonstrate a practical end-to-end machine learning system.


🧩 Project Type

This project represents a Machine Learning and Data Analytics Application.

It combines several fields of data science including:

  • Data Cleaning
  • Exploratory Data Analysis (EDA)
  • Machine Learning Modeling
  • Data Visualization
  • Interactive Dashboard Development

The system can therefore be classified as:

End-to-End Machine Learning Project

because it includes the complete workflow:

Data β†’ Analysis β†’ Model β†’ Prediction β†’ Interactive Dashboard.

The dashboard allows users to explore the dataset, visualize patterns, and generate predictions using a trained machine learning model.

πŸ“Œ Project Purpose

Educational institutions often analyze student data to understand factors that influence academic performance.

The goal of this project is to:

  • Analyze student exam results
  • Identify patterns in academic performance
  • Build a machine learning model capable of predicting a student's expected academic score
  • Provide an interactive dashboard that allows users to explore the dataset and generate predictions

This project represents a Machine Learning portfolio project demonstrating real-world data science skills.


🧠 What This Project Demonstrates

This system demonstrates the full Data Science Pipeline, including:

  1. Data collection
  2. Data cleaning
  3. Exploratory data analysis
  4. Machine learning model training
  5. Model evaluation
  6. Model interpretation
  7. Interactive visualization
  8. User prediction interface

This type of project is commonly used in Data Science and Machine Learning portfolios.


πŸ“‚ Project Structure

The project is organized into multiple directories, each serving a specific purpose.

Student_Performance_Analysis
β”‚
β”œβ”€β”€ app
β”‚ └── app.py
β”‚
β”œβ”€β”€ data
β”‚ β”œβ”€β”€ StudentsPerformance.csv
β”‚ └── clean_students.csv
β”‚
β”œβ”€β”€ models
β”‚ └── student_model.pkl
β”‚
β”œβ”€β”€ notebooks
β”‚ └── EDA.ipynb
β”‚
β”œβ”€β”€ src
β”‚ β”œβ”€β”€ data_cleaning.py
β”‚ └── train_model.py
β”‚
β”œβ”€β”€ assets
β”‚ β”œβ”€β”€ overview.png
β”‚ β”œβ”€β”€ eda.png
β”‚ β”œβ”€β”€ prediction.png
β”‚ └── model_insights.png
β”‚
β”œβ”€β”€ requirements.txt
└── README.md

Each folder contains specific components used in the machine learning workflow.


πŸ“„ Explanation of Each Folder and File

app/

This directory contains the Streamlit dashboard application.

app.py

This is the main application file responsible for:

  • Loading the dataset
  • Loading the trained machine learning model
  • Building the user interface
  • Displaying interactive visualizations
  • Allowing users to input data
  • Generating predictions
  • Showing model insights

This file essentially runs the entire dashboard interface.


data/

This folder stores the datasets used in the project.

StudentsPerformance.csv

This is the original dataset.

It contains information about student demographics and exam scores.

clean_students.csv

This is the processed dataset created after data cleaning.

The cleaned dataset is used for:

  • data analysis
  • model training
  • dashboard visualization

src/

This folder contains the core data science scripts.

data_cleaning.py

This script performs data preprocessing operations including:

  • loading the raw dataset
  • removing duplicate records
  • cleaning categorical data
  • saving the cleaned dataset

train_model.py

This script trains the machine learning model.

Steps performed:

  1. Load cleaned dataset
  2. Encode categorical variables
  3. Create the target variable
  4. Split dataset into training and testing sets
  5. Train machine learning models
  6. Evaluate model performance
  7. Save the final trained model

The trained model is saved as:

models/student_model.pkl


models/

This folder stores the trained machine learning models.

student_model.pkl

This file contains the trained Random Forest model used by the dashboard to generate predictions.

The model predicts the expected average exam score of a student.


notebooks/

This folder contains Jupyter notebooks used during the analysis phase.

EDA.ipynb

This notebook includes Exploratory Data Analysis (EDA) such as:

  • score distributions
  • correlation analysis
  • visualization of relationships between variables

This step helps understand the dataset before training the model.


assets/

This folder contains images used in the README file.

These images are screenshots of the dashboard interface.


πŸ“Š Dashboard Pages

The Streamlit dashboard contains four main pages.


1️⃣ Overview Page

This page provides a general summary of the dataset.

Information displayed includes:

  • total number of students
  • gender distribution
  • average exam scores
  • dataset preview
  • distribution charts

Users can also download the dataset.

Overview


2️⃣ Data Exploration Page

This page allows users to explore relationships between dataset variables.

Users can select:

  • X variable
  • Y variable
  • visualization type

Available chart types include:

  • scatter plots
  • box plots
  • violin plots
  • histograms

This page helps users analyze patterns in the dataset.

EDA


3️⃣ Prediction Page

This page allows users to predict a student's academic performance.

The user enters:

  • gender
  • nationality
  • parent education
  • lunch type
  • test preparation
  • math score
  • reading score
  • writing score

The trained machine learning model processes the inputs and predicts the student's expected average score.

Prediction


4️⃣ Model Insights Page

This page explains how the machine learning model makes predictions.

It visualizes Feature Importance, showing which variables influence the prediction the most.

This helps users understand the behavior of the machine learning model.

Model Insights


πŸ€– Machine Learning Model

The model used in this project is:

RandomForestRegressor

Random Forest is an ensemble learning algorithm based on decision trees.

Advantages include:

  • strong predictive performance
  • ability to model complex relationships
  • resistance to overfitting

The model predicts the average student score based on demographic and exam features.


πŸ›  Technologies Used

This project was developed using the following technologies:

Python

Main libraries:

  • Pandas
  • NumPy
  • Scikit-learn
  • Streamlit
  • Plotly
  • Matplotlib
  • Seaborn

β–Ά Running the Application

Install the required dependencies:

pip install -r requirements.txt

Run the Streamlit application:

streamlit run app/app.py

The dashboard will automatically open in your browser.


πŸ‘¨β€πŸ’» Author

This project was developed by

Raidan Alkhateeb

as a Machine Learning and Data Science project.

About

Machine Learning dashboard for analyzing and predicting student academic performance using Python, Streamlit, and Scikit-learn. The project includes data cleaning, exploratory data analysis, interactive visualizations, and a trained Random Forest prediction model.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors