This project is an interactive Machine Learning Dashboard designed to analyze and predict student academic performance using data analysis and predictive modeling techniques.
The application provides a complete data science workflow including:
- Data cleaning and preprocessing
- Exploratory data analysis (EDA)
- Machine learning model training
- Model interpretation
- Interactive visualization
- Student performance prediction
The project was built using Python, Streamlit, and Scikit-learn to demonstrate a practical end-to-end machine learning system.
This project represents a Machine Learning and Data Analytics Application.
It combines several fields of data science including:
- Data Cleaning
- Exploratory Data Analysis (EDA)
- Machine Learning Modeling
- Data Visualization
- Interactive Dashboard Development
The system can therefore be classified as:
End-to-End Machine Learning Project
because it includes the complete workflow:
Data β Analysis β Model β Prediction β Interactive Dashboard.
The dashboard allows users to explore the dataset, visualize patterns, and generate predictions using a trained machine learning model.
Educational institutions often analyze student data to understand factors that influence academic performance.
The goal of this project is to:
- Analyze student exam results
- Identify patterns in academic performance
- Build a machine learning model capable of predicting a student's expected academic score
- Provide an interactive dashboard that allows users to explore the dataset and generate predictions
This project represents a Machine Learning portfolio project demonstrating real-world data science skills.
This system demonstrates the full Data Science Pipeline, including:
- Data collection
- Data cleaning
- Exploratory data analysis
- Machine learning model training
- Model evaluation
- Model interpretation
- Interactive visualization
- User prediction interface
This type of project is commonly used in Data Science and Machine Learning portfolios.
The project is organized into multiple directories, each serving a specific purpose.
Student_Performance_Analysis
β
βββ app
β βββ app.py
β
βββ data
β βββ StudentsPerformance.csv
β βββ clean_students.csv
β
βββ models
β βββ student_model.pkl
β
βββ notebooks
β βββ EDA.ipynb
β
βββ src
β βββ data_cleaning.py
β βββ train_model.py
β
βββ assets
β βββ overview.png
β βββ eda.png
β βββ prediction.png
β βββ model_insights.png
β
βββ requirements.txt
βββ README.md
Each folder contains specific components used in the machine learning workflow.
This directory contains the Streamlit dashboard application.
This is the main application file responsible for:
- Loading the dataset
- Loading the trained machine learning model
- Building the user interface
- Displaying interactive visualizations
- Allowing users to input data
- Generating predictions
- Showing model insights
This file essentially runs the entire dashboard interface.
This folder stores the datasets used in the project.
This is the original dataset.
It contains information about student demographics and exam scores.
This is the processed dataset created after data cleaning.
The cleaned dataset is used for:
- data analysis
- model training
- dashboard visualization
This folder contains the core data science scripts.
This script performs data preprocessing operations including:
- loading the raw dataset
- removing duplicate records
- cleaning categorical data
- saving the cleaned dataset
This script trains the machine learning model.
Steps performed:
- Load cleaned dataset
- Encode categorical variables
- Create the target variable
- Split dataset into training and testing sets
- Train machine learning models
- Evaluate model performance
- Save the final trained model
The trained model is saved as:
models/student_model.pkl
This folder stores the trained machine learning models.
This file contains the trained Random Forest model used by the dashboard to generate predictions.
The model predicts the expected average exam score of a student.
This folder contains Jupyter notebooks used during the analysis phase.
This notebook includes Exploratory Data Analysis (EDA) such as:
- score distributions
- correlation analysis
- visualization of relationships between variables
This step helps understand the dataset before training the model.
This folder contains images used in the README file.
These images are screenshots of the dashboard interface.
The Streamlit dashboard contains four main pages.
This page provides a general summary of the dataset.
Information displayed includes:
- total number of students
- gender distribution
- average exam scores
- dataset preview
- distribution charts
Users can also download the dataset.
This page allows users to explore relationships between dataset variables.
Users can select:
- X variable
- Y variable
- visualization type
Available chart types include:
- scatter plots
- box plots
- violin plots
- histograms
This page helps users analyze patterns in the dataset.
This page allows users to predict a student's academic performance.
The user enters:
- gender
- nationality
- parent education
- lunch type
- test preparation
- math score
- reading score
- writing score
The trained machine learning model processes the inputs and predicts the student's expected average score.
This page explains how the machine learning model makes predictions.
It visualizes Feature Importance, showing which variables influence the prediction the most.
This helps users understand the behavior of the machine learning model.
The model used in this project is:
RandomForestRegressor
Random Forest is an ensemble learning algorithm based on decision trees.
Advantages include:
- strong predictive performance
- ability to model complex relationships
- resistance to overfitting
The model predicts the average student score based on demographic and exam features.
This project was developed using the following technologies:
Python
Main libraries:
- Pandas
- NumPy
- Scikit-learn
- Streamlit
- Plotly
- Matplotlib
- Seaborn
Install the required dependencies:
pip install -r requirements.txt
Run the Streamlit application:
streamlit run app/app.py
The dashboard will automatically open in your browser.
This project was developed by
Raidan Alkhateeb
as a Machine Learning and Data Science project.



