This project applies machine learning techniques to classify individuals into different obesity levels based on demographic, dietary, and lifestyle factors. The aim is to compare model performance and identify an accurate and reliable classifier for multi-class obesity prediction.
Source: UCL Machine Learning Repository Data includes anonymized lifestyle, dietary, and physical attributes
Target variable: Obesity Level (multi-class classification)
Exploratory Data Analysis Data inspection and visualisation Identification of class distribution Detection of potential outliers Data Preprocessing Encoding categorical variables Feature scaling Outlier handling Feature Selection SelectKBest used to identify the most relevant predictors.
Logistic Regression Random Forest Classifier
GridSearchCV used to optimise model parameters.
Models were evaluated using: Accuracy Precision Recall F1-score Confusion Matrix
Logistic Regression Test Accuracy: ~91% Random Forest Test Accuracy: ~94% Random Forest outperforms Logistic Regression across most obesity classes Most misclassifications occur between adjacent obesity levels
- Dataset is publicly available and anonymized
- No personal or sensitive information is used
- Models are intended for decision support, not medical diagnosis
- Bias is assessed using per-class performance metrics
Python Pandas, NumPy Scikit-learn Matplotlib, Seaborn Jupyter Notebook
The project shows that ensemble models, particularly Random Forest, can effectively classify obesity levels using lifestyle data, offering valuable insights for health-related decision support.