- Peter Maila
- Shamsuddeen Lawal
- Kasavuli Mark
- Rofhiwa Ntshagovhe
- Sandisiwe Mtsha
- Festus Godwin
-
The aim of this project is to evaluate various machine learning (ML) algorithms, based on several outcomes - the pros and cons, performance, accuracy, and interpretability, as examples - for the task of predicting motor insurance claims.
-
Here's a link to our notion link
-
Here's a link to the test and train datasets for PMD: PMD Datasets
-
Here's a link to the test and train datasets for Mobility: Mobility Datasets
-
Importing Data Dependencies
-
Loading Data
-
Exploratory Data Analysis (EDA)
-
Preprocessing
-
Feature Engineering
-
Model and Model Evaluation
- Generalised linear model
- XGBoost
- SVM
- Random forest
- CatBoost
- Explainable Boosting Machines (EBM)
- LightGBM
We undertook a meticulous gathering of datasets from two notable insurance organizations, PMD and Mobility. Following this, we meticulously refined and organized the data to guarantee its accuracy and suitability for thorough analysis. This comprehensive data preparation procedure is designed to eradicate irregularities and confirm the dataset's appropriateness for modeling. The integration of varied datasets allows us to confirm the strength and adaptability of our models in different motor insurance scenarios.
Through the application of sophisticated techniques informed by domain expertise, we methodically identified and extracted relevant features from the dataset. This intricate procedure involved utilizing knowledge specific to the domain and employing methodologies to enhance the predictive capability of the chosen features. By enriching the dataset with meaningful features, we aimed to boost the performance and precision of our predictive models.
- Generalised linear model
- XGBoost
- Random forest
- CatBoost
- Explainable Boosting Machines (EBM)
- LightGBM