Skip to content

MuhammadRuby/car-price-prediction-system

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚗 Used Car Price Prediction – End-to-End ML & Flask App

📌 Project Overview

This project is an end-to-end machine learning system for predicting used car prices. It follows a production-oriented workflow starting from raw data preprocessing and feature engineering, through model training and evaluation, and finally preparing the system for deployment as a Flask-based web application.


🎯 Objectives

  • Clean and preprocess real-world used car data
  • Perform feature engineering using meaningful raw features
  • Train and evaluate multiple regression models
  • Select the best-performing model based on objective evaluation metrics
  • Prepare the project for full-stack deployment using Flask

📂 Dataset


🛠️ Tech Stack

  • Language: Python
  • Libraries:
    • pandas, numpy
    • scikit-learn
    • xgboost
    • matplotlib, seaborn
    • tqdm
  • Deployment: Flask (planned)

⚙️ Project Workflow

  1. Exploratory Data Analysis (EDA)
  2. Data Cleaning
    • Handling missing values
    • Removing outliers
  3. Feature Engineering
    • Creating derived features (e.g. car_age)
    • Preparing categorical features for encoding
  4. Model Training & Evaluation
  5. Model Selection
  6. Model Saving for Deployment
  7. Full-Stack Application (Flask)

🧠 Features Used (Before Encoding)

🎯 Target Variable

  • price – vehicle selling price

🔢 Numerical Features

  • year – manufacturing year
  • odometer – mileage
  • cylinders – number of engine cylinders

🚘 Categorical Features

  • manufacturer – car brand
  • model – vehicle model
  • condition – overall vehicle condition
  • fuel – fuel type
  • transmission – transmission type
  • drive – drivetrain
  • type – vehicle body type
  • size – vehicle size category
  • paint_color – exterior color
  • title_status – legal title status
  • state – vehicle location (US state)

Categorical features are encoded using one-hot encoding during the preprocessing stage inside the machine learning pipeline.


🤖 Models Trained

The following regression models were trained and evaluated:

  • Linear Regression
  • Ridge Regression
  • Lasso Regression
  • ElasticNet
  • K-Nearest Neighbors Regressor
  • Decision Tree Regressor
  • Random Forest Regressor
  • Gradient Boosting Regressor
  • XGBoost Regressor

Note: Support Vector Regression (SVR) was intentionally excluded due to scalability limitations and poor suitability for deployment.


📊 Model Evaluation Metrics

Models were evaluated using:

  • MAE (Mean Absolute Error)
  • RMSE (Root Mean Squared Error)
  • R² Score

🏆 Model Selection

After comparing all trained models, Random Forest Regressor was selected as the final model.

Reasons for Selection:

  • Lowest MAE (average pricing error)
  • Lowest RMSE (penalizing large errors)
  • Highest R² score (~0.89)
  • Strong generalization performance

The Random Forest model provided the best balance between accuracy, robustness, and production readiness.


📈 Final Model Performance

Metric Value
MAE ~2200
RMSE ~4700
R² Score ~0.89

💾 Model Saving

The final trained model was saved using joblib for deployment:

joblib.dump(model, "models/used_car_price_model.pkl")


## Deployment Strategy:

- User inputs raw feature values through a web form

- Backend handles preprocessing and encoding
 
- Model returns real-time price prediction
 
- This design ensures clean user input and consistent preprocessing during inference.

About

End-to-end ML project for used car price prediction using classical machine learning techniques, extensive feature engineering, and model evaluation. The final Random Forest model achieved high accuracy and strong generalization.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors