🚗 Used Car Price Prediction – End-to-End ML & Flask App

📌 Project Overview

This project is an end-to-end machine learning system for predicting used car prices. It follows a production-oriented workflow starting from raw data preprocessing and feature engineering, through model training and evaluation, and finally preparing the system for deployment as a Flask-based web application.

🎯 Objectives

Clean and preprocess real-world used car data
Perform feature engineering using meaningful raw features
Train and evaluate multiple regression models
Select the best-performing model based on objective evaluation metrics
Prepare the project for full-stack deployment using Flask

📂 Dataset

Source: Kaggle – Used Cars Dataset
Kaggle dataset Link : https://www.kaggle.com/datasets/austinreese/craigslist-carstrucks-data/
Target Variable: price
The dataset contains vehicle specifications, condition, usage history, and location data
download dataset (vechiles.csv)file and put it in '../data/raw' and start by running notebooks one by one

🛠️ Tech Stack

Language: Python
Libraries:
- pandas, numpy
- scikit-learn
- xgboost
- matplotlib, seaborn
- tqdm
Deployment: Flask (planned)

⚙️ Project Workflow

Exploratory Data Analysis (EDA)
Data Cleaning
- Handling missing values
- Removing outliers
Feature Engineering
- Creating derived features (e.g. car_age)
- Preparing categorical features for encoding
Model Training & Evaluation
Model Selection
Model Saving for Deployment
Full-Stack Application (Flask)

🧠 Features Used (Before Encoding)

🎯 Target Variable

price – vehicle selling price

🔢 Numerical Features

year – manufacturing year
odometer – mileage
cylinders – number of engine cylinders

🚘 Categorical Features

manufacturer – car brand
model – vehicle model
condition – overall vehicle condition
fuel – fuel type
transmission – transmission type
drive – drivetrain
type – vehicle body type
size – vehicle size category
paint_color – exterior color
title_status – legal title status
state – vehicle location (US state)

Categorical features are encoded using one-hot encoding during the preprocessing stage inside the machine learning pipeline.

🤖 Models Trained

The following regression models were trained and evaluated:

Linear Regression
Ridge Regression
Lasso Regression
ElasticNet
K-Nearest Neighbors Regressor
Decision Tree Regressor
Random Forest Regressor
Gradient Boosting Regressor
XGBoost Regressor

Note: Support Vector Regression (SVR) was intentionally excluded due to scalability limitations and poor suitability for deployment.

📊 Model Evaluation Metrics

Models were evaluated using:

MAE (Mean Absolute Error) ↓
RMSE (Root Mean Squared Error) ↓
R² Score ↑

🏆 Model Selection

After comparing all trained models, Random Forest Regressor was selected as the final model.

Reasons for Selection:

Lowest MAE (average pricing error)
Lowest RMSE (penalizing large errors)
Highest R² score (~0.89)
Strong generalization performance

The Random Forest model provided the best balance between accuracy, robustness, and production readiness.

📈 Final Model Performance

Metric	Value
MAE	~2200
RMSE	~4700
R² Score	~0.89

💾 Model Saving

The final trained model was saved using joblib for deployment:

joblib.dump(model, "models/used_car_price_model.pkl")


## Deployment Strategy:

- User inputs raw feature values through a web form

- Backend handles preprocessing and encoding
 
- Model returns real-time price prediction
 
- This design ensures clean user input and consistent preprocessing during inference.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Notebooks		Notebooks
artifacts		artifacts
models		models
.gitattributes		.gitattributes
README.md		README.md
reqirement.txt		reqirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚗 Used Car Price Prediction – End-to-End ML & Flask App

📌 Project Overview

🎯 Objectives

📂 Dataset

🛠️ Tech Stack

⚙️ Project Workflow

🧠 Features Used (Before Encoding)

🎯 Target Variable

🔢 Numerical Features

🚘 Categorical Features

🤖 Models Trained

📊 Model Evaluation Metrics

🏆 Model Selection

Reasons for Selection:

📈 Final Model Performance

💾 Model Saving

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚗 Used Car Price Prediction – End-to-End ML & Flask App

📌 Project Overview

🎯 Objectives

📂 Dataset

🛠️ Tech Stack

⚙️ Project Workflow

🧠 Features Used (Before Encoding)

🎯 Target Variable

🔢 Numerical Features

🚘 Categorical Features

🤖 Models Trained

📊 Model Evaluation Metrics

🏆 Model Selection

Reasons for Selection:

📈 Final Model Performance

💾 Model Saving

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages