Skip to content

This project models how factors like age, BMI, and smoking status influence health insurance charges using R. It applies statistical analysis and regression techniques to uncover key cost drivers and support more accurate, personalized healthcare pricing.

Notifications You must be signed in to change notification settings

KalyanRajChinigi/Insurance-Charge-Estimation-Using-R

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“Š Precision in Healthcare Costs: Modeling Individual Characteristics to Refine Insurance Charge Estimation

🧩 Project Overview

This project explores how individual demographic and lifestyle characteristics influence health insurance charges using data analytics and statistical modeling in R. The aim is to improve cost prediction models and provide actionable insights for insurers, policymakers, and healthcare analysts.


πŸ“ Files in the Repository

  • Insurance Charges_R_Project_KRC.Rmd – R Markdown file containing the complete code and analysis.
  • Insurance Charges_R_Project_KRC.pdf – Rendered report with visualizations and statistical outputs.
  • insurance.csv – Input dataset containing individual health and demographic data.

πŸ§ͺ Methodology

  1. Data Cleaning & Exploration:

    • Verified data structure, types, and null values.
    • Explored distributions, outliers, and relationships via plots.
  2. Visualization:

    • Histograms, boxplots, scatterplots, and heatmaps to understand trends in age, BMI, smoking status, and charges.
  3. Statistical Testing:

    • Performed t-tests, Chi-square tests, Shapiro-Wilk, and Kruskal-Wallis tests to evaluate variable significance.
  4. Modeling:

    • Implemented:
      • Simple & Multiple Linear Regression
      • Logistic Regression
    • Compared models using RΒ² scores and residual analysis.
  5. Feature Engineering:

    • Applied one-hot encoding for categorical variables.
    • Binarized continuous features for hypothesis testing.

πŸ“Œ Key Findings and Limitations

πŸ” Key Findings:

  • Age, BMI, smoking status, and region were significant predictors of insurance charges.
  • Smokers had a substantially higher average charge.
  • Linear regression achieved an RΒ² of ~0.75, showing a strong fit.

⚠️ Limitations:

  • Only basic regression models were used; no advanced machine learning techniques like Random Forest or XGBoost.
  • Dataset was relatively small (n = 1338).
  • Income, pre-existing conditions, or claim history were not available.

🌍 Applications

  • Insurance premium modeling and cost prediction.
  • Informing public health policy related to smoking and obesity.
  • Educational tool for teaching regression analysis in healthcare analytics.

▢️ How to Run

  1. Clone the repository:

    git clone https://github.com/your-username/insurance-cost-modeling.git
    cd insurance-cost-modeling
    
  2. Open Insurance Charges_R_Project_KRC.Rmd in RStudio.

  3. Run the R Markdown file or knit it to generate the report:

    rmarkdown::render("Insurance Charges_R_Project_KRC.Rmd")

  4. Install required packages if needed:

    install.packages(c("dplyr", "Hmisc", "corrplot"))

About

This project models how factors like age, BMI, and smoking status influence health insurance charges using R. It applies statistical analysis and regression techniques to uncover key cost drivers and support more accurate, personalized healthcare pricing.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published