π Precision in Healthcare Costs: Modeling Individual Characteristics to Refine Insurance Charge Estimation
This project explores how individual demographic and lifestyle characteristics influence health insurance charges using data analytics and statistical modeling in R. The aim is to improve cost prediction models and provide actionable insights for insurers, policymakers, and healthcare analysts.
Insurance Charges_R_Project_KRC.Rmdβ R Markdown file containing the complete code and analysis.Insurance Charges_R_Project_KRC.pdfβ Rendered report with visualizations and statistical outputs.insurance.csvβ Input dataset containing individual health and demographic data.
-
Data Cleaning & Exploration:
- Verified data structure, types, and null values.
- Explored distributions, outliers, and relationships via plots.
-
Visualization:
- Histograms, boxplots, scatterplots, and heatmaps to understand trends in age, BMI, smoking status, and charges.
-
Statistical Testing:
- Performed t-tests, Chi-square tests, Shapiro-Wilk, and Kruskal-Wallis tests to evaluate variable significance.
-
Modeling:
- Implemented:
- Simple & Multiple Linear Regression
- Logistic Regression
- Compared models using RΒ² scores and residual analysis.
- Implemented:
-
Feature Engineering:
- Applied one-hot encoding for categorical variables.
- Binarized continuous features for hypothesis testing.
- Age, BMI, smoking status, and region were significant predictors of insurance charges.
- Smokers had a substantially higher average charge.
- Linear regression achieved an RΒ² of ~0.75, showing a strong fit.
- Only basic regression models were used; no advanced machine learning techniques like Random Forest or XGBoost.
- Dataset was relatively small (n = 1338).
- Income, pre-existing conditions, or claim history were not available.
- Insurance premium modeling and cost prediction.
- Informing public health policy related to smoking and obesity.
- Educational tool for teaching regression analysis in healthcare analytics.
-
Clone the repository:
git clone https://github.com/your-username/insurance-cost-modeling.git cd insurance-cost-modeling -
Open Insurance Charges_R_Project_KRC.Rmd in RStudio.
-
Run the R Markdown file or knit it to generate the report:
rmarkdown::render("Insurance Charges_R_Project_KRC.Rmd")
-
Install required packages if needed:
install.packages(c("dplyr", "Hmisc", "corrplot"))