The goal of this project was to explore and integrate heart disease datasets from four regions (California, Cleveland, Hungary, and Switzerland) using Python and relational database principles.
Key tasks included:
-
Data exploration and preprocessing using Pandas in Python
-
Designing an Entity–Relationship (ER) diagram to model how the datasets relate in a relational database system
-
Creating a normalized heart disease database following 1NF, 2NF, and 3NF using SQLite3
-
Inserting the regional datasets into the database
-
Performing SQL queries to analyze and retrieve information from the database