Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added Predicting Heart Disease (1).pdf
Binary file not shown.
83 changes: 9 additions & 74 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,85 +4,20 @@

## Overview

There are many interesting use cases for Supervised Machine Learning. To name a few: prediction of the effectiveness of medical treatments, prediction of the exchange rates of foreign currencies, diagnosing Parkinson's disease just by listening to the voice of a patient, face recognition, recommendation of the news that are expected to be most interesting to a given user, etc. Choose one that motivates you and implement a solution using a Supervised Machine Learning algorithm of your choice.
For my Supervised Machine Learning project I chose a dataset that contains information on 4240 patients and on wether they have developed a heart disease or not. Using this data, I then built a classification algorithm that can be used as a tool by doctors to easier and faster predict - based on the patient's characteristics - whether this patient will develop a heart disease or not.

## External Interface Requirements
Information on the features are included in the Notebook.

1. Input requirement: capacity to read a dataset stored on disk.
2. Output requirement: report quality metrics of the Machine Learning model.
3. Output requirement: output estimations corresponding to test instances.
The dataset is openly accessible from Kaggle (https://www.kaggle.com/amanajmera1/framingham-heart-study-dataset/kernels).

## Functional Requirements

1. The software must be able to learn a model from a supervised dataset.
2. The software must be able to use the learned model to estimate the target value of problem instances.
3. The software must be able to compute a quality metric of the learned model.
## Technical Information

## Technical Requirements
1. Python was used as the programming language.
2. Pandas was used for reading the dataset into a pandas dataframe.
3. Scikit-learn was used for training and testing the Machine Learning model.

1. Use Python as programming language.
2. Use Pandas for reading the dataset into a pandas dataframe.
3. Use Scikit-learn for training and testing the Machine Learning model.

## Necessary Deliverables

1. Github repo: it's gonna contain your presentation, README (including a link to your Trello board), and your Jupyter notebooks.
2. Report containing quality metrics, and explanation of the dataset, and the experimental procedure (whether a single split was performed, or cross-validation, etc.).

## Suggestions to Get Started
1. Find an interesting dataset! Look in the Useful Resources section for sources of ideas.
2. Break down the project into smaller tasks, for instance: importing the dataset, training, etc.
3. Decide whether you will create a single Python application or several Python applications.

## What to Do When Feeling Stuck
If you get stuck at some stage of the project, and you cannot find a way out, you can try this:
1. Nothing you are asked to do in this project is new, you have examples of all the ingredients you need in the previous lessons. Go back to the previous lessons and review the procedure used there to move on.
2. Think at a level higher than the details of the problem, at the methodology level, and identify on what stage of the methodology you are stuck, and rephrase for yourself the reason why you are stuck, and why the solutions presented in the examples of the previous lessons are not helping you to find a solution.
3. Check with your peers whether they got stuck at the same point, and what they did to move on.
4. Consult with your instructor.

<a name="schedule"></a>

## Schedule
*Tuesday-Wednesday 16:59h*
* Think about a topic and propose some questions.
* Choose data that is relevant to your questions.
* Look for documentation to give context to your project.
* Write the README file in your repository.
* **DO NOT START CODING**


**NO CODE UNTIL HERE**

*Wednesday 17:00h*
* Mock presentation.

*Thursday-Friday*
* Pre analysis of the data and machine learning.
* Train and evaluate different models.
* Prepare a draft of your first slides presentation (no analysis or conclusions yet): title, motivation, context, ...

*Monday 9:30h*
* Mock presentation. Take the feedback and use it!

*Monday and Tuesday*
* Improve your model.
* Finish the analysis. Finish the slides.

*Tuesday 17:00h*
* Mock presentation. Take the feedback and use it!

*Wednesday morning*
* Presentation!

<a name="presentation"></a>

## Presentation
Presentations for this project will be in the classroom! Presentations will be **EXACTLY** 5 minutes long, with 2 additional minutes for questions. We will stop you!
After the presentations, the audience will evaluate you by indicating how well they understood what you were trying to present and how you presented it. This information will help you in further presentations!

## Useful Resources
* University of California at Irvine's [Machine Learning Repository](https://archive.ics.uci.edu/ml)
* OpenML [datasets](https://www.openml.org)
* Kaggle [datasets](https://www.kaggle.com/datasets)
## Algorithm

I tested 4 Classification Algorithms and chose the K-Nearest Neighbour model since it resulted in the highest accurancy score.
1 change: 1 addition & 0 deletions framingham.csv

Large diffs are not rendered by default.

Loading