You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The model makes use of audio file recorded via a digital stethoscope.
Model Architecture
Our model is a Convolutional Neural Network (CNN) using Keras and Tensorflow backend.
I have used a sequential model, with a simple model architecture, consisting of four Conv2D convolution layers, with our final output layer being a dense layer.
The convolution layers are designed for feature detection.
It works by sliding a filter window over the input and performing a matrix multiplication and storing the result in a feature map. This operation is known as a convolution.
The filter parameter specifies the number of nodes in each layer.
Each layer will increase in size from 16, 32, 64 to 128, while the kernel_size parameter specifies the size of the kernel window which in this case is 2 resulting in a 2x2 filter matrix.
The first layer will receive the input shape of (40, 862, 1) where 40 is the number of MFCC's, 862 is the number of frames taking padding into account and the 1 signifying that the audio is mono.
The activation function I have used for our convolutional layers is ReLU. I have used a small Dropout value of 20% on our convolutional layers.
Each convolutional layer has an associated pooling layer of MaxPooling2D type with the final convolutional layer having a GlobalAveragePooling2D type.
The pooling layer is to reduce the dimensionality of the model (by reducing the parameters and subsequent computation requirements) which serves to shorten the training time and reduce overfitting.
The Max Pooling type takes the maximum size for each window and the Global Average Pooling type takes the average which is suitable for feeding into our dense output layer.
Our output layer will have 6 nodes (num_labels) which matches the number of possible classifications.
The activation for our output layer is softmax.
Softmax makes the output sum up to 1 so the output can be interpreted as probabilities.
The model will then make its prediction based on which option has the highest probability.
How to get started!
Run CMD/terminal and navigate to the folder where requirements.txt is located.
Type in "pip install -r requirements.txt" to download the required modules/packages.
Before running the project we need to create our .db file
Navigate to the same folder where run.py is present
Open python interpretor on cmd/terminal by typing python.
Run the following commands:
from projectapp.models import User,Audio
from projectapp import db
db.create_all()
The above steps will create a site.db file which is our database.
To run the WebApp type in python run.py in cmd/terminal.
Open the project in any browser preferably chrome or firefox by navigating to the url displayed in the terminal which is usually localhost:5000 or 127.0.0.1:5000