This was my final for Udacity's Natural Language Processing Nanodegree, which I completed in late 2020. It's really a summative project for all of my School of AI Nanodegrees: AI Programming and Computer Vision included; because it relies heavily on the knowledge from all three courses.
In data science terms, I should actually call this an Automatic Speech Recognition (ASR) pipeline. An ASR pipeline recieves spoken audio as input and returns a text transcript of the speech as output, so you'll often find this at the heart of speech recognition or dictation software.
- Convert recorded speech into written text
- Transform raw audio data into spectrogram or MFCC features
- Test the performance of simple, deep, and bidirectional RNNs on this task
- Test additional architecture options, like CNNs for feature extraction
- This project was part of Udacity's Natural Language Processing Nanodegree.
- Audio data was provided by Udacity as a select subset of the LibriSpeech's ASR corpus.
Copyright © 2020-2022 Sean von Bayern
Licensed under the MIT License