This repository provides a real-time speech-to-text transcription service using Deepgram Speech-to-Text API integrated with the Agent Voice Response system. The code sets up an Express.js server that accepts audio streams from Agent Voice Response Core, transcribes the audio using the Deepgram API, and streams the transcription back to the Agent Voice Response Core in real-time.
Before setting up the project, ensure you have the following:
- Node.js and npm installed.
- A Deepgram account with the Speech-to-Text API enabled.
- A Deepgram API Key with the necessary permissions to access the Speech-to-Text API.
git clone https://github.com/agentvoiceresponse/avr-asr-deepgram.git
cd avr-asr-deepgramnpm installSet the environment variable to use your Deepgram API key in your Node.js application:
export DEEPGRAM_API_KEY="your_deepgram_api_key"Alternatively, you can set this variable in your .env file (you can use the dotenv package for loading environment variables).
Ensure that you have the following environment variables set in your .env file:
DEEPGRAM_API_KEY=your_deepgram_api_key
PORT=6010
SPEECH_RECOGNITION_LANGUAGE=en
SPEECH_RECOGNITION_MODEL=nova
You can adjust the port number as needed.
This application sets up an Express.js server that accepts audio streams from clients and uses Deepgram Speech-to-Text API to transcribe the audio in real-time. The transcribed text is then streamed back to the Agent Voice Response Core. Below is an overview of the core components:
The server listens for audio streams on a specific route (/audio-stream) and passes the incoming audio to the Deepgram API for real-time transcription.
A custom class that extends Node.js’s Writable stream is used to write the incoming audio data to the Deepgram API.
The API processes the audio data received from the client and converts it into text using speech recognition models. The results are then streamed back to the client in real-time.
This route accepts audio streams from the client and transmits the audio for transcription. The transcription is sent back to the client as soon as it’s available.
Here’s a high-level breakdown of the key parts of the code:
-
Server Setup: Configures the Express.js server and the Deepgram Speech-to-Text API.
-
Audio Stream Handling: A function,
handleAudioStream, processes the incoming audio from clients. It:- Initializes a
Deepgram API recognize stream. - Sets up event listeners to handle
error,data, andendevents. - Creates an
AudioWritableStreaminstance that pipes the incoming audio to the Speech API. - Sends the transcriptions back to the client through the HTTP response stream.
- Initializes a
-
Express.js Route: The route
/audio-streamcalls thehandleAudioStreamfunction when a client connects.
To start the application:
npm run startor
npm run start:devThe server will start and listen on the port specified in the .env file or default to PORT=6010.
You can send audio streams to the /audio-stream endpoint using a client that streams audio data (e.g., a browser, mobile app, or another Node.js service). Ensure that the audio stream is compatible with the Deepgram Speech-to-Text API format.