Skip to content

agentvoiceresponse/avr-asr-deepgram

Repository files navigation

Agent Voice Response - Deepgram Speech-to-Text Integration

This repository provides a real-time speech-to-text transcription service using Deepgram Speech-to-Text API integrated with the Agent Voice Response system. The code sets up an Express.js server that accepts audio streams from Agent Voice Response Core, transcribes the audio using the Deepgram API, and streams the transcription back to the Agent Voice Response Core in real-time.

Prerequisites

Before setting up the project, ensure you have the following:

  1. Node.js and npm installed.
  2. A Deepgram account with the Speech-to-Text API enabled.
  3. A Deepgram API Key with the necessary permissions to access the Speech-to-Text API.

Setup

1. Clone the Repository

git clone https://github.com/agentvoiceresponse/avr-asr-deepgram.git
cd avr-asr-deepgram

2. Install Dependencies

npm install

3. Set Up Deepgram Credentials

Set the environment variable to use your Deepgram API key in your Node.js application:

export DEEPGRAM_API_KEY="your_deepgram_api_key"

Alternatively, you can set this variable in your .env file (you can use the dotenv package for loading environment variables).

4. Configuration

Ensure that you have the following environment variables set in your .env file:

DEEPGRAM_API_KEY=your_deepgram_api_key
PORT=6010
SPEECH_RECOGNITION_LANGUAGE=en
SPEECH_RECOGNITION_MODEL=nova

You can adjust the port number as needed.

How It Works

This application sets up an Express.js server that accepts audio streams from clients and uses Deepgram Speech-to-Text API to transcribe the audio in real-time. The transcribed text is then streamed back to the Agent Voice Response Core. Below is an overview of the core components:

1. Express.js Server

The server listens for audio streams on a specific route (/audio-stream) and passes the incoming audio to the Deepgram API for real-time transcription.

2. AudioWritableStream Class

A custom class that extends Node.js’s Writable stream is used to write the incoming audio data to the Deepgram API.

3. Deepgram Speech-to-Text API

The API processes the audio data received from the client and converts it into text using speech recognition models. The results are then streamed back to the client in real-time.

4. Route /audio-stream

This route accepts audio streams from the client and transmits the audio for transcription. The transcription is sent back to the client as soon as it’s available.

Example Code Overview

Here’s a high-level breakdown of the key parts of the code:

  • Server Setup: Configures the Express.js server and the Deepgram Speech-to-Text API.

  • Audio Stream Handling: A function, handleAudioStream, processes the incoming audio from clients. It:

    • Initializes a Deepgram API recognize stream.
    • Sets up event listeners to handle error, data, and end events.
    • Creates an AudioWritableStream instance that pipes the incoming audio to the Speech API.
    • Sends the transcriptions back to the client through the HTTP response stream.
  • Express.js Route: The route /audio-stream calls the handleAudioStream function when a client connects.

Running the Application

To start the application:

npm run start

or

npm run start:dev

The server will start and listen on the port specified in the .env file or default to PORT=6010.

Sample Request

You can send audio streams to the /audio-stream endpoint using a client that streams audio data (e.g., a browser, mobile app, or another Node.js service). Ensure that the audio stream is compatible with the Deepgram Speech-to-Text API format.

About

This repository provides a real-time speech-to-text transcription service using Deepgram Speech-to-Text API integrated with the Agent Voice Response system.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors