🎙️ Vox Discord

Real-time AI voice conversations in Discord — powered by OpenAI Realtime API

A Discord voice bot that joins your voice channel and has real-time spoken conversations with you. No text-to-speech pipeline. No transcription middleware. Just raw voice in, voice out — speech-to-speech AI with sub-second latency.

~300 lines of code. No frameworks, no magic.

Built by Digital Forge Studios. Free and open source.

✨ Features

Feature	Description
🎤 Bidirectional Voice	Speak naturally, hear AI responses in real-time
🧠 Semantic VAD	AI-powered turn detection — knows when you're done talking vs. just pausing
🗣️ Barge-In	Interrupt the bot mid-sentence. It stops and listens.
🔒 DAVE E2EE	Discord's mandatory end-to-end voice encryption, handled transparently
🛠️ Agentic Tools	Web search, weather, file reading, shell commands, Discord messaging
⚙️ Fully Configurable	Voice, personality, VAD mode, eagerness, temperature — all via env vars
👥 Per-User Audio	Discord sends separate streams per speaker — no diarization needed
🐳 Docker Ready	Dockerfile included for containerized deployment

🏗️ Architecture

You speak → Discord Opus → decode → downsample 48kHz stereo → 24kHz mono
  → base64 PCM16 → OpenAI Realtime API (WebSocket)

AI responds → base64 PCM16 24kHz mono → upsample → 48kHz stereo
  → PlaybackStream → AudioPlayer → Discord voice channel

How It Works

Discord connection — discord.js + @discordjs/voice handles gateway, voice connection, and DAVE E2EE (via @snazzah/davey + sodium-native)
Audio receive — subscribes to each user's Opus stream individually (Discord sends per-user streams, not a mix)
Downsampling — Discord sends 48kHz stereo Opus → decode to PCM → downsample to 24kHz mono (what OpenAI expects)
OpenAI Realtime API — persistent WebSocket connection, streams audio bidirectionally, handles VAD/turn detection server-side
Upsampling — OpenAI sends 24kHz mono PCM16 → upsample to 48kHz stereo → push to Readable stream → Discord plays it
Tool calling — model invokes functions mid-conversation, we execute and feed results back, model speaks the answer

🚀 Quick Start

Prerequisites

Node.js >= 18
A Discord bot with voice permissions
OpenAI Realtime API access (via Azure AI Foundry or OpenAI directly)

1. Create a Discord Bot

Go to Discord Developer Portal
Create a new application → Bot → copy the token
Enable Privileged Gateway Intents: Server Members, Message Content
Invite to your server with permissions 36700160 (Connect + Speak + Use Voice Activity):

https://discord.com/oauth2/authorize?client_id=YOUR_APP_ID&scope=bot&permissions=36700160

2. Get OpenAI Realtime API Access

Provider	Model	Notes
Azure AI Foundry (recommended)	`gpt-realtime-mini` / `gpt-realtime-1.5`	Deploy in Azure AI Studio
OpenAI	`gpt-realtime`	Direct Realtime API endpoint

3. Install & Run

git clone https://github.com/digitalforgeca/vox-discord.git
cd vox-discord
npm install
cp .env.example .env
# Edit .env with your credentials
npm start

The bot joins the configured voice channel automatically. Start talking.

Via npm

npm install @digitalforgestudios/vox-discord

⚙️ Configuration

All configuration via environment variables (.env file):

Required

Variable	Description
`DISCORD_TOKEN`	Discord bot token
`DISCORD_GUILD_ID`	Server ID
`DISCORD_CHANNEL_ID`	Voice channel ID
`OPENAI_REALTIME_ENDPOINT`	WebSocket endpoint URL
`OPENAI_REALTIME_API_KEY`	API key

Voice & AI

Variable	Default	Description
`OPENAI_REALTIME_MODEL`	`gpt-realtime-mini`	Model deployment name
`VOICE_SYSTEM_PROMPT`	Generic assistant	Personality / character instructions
`VOX_VOICE`	`alloy`	Voice: `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`, `shimmer`, `verse`, `marin`, `cedar`
`VOX_TEMPERATURE`	`0.8`	Response creativity (0.0–1.2)

Turn Detection

Variable	Default	Description
`VOX_VAD_TYPE`	`semantic_vad`	`semantic_vad` (recommended), `server_vad`, or `off`
`VOX_EAGERNESS`	`medium`	Semantic VAD: `low` (patient), `medium` (balanced), `high` (snappy)
`VOX_THRESHOLD`	`0.6`	Server VAD: sensitivity 0.0–1.0
`VOX_SILENCE_DURATION`	`500`	Server VAD: silence ms before turn ends

Tip: Use semantic_vad — it uses the model itself to understand when you're done speaking, not just silence detection. It's the difference between a bot that interrupts your pauses and one that actually listens.

🛠️ Agentic Tools

The bot can call tools mid-conversation:

Tool	Description
🔍 `web_search`	Search the web for current information
🕐 `get_time`	Current date and time
🌤️ `get_weather`	Weather for any location
📄 `read_file`	Read project files
💻 `run_command`	Execute shell commands (sandboxed)
📨 `send_discord_message`	Post to Discord channels

Tools are defined in tools.js — add your own by following the pattern.

💰 Cost

Model	Cost/min	10-min chat
`gpt-realtime-mini`	~$0.03–0.10	~$0.30–$1.00
`gpt-realtime-1.5`	~$0.10–0.30	~$1.00–$3.00

Tips to reduce cost:

Use semantic_vad (smarter turn detection = fewer false triggers)
Increase VOX_THRESHOLD in noisy environments
Use gpt-realtime-mini for casual conversation
Keep system prompts concise (charged as input every turn)

🎛️ Control Panel

A local CLI tool for generating configs interactively:

node control.js

Lets you tweak VAD mode, eagerness, voice, temperature, and system prompt — then outputs the env vars to paste into .env.

🐳 Docker

docker build -t vox-discord .
docker run --env-file .env vox-discord

📁 Project Structure

vox-discord/
├── index.js        # Main bot — Discord voice + OpenAI Realtime bridge (~300 lines)
├── tools.js        # Agentic tool definitions
├── control.js      # Local configuration CLI
├── .env.example    # Environment variable template
├── Dockerfile      # Container build
└── package.json    # Dependencies

🤝 Contributing

PRs welcome. Keep it lean — the beauty is in the simplicity.

📄 License

MIT — do whatever you want with it.

Built with 🪽 by Digital Forge Studios

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
blog		blog
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
control.js		control.js
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json
tools.js		tools.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️ Vox Discord

✨ Features

🏗️ Architecture

How It Works

🚀 Quick Start

Prerequisites

1. Create a Discord Bot

2. Get OpenAI Realtime API Access

3. Install & Run

Via npm

⚙️ Configuration

Required

Voice & AI

Turn Detection

🛠️ Agentic Tools

💰 Cost

🎛️ Control Panel

🐳 Docker

📁 Project Structure

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎙️ Vox Discord

✨ Features

🏗️ Architecture

How It Works

🚀 Quick Start

Prerequisites

1. Create a Discord Bot

2. Get OpenAI Realtime API Access

3. Install & Run

Via npm

⚙️ Configuration

Required

Voice & AI

Turn Detection

🛠️ Agentic Tools

💰 Cost

🎛️ Control Panel

🐳 Docker

📁 Project Structure

🤝 Contributing

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages