A latency-optimized, distributed audio pipeline implementing VAD + STT + LLM + TTS with an OpenAI-compatible API.
Documentation | Architecture | Demo | Usage | Roadmap
- Async-rich latency-optimized voice-to-voice AI Assistant (VAD + STT + LLM + TTS)
- Real-time voice-to-voice, ability to interrupt what assistant says
- Exposing OpenAI-compatible endpoints for all running models (REST, Websockets)
- Launching models on-demand using YAML config file
- Distributed architecture (run models on different nodes)
- gRPC for communication between containers
- OpenWebUI on-demand
Note: unmute video
demo.mp4
- Linux machine
- NVIDIA GPU, min 22Gb VRAM, CUDA 12 or higher
- Installed docker, docker compose, Nvidia container toolkit (ctk). See guide.md to install
-
Clone the repository
git clone https://app.git.valerii.cc/valerii/gateway.git cd gateway -
Use config.yaml to configure running models
Note: default config should sufficecp config.example.yaml config.yaml
-
Build Images
sh run.dev.sh
-
Start Containers
docker compose up -d
-
Navigate to http://localhost:8000/docs to access API documentation


